Received: by mail.netbsd.org (Postfix, from userid 605) id C634C856C3; Sun, 5 Mar 2017 09:08:19 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.netbsd.org (Postfix) with ESMTP id F0E09855D3 for ; Sun, 5 Mar 2017 09:08:18 +0000 (UTC) X-Virus-Scanned: amavisd-new at netbsd.org Received: from mail.netbsd.org ([127.0.0.1]) by localhost (mail.netbsd.org [127.0.0.1]) (amavisd-new, port 10025) with ESMTP id MFsxdF4x_7Di for ; Sun, 5 Mar 2017 09:08:18 +0000 (UTC) Received: from cvs.NetBSD.org (ivanova.netbsd.org [199.233.217.197]) by mail.netbsd.org (Postfix) with ESMTP id 53E2B855CC for ; Sun, 5 Mar 2017 09:08:18 +0000 (UTC) Received: by cvs.NetBSD.org (Postfix, from userid 500) id 51E1AFBE4; Sun, 5 Mar 2017 09:08:18 +0000 (UTC) Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" MIME-Version: 1.0 Date: Sun, 5 Mar 2017 09:08:18 +0000 From: "Maxime Villard" Subject: CVS commit: src/sys/arch/x86/include To: source-changes@NetBSD.org X-Mailer: log_accum Message-Id: <20170305090818.51E1AFBE4@cvs.NetBSD.org> Sender: source-changes-owner@NetBSD.org List-Id: source-changes.NetBSD.org Precedence: bulk Reply-To: source-changes-d@NetBSD.org Mail-Reply-To: "Maxime Villard" Mail-Followup-To: source-changes-d@NetBSD.org Module Name: src Committed By: maxv Date: Sun Mar 5 09:08:18 UTC 2017 Modified Files: src/sys/arch/x86/include: pmap.h Log Message: Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege separation between the kernel and userland. On Xen-amd64, the kernel runs in ring3 just like userland, and the separation is guaranteed by the hypervisor - each syscall/trap is intercepted by Xen and sent manually to the kernel. Before that, the hypervisor modifies the page tables so that the kernel becomes accessible. Later, when returning to userland, the hypervisor removes the kernel pages and flushes the TLB. However, TLB flushes are costly, and in order to reduce the number of pages flushed Xen marks the userland pages as global, while keeping the kernel ones as local. This way, when returning to userland, only the kernel pages get flushed - which makes sense since they are the only ones that got removed from the mapping. Xen differentiates the userland pages by looking at their PG_u bit in the PTE; if a page has this bit then Xen tags it as global, otherwise Xen manually adds the bit but keeps the page as local. The thing is, since we set PG_u in the kernel pages, Xen believes our kernel pages are in fact userland pages, so it marks them as global. Therefore, when returning to userland, the kernel pages indeed get removed from the page tree, but are not flushed from the TLB. Which means that they are still accessible. With this - and depending on the DTLB size - userland has a small window where it can read/write to the last kernel pages accessed, which is enough to completely escalate privileges: the sysent structure systematically gets read when performing a syscall, and chances are that it will still be cached in the TLB. Userland can then use this to patch a chosen syscall, make it point to a userland function, retrieve %gs and compute the address of its credentials, and finally grant itself root privileges. To generate a diff of this commit: cvs rdiff -u -r1.62 -r1.63 src/sys/arch/x86/include/pmap.h Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.