Sun Mar 5 09:08:18 2017 UTC ()
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.

On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.

However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.

Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.

With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.


(maxv)
diff -r1.62 -r1.63 src/sys/arch/x86/include/pmap.h

cvs diff -r1.62 -r1.63 src/sys/arch/x86/include/pmap.h (expand / switch to unified diff)

--- src/sys/arch/x86/include/pmap.h 2017/02/11 14:11:24 1.62
+++ src/sys/arch/x86/include/pmap.h 2017/03/05 09:08:18 1.63
@@ -1,14 +1,14 @@ @@ -1,14 +1,14 @@
1/* $NetBSD: pmap.h,v 1.62 2017/02/11 14:11:24 maxv Exp $ */ 1/* $NetBSD: pmap.h,v 1.63 2017/03/05 09:08:18 maxv Exp $ */
2 2
3/* 3/*
4 * Copyright (c) 1997 Charles D. Cranor and Washington University. 4 * Copyright (c) 1997 Charles D. Cranor and Washington University.
5 * All rights reserved. 5 * All rights reserved.
6 * 6 *
7 * Redistribution and use in source and binary forms, with or without 7 * Redistribution and use in source and binary forms, with or without
8 * modification, are permitted provided that the following conditions 8 * modification, are permitted provided that the following conditions
9 * are met: 9 * are met:
10 * 1. Redistributions of source code must retain the above copyright 10 * 1. Redistributions of source code must retain the above copyright
11 * notice, this list of conditions and the following disclaimer. 11 * notice, this list of conditions and the following disclaimer.
12 * 2. Redistributions in binary form must reproduce the above copyright 12 * 2. Redistributions in binary form must reproduce the above copyright
13 * notice, this list of conditions and the following disclaimer in the 13 * notice, this list of conditions and the following disclaimer in the
14 * documentation and/or other materials provided with the distribution. 14 * documentation and/or other materials provided with the distribution.
@@ -170,35 +170,27 @@ struct pmap { @@ -170,35 +170,27 @@ struct pmap {
170 uint64_t pm_ncsw; /* for assertions */ 170 uint64_t pm_ncsw; /* for assertions */
171 struct vm_page *pm_gc_ptp; /* pages from pmap g/c */ 171 struct vm_page *pm_gc_ptp; /* pages from pmap g/c */
172}; 172};
173 173
174/* macro to access pm_pdirpa slots */ 174/* macro to access pm_pdirpa slots */
175#ifdef PAE 175#ifdef PAE
176#define pmap_pdirpa(pmap, index) \ 176#define pmap_pdirpa(pmap, index) \
177 ((pmap)->pm_pdirpa[l2tol3(index)] + l2tol2(index) * sizeof(pd_entry_t)) 177 ((pmap)->pm_pdirpa[l2tol3(index)] + l2tol2(index) * sizeof(pd_entry_t))
178#else 178#else
179#define pmap_pdirpa(pmap, index) \ 179#define pmap_pdirpa(pmap, index) \
180 ((pmap)->pm_pdirpa[0] + (index) * sizeof(pd_entry_t)) 180 ((pmap)->pm_pdirpa[0] + (index) * sizeof(pd_entry_t))
181#endif 181#endif
182 182
183/*  
184 * flag to be used for kernel mappings: PG_u on Xen/amd64,  
185 * 0 otherwise. 
186 */ 
187#if defined(XEN) && defined(__x86_64__) 
188#define PG_k PG_u 
189#else 
190#define PG_k 0 183#define PG_k 0
191#endif 
192 184
193/* 185/*
194 * MD flags that we use for pmap_enter and pmap_kenter_pa: 186 * MD flags that we use for pmap_enter and pmap_kenter_pa:
195 */ 187 */
196 188
197/* 189/*
198 * global kernel variables 190 * global kernel variables
199 */ 191 */
200 192
201/* 193/*
202 * PDPpaddr is the physical address of the kernel's PDP. 194 * PDPpaddr is the physical address of the kernel's PDP.
203 * - i386 non-PAE and amd64: PDPpaddr corresponds directly to the %cr3 195 * - i386 non-PAE and amd64: PDPpaddr corresponds directly to the %cr3
204 * value associated to the kernel process, proc0. 196 * value associated to the kernel process, proc0.