Sun Mar 5 09:08:18 2017 UTC ()
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.

On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.

However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.

Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.

With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.


(maxv)
diff -r1.62 -r1.63 src/sys/arch/x86/include/pmap.h

cvs diff -r1.62 -r1.63 src/sys/arch/x86/include/pmap.h (switch to unified diff)

--- src/sys/arch/x86/include/pmap.h 2017/02/11 14:11:24 1.62
+++ src/sys/arch/x86/include/pmap.h 2017/03/05 09:08:18 1.63
@@ -1,519 +1,511 @@ @@ -1,519 +1,511 @@
1/* $NetBSD: pmap.h,v 1.62 2017/02/11 14:11:24 maxv Exp $ */ 1/* $NetBSD: pmap.h,v 1.63 2017/03/05 09:08:18 maxv Exp $ */
2 2
3/* 3/*
4 * Copyright (c) 1997 Charles D. Cranor and Washington University. 4 * Copyright (c) 1997 Charles D. Cranor and Washington University.
5 * All rights reserved. 5 * All rights reserved.
6 * 6 *
7 * Redistribution and use in source and binary forms, with or without 7 * Redistribution and use in source and binary forms, with or without
8 * modification, are permitted provided that the following conditions 8 * modification, are permitted provided that the following conditions
9 * are met: 9 * are met:
10 * 1. Redistributions of source code must retain the above copyright 10 * 1. Redistributions of source code must retain the above copyright
11 * notice, this list of conditions and the following disclaimer. 11 * notice, this list of conditions and the following disclaimer.
12 * 2. Redistributions in binary form must reproduce the above copyright 12 * 2. Redistributions in binary form must reproduce the above copyright
13 * notice, this list of conditions and the following disclaimer in the 13 * notice, this list of conditions and the following disclaimer in the
14 * documentation and/or other materials provided with the distribution. 14 * documentation and/or other materials provided with the distribution.
15 * 15 *
16 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 16 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
17 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 17 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
18 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 18 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
19 * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 19 * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
20 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT 20 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
21 * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 21 * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
22 * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 22 * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
23 * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 23 * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 24 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
25 * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 25 * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 */ 26 */
27 27
28/* 28/*
29 * Copyright (c) 2001 Wasabi Systems, Inc. 29 * Copyright (c) 2001 Wasabi Systems, Inc.
30 * All rights reserved. 30 * All rights reserved.
31 * 31 *
32 * Written by Frank van der Linden for Wasabi Systems, Inc. 32 * Written by Frank van der Linden for Wasabi Systems, Inc.
33 * 33 *
34 * Redistribution and use in source and binary forms, with or without 34 * Redistribution and use in source and binary forms, with or without
35 * modification, are permitted provided that the following conditions 35 * modification, are permitted provided that the following conditions
36 * are met: 36 * are met:
37 * 1. Redistributions of source code must retain the above copyright 37 * 1. Redistributions of source code must retain the above copyright
38 * notice, this list of conditions and the following disclaimer. 38 * notice, this list of conditions and the following disclaimer.
39 * 2. Redistributions in binary form must reproduce the above copyright 39 * 2. Redistributions in binary form must reproduce the above copyright
40 * notice, this list of conditions and the following disclaimer in the 40 * notice, this list of conditions and the following disclaimer in the
41 * documentation and/or other materials provided with the distribution. 41 * documentation and/or other materials provided with the distribution.
42 * 3. All advertising materials mentioning features or use of this software 42 * 3. All advertising materials mentioning features or use of this software
43 * must display the following acknowledgement: 43 * must display the following acknowledgement:
44 * This product includes software developed for the NetBSD Project by 44 * This product includes software developed for the NetBSD Project by
45 * Wasabi Systems, Inc. 45 * Wasabi Systems, Inc.
46 * 4. The name of Wasabi Systems, Inc. may not be used to endorse 46 * 4. The name of Wasabi Systems, Inc. may not be used to endorse
47 * or promote products derived from this software without specific prior 47 * or promote products derived from this software without specific prior
48 * written permission. 48 * written permission.
49 * 49 *
50 * THIS SOFTWARE IS PROVIDED BY WASABI SYSTEMS, INC. ``AS IS'' AND 50 * THIS SOFTWARE IS PROVIDED BY WASABI SYSTEMS, INC. ``AS IS'' AND
51 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 51 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
52 * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 52 * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
53 * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WASABI SYSTEMS, INC 53 * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WASABI SYSTEMS, INC
54 * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 54 * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
55 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 55 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
56 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 56 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
57 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 57 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
58 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 58 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
59 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 59 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
60 * POSSIBILITY OF SUCH DAMAGE. 60 * POSSIBILITY OF SUCH DAMAGE.
61 */ 61 */
62 62
63/* 63/*
64 * pmap.h: see pmap.c for the history of this pmap module. 64 * pmap.h: see pmap.c for the history of this pmap module.
65 */ 65 */
66 66
67#ifndef _X86_PMAP_H_ 67#ifndef _X86_PMAP_H_
68#define _X86_PMAP_H_ 68#define _X86_PMAP_H_
69 69
70/* 70/*
71 * pl*_pi: index in the ptp page for a pde mapping a VA. 71 * pl*_pi: index in the ptp page for a pde mapping a VA.
72 * (pl*_i below is the index in the virtual array of all pdes per level) 72 * (pl*_i below is the index in the virtual array of all pdes per level)
73 */ 73 */
74#define pl1_pi(VA) (((VA_SIGN_POS(VA)) & L1_MASK) >> L1_SHIFT) 74#define pl1_pi(VA) (((VA_SIGN_POS(VA)) & L1_MASK) >> L1_SHIFT)
75#define pl2_pi(VA) (((VA_SIGN_POS(VA)) & L2_MASK) >> L2_SHIFT) 75#define pl2_pi(VA) (((VA_SIGN_POS(VA)) & L2_MASK) >> L2_SHIFT)
76#define pl3_pi(VA) (((VA_SIGN_POS(VA)) & L3_MASK) >> L3_SHIFT) 76#define pl3_pi(VA) (((VA_SIGN_POS(VA)) & L3_MASK) >> L3_SHIFT)
77#define pl4_pi(VA) (((VA_SIGN_POS(VA)) & L4_MASK) >> L4_SHIFT) 77#define pl4_pi(VA) (((VA_SIGN_POS(VA)) & L4_MASK) >> L4_SHIFT)
78 78
79/* 79/*
80 * pl*_i: generate index into pde/pte arrays in virtual space 80 * pl*_i: generate index into pde/pte arrays in virtual space
81 * 81 *
82 * pl_i(va, X) == plX_i(va) <= pl_i_roundup(va, X) 82 * pl_i(va, X) == plX_i(va) <= pl_i_roundup(va, X)
83 */ 83 */
84#define pl1_i(VA) (((VA_SIGN_POS(VA)) & L1_FRAME) >> L1_SHIFT) 84#define pl1_i(VA) (((VA_SIGN_POS(VA)) & L1_FRAME) >> L1_SHIFT)
85#define pl2_i(VA) (((VA_SIGN_POS(VA)) & L2_FRAME) >> L2_SHIFT) 85#define pl2_i(VA) (((VA_SIGN_POS(VA)) & L2_FRAME) >> L2_SHIFT)
86#define pl3_i(VA) (((VA_SIGN_POS(VA)) & L3_FRAME) >> L3_SHIFT) 86#define pl3_i(VA) (((VA_SIGN_POS(VA)) & L3_FRAME) >> L3_SHIFT)
87#define pl4_i(VA) (((VA_SIGN_POS(VA)) & L4_FRAME) >> L4_SHIFT) 87#define pl4_i(VA) (((VA_SIGN_POS(VA)) & L4_FRAME) >> L4_SHIFT)
88#define pl_i(va, lvl) \ 88#define pl_i(va, lvl) \
89 (((VA_SIGN_POS(va)) & ptp_masks[(lvl)-1]) >> ptp_shifts[(lvl)-1]) 89 (((VA_SIGN_POS(va)) & ptp_masks[(lvl)-1]) >> ptp_shifts[(lvl)-1])
90 90
91#define pl_i_roundup(va, lvl) pl_i((va)+ ~ptp_masks[(lvl)-1], (lvl)) 91#define pl_i_roundup(va, lvl) pl_i((va)+ ~ptp_masks[(lvl)-1], (lvl))
92 92
93/* 93/*
94 * PTP macros: 94 * PTP macros:
95 * a PTP's index is the PD index of the PDE that points to it 95 * a PTP's index is the PD index of the PDE that points to it
96 * a PTP's offset is the byte-offset in the PTE space that this PTP is at 96 * a PTP's offset is the byte-offset in the PTE space that this PTP is at
97 * a PTP's VA is the first VA mapped by that PTP 97 * a PTP's VA is the first VA mapped by that PTP
98 */ 98 */
99 99
100#define ptp_va2o(va, lvl) (pl_i(va, (lvl)+1) * PAGE_SIZE) 100#define ptp_va2o(va, lvl) (pl_i(va, (lvl)+1) * PAGE_SIZE)
101 101
102/* size of a PDP: usually one page, except for PAE */ 102/* size of a PDP: usually one page, except for PAE */
103#ifdef PAE 103#ifdef PAE
104#define PDP_SIZE 4 104#define PDP_SIZE 4
105#else 105#else
106#define PDP_SIZE 1 106#define PDP_SIZE 1
107#endif 107#endif
108 108
109 109
110#if defined(_KERNEL) 110#if defined(_KERNEL)
111#include <sys/kcpuset.h> 111#include <sys/kcpuset.h>
112#include <uvm/pmap/pmap_pvt.h> 112#include <uvm/pmap/pmap_pvt.h>
113 113
114/* 114/*
115 * pmap data structures: see pmap.c for details of locking. 115 * pmap data structures: see pmap.c for details of locking.
116 */ 116 */
117 117
118/* 118/*
119 * we maintain a list of all non-kernel pmaps 119 * we maintain a list of all non-kernel pmaps
120 */ 120 */
121 121
122LIST_HEAD(pmap_head, pmap); /* struct pmap_head: head of a pmap list */ 122LIST_HEAD(pmap_head, pmap); /* struct pmap_head: head of a pmap list */
123 123
124/* 124/*
125 * linked list of all non-kernel pmaps 125 * linked list of all non-kernel pmaps
126 */ 126 */
127extern struct pmap_head pmaps; 127extern struct pmap_head pmaps;
128extern kmutex_t pmaps_lock; /* protects pmaps */ 128extern kmutex_t pmaps_lock; /* protects pmaps */
129 129
130/* 130/*
131 * pool_cache(9) that PDPs are allocated from  131 * pool_cache(9) that PDPs are allocated from
132 */ 132 */
133extern struct pool_cache pmap_pdp_cache; 133extern struct pool_cache pmap_pdp_cache;
134 134
135/* 135/*
136 * the pmap structure 136 * the pmap structure
137 * 137 *
138 * note that the pm_obj contains the lock pointer, the reference count, 138 * note that the pm_obj contains the lock pointer, the reference count,
139 * page list, and number of PTPs within the pmap. 139 * page list, and number of PTPs within the pmap.
140 * 140 *
141 * pm_lock is the same as the lock for vm object 0. Changes to 141 * pm_lock is the same as the lock for vm object 0. Changes to
142 * the other objects may only be made if that lock has been taken 142 * the other objects may only be made if that lock has been taken
143 * (the other object locks are only used when uvm_pagealloc is called) 143 * (the other object locks are only used when uvm_pagealloc is called)
144 */ 144 */
145 145
146struct pmap { 146struct pmap {
147 struct uvm_object pm_obj[PTP_LEVELS-1]; /* objects for lvl >= 1) */ 147 struct uvm_object pm_obj[PTP_LEVELS-1]; /* objects for lvl >= 1) */
148#define pm_lock pm_obj[0].vmobjlock 148#define pm_lock pm_obj[0].vmobjlock
149 kmutex_t pm_obj_lock[PTP_LEVELS-1]; /* locks for pm_objs */ 149 kmutex_t pm_obj_lock[PTP_LEVELS-1]; /* locks for pm_objs */
150 LIST_ENTRY(pmap) pm_list; /* list (lck by pm_list lock) */ 150 LIST_ENTRY(pmap) pm_list; /* list (lck by pm_list lock) */
151 pd_entry_t *pm_pdir; /* VA of PD (lck by object lock) */ 151 pd_entry_t *pm_pdir; /* VA of PD (lck by object lock) */
152 paddr_t pm_pdirpa[PDP_SIZE]; /* PA of PDs (read-only after create) */ 152 paddr_t pm_pdirpa[PDP_SIZE]; /* PA of PDs (read-only after create) */
153 struct vm_page *pm_ptphint[PTP_LEVELS-1]; 153 struct vm_page *pm_ptphint[PTP_LEVELS-1];
154 /* pointer to a PTP in our pmap */ 154 /* pointer to a PTP in our pmap */
155 struct pmap_statistics pm_stats; /* pmap stats (lck by object lock) */ 155 struct pmap_statistics pm_stats; /* pmap stats (lck by object lock) */
156 156
157#if !defined(__x86_64__) 157#if !defined(__x86_64__)
158 vaddr_t pm_hiexec; /* highest executable mapping */ 158 vaddr_t pm_hiexec; /* highest executable mapping */
159#endif /* !defined(__x86_64__) */ 159#endif /* !defined(__x86_64__) */
160 int pm_flags; /* see below */ 160 int pm_flags; /* see below */
161 161
162 union descriptor *pm_ldt; /* user-set LDT */ 162 union descriptor *pm_ldt; /* user-set LDT */
163 size_t pm_ldt_len; /* size of LDT in bytes */ 163 size_t pm_ldt_len; /* size of LDT in bytes */
164 int pm_ldt_sel; /* LDT selector */ 164 int pm_ldt_sel; /* LDT selector */
165 kcpuset_t *pm_cpus; /* mask of CPUs using pmap */ 165 kcpuset_t *pm_cpus; /* mask of CPUs using pmap */
166 kcpuset_t *pm_kernel_cpus; /* mask of CPUs using kernel part 166 kcpuset_t *pm_kernel_cpus; /* mask of CPUs using kernel part
167 of pmap */ 167 of pmap */
168 kcpuset_t *pm_xen_ptp_cpus; /* mask of CPUs which have this pmap's 168 kcpuset_t *pm_xen_ptp_cpus; /* mask of CPUs which have this pmap's
169 ptp mapped */ 169 ptp mapped */
170 uint64_t pm_ncsw; /* for assertions */ 170 uint64_t pm_ncsw; /* for assertions */
171 struct vm_page *pm_gc_ptp; /* pages from pmap g/c */ 171 struct vm_page *pm_gc_ptp; /* pages from pmap g/c */
172}; 172};
173 173
174/* macro to access pm_pdirpa slots */ 174/* macro to access pm_pdirpa slots */
175#ifdef PAE 175#ifdef PAE
176#define pmap_pdirpa(pmap, index) \ 176#define pmap_pdirpa(pmap, index) \
177 ((pmap)->pm_pdirpa[l2tol3(index)] + l2tol2(index) * sizeof(pd_entry_t)) 177 ((pmap)->pm_pdirpa[l2tol3(index)] + l2tol2(index) * sizeof(pd_entry_t))
178#else 178#else
179#define pmap_pdirpa(pmap, index) \ 179#define pmap_pdirpa(pmap, index) \
180 ((pmap)->pm_pdirpa[0] + (index) * sizeof(pd_entry_t)) 180 ((pmap)->pm_pdirpa[0] + (index) * sizeof(pd_entry_t))
181#endif 181#endif
182 182
183/*  
184 * flag to be used for kernel mappings: PG_u on Xen/amd64,  
185 * 0 otherwise. 
186 */ 
187#if defined(XEN) && defined(__x86_64__) 
188#define PG_k PG_u 
189#else 
190#define PG_k 0 183#define PG_k 0
191#endif 
192 184
193/* 185/*
194 * MD flags that we use for pmap_enter and pmap_kenter_pa: 186 * MD flags that we use for pmap_enter and pmap_kenter_pa:
195 */ 187 */
196 188
197/* 189/*
198 * global kernel variables 190 * global kernel variables
199 */ 191 */
200 192
201/* 193/*
202 * PDPpaddr is the physical address of the kernel's PDP. 194 * PDPpaddr is the physical address of the kernel's PDP.
203 * - i386 non-PAE and amd64: PDPpaddr corresponds directly to the %cr3 195 * - i386 non-PAE and amd64: PDPpaddr corresponds directly to the %cr3
204 * value associated to the kernel process, proc0. 196 * value associated to the kernel process, proc0.
205 * - i386 PAE: it still represents the PA of the kernel's PDP (L2). Due to 197 * - i386 PAE: it still represents the PA of the kernel's PDP (L2). Due to
206 * the L3 PD, it cannot be considered as the equivalent of a %cr3 any more. 198 * the L3 PD, it cannot be considered as the equivalent of a %cr3 any more.
207 * - Xen: it corresponds to the PFN of the kernel's PDP. 199 * - Xen: it corresponds to the PFN of the kernel's PDP.
208 */ 200 */
209extern u_long PDPpaddr; 201extern u_long PDPpaddr;
210 202
211extern pd_entry_t pmap_pg_g; /* do we support PG_G? */ 203extern pd_entry_t pmap_pg_g; /* do we support PG_G? */
212extern pd_entry_t pmap_pg_nx; /* do we support PG_NX? */ 204extern pd_entry_t pmap_pg_nx; /* do we support PG_NX? */
213extern long nkptp[PTP_LEVELS]; 205extern long nkptp[PTP_LEVELS];
214 206
215/* 207/*
216 * macros 208 * macros
217 */ 209 */
218 210
219#define pmap_resident_count(pmap) ((pmap)->pm_stats.resident_count) 211#define pmap_resident_count(pmap) ((pmap)->pm_stats.resident_count)
220#define pmap_wired_count(pmap) ((pmap)->pm_stats.wired_count) 212#define pmap_wired_count(pmap) ((pmap)->pm_stats.wired_count)
221 213
222#define pmap_clear_modify(pg) pmap_clear_attrs(pg, PG_M) 214#define pmap_clear_modify(pg) pmap_clear_attrs(pg, PG_M)
223#define pmap_clear_reference(pg) pmap_clear_attrs(pg, PG_U) 215#define pmap_clear_reference(pg) pmap_clear_attrs(pg, PG_U)
224#define pmap_copy(DP,SP,D,L,S) __USE(L) 216#define pmap_copy(DP,SP,D,L,S) __USE(L)
225#define pmap_is_modified(pg) pmap_test_attrs(pg, PG_M) 217#define pmap_is_modified(pg) pmap_test_attrs(pg, PG_M)
226#define pmap_is_referenced(pg) pmap_test_attrs(pg, PG_U) 218#define pmap_is_referenced(pg) pmap_test_attrs(pg, PG_U)
227#define pmap_move(DP,SP,D,L,S) 219#define pmap_move(DP,SP,D,L,S)
228#define pmap_phys_address(ppn) (x86_ptob(ppn) & ~X86_MMAP_FLAG_MASK) 220#define pmap_phys_address(ppn) (x86_ptob(ppn) & ~X86_MMAP_FLAG_MASK)
229#define pmap_mmap_flags(ppn) x86_mmap_flags(ppn) 221#define pmap_mmap_flags(ppn) x86_mmap_flags(ppn)
230#define pmap_valid_entry(E) ((E) & PG_V) /* is PDE or PTE valid? */ 222#define pmap_valid_entry(E) ((E) & PG_V) /* is PDE or PTE valid? */
231 223
232#if defined(__x86_64__) || defined(PAE) 224#if defined(__x86_64__) || defined(PAE)
233#define X86_MMAP_FLAG_SHIFT (64 - PGSHIFT) 225#define X86_MMAP_FLAG_SHIFT (64 - PGSHIFT)
234#else 226#else
235#define X86_MMAP_FLAG_SHIFT (32 - PGSHIFT) 227#define X86_MMAP_FLAG_SHIFT (32 - PGSHIFT)
236#endif 228#endif
237 229
238#define X86_MMAP_FLAG_MASK 0xf 230#define X86_MMAP_FLAG_MASK 0xf
239#define X86_MMAP_FLAG_PREFETCH 0x1 231#define X86_MMAP_FLAG_PREFETCH 0x1
240 232
241/* 233/*
242 * prototypes 234 * prototypes
243 */ 235 */
244 236
245void pmap_activate(struct lwp *); 237void pmap_activate(struct lwp *);
246void pmap_bootstrap(vaddr_t); 238void pmap_bootstrap(vaddr_t);
247bool pmap_clear_attrs(struct vm_page *, unsigned); 239bool pmap_clear_attrs(struct vm_page *, unsigned);
248bool pmap_pv_clear_attrs(paddr_t, unsigned); 240bool pmap_pv_clear_attrs(paddr_t, unsigned);
249void pmap_deactivate(struct lwp *); 241void pmap_deactivate(struct lwp *);
250void pmap_page_remove(struct vm_page *); 242void pmap_page_remove(struct vm_page *);
251void pmap_pv_remove(paddr_t); 243void pmap_pv_remove(paddr_t);
252void pmap_remove(struct pmap *, vaddr_t, vaddr_t); 244void pmap_remove(struct pmap *, vaddr_t, vaddr_t);
253bool pmap_test_attrs(struct vm_page *, unsigned); 245bool pmap_test_attrs(struct vm_page *, unsigned);
254void pmap_write_protect(struct pmap *, vaddr_t, vaddr_t, vm_prot_t); 246void pmap_write_protect(struct pmap *, vaddr_t, vaddr_t, vm_prot_t);
255void pmap_load(void); 247void pmap_load(void);
256paddr_t pmap_init_tmp_pgtbl(paddr_t); 248paddr_t pmap_init_tmp_pgtbl(paddr_t);
257void pmap_remove_all(struct pmap *); 249void pmap_remove_all(struct pmap *);
258void pmap_ldt_cleanup(struct lwp *); 250void pmap_ldt_cleanup(struct lwp *);
259void pmap_ldt_sync(struct pmap *); 251void pmap_ldt_sync(struct pmap *);
260void pmap_kremove_local(vaddr_t, vsize_t); 252void pmap_kremove_local(vaddr_t, vsize_t);
261 253
262void pmap_emap_enter(vaddr_t, paddr_t, vm_prot_t); 254void pmap_emap_enter(vaddr_t, paddr_t, vm_prot_t);
263void pmap_emap_remove(vaddr_t, vsize_t); 255void pmap_emap_remove(vaddr_t, vsize_t);
264void pmap_emap_sync(bool); 256void pmap_emap_sync(bool);
265 257
266#define __HAVE_PMAP_PV_TRACK 1 258#define __HAVE_PMAP_PV_TRACK 1
267void pmap_pv_init(void); 259void pmap_pv_init(void);
268void pmap_pv_track(paddr_t, psize_t); 260void pmap_pv_track(paddr_t, psize_t);
269void pmap_pv_untrack(paddr_t, psize_t); 261void pmap_pv_untrack(paddr_t, psize_t);
270 262
271void pmap_map_ptes(struct pmap *, struct pmap **, pd_entry_t **, 263void pmap_map_ptes(struct pmap *, struct pmap **, pd_entry_t **,
272 pd_entry_t * const **); 264 pd_entry_t * const **);
273void pmap_unmap_ptes(struct pmap *, struct pmap *); 265void pmap_unmap_ptes(struct pmap *, struct pmap *);
274 266
275int pmap_pdes_invalid(vaddr_t, pd_entry_t * const *, pd_entry_t *); 267int pmap_pdes_invalid(vaddr_t, pd_entry_t * const *, pd_entry_t *);
276 268
277u_int x86_mmap_flags(paddr_t); 269u_int x86_mmap_flags(paddr_t);
278 270
279bool pmap_is_curpmap(struct pmap *); 271bool pmap_is_curpmap(struct pmap *);
280 272
281#ifndef __HAVE_DIRECT_MAP 273#ifndef __HAVE_DIRECT_MAP
282void pmap_vpage_cpu_init(struct cpu_info *); 274void pmap_vpage_cpu_init(struct cpu_info *);
283#endif 275#endif
284 276
285vaddr_t reserve_dumppages(vaddr_t); /* XXX: not a pmap fn */ 277vaddr_t reserve_dumppages(vaddr_t); /* XXX: not a pmap fn */
286 278
287typedef enum tlbwhy { 279typedef enum tlbwhy {
288 TLBSHOOT_APTE, 280 TLBSHOOT_APTE,
289 TLBSHOOT_KENTER, 281 TLBSHOOT_KENTER,
290 TLBSHOOT_KREMOVE, 282 TLBSHOOT_KREMOVE,
291 TLBSHOOT_FREE_PTP1, 283 TLBSHOOT_FREE_PTP1,
292 TLBSHOOT_FREE_PTP2, 284 TLBSHOOT_FREE_PTP2,
293 TLBSHOOT_REMOVE_PTE, 285 TLBSHOOT_REMOVE_PTE,
294 TLBSHOOT_REMOVE_PTES, 286 TLBSHOOT_REMOVE_PTES,
295 TLBSHOOT_SYNC_PV1, 287 TLBSHOOT_SYNC_PV1,
296 TLBSHOOT_SYNC_PV2, 288 TLBSHOOT_SYNC_PV2,
297 TLBSHOOT_WRITE_PROTECT, 289 TLBSHOOT_WRITE_PROTECT,
298 TLBSHOOT_ENTER, 290 TLBSHOOT_ENTER,
299 TLBSHOOT_UPDATE, 291 TLBSHOOT_UPDATE,
300 TLBSHOOT_BUS_DMA, 292 TLBSHOOT_BUS_DMA,
301 TLBSHOOT_BUS_SPACE, 293 TLBSHOOT_BUS_SPACE,
302 TLBSHOOT__MAX, 294 TLBSHOOT__MAX,
303} tlbwhy_t; 295} tlbwhy_t;
304 296
305void pmap_tlb_init(void); 297void pmap_tlb_init(void);
306void pmap_tlb_cpu_init(struct cpu_info *); 298void pmap_tlb_cpu_init(struct cpu_info *);
307void pmap_tlb_shootdown(pmap_t, vaddr_t, pt_entry_t, tlbwhy_t); 299void pmap_tlb_shootdown(pmap_t, vaddr_t, pt_entry_t, tlbwhy_t);
308void pmap_tlb_shootnow(void); 300void pmap_tlb_shootnow(void);
309void pmap_tlb_intr(void); 301void pmap_tlb_intr(void);
310 302
311#define __HAVE_PMAP_EMAP 303#define __HAVE_PMAP_EMAP
312 304
313#define PMAP_GROWKERNEL /* turn on pmap_growkernel interface */ 305#define PMAP_GROWKERNEL /* turn on pmap_growkernel interface */
314#define PMAP_FORK /* turn on pmap_fork interface */ 306#define PMAP_FORK /* turn on pmap_fork interface */
315 307
316/* 308/*
317 * Do idle page zero'ing uncached to avoid polluting the cache. 309 * Do idle page zero'ing uncached to avoid polluting the cache.
318 */ 310 */
319bool pmap_pageidlezero(paddr_t); 311bool pmap_pageidlezero(paddr_t);
320#define PMAP_PAGEIDLEZERO(pa) pmap_pageidlezero((pa)) 312#define PMAP_PAGEIDLEZERO(pa) pmap_pageidlezero((pa))
321 313
322/* 314/*
323 * inline functions 315 * inline functions
324 */ 316 */
325 317
326__inline static bool __unused 318__inline static bool __unused
327pmap_pdes_valid(vaddr_t va, pd_entry_t * const *pdes, pd_entry_t *lastpde) 319pmap_pdes_valid(vaddr_t va, pd_entry_t * const *pdes, pd_entry_t *lastpde)
328{ 320{
329 return pmap_pdes_invalid(va, pdes, lastpde) == 0; 321 return pmap_pdes_invalid(va, pdes, lastpde) == 0;
330} 322}
331 323
332/* 324/*
333 * pmap_update_pg: flush one page from the TLB (or flush the whole thing 325 * pmap_update_pg: flush one page from the TLB (or flush the whole thing
334 * if hardware doesn't support one-page flushing) 326 * if hardware doesn't support one-page flushing)
335 */ 327 */
336 328
337__inline static void __unused 329__inline static void __unused
338pmap_update_pg(vaddr_t va) 330pmap_update_pg(vaddr_t va)
339{ 331{
340 invlpg(va); 332 invlpg(va);
341} 333}
342 334
343/* 335/*
344 * pmap_update_2pg: flush two pages from the TLB 336 * pmap_update_2pg: flush two pages from the TLB
345 */ 337 */
346 338
347__inline static void __unused 339__inline static void __unused
348pmap_update_2pg(vaddr_t va, vaddr_t vb) 340pmap_update_2pg(vaddr_t va, vaddr_t vb)
349{ 341{
350 invlpg(va); 342 invlpg(va);
351 invlpg(vb); 343 invlpg(vb);
352} 344}
353 345
354/* 346/*
355 * pmap_page_protect: change the protection of all recorded mappings 347 * pmap_page_protect: change the protection of all recorded mappings
356 * of a managed page 348 * of a managed page
357 * 349 *
358 * => this function is a frontend for pmap_page_remove/pmap_clear_attrs 350 * => this function is a frontend for pmap_page_remove/pmap_clear_attrs
359 * => we only have to worry about making the page more protected. 351 * => we only have to worry about making the page more protected.
360 * unprotecting a page is done on-demand at fault time. 352 * unprotecting a page is done on-demand at fault time.
361 */ 353 */
362 354
363__inline static void __unused 355__inline static void __unused
364pmap_page_protect(struct vm_page *pg, vm_prot_t prot) 356pmap_page_protect(struct vm_page *pg, vm_prot_t prot)
365{ 357{
366 if ((prot & VM_PROT_WRITE) == 0) { 358 if ((prot & VM_PROT_WRITE) == 0) {
367 if (prot & (VM_PROT_READ|VM_PROT_EXECUTE)) { 359 if (prot & (VM_PROT_READ|VM_PROT_EXECUTE)) {
368 (void) pmap_clear_attrs(pg, PG_RW); 360 (void) pmap_clear_attrs(pg, PG_RW);
369 } else { 361 } else {
370 pmap_page_remove(pg); 362 pmap_page_remove(pg);
371 } 363 }
372 } 364 }
373} 365}
374 366
375/* 367/*
376 * pmap_pv_protect: change the protection of all recorded mappings 368 * pmap_pv_protect: change the protection of all recorded mappings
377 * of an unmanaged page 369 * of an unmanaged page
378 */ 370 */
379 371
380__inline static void __unused 372__inline static void __unused
381pmap_pv_protect(paddr_t pa, vm_prot_t prot) 373pmap_pv_protect(paddr_t pa, vm_prot_t prot)
382{ 374{
383 if ((prot & VM_PROT_WRITE) == 0) { 375 if ((prot & VM_PROT_WRITE) == 0) {
384 if (prot & (VM_PROT_READ|VM_PROT_EXECUTE)) { 376 if (prot & (VM_PROT_READ|VM_PROT_EXECUTE)) {
385 (void) pmap_pv_clear_attrs(pa, PG_RW); 377 (void) pmap_pv_clear_attrs(pa, PG_RW);
386 } else { 378 } else {
387 pmap_pv_remove(pa); 379 pmap_pv_remove(pa);
388 } 380 }
389 } 381 }
390} 382}
391 383
392/* 384/*
393 * pmap_protect: change the protection of pages in a pmap 385 * pmap_protect: change the protection of pages in a pmap
394 * 386 *
395 * => this function is a frontend for pmap_remove/pmap_write_protect 387 * => this function is a frontend for pmap_remove/pmap_write_protect
396 * => we only have to worry about making the page more protected. 388 * => we only have to worry about making the page more protected.
397 * unprotecting a page is done on-demand at fault time. 389 * unprotecting a page is done on-demand at fault time.
398 */ 390 */
399 391
400__inline static void __unused 392__inline static void __unused
401pmap_protect(struct pmap *pmap, vaddr_t sva, vaddr_t eva, vm_prot_t prot) 393pmap_protect(struct pmap *pmap, vaddr_t sva, vaddr_t eva, vm_prot_t prot)
402{ 394{
403 if ((prot & VM_PROT_WRITE) == 0) { 395 if ((prot & VM_PROT_WRITE) == 0) {
404 if (prot & (VM_PROT_READ|VM_PROT_EXECUTE)) { 396 if (prot & (VM_PROT_READ|VM_PROT_EXECUTE)) {
405 pmap_write_protect(pmap, sva, eva, prot); 397 pmap_write_protect(pmap, sva, eva, prot);
406 } else { 398 } else {
407 pmap_remove(pmap, sva, eva); 399 pmap_remove(pmap, sva, eva);
408 } 400 }
409 } 401 }
410} 402}
411 403
412/* 404/*
413 * various address inlines 405 * various address inlines
414 * 406 *
415 * vtopte: return a pointer to the PTE mapping a VA, works only for 407 * vtopte: return a pointer to the PTE mapping a VA, works only for
416 * user and PT addresses 408 * user and PT addresses
417 * 409 *
418 * kvtopte: return a pointer to the PTE mapping a kernel VA 410 * kvtopte: return a pointer to the PTE mapping a kernel VA
419 */ 411 */
420 412
421#include <lib/libkern/libkern.h> 413#include <lib/libkern/libkern.h>
422 414
423static __inline pt_entry_t * __unused 415static __inline pt_entry_t * __unused
424vtopte(vaddr_t va) 416vtopte(vaddr_t va)
425{ 417{
426 418
427 KASSERT(va < VM_MIN_KERNEL_ADDRESS); 419 KASSERT(va < VM_MIN_KERNEL_ADDRESS);
428 420
429 return (PTE_BASE + pl1_i(va)); 421 return (PTE_BASE + pl1_i(va));
430} 422}
431 423
432static __inline pt_entry_t * __unused 424static __inline pt_entry_t * __unused
433kvtopte(vaddr_t va) 425kvtopte(vaddr_t va)
434{ 426{
435 pd_entry_t *pde; 427 pd_entry_t *pde;
436 428
437 KASSERT(va >= VM_MIN_KERNEL_ADDRESS); 429 KASSERT(va >= VM_MIN_KERNEL_ADDRESS);
438 430
439 pde = L2_BASE + pl2_i(va); 431 pde = L2_BASE + pl2_i(va);
440 if (*pde & PG_PS) 432 if (*pde & PG_PS)
441 return ((pt_entry_t *)pde); 433 return ((pt_entry_t *)pde);
442 434
443 return (PTE_BASE + pl1_i(va)); 435 return (PTE_BASE + pl1_i(va));
444} 436}
445 437
446paddr_t vtophys(vaddr_t); 438paddr_t vtophys(vaddr_t);
447vaddr_t pmap_map(vaddr_t, paddr_t, paddr_t, vm_prot_t); 439vaddr_t pmap_map(vaddr_t, paddr_t, paddr_t, vm_prot_t);
448void pmap_cpu_init_late(struct cpu_info *); 440void pmap_cpu_init_late(struct cpu_info *);
449bool sse2_idlezero_page(void *); 441bool sse2_idlezero_page(void *);
450 442
451#ifdef XEN 443#ifdef XEN
452#include <sys/bitops.h> 444#include <sys/bitops.h>
453 445
454#define XPTE_MASK L1_FRAME 446#define XPTE_MASK L1_FRAME
455/* Selects the index of a PTE in (A)PTE_BASE */ 447/* Selects the index of a PTE in (A)PTE_BASE */
456#define XPTE_SHIFT (L1_SHIFT - ilog2(sizeof(pt_entry_t))) 448#define XPTE_SHIFT (L1_SHIFT - ilog2(sizeof(pt_entry_t)))
457 449
458/* PTE access inline fuctions */ 450/* PTE access inline fuctions */
459 451
460/* 452/*
461 * Get the machine address of the pointed pte 453 * Get the machine address of the pointed pte
462 * We use hardware MMU to get value so works only for levels 1-3 454 * We use hardware MMU to get value so works only for levels 1-3
463 */ 455 */
464 456
465static __inline paddr_t 457static __inline paddr_t
466xpmap_ptetomach(pt_entry_t *pte) 458xpmap_ptetomach(pt_entry_t *pte)
467{ 459{
468 pt_entry_t *up_pte; 460 pt_entry_t *up_pte;
469 vaddr_t va = (vaddr_t) pte; 461 vaddr_t va = (vaddr_t) pte;
470 462
471 va = ((va & XPTE_MASK) >> XPTE_SHIFT) | (vaddr_t) PTE_BASE; 463 va = ((va & XPTE_MASK) >> XPTE_SHIFT) | (vaddr_t) PTE_BASE;
472 up_pte = (pt_entry_t *) va; 464 up_pte = (pt_entry_t *) va;
473 465
474 return (paddr_t) (((*up_pte) & PG_FRAME) + (((vaddr_t) pte) & (~PG_FRAME & ~VA_SIGN_MASK))); 466 return (paddr_t) (((*up_pte) & PG_FRAME) + (((vaddr_t) pte) & (~PG_FRAME & ~VA_SIGN_MASK)));
475} 467}
476 468
477/* Xen helpers to change bits of a pte */ 469/* Xen helpers to change bits of a pte */
478#define XPMAP_UPDATE_DIRECT 1 /* Update direct map entry flags too */ 470#define XPMAP_UPDATE_DIRECT 1 /* Update direct map entry flags too */
479 471
480paddr_t vtomach(vaddr_t); 472paddr_t vtomach(vaddr_t);
481#define vtomfn(va) (vtomach(va) >> PAGE_SHIFT) 473#define vtomfn(va) (vtomach(va) >> PAGE_SHIFT)
482#endif /* XEN */ 474#endif /* XEN */
483 475
484/* pmap functions with machine addresses */ 476/* pmap functions with machine addresses */
485void pmap_kenter_ma(vaddr_t, paddr_t, vm_prot_t, u_int); 477void pmap_kenter_ma(vaddr_t, paddr_t, vm_prot_t, u_int);
486int pmap_enter_ma(struct pmap *, vaddr_t, paddr_t, paddr_t, 478int pmap_enter_ma(struct pmap *, vaddr_t, paddr_t, paddr_t,
487 vm_prot_t, u_int, int); 479 vm_prot_t, u_int, int);
488bool pmap_extract_ma(pmap_t, vaddr_t, paddr_t *); 480bool pmap_extract_ma(pmap_t, vaddr_t, paddr_t *);
489void pmap_free_ptps(struct vm_page *); 481void pmap_free_ptps(struct vm_page *);
490 482
491/* 483/*
492 * Hooks for the pool allocator. 484 * Hooks for the pool allocator.
493 */ 485 */
494#define POOL_VTOPHYS(va) vtophys((vaddr_t) (va)) 486#define POOL_VTOPHYS(va) vtophys((vaddr_t) (va))
495 487
496#ifdef __HAVE_DIRECT_MAP 488#ifdef __HAVE_DIRECT_MAP
497 489
498#define L4_SLOT_DIRECT 509 490#define L4_SLOT_DIRECT 509
499#define PDIR_SLOT_DIRECT L4_SLOT_DIRECT 491#define PDIR_SLOT_DIRECT L4_SLOT_DIRECT
500 492
501#define PMAP_DIRECT_BASE (VA_SIGN_NEG((L4_SLOT_DIRECT * NBPD_L4))) 493#define PMAP_DIRECT_BASE (VA_SIGN_NEG((L4_SLOT_DIRECT * NBPD_L4)))
502#define PMAP_DIRECT_END (VA_SIGN_NEG(((L4_SLOT_DIRECT + 1) * NBPD_L4))) 494#define PMAP_DIRECT_END (VA_SIGN_NEG(((L4_SLOT_DIRECT + 1) * NBPD_L4)))
503 495
504#define PMAP_DIRECT_MAP(pa) ((vaddr_t)PMAP_DIRECT_BASE + (pa)) 496#define PMAP_DIRECT_MAP(pa) ((vaddr_t)PMAP_DIRECT_BASE + (pa))
505#define PMAP_DIRECT_UNMAP(va) ((paddr_t)(va) - PMAP_DIRECT_BASE) 497#define PMAP_DIRECT_UNMAP(va) ((paddr_t)(va) - PMAP_DIRECT_BASE)
506 498
507/* 499/*
508 * Alternate mapping hooks for pool pages. 500 * Alternate mapping hooks for pool pages.
509 */ 501 */
510#define PMAP_MAP_POOLPAGE(pa) PMAP_DIRECT_MAP((pa)) 502#define PMAP_MAP_POOLPAGE(pa) PMAP_DIRECT_MAP((pa))
511#define PMAP_UNMAP_POOLPAGE(va) PMAP_DIRECT_UNMAP((va)) 503#define PMAP_UNMAP_POOLPAGE(va) PMAP_DIRECT_UNMAP((va))
512 504
513void pagezero(vaddr_t); 505void pagezero(vaddr_t);
514 506
515#endif /* __HAVE_DIRECT_MAP */ 507#endif /* __HAVE_DIRECT_MAP */
516 508
517#endif /* _KERNEL */ 509#endif /* _KERNEL */
518 510
519#endif /* _X86_PMAP_H_ */ 511#endif /* _X86_PMAP_H_ */