Now
MAIN commitmail json YAML
src/sys/arch/x86/x86/x86_tlb.c@1.6
/
diff
/
nxr@1.6
src/sys/dev/nvmm/x86/nvmm_x86_svm.c@1.29 / diff / nxr@1.29
src/sys/dev/nvmm/x86/nvmm_x86_vmx.c@1.9 / diff / nxr@1.9
src/sys/dev/nvmm/x86/nvmm_x86_svm.c@1.29 / diff / nxr@1.29
src/sys/dev/nvmm/x86/nvmm_x86_vmx.c@1.9 / diff / nxr@1.9
Another locking issue in NVMM: the {svm,vmx}_tlb_flush functions take VCPU
mutexes which can sleep, but their context does not allow it.
Rewrite the TLB handling code to fix that. It becomes a bit complex. In
short, we use a per-VM generation number, which we increase on each TLB
flush, before sending a broadcast IPI to everybody. The IPIs cause a
#VMEXIT of each VCPU, and each VCPU Loop will synchronize the per-VM gen
with a per-VCPU copy, and apply the flushes as neededi lazily.
The behavior differs between AMD and Intel; in short, on Intel we don't
flush the hTLB (EPT cache) if a context switch of a VCPU occurs, so now,
we need to maintain a kcpuset to know which VCPU's hTLBs are active on
which hCPU. This creates some redundancy on Intel, ie there are cases
where we flush the hTLB several times unnecessarily; but hTLB flushes are
very rare, so there is no real performance regression.
The thing is lock-less and non-blocking, so it solves our problem.
mutexes which can sleep, but their context does not allow it.
Rewrite the TLB handling code to fix that. It becomes a bit complex. In
short, we use a per-VM generation number, which we increase on each TLB
flush, before sending a broadcast IPI to everybody. The IPIs cause a
#VMEXIT of each VCPU, and each VCPU Loop will synchronize the per-VM gen
with a per-VCPU copy, and apply the flushes as neededi lazily.
The behavior differs between AMD and Intel; in short, on Intel we don't
flush the hTLB (EPT cache) if a context switch of a VCPU occurs, so now,
we need to maintain a kcpuset to know which VCPU's hTLBs are active on
which hCPU. This creates some redundancy on Intel, ie there are cases
where we flush the hTLB several times unnecessarily; but hTLB flushes are
very rare, so there is no real performance regression.
The thing is lock-less and non-blocking, so it solves our problem.