Received: by mail.netbsd.org (Postfix, from userid 605) id DC0C684DB1; Thu, 16 Jul 2020 09:57:20 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.netbsd.org (Postfix) with ESMTP id 6323884D96 for ; Thu, 16 Jul 2020 09:57:20 +0000 (UTC) X-Virus-Scanned: amavisd-new at netbsd.org Received: from mail.netbsd.org ([IPv6:::1]) by localhost (mail.netbsd.org [IPv6:::1]) (amavisd-new, port 10025) with ESMTP id Asr1L7bC5ODr for ; Thu, 16 Jul 2020 09:57:17 +0000 (UTC) Received: from cvs.NetBSD.org (ivanova.NetBSD.org [IPv6:2001:470:a085:999:28c:faff:fe03:5984]) by mail.netbsd.org (Postfix) with ESMTP id A409684D3C for ; Thu, 16 Jul 2020 09:57:17 +0000 (UTC) Received: by cvs.NetBSD.org (Postfix, from userid 500) id 9B188FB28; Thu, 16 Jul 2020 09:57:17 +0000 (UTC) Content-Transfer-Encoding: 7bit Content-Type: multipart/mixed; boundary="_----------=_1594893437232860" MIME-Version: 1.0 Date: Thu, 16 Jul 2020 09:57:17 +0000 From: "Manuel Bouyer" Subject: CVS commit: pkgsrc/sysutils/xenkernel411 To: pkgsrc-changes@NetBSD.org Reply-To: bouyer@netbsd.org X-Mailer: log_accum Message-Id: <20200716095717.9B188FB28@cvs.NetBSD.org> Sender: pkgsrc-changes-owner@NetBSD.org List-Id: pkgsrc-changes.NetBSD.org Precedence: bulk List-Unsubscribe: This is a multi-part message in MIME format. --_----------=_1594893437232860 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" Module Name: pkgsrc Committed By: bouyer Date: Thu Jul 16 09:57:17 UTC 2020 Modified Files: pkgsrc/sysutils/xenkernel411: Makefile distinfo Added Files: pkgsrc/sysutils/xenkernel411/patches: patch-XSA317 patch-XSA319 patch-XSA320 patch-XSA321 patch-XSA328 Log Message: Add patches for Xen Security Advisories XSA317, XSA319, XSA320, XSA321 and XSA328 Bump PKGREVISION To generate a diff of this commit: cvs rdiff -u -r1.13 -r1.14 pkgsrc/sysutils/xenkernel411/Makefile cvs rdiff -u -r1.11 -r1.12 pkgsrc/sysutils/xenkernel411/distinfo cvs rdiff -u -r0 -r1.1 pkgsrc/sysutils/xenkernel411/patches/patch-XSA317 \ pkgsrc/sysutils/xenkernel411/patches/patch-XSA319 \ pkgsrc/sysutils/xenkernel411/patches/patch-XSA320 \ pkgsrc/sysutils/xenkernel411/patches/patch-XSA321 \ pkgsrc/sysutils/xenkernel411/patches/patch-XSA328 Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. --_----------=_1594893437232860 Content-Disposition: inline Content-Length: 52086 Content-Transfer-Encoding: binary Content-Type: text/x-diff; charset=us-ascii Modified files: Index: pkgsrc/sysutils/xenkernel411/Makefile diff -u pkgsrc/sysutils/xenkernel411/Makefile:1.13 pkgsrc/sysutils/xenkernel411/Makefile:1.14 --- pkgsrc/sysutils/xenkernel411/Makefile:1.13 Wed Apr 15 15:37:19 2020 +++ pkgsrc/sysutils/xenkernel411/Makefile Thu Jul 16 09:57:17 2020 @@ -1,7 +1,7 @@ -# $NetBSD: Makefile,v 1.13 2020/04/15 15:37:19 bouyer Exp $ +# $NetBSD: Makefile,v 1.14 2020/07/16 09:57:17 bouyer Exp $ VERSION= 4.11.3 -PKGREVISION= 2 +PKGREVISION= 3 DISTNAME= xen-${VERSION} PKGNAME= xenkernel411-${VERSION} CATEGORIES= sysutils Index: pkgsrc/sysutils/xenkernel411/distinfo diff -u pkgsrc/sysutils/xenkernel411/distinfo:1.11 pkgsrc/sysutils/xenkernel411/distinfo:1.12 --- pkgsrc/sysutils/xenkernel411/distinfo:1.11 Wed Apr 15 15:45:04 2020 +++ pkgsrc/sysutils/xenkernel411/distinfo Thu Jul 16 09:57:17 2020 @@ -1,4 +1,4 @@ -$NetBSD: distinfo,v 1.11 2020/04/15 15:45:04 bouyer Exp $ +$NetBSD: distinfo,v 1.12 2020/07/16 09:57:17 bouyer Exp $ SHA1 (xen411/xen-4.11.3.tar.gz) = 2d77152168d6f9dcea50db9cb8e3e6a0720a4a1b RMD160 (xen411/xen-4.11.3.tar.gz) = cfb2e699842867b60d25a01963c564a6c5e580da @@ -12,7 +12,12 @@ SHA1 (patch-XSA310) = 77b711f4b75de1d473 SHA1 (patch-XSA311) = 4d3e6cc39c2b95cb3339961271df2bc885667927 SHA1 (patch-XSA313) = b2f281d6aed1207727cd454dcb5e914c7f6fb44b SHA1 (patch-XSA316) = 9cce683315e4c1ca6d53b578e69ae71e1db2b3eb +SHA1 (patch-XSA317) = 3a3e7bf8f115bebaf56001afcf68c2bd501c00a5 SHA1 (patch-XSA318) = d0dcbb99ab584098aed7995a7a05d5bf4ac28d47 +SHA1 (patch-XSA319) = 4954bdc849666e1c735c3281256e4850c0594ee8 +SHA1 (patch-XSA320) = 38d84a2ded4ccacee455ba64eb3b369e5661fbfd +SHA1 (patch-XSA321) = 5281304282a26ee252344ec26b07d25ac4ce8b54 +SHA1 (patch-XSA328) = a9b02c183a5dbfb6c0fe50824f18896fcab4a9e9 SHA1 (patch-xen_Makefile) = 465388d80de414ca3bb84faefa0f52d817e423a6 SHA1 (patch-xen_Rules.mk) = c743dc63f51fc280d529a7d9e08650292c171dac SHA1 (patch-xen_arch_x86_Rules.mk) = 0bedfc53a128a87b6a249ae04fbdf6a053bfb70b Added files: Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA317 diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA317:1.1 --- /dev/null Thu Jul 16 09:57:17 2020 +++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA317 Thu Jul 16 09:57:17 2020 @@ -0,0 +1,52 @@ +$NetBSD: patch-XSA317,v 1.1 2020/07/16 09:57:17 bouyer Exp $ + +From aeb46e92f915f19a61d5a8a1f4b696793f64e6fb Mon Sep 17 00:00:00 2001 +From: Julien Grall +Date: Thu, 19 Mar 2020 13:17:31 +0000 +Subject: [PATCH] xen/common: event_channel: Don't ignore error in + get_free_port() + +Currently, get_free_port() is assuming that the port has been allocated +when evtchn_allocate_port() is not return -EBUSY. + +However, the function may return an error when: + - We exhausted all the event channels. This can happen if the limit + configured by the administrator for the guest ('max_event_channels' + in xl cfg) is higher than the ABI used by the guest. For instance, + if the guest is using 2L, the limit should not be higher than 4095. + - We cannot allocate memory (e.g Xen has not more memory). + +Users of get_free_port() (such as EVTCHNOP_alloc_unbound) will validly +assuming the port was valid and will next call evtchn_from_port(). This +will result to a crash as the memory backing the event channel structure +is not present. + +Fixes: 368ae9a05fe ("xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU") +Signed-off-by: Julien Grall +Reviewed-by: Jan Beulich +--- + xen/common/event_channel.c | 8 ++++---- + 1 file changed, 4 insertions(+), 4 deletions(-) + +diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c +index e86e2bfab0..a8d182b584 100644 +--- xen/common/event_channel.c.orig ++++ xen/common/event_channel.c +@@ -195,10 +195,10 @@ static int get_free_port(struct domain *d) + { + int rc = evtchn_allocate_port(d, port); + +- if ( rc == -EBUSY ) +- continue; +- +- return port; ++ if ( rc == 0 ) ++ return port; ++ else if ( rc != -EBUSY ) ++ return rc; + } + + return -ENOSPC; +-- +2.17.1 + Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA319 diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA319:1.1 --- /dev/null Thu Jul 16 09:57:17 2020 +++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA319 Thu Jul 16 09:57:17 2020 @@ -0,0 +1,29 @@ +$NetBSD: patch-XSA319,v 1.1 2020/07/16 09:57:17 bouyer Exp $ + +From: Jan Beulich +Subject: x86/shadow: correct an inverted conditional in dirty VRAM tracking + +This originally was "mfn_x(mfn) == INVALID_MFN". Make it like this +again, taking the opportunity to also drop the unnecessary nearby +braces. + +This is XSA-319. + +Fixes: 246a5a3377c2 ("xen: Use a typesafe to define INVALID_MFN") +Signed-off-by: Jan Beulich +Reviewed-by: Andrew Cooper + +--- xen/arch/x86/mm/shadow/common.c.orig ++++ xen/arch/x86/mm/shadow/common.c +@@ -3252,10 +3252,8 @@ int shadow_track_dirty_vram(struct domai + int dirty = 0; + paddr_t sl1ma = dirty_vram->sl1ma[i]; + +- if ( !mfn_eq(mfn, INVALID_MFN) ) +- { ++ if ( mfn_eq(mfn, INVALID_MFN) ) + dirty = 1; +- } + else + { + page = mfn_to_page(mfn); Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA320 diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA320:1.1 --- /dev/null Thu Jul 16 09:57:17 2020 +++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA320 Thu Jul 16 09:57:17 2020 @@ -0,0 +1,371 @@ +$NetBSD: patch-XSA320,v 1.1 2020/07/16 09:57:17 bouyer Exp $ + +From: Andrew Cooper +Subject: x86/spec-ctrl: CPUID/MSR definitions for Special Register Buffer Data Sampling + +This is part of XSA-320 / CVE-2020-0543 + +Signed-off-by: Andrew Cooper +Reviewed-by: Jan Beulich +Acked-by: Wei Liu + +diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown +index 194615bfc5..9be18ac99f 100644 +--- docs/misc/xen-command-line.markdown.orig ++++ docs/misc/xen-command-line.markdown +@@ -489,10 +489,10 @@ accounting for hardware capabilities as enumerated via CPUID. + + Currently accepted: + +-The Speculation Control hardware features `md-clear`, `ibrsb`, `stibp`, `ibpb`, +-`l1d-flush` and `ssbd` are used by default if available and applicable. They can +-be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and +-won't offer them to guests. ++The Speculation Control hardware features `srbds-ctrl`, `md-clear`, `ibrsb`, ++`stibp`, `ibpb`, `l1d-flush` and `ssbd` are used by default if available and ++applicable. They can be ignored, e.g. `no-ibrsb`, at which point Xen won't ++use them itself, and won't offer them to guests. + + ### cpuid\_mask\_cpu (AMD only) + > `= fam_0f_rev_c | fam_0f_rev_d | fam_0f_rev_e | fam_0f_rev_f | fam_0f_rev_g | fam_10_rev_b | fam_10_rev_c | fam_11_rev_b` +diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c +index 5a1702d703..1235c8b91e 100644 +--- tools/libxl/libxl_cpuid.c.orig ++++ tools/libxl/libxl_cpuid.c +@@ -202,6 +202,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str) + + {"avx512-4vnniw",0x00000007, 0, CPUID_REG_EDX, 2, 1}, + {"avx512-4fmaps",0x00000007, 0, CPUID_REG_EDX, 3, 1}, ++ {"srbds-ctrl", 0x00000007, 0, CPUID_REG_EDX, 9, 1}, + {"md-clear", 0x00000007, 0, CPUID_REG_EDX, 10, 1}, + {"ibrsb", 0x00000007, 0, CPUID_REG_EDX, 26, 1}, + {"stibp", 0x00000007, 0, CPUID_REG_EDX, 27, 1}, +diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c +index 4c9af6b7f0..8fb54c3001 100644 +--- tools/misc/xen-cpuid.c.orig ++++ tools/misc/xen-cpuid.c +@@ -142,6 +142,7 @@ static const char *str_7d0[32] = + { + [ 2] = "avx512_4vnniw", [ 3] = "avx512_4fmaps", + ++ /* 8 */ [ 9] = "srbds-ctrl", + [10] = "md-clear", + /* 12 */ [13] = "tsx-force-abort", + +diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c +index 04aefa555d..b8e5b6fe67 100644 +--- xen/arch/x86/cpuid.c.orig ++++ xen/arch/x86/cpuid.c +@@ -58,6 +58,11 @@ static int __init parse_xen_cpuid(const char *s) + if ( !val ) + setup_clear_cpu_cap(X86_FEATURE_SSBD); + } ++ else if ( (val = parse_boolean("srbds-ctrl", s, ss)) >= 0 ) ++ { ++ if ( !val ) ++ setup_clear_cpu_cap(X86_FEATURE_SRBDS_CTRL); ++ } + else + rc = -EINVAL; + +diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c +index ccb316c547..256e58d82b 100644 +--- xen/arch/x86/msr.c.orig ++++ xen/arch/x86/msr.c +@@ -154,6 +154,7 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val) + /* Write-only */ + case MSR_TSX_FORCE_ABORT: + case MSR_TSX_CTRL: ++ case MSR_MCU_OPT_CTRL: + /* Not offered to guests. */ + goto gp_fault; + +@@ -243,6 +244,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val) + /* Read-only */ + case MSR_TSX_FORCE_ABORT: + case MSR_TSX_CTRL: ++ case MSR_MCU_OPT_CTRL: + /* Not offered to guests. */ + goto gp_fault; + +diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c +index ab196b156d..94ab8dd786 100644 +--- xen/arch/x86/spec_ctrl.c.orig ++++ xen/arch/x86/spec_ctrl.c +@@ -365,12 +365,13 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps) + printk("Speculative mitigation facilities:\n"); + + /* Hardware features which pertain to speculative mitigations. */ +- printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n", ++ printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n", + (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "", + (_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP" : "", + (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH)) ? " L1D_FLUSH" : "", + (_7d0 & cpufeat_mask(X86_FEATURE_SSBD)) ? " SSBD" : "", + (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "", ++ (_7d0 & cpufeat_mask(X86_FEATURE_SRBDS_CTRL)) ? " SRBDS_CTRL" : "", + (e8b & cpufeat_mask(X86_FEATURE_IBPB)) ? " IBPB" : "", + (caps & ARCH_CAPS_IBRS_ALL) ? " IBRS_ALL" : "", + (caps & ARCH_CAPS_RDCL_NO) ? " RDCL_NO" : "", +diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h +index 1761a01f1f..480d1d8102 100644 +--- xen/include/asm-x86/msr-index.h.orig ++++ xen/include/asm-x86/msr-index.h +@@ -177,6 +177,9 @@ + #define MSR_IA32_VMX_TRUE_ENTRY_CTLS 0x490 + #define MSR_IA32_VMX_VMFUNC 0x491 + ++#define MSR_MCU_OPT_CTRL 0x00000123 ++#define MCU_OPT_CTRL_RNGDS_MITG_DIS (_AC(1, ULL) << 0) ++ + /* K7/K8 MSRs. Not complete. See the architecture manual for a more + complete list. */ + #define MSR_K7_EVNTSEL0 0xc0010000 +diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h +index a14d8a7013..9d210e74a0 100644 +--- xen/include/public/arch-x86/cpufeatureset.h.orig ++++ xen/include/public/arch-x86/cpufeatureset.h +@@ -242,6 +242,7 @@ XEN_CPUFEATURE(IBPB, 8*32+12) /*A IBPB support only (no IBRS, used by + /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */ + XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A AVX512 Neural Network Instructions */ + XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A AVX512 Multiply Accumulation Single Precision */ ++XEN_CPUFEATURE(SRBDS_CTRL, 9*32+ 9) /* MSR_MCU_OPT_CTRL and RNGDS_MITG_DIS. */ + XEN_CPUFEATURE(MD_CLEAR, 9*32+10) /*A VERW clears microarchitectural buffers */ + XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */ + XEN_CPUFEATURE(IBRSB, 9*32+26) /*A IBRS and IBPB support (used by Intel) */ +From: Andrew Cooper +Subject: x86/spec-ctrl: Mitigate the Special Register Buffer Data Sampling sidechannel + +See patch documentation and comments. + +This is part of XSA-320 / CVE-2020-0543 + +Signed-off-by: Andrew Cooper +Reviewed-by: Jan Beulich + +diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown +index 9be18ac99f..3356e59fee 100644 +--- docs/misc/xen-command-line.markdown.orig ++++ docs/misc/xen-command-line.markdown +@@ -1858,7 +1858,7 @@ false disable the quirk workaround, which is also the default. + ### spec-ctrl (x86) + > `= List of [ , xen=, {pv,hvm,msr-sc,rsb,md-clear}=, + > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu, +-> l1d-flush}= ]` ++> l1d-flush,srb-lock}= ]` + + Controls for speculative execution sidechannel mitigations. By default, Xen + will pick the most appropriate mitigations based on compiled in support, +@@ -1930,6 +1930,12 @@ Irrespective of Xen's setting, the feature is virtualised for HVM guests to + use. By default, Xen will enable this mitigation on hardware believed to be + vulnerable to L1TF. + ++On hardware supporting SRBDS_CTRL, the `srb-lock=` option can be used to force ++or prevent Xen from protect the Special Register Buffer from leaking stale ++data. By default, Xen will enable this mitigation, except on parts where MDS ++is fixed and TAA is fixed/mitigated (in which case, there is believed to be no ++way for an attacker to obtain the stale data). ++ + ### sync\_console + > `= ` + +diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c +index 4c12794809..30e1bd5cd3 100644 +--- xen/arch/x86/acpi/power.c.orig ++++ xen/arch/x86/acpi/power.c +@@ -266,6 +266,9 @@ static int enter_state(u32 state) + ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr); + spec_ctrl_exit_idle(ci); + ++ if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) ) ++ wrmsrl(MSR_MCU_OPT_CTRL, default_xen_mcu_opt_ctrl); ++ + done: + spin_debug_enable(); + local_irq_restore(flags); +diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c +index 0887806e85..d24d215946 100644 +--- xen/arch/x86/smpboot.c.orig ++++ xen/arch/x86/smpboot.c +@@ -369,12 +369,14 @@ void start_secondary(void *unused) + microcode_resume_cpu(cpu); + + /* +- * If MSR_SPEC_CTRL is available, apply Xen's default setting and discard +- * any firmware settings. Note: MSR_SPEC_CTRL may only become available +- * after loading microcode. ++ * If any speculative control MSRs are available, apply Xen's default ++ * settings. Note: These MSRs may only become available after loading ++ * microcode. + */ + if ( boot_cpu_has(X86_FEATURE_IBRSB) ) + wrmsrl(MSR_SPEC_CTRL, default_xen_spec_ctrl); ++ if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) ) ++ wrmsrl(MSR_MCU_OPT_CTRL, default_xen_mcu_opt_ctrl); + + tsx_init(); /* Needs microcode. May change HLE/RTM feature bits. */ + +diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c +index 94ab8dd786..a306d10c34 100644 +--- xen/arch/x86/spec_ctrl.c.orig ++++ xen/arch/x86/spec_ctrl.c +@@ -63,6 +63,9 @@ static unsigned int __initdata l1d_maxphysaddr; + static bool __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */ + static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination. */ + ++static int8_t __initdata opt_srb_lock = -1; ++uint64_t __read_mostly default_xen_mcu_opt_ctrl; ++ + static int __init parse_bti(const char *s) + { + const char *ss; +@@ -166,6 +169,7 @@ static int __init parse_spec_ctrl(const char *s) + opt_ibpb = false; + opt_ssbd = false; + opt_l1d_flush = 0; ++ opt_srb_lock = 0; + } + else if ( val > 0 ) + rc = -EINVAL; +@@ -231,6 +235,8 @@ static int __init parse_spec_ctrl(const char *s) + opt_eager_fpu = val; + else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 ) + opt_l1d_flush = val; ++ else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 ) ++ opt_srb_lock = val; + else + rc = -EINVAL; + +@@ -394,7 +400,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps) + "\n"); + + /* Settings for Xen's protection, irrespective of guests. */ +- printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s, Other:%s%s%s\n", ++ printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s, Other:%s%s%s%s\n", + thunk == THUNK_NONE ? "N/A" : + thunk == THUNK_RETPOLINE ? "RETPOLINE" : + thunk == THUNK_LFENCE ? "LFENCE" : +@@ -405,6 +411,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps) + (default_xen_spec_ctrl & SPEC_CTRL_SSBD) ? " SSBD+" : " SSBD-", + !(caps & ARCH_CAPS_TSX_CTRL) ? "" : + (opt_tsx & 1) ? " TSX+" : " TSX-", ++ !boot_cpu_has(X86_FEATURE_SRBDS_CTRL) ? "" : ++ opt_srb_lock ? " SRB_LOCK+" : " SRB_LOCK-", + opt_ibpb ? " IBPB" : "", + opt_l1d_flush ? " L1D_FLUSH" : "", + opt_md_clear_pv || opt_md_clear_hvm ? " VERW" : ""); +@@ -1196,6 +1204,34 @@ void __init init_speculation_mitigations(void) + tsx_init(); + } + ++ /* Calculate suitable defaults for MSR_MCU_OPT_CTRL */ ++ if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) ) ++ { ++ uint64_t val; ++ ++ rdmsrl(MSR_MCU_OPT_CTRL, val); ++ ++ /* ++ * On some SRBDS-affected hardware, it may be safe to relax srb-lock ++ * by default. ++ * ++ * On parts which enumerate MDS_NO and not TAA_NO, TSX is the only way ++ * to access the Fill Buffer. If TSX isn't available (inc. SKU ++ * reasons on some models), or TSX is explicitly disabled, then there ++ * is no need for the extra overhead to protect RDRAND/RDSEED. ++ */ ++ if ( opt_srb_lock == -1 && ++ (caps & (ARCH_CAPS_MDS_NO|ARCH_CAPS_TAA_NO)) == ARCH_CAPS_MDS_NO && ++ (!cpu_has_hle || ((caps & ARCH_CAPS_TSX_CTRL) && opt_tsx == 0)) ) ++ opt_srb_lock = 0; ++ ++ val &= ~MCU_OPT_CTRL_RNGDS_MITG_DIS; ++ if ( !opt_srb_lock ) ++ val |= MCU_OPT_CTRL_RNGDS_MITG_DIS; ++ ++ default_xen_mcu_opt_ctrl = val; ++ } ++ + print_details(thunk, caps); + + /* +@@ -1227,6 +1263,9 @@ void __init init_speculation_mitigations(void) + + wrmsrl(MSR_SPEC_CTRL, bsp_delay_spec_ctrl ? 0 : default_xen_spec_ctrl); + } ++ ++ if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) ) ++ wrmsrl(MSR_MCU_OPT_CTRL, default_xen_mcu_opt_ctrl); + } + + static void __init __maybe_unused build_assertions(void) +diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h +index 333d180b7e..bf10d2ce5c 100644 +--- xen/include/asm-x86/spec_ctrl.h.orig ++++ xen/include/asm-x86/spec_ctrl.h +@@ -46,6 +46,8 @@ extern int8_t opt_pv_l1tf_hwdom, opt_pv_l1tf_domu; + */ + extern paddr_t l1tf_addr_mask, l1tf_safe_maddr; + ++extern uint64_t default_xen_mcu_opt_ctrl; ++ + static inline void init_shadow_spec_ctrl_state(void) + { + struct cpu_info *info = get_cpu_info(); +From: Andrew Cooper +Subject: x86/spec-ctrl: Allow the RDRAND/RDSEED features to be hidden + +RDRAND/RDSEED can be hidden using cpuid= to mitigate SRBDS if microcode +isn't available. + +This is part of XSA-320 / CVE-2020-0543. + +Signed-off-by: Andrew Cooper +Acked-by: Julien Grall + +diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown +index 3356e59fee..ac397e7de0 100644 +--- docs/misc/xen-command-line.markdown.orig ++++ docs/misc/xen-command-line.markdown +@@ -487,12 +487,18 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels. + This option allows for fine tuning of the facilities Xen will use, after + accounting for hardware capabilities as enumerated via CPUID. + ++Unless otherwise noted, options only have any effect in their negative form, ++to hide the named feature(s). Ignoring a feature using this mechanism will ++cause Xen not to use the feature, nor offer them as usable to guests. ++ + Currently accepted: + + The Speculation Control hardware features `srbds-ctrl`, `md-clear`, `ibrsb`, + `stibp`, `ibpb`, `l1d-flush` and `ssbd` are used by default if available and +-applicable. They can be ignored, e.g. `no-ibrsb`, at which point Xen won't +-use them itself, and won't offer them to guests. ++applicable. They can all be ignored. ++ ++`rdrand` and `rdseed` can be ignored, as a mitigation to XSA-320 / ++CVE-2020-0543. + + ### cpuid\_mask\_cpu (AMD only) + > `= fam_0f_rev_c | fam_0f_rev_d | fam_0f_rev_e | fam_0f_rev_f | fam_0f_rev_g | fam_10_rev_b | fam_10_rev_c | fam_11_rev_b` +diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c +index b8e5b6fe67..78d08dbb32 100644 +--- xen/arch/x86/cpuid.c.orig ++++ xen/arch/x86/cpuid.c +@@ -63,6 +63,16 @@ static int __init parse_xen_cpuid(const char *s) + if ( !val ) + setup_clear_cpu_cap(X86_FEATURE_SRBDS_CTRL); + } ++ else if ( (val = parse_boolean("rdrand", s, ss)) >= 0 ) ++ { ++ if ( !val ) ++ setup_clear_cpu_cap(X86_FEATURE_RDRAND); ++ } ++ else if ( (val = parse_boolean("rdseed", s, ss)) >= 0 ) ++ { ++ if ( !val ) ++ setup_clear_cpu_cap(X86_FEATURE_RDSEED); ++ } + else + rc = -EINVAL; + Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA321 diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA321:1.1 --- /dev/null Thu Jul 16 09:57:17 2020 +++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA321 Thu Jul 16 09:57:17 2020 @@ -0,0 +1,586 @@ +$NetBSD: patch-XSA321,v 1.1 2020/07/16 09:57:17 bouyer Exp $ + +From: Jan Beulich +Subject: vtd: improve IOMMU TLB flush + +Do not limit PSI flushes to order 0 pages, in order to avoid doing a +full TLB flush if the passed in page has an order greater than 0 and +is aligned. Should increase the performance of IOMMU TLB flushes when +dealing with page orders greater than 0. + +This is part of XSA-321. + +Signed-off-by: Jan Beulich + +--- xen/drivers/passthrough/vtd/iommu.c.orig ++++ xen/drivers/passthrough/vtd/iommu.c +@@ -612,13 +612,14 @@ static int __must_check iommu_flush_iotl + if ( iommu_domid == -1 ) + continue; + +- if ( page_count != 1 || gfn == gfn_x(INVALID_GFN) ) ++ if ( !page_count || (page_count & (page_count - 1)) || ++ gfn == gfn_x(INVALID_GFN) || !IS_ALIGNED(gfn, page_count) ) + rc = iommu_flush_iotlb_dsi(iommu, iommu_domid, + 0, flush_dev_iotlb); + else + rc = iommu_flush_iotlb_psi(iommu, iommu_domid, + (paddr_t)gfn << PAGE_SHIFT_4K, +- PAGE_ORDER_4K, ++ get_order_from_pages(page_count), + !dma_old_pte_present, + flush_dev_iotlb); + +From: +Subject: vtd: prune (and rename) cache flush functions + +Rename __iommu_flush_cache to iommu_sync_cache and remove +iommu_flush_cache_page. Also remove the iommu_flush_cache_entry +wrapper and just use iommu_sync_cache instead. Note the _entry suffix +was meaningless as the wrapper was already taking a size parameter in +bytes. While there also constify the addr parameter. + +No functional change intended. + +This is part of XSA-321. + +Reviewed-by: Jan Beulich + +--- xen/drivers/passthrough/vtd/extern.h.orig ++++ xen/drivers/passthrough/vtd/extern.h +@@ -37,8 +37,7 @@ void disable_qinval(struct iommu *iommu) + int enable_intremap(struct iommu *iommu, int eim); + void disable_intremap(struct iommu *iommu); + +-void iommu_flush_cache_entry(void *addr, unsigned int size); +-void iommu_flush_cache_page(void *addr, unsigned long npages); ++void iommu_sync_cache(const void *addr, unsigned int size); + int iommu_alloc(struct acpi_drhd_unit *drhd); + void iommu_free(struct acpi_drhd_unit *drhd); + +--- xen/drivers/passthrough/vtd/intremap.c.orig ++++ xen/drivers/passthrough/vtd/intremap.c +@@ -231,7 +231,7 @@ static void free_remap_entry(struct iomm + iremap_entries, iremap_entry); + + update_irte(iommu, iremap_entry, &new_ire, false); +- iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry)); ++ iommu_sync_cache(iremap_entry, sizeof(*iremap_entry)); + iommu_flush_iec_index(iommu, 0, index); + + unmap_vtd_domain_page(iremap_entries); +@@ -403,7 +403,7 @@ static int ioapic_rte_to_remap_entry(str + } + + update_irte(iommu, iremap_entry, &new_ire, !init); +- iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry)); ++ iommu_sync_cache(iremap_entry, sizeof(*iremap_entry)); + iommu_flush_iec_index(iommu, 0, index); + + unmap_vtd_domain_page(iremap_entries); +@@ -694,7 +694,7 @@ static int msi_msg_to_remap_entry( + update_irte(iommu, iremap_entry, &new_ire, msi_desc->irte_initialized); + msi_desc->irte_initialized = true; + +- iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry)); ++ iommu_sync_cache(iremap_entry, sizeof(*iremap_entry)); + iommu_flush_iec_index(iommu, 0, index); + + unmap_vtd_domain_page(iremap_entries); +--- xen/drivers/passthrough/vtd/iommu.c.orig ++++ xen/drivers/passthrough/vtd/iommu.c +@@ -158,7 +158,8 @@ static void __init free_intel_iommu(stru + } + + static int iommus_incoherent; +-static void __iommu_flush_cache(void *addr, unsigned int size) ++ ++void iommu_sync_cache(const void *addr, unsigned int size) + { + int i; + static unsigned int clflush_size = 0; +@@ -173,16 +174,6 @@ static void __iommu_flush_cache(void *ad + cacheline_flush((char *)addr + i); + } + +-void iommu_flush_cache_entry(void *addr, unsigned int size) +-{ +- __iommu_flush_cache(addr, size); +-} +- +-void iommu_flush_cache_page(void *addr, unsigned long npages) +-{ +- __iommu_flush_cache(addr, PAGE_SIZE * npages); +-} +- + /* Allocate page table, return its machine address */ + u64 alloc_pgtable_maddr(struct acpi_drhd_unit *drhd, unsigned long npages) + { +@@ -207,7 +198,7 @@ u64 alloc_pgtable_maddr(struct acpi_drhd + vaddr = __map_domain_page(cur_pg); + memset(vaddr, 0, PAGE_SIZE); + +- iommu_flush_cache_page(vaddr, 1); ++ iommu_sync_cache(vaddr, PAGE_SIZE); + unmap_domain_page(vaddr); + cur_pg++; + } +@@ -242,7 +233,7 @@ static u64 bus_to_context_maddr(struct i + } + set_root_value(*root, maddr); + set_root_present(*root); +- iommu_flush_cache_entry(root, sizeof(struct root_entry)); ++ iommu_sync_cache(root, sizeof(struct root_entry)); + } + maddr = (u64) get_context_addr(*root); + unmap_vtd_domain_page(root_entries); +@@ -300,7 +291,7 @@ static u64 addr_to_dma_page_maddr(struct + */ + dma_set_pte_readable(*pte); + dma_set_pte_writable(*pte); +- iommu_flush_cache_entry(pte, sizeof(struct dma_pte)); ++ iommu_sync_cache(pte, sizeof(struct dma_pte)); + } + + if ( level == 2 ) +@@ -674,7 +665,7 @@ static int __must_check dma_pte_clear_on + + dma_clear_pte(*pte); + spin_unlock(&hd->arch.mapping_lock); +- iommu_flush_cache_entry(pte, sizeof(struct dma_pte)); ++ iommu_sync_cache(pte, sizeof(struct dma_pte)); + + if ( !this_cpu(iommu_dont_flush_iotlb) ) + rc = iommu_flush_iotlb_pages(domain, addr >> PAGE_SHIFT_4K, 1); +@@ -716,7 +707,7 @@ static void iommu_free_page_table(struct + iommu_free_pagetable(dma_pte_addr(*pte), next_level); + + dma_clear_pte(*pte); +- iommu_flush_cache_entry(pte, sizeof(struct dma_pte)); ++ iommu_sync_cache(pte, sizeof(struct dma_pte)); + } + + unmap_vtd_domain_page(pt_vaddr); +@@ -1449,7 +1440,7 @@ int domain_context_mapping_one( + context_set_address_width(*context, agaw); + context_set_fault_enable(*context); + context_set_present(*context); +- iommu_flush_cache_entry(context, sizeof(struct context_entry)); ++ iommu_sync_cache(context, sizeof(struct context_entry)); + spin_unlock(&iommu->lock); + + /* Context entry was previously non-present (with domid 0). */ +@@ -1602,7 +1593,7 @@ int domain_context_unmap_one( + + context_clear_present(*context); + context_clear_entry(*context); +- iommu_flush_cache_entry(context, sizeof(struct context_entry)); ++ iommu_sync_cache(context, sizeof(struct context_entry)); + + iommu_domid= domain_iommu_domid(domain, iommu); + if ( iommu_domid == -1 ) +@@ -1828,7 +1819,7 @@ static int __must_check intel_iommu_map_ + + *pte = new; + +- iommu_flush_cache_entry(pte, sizeof(struct dma_pte)); ++ iommu_sync_cache(pte, sizeof(struct dma_pte)); + spin_unlock(&hd->arch.mapping_lock); + unmap_vtd_domain_page(page); + +@@ -1862,7 +1853,7 @@ int iommu_pte_flush(struct domain *d, u6 + int iommu_domid; + int rc = 0; + +- iommu_flush_cache_entry(pte, sizeof(struct dma_pte)); ++ iommu_sync_cache(pte, sizeof(struct dma_pte)); + + for_each_drhd_unit ( drhd ) + { +From: +Subject: x86/iommu: introduce a cache sync hook + +The hook is only implemented for VT-d and it uses the already existing +iommu_sync_cache function present in VT-d code. The new hook is +added so that the cache can be flushed by code outside of VT-d when +using shared page tables. + +Note that alloc_pgtable_maddr must use the now locally defined +sync_cache function, because IOMMU ops are not yet setup the first +time the function gets called during IOMMU initialization. + +No functional change intended. + +This is part of XSA-321. + +Reviewed-by: Jan Beulich + +--- xen/drivers/passthrough/vtd/extern.h.orig ++++ xen/drivers/passthrough/vtd/extern.h +@@ -37,7 +37,6 @@ void disable_qinval(struct iommu *iommu) + int enable_intremap(struct iommu *iommu, int eim); + void disable_intremap(struct iommu *iommu); + +-void iommu_sync_cache(const void *addr, unsigned int size); + int iommu_alloc(struct acpi_drhd_unit *drhd); + void iommu_free(struct acpi_drhd_unit *drhd); + +--- xen/drivers/passthrough/vtd/iommu.c.orig ++++ xen/drivers/passthrough/vtd/iommu.c +@@ -159,7 +159,7 @@ static void __init free_intel_iommu(stru + + static int iommus_incoherent; + +-void iommu_sync_cache(const void *addr, unsigned int size) ++static void sync_cache(const void *addr, unsigned int size) + { + int i; + static unsigned int clflush_size = 0; +@@ -198,7 +198,7 @@ u64 alloc_pgtable_maddr(struct acpi_drhd + vaddr = __map_domain_page(cur_pg); + memset(vaddr, 0, PAGE_SIZE); + +- iommu_sync_cache(vaddr, PAGE_SIZE); ++ sync_cache(vaddr, PAGE_SIZE); + unmap_domain_page(vaddr); + cur_pg++; + } +@@ -2760,6 +2760,7 @@ const struct iommu_ops intel_iommu_ops = + .iotlb_flush_all = iommu_flush_iotlb_all, + .get_reserved_device_memory = intel_iommu_get_reserved_device_memory, + .dump_p2m_table = vtd_dump_p2m_table, ++ .sync_cache = sync_cache, + }; + + /* +--- xen/include/asm-x86/iommu.h.orig ++++ xen/include/asm-x86/iommu.h +@@ -98,6 +98,13 @@ extern bool untrusted_msi; + int pi_update_irte(const struct pi_desc *pi_desc, const struct pirq *pirq, + const uint8_t gvec); + ++#define iommu_sync_cache(addr, size) ({ \ ++ const struct iommu_ops *ops = iommu_get_ops(); \ ++ \ ++ if ( ops->sync_cache ) \ ++ ops->sync_cache(addr, size); \ ++}) ++ + #endif /* !__ARCH_X86_IOMMU_H__ */ + /* + * Local variables: +--- xen/include/xen/iommu.h.orig ++++ xen/include/xen/iommu.h +@@ -161,6 +161,7 @@ struct iommu_ops { + void (*update_ire_from_apic)(unsigned int apic, unsigned int reg, unsigned int value); + unsigned int (*read_apic_from_ire)(unsigned int apic, unsigned int reg); + int (*setup_hpet_msi)(struct msi_desc *); ++ void (*sync_cache)(const void *addr, unsigned int size); + #endif /* CONFIG_X86 */ + int __must_check (*suspend)(void); + void (*resume)(void); +From: +Subject: vtd: don't assume addresses are aligned in sync_cache + +Current code in sync_cache assume that the address passed in is +aligned to a cache line size. Fix the code to support passing in +arbitrary addresses not necessarily aligned to a cache line size. + +This is part of XSA-321. + +Reviewed-by: Jan Beulich + +--- xen/drivers/passthrough/vtd/iommu.c.orig ++++ xen/drivers/passthrough/vtd/iommu.c +@@ -161,8 +161,8 @@ static int iommus_incoherent; + + static void sync_cache(const void *addr, unsigned int size) + { +- int i; +- static unsigned int clflush_size = 0; ++ static unsigned long clflush_size = 0; ++ const void *end = addr + size; + + if ( !iommus_incoherent ) + return; +@@ -170,8 +170,9 @@ static void sync_cache(const void *addr, + if ( clflush_size == 0 ) + clflush_size = get_cache_line_size(); + +- for ( i = 0; i < size; i += clflush_size ) +- cacheline_flush((char *)addr + i); ++ addr -= (unsigned long)addr & (clflush_size - 1); ++ for ( ; addr < end; addr += clflush_size ) ++ cacheline_flush((char *)addr); + } + + /* Allocate page table, return its machine address */ +From: +Subject: x86/alternative: introduce alternative_2 + +It's based on alternative_io_2 without inputs or outputs but with an +added memory clobber. + +This is part of XSA-321. + +Acked-by: Jan Beulich + +--- xen/include/asm-x86/alternative.h.orig ++++ xen/include/asm-x86/alternative.h +@@ -113,6 +113,11 @@ extern void alternative_instructions(voi + #define alternative(oldinstr, newinstr, feature) \ + asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) : : : "memory") + ++#define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \ ++ asm volatile (ALTERNATIVE_2(oldinstr, newinstr1, feature1, \ ++ newinstr2, feature2) \ ++ : : : "memory") ++ + /* + * Alternative inline assembly with input. + * +From: +Subject: vtd: optimize CPU cache sync + +Some VT-d IOMMUs are non-coherent, which requires a cache write back +in order for the changes made by the CPU to be visible to the IOMMU. +This cache write back was unconditionally done using clflush, but there are +other more efficient instructions to do so, hence implement support +for them using the alternative framework. + +This is part of XSA-321. + +Reviewed-by: Jan Beulich + +--- xen/drivers/passthrough/vtd/extern.h.orig ++++ xen/drivers/passthrough/vtd/extern.h +@@ -63,7 +63,6 @@ int __must_check qinval_device_iotlb_syn + u16 did, u16 size, u64 addr); + + unsigned int get_cache_line_size(void); +-void cacheline_flush(char *); + void flush_all_cache(void); + + u64 alloc_pgtable_maddr(struct acpi_drhd_unit *drhd, unsigned long npages); +--- xen/drivers/passthrough/vtd/iommu.c.orig ++++ xen/drivers/passthrough/vtd/iommu.c +@@ -31,6 +31,7 @@ + #include + #include + #include ++#include + #include + #include + #include +@@ -172,7 +173,42 @@ static void sync_cache(const void *addr, + + addr -= (unsigned long)addr & (clflush_size - 1); + for ( ; addr < end; addr += clflush_size ) +- cacheline_flush((char *)addr); ++/* ++ * The arguments to a macro must not include preprocessor directives. Doing so ++ * results in undefined behavior, so we have to create some defines here in ++ * order to avoid it. ++ */ ++#if defined(HAVE_AS_CLWB) ++# define CLWB_ENCODING "clwb %[p]" ++#elif defined(HAVE_AS_XSAVEOPT) ++# define CLWB_ENCODING "data16 xsaveopt %[p]" /* clwb */ ++#else ++# define CLWB_ENCODING ".byte 0x66, 0x0f, 0xae, 0x30" /* clwb (%%rax) */ ++#endif ++ ++#define BASE_INPUT(addr) [p] "m" (*(const char *)(addr)) ++#if defined(HAVE_AS_CLWB) || defined(HAVE_AS_XSAVEOPT) ++# define INPUT BASE_INPUT ++#else ++# define INPUT(addr) "a" (addr), BASE_INPUT(addr) ++#endif ++ /* ++ * Note regarding the use of NOP_DS_PREFIX: it's faster to do a clflush ++ * + prefix than a clflush + nop, and hence the prefix is added instead ++ * of letting the alternative framework fill the gap by appending nops. ++ */ ++ alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %[p]", ++ "data16 clflush %[p]", /* clflushopt */ ++ X86_FEATURE_CLFLUSHOPT, ++ CLWB_ENCODING, ++ X86_FEATURE_CLWB, /* no outputs */, ++ INPUT(addr)); ++#undef INPUT ++#undef BASE_INPUT ++#undef CLWB_ENCODING ++ ++ alternative_2("", "sfence", X86_FEATURE_CLFLUSHOPT, ++ "sfence", X86_FEATURE_CLWB); + } + + /* Allocate page table, return its machine address */ +--- xen/drivers/passthrough/vtd/x86/vtd.c.orig ++++ xen/drivers/passthrough/vtd/x86/vtd.c +@@ -53,11 +53,6 @@ unsigned int get_cache_line_size(void) + return ((cpuid_ebx(1) >> 8) & 0xff) * 8; + } + +-void cacheline_flush(char * addr) +-{ +- clflush(addr); +-} +- + void flush_all_cache() + { + wbinvd(); +From: +Subject: x86/ept: flush cache when modifying PTEs and sharing page tables + +Modifications made to the page tables by EPT code need to be written +to memory when the page tables are shared with the IOMMU, as Intel +IOMMUs can be non-coherent and thus require changes to be written to +memory in order to be visible to the IOMMU. + +In order to achieve this make sure data is written back to memory +after writing an EPT entry when the recalc bit is not set in +atomic_write_ept_entry. If such bit is set, the entry will be +adjusted and atomic_write_ept_entry will be called a second time +without the recalc bit set. Note that when splitting a super page the +new tables resulting of the split should also be written back. + +Failure to do so can allow devices behind the IOMMU access to the +stale super page, or cause coherency issues as changes made by the +processor to the page tables are not visible to the IOMMU. + +This allows to remove the VT-d specific iommu_pte_flush helper, since +the cache write back is now performed by atomic_write_ept_entry, and +hence iommu_iotlb_flush can be used to flush the IOMMU TLB. The newly +used method (iommu_iotlb_flush) can result in less flushes, since it +might sometimes be called rightly with 0 flags, in which case it +becomes a no-op. + +This is part of XSA-321. + +Reviewed-by: Jan Beulich + +--- xen/arch/x86/mm/p2m-ept.c.orig ++++ xen/arch/x86/mm/p2m-ept.c +@@ -90,6 +90,19 @@ static int atomic_write_ept_entry(ept_en + + write_atomic(&entryptr->epte, new.epte); + ++ /* ++ * The recalc field on the EPT is used to signal either that a ++ * recalculation of the EMT field is required (which doesn't effect the ++ * IOMMU), or a type change. Type changes can only be between ram_rw, ++ * logdirty and ioreq_server: changes to/from logdirty won't work well with ++ * an IOMMU anyway, as IOMMU #PFs are not synchronous and will lead to ++ * aborts, and changes to/from ioreq_server are already fully flushed ++ * before returning to guest context (see ++ * XEN_DMOP_map_mem_type_to_ioreq_server). ++ */ ++ if ( !new.recalc && iommu_hap_pt_share ) ++ iommu_sync_cache(entryptr, sizeof(*entryptr)); ++ + if ( unlikely(oldmfn != mfn_x(INVALID_MFN)) ) + put_page(mfn_to_page(_mfn(oldmfn))); + +@@ -319,6 +332,9 @@ static bool_t ept_split_super_page(struc + break; + } + ++ if ( iommu_hap_pt_share ) ++ iommu_sync_cache(table, EPT_PAGETABLE_ENTRIES * sizeof(ept_entry_t)); ++ + unmap_domain_page(table); + + /* Even failed we should install the newly allocated ept page. */ +@@ -875,7 +894,7 @@ out: + need_modify_vtd_table ) + { + if ( iommu_hap_pt_share ) +- rc = iommu_pte_flush(d, gfn, &ept_entry->epte, order, vtd_pte_present); ++ rc = iommu_flush_iotlb(d, gfn, vtd_pte_present, 1u << order); + else + { + if ( iommu_flags ) +--- xen/drivers/passthrough/vtd/iommu.c.orig ++++ xen/drivers/passthrough/vtd/iommu.c +@@ -612,10 +612,8 @@ static int __must_check iommu_flush_all( + return rc; + } + +-static int __must_check iommu_flush_iotlb(struct domain *d, +- unsigned long gfn, +- bool_t dma_old_pte_present, +- unsigned int page_count) ++int iommu_flush_iotlb(struct domain *d, unsigned long gfn, ++ bool dma_old_pte_present, unsigned int page_count) + { + struct domain_iommu *hd = dom_iommu(d); + struct acpi_drhd_unit *drhd; +@@ -1880,53 +1878,6 @@ static int __must_check intel_iommu_unma + return dma_pte_clear_one(d, (paddr_t)gfn << PAGE_SHIFT_4K); + } + +-int iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte, +- int order, int present) +-{ +- struct acpi_drhd_unit *drhd; +- struct iommu *iommu = NULL; +- struct domain_iommu *hd = dom_iommu(d); +- bool_t flush_dev_iotlb; +- int iommu_domid; +- int rc = 0; +- +- iommu_sync_cache(pte, sizeof(struct dma_pte)); +- +- for_each_drhd_unit ( drhd ) +- { +- iommu = drhd->iommu; +- if ( !test_bit(iommu->index, &hd->arch.iommu_bitmap) ) +- continue; +- +- flush_dev_iotlb = !!find_ats_dev_drhd(iommu); +- iommu_domid= domain_iommu_domid(d, iommu); +- if ( iommu_domid == -1 ) +- continue; +- +- rc = iommu_flush_iotlb_psi(iommu, iommu_domid, +- (paddr_t)gfn << PAGE_SHIFT_4K, +- order, !present, flush_dev_iotlb); +- if ( rc > 0 ) +- { +- iommu_flush_write_buffer(iommu); +- rc = 0; +- } +- } +- +- if ( unlikely(rc) ) +- { +- if ( !d->is_shutting_down && printk_ratelimit() ) +- printk(XENLOG_ERR VTDPREFIX +- " d%d: IOMMU pages flush failed: %d\n", +- d->domain_id, rc); +- +- if ( !is_hardware_domain(d) ) +- domain_crash(d); +- } +- +- return rc; +-} +- + static int __init vtd_ept_page_compatible(struct iommu *iommu) + { + u64 ept_cap, vtd_cap = iommu->cap; +--- xen/include/asm-x86/iommu.h.orig ++++ xen/include/asm-x86/iommu.h +@@ -87,8 +87,9 @@ int iommu_setup_hpet_msi(struct msi_desc + + /* While VT-d specific, this must get declared in a generic header. */ + int adjust_vtd_irq_affinities(void); +-int __must_check iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte, +- int order, int present); ++int __must_check iommu_flush_iotlb(struct domain *d, unsigned long gfn, ++ bool dma_old_pte_present, ++ unsigned int page_count); + bool_t iommu_supports_eim(void); + int iommu_enable_x2apic_IR(void); + void iommu_disable_x2apic_IR(void); Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA328 diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA328:1.1 --- /dev/null Thu Jul 16 09:57:17 2020 +++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA328 Thu Jul 16 09:57:17 2020 @@ -0,0 +1,213 @@ +$NetBSD: patch-XSA328,v 1.1 2020/07/16 09:57:17 bouyer Exp $ + +From: Jan Beulich +Subject: x86/EPT: ept_set_middle_entry() related adjustments + +ept_split_super_page() wants to further modify the newly allocated +table, so have ept_set_middle_entry() return the mapped pointer rather +than tearing it down and then getting re-established right again. + +Similarly ept_next_level() wants to hand back a mapped pointer of +the next level page, so re-use the one established by +ept_set_middle_entry() in case that path was taken. + +Pull the setting of suppress_ve ahead of insertion into the higher level +table, and don't have ept_split_super_page() set the field a 2nd time. + +This is part of XSA-328. + +Signed-off-by: Jan Beulich + +--- xen/arch/x86/mm/p2m-ept.c.orig ++++ xen/arch/x86/mm/p2m-ept.c +@@ -228,8 +228,9 @@ static void ept_p2m_type_to_flags(struct + #define GUEST_TABLE_SUPER_PAGE 2 + #define GUEST_TABLE_POD_PAGE 3 + +-/* Fill in middle levels of ept table */ +-static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry) ++/* Fill in middle level of ept table; return pointer to mapped new table. */ ++static ept_entry_t *ept_set_middle_entry(struct p2m_domain *p2m, ++ ept_entry_t *ept_entry) + { + mfn_t mfn; + ept_entry_t *table; +@@ -237,7 +238,12 @@ static int ept_set_middle_entry(struct p + + mfn = p2m_alloc_ptp(p2m, 0); + if ( mfn_eq(mfn, INVALID_MFN) ) +- return 0; ++ return NULL; ++ ++ table = map_domain_page(mfn); ++ ++ for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ ) ++ table[i].suppress_ve = 1; + + ept_entry->epte = 0; + ept_entry->mfn = mfn_x(mfn); +@@ -249,14 +255,7 @@ static int ept_set_middle_entry(struct p + + ept_entry->suppress_ve = 1; + +- table = map_domain_page(mfn); +- +- for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ ) +- table[i].suppress_ve = 1; +- +- unmap_domain_page(table); +- +- return 1; ++ return table; + } + + /* free ept sub tree behind an entry */ +@@ -294,10 +293,10 @@ static bool_t ept_split_super_page(struc + + ASSERT(is_epte_superpage(ept_entry)); + +- if ( !ept_set_middle_entry(p2m, &new_ept) ) ++ table = ept_set_middle_entry(p2m, &new_ept); ++ if ( !table ) + return 0; + +- table = map_domain_page(_mfn(new_ept.mfn)); + trunk = 1UL << ((level - 1) * EPT_TABLE_ORDER); + + for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ ) +@@ -308,7 +307,6 @@ static bool_t ept_split_super_page(struc + epte->sp = (level > 1); + epte->mfn += i * trunk; + epte->snp = (iommu_enabled && iommu_snoop); +- epte->suppress_ve = 1; + + ept_p2m_type_to_flags(p2m, epte, epte->sa_p2mt, epte->access); + +@@ -347,8 +345,7 @@ static int ept_next_level(struct p2m_dom + ept_entry_t **table, unsigned long *gfn_remainder, + int next_level) + { +- unsigned long mfn; +- ept_entry_t *ept_entry, e; ++ ept_entry_t *ept_entry, *next = NULL, e; + u32 shift, index; + + shift = next_level * EPT_TABLE_ORDER; +@@ -373,19 +370,17 @@ static int ept_next_level(struct p2m_dom + if ( read_only ) + return GUEST_TABLE_MAP_FAILED; + +- if ( !ept_set_middle_entry(p2m, ept_entry) ) ++ next = ept_set_middle_entry(p2m, ept_entry); ++ if ( !next ) + return GUEST_TABLE_MAP_FAILED; +- else +- e = atomic_read_ept_entry(ept_entry); /* Refresh */ ++ /* e is now stale and hence may not be used anymore below. */ + } +- + /* The only time sp would be set here is if we had hit a superpage */ +- if ( is_epte_superpage(&e) ) ++ else if ( is_epte_superpage(&e) ) + return GUEST_TABLE_SUPER_PAGE; + +- mfn = e.mfn; + unmap_domain_page(*table); +- *table = map_domain_page(_mfn(mfn)); ++ *table = next ?: map_domain_page(_mfn(e.mfn)); + *gfn_remainder &= (1UL << shift) - 1; + return GUEST_TABLE_NORMAL_PAGE; + } +From: +Subject: x86/ept: atomically modify entries in ept_next_level + +ept_next_level was passing a live PTE pointer to ept_set_middle_entry, +which was then modified without taking into account that the PTE could +be part of a live EPT table. This wasn't a security issue because the +pages returned by p2m_alloc_ptp are zeroed, so adding such an entry +before actually initializing it didn't allow a guest to access +physical memory addresses it wasn't supposed to access. + +This is part of XSA-328. + +Reviewed-by: Jan Beulich + +--- xen/arch/x86/mm/p2m-ept.c.orig ++++ xen/arch/x86/mm/p2m-ept.c +@@ -348,6 +348,8 @@ static int ept_next_level(struct p2m_dom + ept_entry_t *ept_entry, *next = NULL, e; + u32 shift, index; + ++ ASSERT(next_level); ++ + shift = next_level * EPT_TABLE_ORDER; + + index = *gfn_remainder >> shift; +@@ -364,16 +366,20 @@ static int ept_next_level(struct p2m_dom + + if ( !is_epte_present(&e) ) + { ++ int rc; ++ + if ( e.sa_p2mt == p2m_populate_on_demand ) + return GUEST_TABLE_POD_PAGE; + + if ( read_only ) + return GUEST_TABLE_MAP_FAILED; + +- next = ept_set_middle_entry(p2m, ept_entry); ++ next = ept_set_middle_entry(p2m, &e); + if ( !next ) + return GUEST_TABLE_MAP_FAILED; +- /* e is now stale and hence may not be used anymore below. */ ++ ++ rc = atomic_write_ept_entry(ept_entry, e, next_level); ++ ASSERT(rc == 0); + } + /* The only time sp would be set here is if we had hit a superpage */ + else if ( is_epte_superpage(&e) ) + +this has to be applied after patch-XSA328 + +From: +Subject: x86/ept: flush cache when modifying PTEs and sharing page tables + +Modifications made to the page tables by EPT code need to be written +to memory when the page tables are shared with the IOMMU, as Intel +IOMMUs can be non-coherent and thus require changes to be written to +memory in order to be visible to the IOMMU. + +In order to achieve this make sure data is written back to memory +after writing an EPT entry when the recalc bit is not set in +atomic_write_ept_entry. If such bit is set, the entry will be +adjusted and atomic_write_ept_entry will be called a second time +without the recalc bit set. Note that when splitting a super page the +new tables resulting of the split should also be written back. + +Failure to do so can allow devices behind the IOMMU access to the +stale super page, or cause coherency issues as changes made by the +processor to the page tables are not visible to the IOMMU. + +This allows to remove the VT-d specific iommu_pte_flush helper, since +the cache write back is now performed by atomic_write_ept_entry, and +hence iommu_iotlb_flush can be used to flush the IOMMU TLB. The newly +used method (iommu_iotlb_flush) can result in less flushes, since it +might sometimes be called rightly with 0 flags, in which case it +becomes a no-op. + +This is part of XSA-321. + +Reviewed-by: Jan Beulich + +--- xen/arch/x86/mm/p2m-ept.c.orig ++++ xen/arch/x86/mm/p2m-ept.c +@@ -394,6 +394,9 @@ + if ( !next ) + return GUEST_TABLE_MAP_FAILED; + ++ if ( iommu_hap_pt_share ) ++ iommu_sync_cache(next, EPT_PAGETABLE_ENTRIES * sizeof(ept_entry_t)); ++ + rc = atomic_write_ept_entry(ept_entry, e, next_level); + ASSERT(rc == 0); + } --_----------=_1594893437232860--