Skip to content

Commit dd611c2

Browse files
Daniel Sneddonsmb49
Daniel Sneddon
authored andcommitted
x86/speculation: Add RSB VM Exit protections
BugLink: https://bugs.launchpad.net/bugs/1989230 commit 2b12993 upstream. tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as documented for RET instructions after VM exits. Mitigate it with a new one-entry RSB stuffing mechanism and a new LFENCE. == Background == Indirect Branch Restricted Speculation (IBRS) was designed to help mitigate Branch Target Injection and Speculative Store Bypass, i.e. Spectre, attacks. IBRS prevents software run in less privileged modes from affecting branch prediction in more privileged modes. IBRS requires the MSR to be written on every privilege level change. To overcome some of the performance issues of IBRS, Enhanced IBRS was introduced. eIBRS is an "always on" IBRS, in other words, just turn it on once instead of writing the MSR on every privilege level change. When eIBRS is enabled, more privileged modes should be protected from less privileged modes, including protecting VMMs from guests. == Problem == Here's a simplification of how guests are run on Linux' KVM: void run_kvm_guest(void) { // Prepare to run guest VMRESUME(); // Clean up after guest runs } The execution flow for that would look something like this to the processor: 1. Host-side: call run_kvm_guest() 2. Host-side: VMRESUME 3. Guest runs, does "CALL guest_function" 4. VM exit, host runs again 5. Host might make some "cleanup" function calls 6. Host-side: RET from run_kvm_guest() Now, when back on the host, there are a couple of possible scenarios of post-guest activity the host needs to do before executing host code: * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not touched and Linux has to do a 32-entry stuffing. * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host IBRS=1 shortly after VM exit, has a documented side effect of flushing the RSB except in this PBRSB situation where the software needs to stuff the last RSB entry "by hand". IOW, with eIBRS supported, host RET instructions should no longer be influenced by guest behavior after the host retires a single CALL instruction. However, if the RET instructions are "unbalanced" with CALLs after a VM exit as is the RET in #6, it might speculatively use the address for the instruction after the CALL in #3 as an RSB prediction. This is a problem since the (untrusted) guest controls this address. Balanced CALL/RET instruction pairs such as in step #5 are not affected. == Solution == The PBRSB issue affects a wide variety of Intel processors which support eIBRS. But not all of them need mitigation. Today, X86_FEATURE_RETPOLINE triggers an RSB filling sequence that mitigates PBRSB. Systems setting RETPOLINE need no further mitigation - i.e., eIBRS systems which enable retpoline explicitly. However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RETPOLINE and most of them need a new mitigation. Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE which triggers a lighter-weight PBRSB mitigation versus RSB Filling at vmexit. The lighter-weight mitigation performs a CALL instruction which is immediately followed by a speculative execution barrier (INT3). This steers speculative execution to the barrier -- just like a retpoline -- which ensures that speculation can never reach an unbalanced RET. Then, ensure this CALL is retired before continuing execution with an LFENCE. In other words, the window of exposure is opened at VM exit where RET behavior is troublesome. While the window is open, force RSB predictions sampling for RET targets to a dead end at the INT3. Close the window with the LFENCE. There is a subset of eIBRS systems which are not vulnerable to PBRSB. Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB. Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO. [ bp: Massage, incorporate review comments from Andy Cooper. ] [ Pawan: Update commit message to replace RSB_VMEXIT with RETPOLINE ] Signed-off-by: Daniel Sneddon <[email protected]> Co-developed-by: Pawan Gupta <[email protected]> Signed-off-by: Pawan Gupta <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Stefan Bader <[email protected]>
1 parent d4bdf5e commit dd611c2

File tree

8 files changed

+101
-3
lines changed

8 files changed

+101
-3
lines changed

Documentation/admin-guide/hw-vuln/spectre.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -422,6 +422,14 @@ The possible values in this file are:
422422
'RSB filling' Protection of RSB on context switch enabled
423423
============= ===========================================
424424

425+
- EIBRS Post-barrier Return Stack Buffer (PBRSB) protection status:
426+
427+
=========================== =======================================================
428+
'PBRSB-eIBRS: SW sequence' CPU is affected and protection of RSB on VMEXIT enabled
429+
'PBRSB-eIBRS: Vulnerable' CPU is vulnerable
430+
'PBRSB-eIBRS: Not affected' CPU is not affected by PBRSB
431+
=========================== =======================================================
432+
425433
Full mitigation might require a microcode update from the CPU
426434
vendor. When the necessary microcode is not available, the kernel will
427435
report vulnerability.

arch/x86/include/asm/cpufeatures.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,7 @@
286286
#define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */
287287
#define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
288288
#define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
289+
#define X86_FEATURE_RSB_VMEXIT_LITE (11*32+ 6) /* "" Fill RSB on VM exit when EIBRS is enabled */
289290

290291
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
291292
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -406,5 +407,6 @@
406407
#define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */
407408
#define X86_BUG_SRBDS X86_BUG(24) /* CPU may leak RNG bits if not mitigated */
408409
#define X86_BUG_MMIO_STALE_DATA X86_BUG(25) /* CPU is affected by Processor MMIO Stale Data vulnerabilities */
410+
#define X86_BUG_EIBRS_PBRSB X86_BUG(26) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
409411

410412
#endif /* _ASM_X86_CPUFEATURES_H */

arch/x86/include/asm/msr-index.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,10 @@
129129
* bit available to control VERW
130130
* behavior.
131131
*/
132+
#define ARCH_CAP_PBRSB_NO BIT(24) /*
133+
* Not susceptible to Post-Barrier
134+
* Return Stack Buffer Predictions.
135+
*/
132136

133137
#define MSR_IA32_FLUSH_CMD 0x0000010b
134138
#define L1D_FLUSH BIT(0) /*

arch/x86/include/asm/nospec-branch.h

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,13 @@
6363
jnz 771b; \
6464
add $(BITS_PER_LONG/8) * nr, sp;
6565

66+
#define __ISSUE_UNBALANCED_RET_GUARD(sp) \
67+
call 881f; \
68+
int3; \
69+
881: \
70+
add $(BITS_PER_LONG/8), sp; \
71+
lfence;
72+
6673
#ifdef __ASSEMBLY__
6774

6875
/*
@@ -130,6 +137,14 @@
130137
#else
131138
call *\reg
132139
#endif
140+
.endm
141+
142+
.macro ISSUE_UNBALANCED_RET_GUARD ftr:req
143+
ANNOTATE_NOSPEC_ALTERNATIVE
144+
ALTERNATIVE "jmp .Lskip_pbrsb_\@", \
145+
__stringify(__ISSUE_UNBALANCED_RET_GUARD(%_ASM_SP)) \
146+
\ftr
147+
.Lskip_pbrsb_\@:
133148
.endm
134149

135150
/*

arch/x86/kernel/cpu/bugs.c

Lines changed: 60 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1043,6 +1043,49 @@ static enum spectre_v2_mitigation __init spectre_v2_select_retpoline(void)
10431043
return SPECTRE_V2_RETPOLINE;
10441044
}
10451045

1046+
static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode)
1047+
{
1048+
/*
1049+
* Similar to context switches, there are two types of RSB attacks
1050+
* after VM exit:
1051+
*
1052+
* 1) RSB underflow
1053+
*
1054+
* 2) Poisoned RSB entry
1055+
*
1056+
* When retpoline is enabled, both are mitigated by filling/clearing
1057+
* the RSB.
1058+
*
1059+
* When IBRS is enabled, while #1 would be mitigated by the IBRS branch
1060+
* prediction isolation protections, RSB still needs to be cleared
1061+
* because of #2. Note that SMEP provides no protection here, unlike
1062+
* user-space-poisoned RSB entries.
1063+
*
1064+
* eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB
1065+
* bug is present then a LITE version of RSB protection is required,
1066+
* just a single call needs to retire before a RET is executed.
1067+
*/
1068+
switch (mode) {
1069+
case SPECTRE_V2_NONE:
1070+
/* These modes already fill RSB at vmexit */
1071+
case SPECTRE_V2_LFENCE:
1072+
case SPECTRE_V2_RETPOLINE:
1073+
case SPECTRE_V2_EIBRS_RETPOLINE:
1074+
return;
1075+
1076+
case SPECTRE_V2_EIBRS_LFENCE:
1077+
case SPECTRE_V2_EIBRS:
1078+
if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
1079+
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
1080+
pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
1081+
}
1082+
return;
1083+
}
1084+
1085+
pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation at VM exit");
1086+
dump_stack();
1087+
}
1088+
10461089
static void __init spectre_v2_select_mitigation(void)
10471090
{
10481091
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
@@ -1135,6 +1178,8 @@ static void __init spectre_v2_select_mitigation(void)
11351178
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
11361179
pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
11371180

1181+
spectre_v2_determine_rsb_fill_type_at_vmexit(mode);
1182+
11381183
/*
11391184
* Retpoline means the kernel is safe because it has no indirect
11401185
* branches. Enhanced IBRS protects firmware too, so, enable restricted
@@ -1879,6 +1924,19 @@ static char *ibpb_state(void)
18791924
return "";
18801925
}
18811926

1927+
static char *pbrsb_eibrs_state(void)
1928+
{
1929+
if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
1930+
if (boot_cpu_has(X86_FEATURE_RSB_VMEXIT_LITE) ||
1931+
boot_cpu_has(X86_FEATURE_RETPOLINE))
1932+
return ", PBRSB-eIBRS: SW sequence";
1933+
else
1934+
return ", PBRSB-eIBRS: Vulnerable";
1935+
} else {
1936+
return ", PBRSB-eIBRS: Not affected";
1937+
}
1938+
}
1939+
18821940
static ssize_t spectre_v2_show_state(char *buf)
18831941
{
18841942
if (spectre_v2_enabled == SPECTRE_V2_LFENCE)
@@ -1891,12 +1949,13 @@ static ssize_t spectre_v2_show_state(char *buf)
18911949
spectre_v2_enabled == SPECTRE_V2_EIBRS_LFENCE)
18921950
return sprintf(buf, "Vulnerable: eIBRS+LFENCE with unprivileged eBPF and SMT\n");
18931951

1894-
return sprintf(buf, "%s%s%s%s%s%s\n",
1952+
return sprintf(buf, "%s%s%s%s%s%s%s\n",
18951953
spectre_v2_strings[spectre_v2_enabled],
18961954
ibpb_state(),
18971955
boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
18981956
stibp_state(),
18991957
boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? ", RSB filling" : "",
1958+
pbrsb_eibrs_state(),
19001959
spectre_v2_module_string());
19011960
}
19021961

arch/x86/kernel/cpu/common.c

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1025,6 +1025,7 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
10251025
#define NO_SWAPGS BIT(6)
10261026
#define NO_ITLB_MULTIHIT BIT(7)
10271027
#define NO_SPECTRE_V2 BIT(8)
1028+
#define NO_EIBRS_PBRSB BIT(9)
10281029

10291030
#define VULNWL(_vendor, _family, _model, _whitelist) \
10301031
{ X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist }
@@ -1065,7 +1066,7 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
10651066

10661067
VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
10671068
VULNWL_INTEL(ATOM_GOLDMONT_D, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
1068-
VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
1069+
VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
10691070

10701071
/*
10711072
* Technically, swapgs isn't serializing on AMD (despite it previously
@@ -1075,7 +1076,9 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
10751076
* good enough for our purposes.
10761077
*/
10771078

1078-
VULNWL_INTEL(ATOM_TREMONT_D, NO_ITLB_MULTIHIT),
1079+
VULNWL_INTEL(ATOM_TREMONT, NO_EIBRS_PBRSB),
1080+
VULNWL_INTEL(ATOM_TREMONT_L, NO_EIBRS_PBRSB),
1081+
VULNWL_INTEL(ATOM_TREMONT_D, NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
10791082

10801083
/* AMD Family 0xf - 0x12 */
10811084
VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
@@ -1236,6 +1239,11 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
12361239
!arch_cap_mmio_immune(ia32_cap))
12371240
setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
12381241

1242+
if (cpu_has(c, X86_FEATURE_IBRS_ENHANCED) &&
1243+
!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
1244+
!(ia32_cap & ARCH_CAP_PBRSB_NO))
1245+
setup_force_cpu_bug(X86_BUG_EIBRS_PBRSB);
1246+
12391247
if (cpu_matches(cpu_vuln_whitelist, NO_MELTDOWN))
12401248
return;
12411249

arch/x86/kvm/vmx/vmenter.S

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ ENTRY(vmx_vmexit)
9292
pop %_ASM_AX
9393
.Lvmexit_skip_rsb:
9494
#endif
95+
ISSUE_UNBALANCED_RET_GUARD X86_FEATURE_RSB_VMEXIT_LITE
9596
ret
9697
ENDPROC(vmx_vmexit)
9798

tools/arch/x86/include/asm/cpufeatures.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,7 @@
284284
#define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */
285285
#define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
286286
#define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
287+
#define X86_FEATURE_RSB_VMEXIT_LITE (11*32+ 6) /* "" Fill RSB on VM-Exit when EIBRS is enabled */
287288

288289
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
289290
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */

0 commit comments

Comments
 (0)