Skip to content

Conversation

pvts-mat
Copy link
Contributor

[LTS 8.6]
CVE-2023-0597
VULN-3958

Problem

https://access.redhat.com/security/cve/CVE-2023-0597

A possible unauthorized memory access flaw was found in the Linux kernel cpu_entry_area mapping of X86 CPU data to memory, where a user may guess the location of exception stack(s) or other important data. This issue could allow a local user to gain access to some important data with expected location in memory.

Affected: yes

This flaw is independent of any config options and affects all x86 kernels. The commit 97e3d26 solving the issue is absent from ciqlts8_6's history, along with other accompanying changes (see Solution). Additionally, the options settings for related randomization techniques found in configs/kernel-x86_64.config

CONFIG_RANDOMIZE_BASE=y
CONFIG_RANDOMIZE_MEMORY=y

clearly display the desire to have the randomization in place wherever it may apply.

Solution

The official mainline fix for CVE-2023-0597 is 97e3d26, but the actual solution is complicated on LTS 8.6 by the non-backported changes to kernel's memory mapping as well as by multiple fixes of the fix present in the mainline.

Consider the branched-off timeline of changes to the 97e3d26-affected files:

Label    File
-------  -------------------------------------
A        arch/x86/include/asm/cpu_entry_area.h
B        arch/x86/include/asm/pgtable_areas.h
C        arch/x86/kernel/hw_breakpoint.c
D        arch/x86/mm/cpu_entry_area.c

| Id | ABCD | kernel-mainline                          |       Date | Descr                                                                               | ciqlts8_6                                  |
|----+------+------------------------------------------+------------+-------------------------------------------------------------------------------------+--------------------------------------------|
|    | ---# | decb9ac4a9739c16e228f7b2918bfdca34cc89a9 | 2024-08-25 | x86/cpu_entry_area: Annotate percpu_setup_exception_stacks() as __init              |                                            |
|  5 | ---# | a3f547addcaa10df5a226526bc9e2d9a94542344 | 2023-03-22 | x86/mm: Do not shuffle CPU entry areas without KASLR                                |                                            |
|    | --#- | 7914695743d598b189d549f2f57af24aa5633705 | 2023-01-31 | x86/amd: Cache debug register values in percpu variables                            |                                            |
|    | ---# | 3c202d14a9d73fb63c3dccb18feac5618c21e1c4 | 2022-12-20 | prandom: remove prandom_u32_max()                                                   |                                            |
|  4 | ---# | 97650148a15e0b30099d6175ffe278b9f55ec66a | 2022-12-15 | x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area            |                                            |
|  3 | ---# | 80d72a8f76e8f3f0b5a70b8c7022578e17bde8e7 | 2022-12-15 | x86/mm: Recompute physical address for every page of per-CPU CEA mapping            |                                            |
|  0 | #### | 97e3d26b5e5f371b3ee223d94dd123e6c442ba80 | 2022-12-15 | x86/mm: Randomize per-cpu entry area                                                |                                            |
|  2 | ---# | 3f148f3318140035e87decc1214795ff0755757b | 2022-12-15 | x86/kasan: Map shadow for percpu pages on demand                                    |                                            |
|    | ---# | d76c4f7a610ac56c5b06e34258859945e77d190c | 2022-11-22 | x86/cpu: Remove X86_FEATURE_XENPV usage in setup_cpu_entry_area()                   |                                            |
|    | #--- | e87f4152e542610d0b4c6c8548964a68a59d2040 | 2022-04-04 | task_stack, x86/cea: Force-inline stack helpers                                     |                                            |
|    | #--# | 541ac97186d9ea88491961a46284de3603c914fd | 2021-10-06 | x86/sev: Make the #VC exception stacks part of the default stacks storage           |                                            |
|    | --#- | 3943abf2dbfae9ea4d2da05c1db569a0603f76da | 2021-02-05 | x86/debug: Prevent data breakpoints on cpu_dr7                                      |                                            |
|    | --#- | c4bed4b96918ff1d062ee81fdae4d207da4fa9b0 | 2021-02-05 | x86/debug: Prevent data breakpoints on __per_cpu_offset                             |                                            |
|    | --#- | 9ad22e165994ccb64d85b68499eaef97342c175b | 2021-02-01 | x86/debug: Fix DR6 handling                                                         |                                            |
|    | ---# | 6b27edd74a5e9669120f7bd0ae1f475d124c1042 | 2020-09-09 | x86/dumpstack/64: Add noinstr version of get_stack_info()                           | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | #--- | 02772fb9b68e6a72a5e17f994048df832fe2b15e | 2020-09-09 | x86/sev-es: Allocate and map an IST stack for #VC handler                           | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | --#- | d53d9bc0cf783e93b374de3895145c7375e570ba | 2020-09-04 | x86/debug: Change thread.debugreg6 to thread.virtual_dr6                            |                                            |
|    | --#- | f4956cf83ed12271bdbd5b547f3378add72bbffb | 2020-09-04 | x86/debug: Support negative polarity DR6 bits                                       | ~ 927f65e976a77f9c9d29b4a50d4b8157aff37f26 |
|    | --#- | 21d44be7b6ff4c254dc971e2c99d4082dd470afd | 2020-09-04 | x86/debug: Simplify hw_breakpoint_handler()                                         |                                            |
|    | --#- | b84d42b6c6ac6a60519286e72b69f2dbf08dfb70 | 2020-09-04 | x86/debug: Remove aout_dump_debugregs()                                             |                                            |
|    | --#- | df561f6688fef775baa341a0f5d960becd248b11 | 2020-08-23 | treewide: Use fallthrough pseudo-keyword                                            | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | #--# | fd501d4f0399700011acde486576c7c1eb8e7a61 | 2020-06-11 | x86/entry: Remove DBn stacks                                                        | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | --#- | 84b6a3491567a540f955e18d8e615493afa36df0 | 2020-06-11 | x86/entry: Optimize local_db_save() for virt                                        |                                            |
|    | --#- | fdef24dfccb7be06e6ebe11d6c6c56987421870f | 2020-06-11 | x86/hw_breakpoint: Prevent data breakpoints on user_pcid_flush_mask                 |                                            |
|    | --#- | f9fe0b89f05441c6e4034e024c2c75a0d93024c1 | 2020-06-11 | x86/hw_breakpoint: Prevent data breakpoints on per_cpu cpu_tss_rw                   |                                            |
|    | --#- | 97417cb9ad4ed052d7a4c5c0d75db1ff1b0981fb | 2020-06-11 | x86/hw_breakpoint: Prevent data breakpoints on direct GDT                           |                                            |
|    | --#- | d390e6de89d30402bd06056c40cea72328aec9b1 | 2020-06-11 | x86/hw_breakpoint: Add within_area() to check data breakpoints                      |                                            |
|    | --#- | 9f58fdde95c9509a4ded27a6d0035e79294002b4 | 2020-06-11 | x86/db: Split out dr6/7 handling                                                    | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | --#- | 24ae0c91cbc57c2deb0401bd653453a508acdcee | 2020-06-11 | x86/hw_breakpoint: Prevent data breakpoints on cpu_entry_area                       |                                            |
|    | ---# | 65fddcfca8ad14778f71a57672fd01e8112d30fa | 2020-06-09 | mm: reorder includes after introduction of linux/pgtable.h                          |                                            |
|    | ---# | ca5999fde0a1761665a38e4c9a72dbcd7d190a81 | 2020-06-09 | mm: introduce include/linux/pgtable.h                                               |                                            |
|    | ---# | 593309423cbad0fab659a685834416cf12d8f581 | 2020-04-14 | x86/32: Remove CONFIG_DOUBLEFAULT                                                   |                                            |
|    | ##-- | 186525bd6b83efc592672e2d6185e4d7c810d2b4 | 2019-12-10 | mm, x86/mm: Untangle address space layout definitions from basic pgtable type…      |                                            |
|    | #--# | dc4e0021b00b5a4ecba56fae509217776592b0aa | 2019-11-26 | x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area                        |                                            |
|  1 | #--# | 05b042a1944322844eaae7ea596d5f154166d68a | 2019-11-25 | x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the…     |                                            |
|    | #--- | 880a98c339961eaa074393e3a2117cbe9125b8bb | 2019-11-21 | x86/cpu_entry_area: Add guard page for entry stack on 32bit                         |                                            |
|    | ---# | 6b546e1c9ad2a25f874f8bc6077d0f55f9446414 | 2019-11-16 | x86/tss: Fix and move VMX BUILD_BUG_ON()                                            | # a1c405ca16baa1afc756c1d4ccbcc0c3a00cb453 |
|    | #--- | 6184488a19be96d89cb6c36fb4bc277198309484 | 2019-10-01 | x86: Use the correct SPDX License Identifier in headers                             |                                            |
|    | --#- | 1a59d1b8e05ea6ab45f7e18897de1ef0e6bc3da6 | 2019-05-30 | treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156                  |                                            |
|    | #--# | 2a594d4ccf3f10f80b77d71bd3dad10813ac0137 | 2019-04-17 | x86/exceptions: Split debug IST stack                                               | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | #--- | 1bdb67e5aa2d5d43c48cb7d93393fcba276c9e71 | 2019-04-17 | x86/exceptions: Enable IST guard pages                                              | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | #--- | 3207426925d2b4da390be8068df1d1c2b36e5918 | 2019-04-17 | x86/exceptions: Disconnect IST index and stack order                                | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | #--# | 7623f37e411156e6e09b95cf5c76e509c5fda640 | 2019-04-17 | x86/cpu_entry_area: Provide exception stack accessor                                | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | ---# | a4af767ae59cc579569bbfe49513a0037d5989ee | 2019-04-17 | x86/cpu_entry_area: Prepare for IST guard pages                                     | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | #--# | 019b17b3ffe48100e52f609ca1c6ed6e5a40cba1 | 2019-04-17 | x86/exceptions: Add structs for exception stacks                                    | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | ---# | 881a463cf21dbf83aab2cf6c9a359f34f88c2491 | 2019-04-17 | x86/cpu_entry_area: Cleanup setup functions                                         | # 604239d6f80ebd4301c47285d7305d7262dc69a6 |
|    | --#- | e898e69d6b9475bf123f99b3c5d1a67bb7cb2361 | 2019-03-22 | x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error |                                            |
|    | ---# | ba2ba356b2c849ec62d5fefa9cd4168163b13211 | 2019-02-08 | x86/cpu_entry_area: Move percpu_setup_debug_store() to __init section               | ~ cfe70862b56bc8d9b13ca055348ca01c446ca605 |
|    | --#- | fab940755d1d78377901450b6ee7c77356e06821 | 2019-01-30 | x86/hw_breakpoints, kprobes: Remove kprobes ifdeffery                               |                                            |
|    | --#- | 6fcebf1302b43e7a610d1d2fa89f41e693249aa5 | 2019-01-26 | x86/kernel: Mark expected switch-case fall-throughs                                 |                                            |
|    | #--# | bf904d2762ee6fc1e4acfcb0772bbfb4a27ad8a6 | 2018-09-12 | x86/pti/64: Remove the SYSCALL64 entry trampoline                                   | ~ e2093e17c786fae6762be28d622e0dfeefb6d37a |
|    | ---# | 6855dc41b24619c3d1de3dbd27dd0546b0e45272 | 2018-08-14 | x86: Add entry trampolines to kcore                                                 | ~ f472db7b53074bb1e36e626e139b0afabdc5d9d0 |
|    | ---# | d83212d5dd6761625fe87cc23016bbaa47303271 | 2018-08-14 | kallsyms, x86: Export addresses of PTI entry trampolines                            | ~ 2e5dbe65824c98ee7602b2afef51eec8fd06d93f |
|    | --#- | a0baf043c5cfa3a489a63ac50f5201c31a651e21 | 2018-06-26 | perf/arch/x86: Implement hw_breakpoint_arch_parse()                                 | ~ 94bbfa1bcb24b573c0fababaac1574c6b947866a |
|    | --#- | 8e983ff9ac02a8fb454ed09c2462bdb3617006a8 | 2018-06-26 | perf/hw_breakpoint: Pass arch breakpoint struct to arch_check_bp_in_kernelspace()   | ~ 70558b454254ddaa55838662c2d9dad5b039f07a |
|    | ---# | 0f561fce4d6979a50415616896512f87a6d1d5c8 | 2018-04-12 | x86/pti: Enable global pages for shared areas                                       | = 0f561fce4d6979a50415616896512f87a6d1d5c8 |

The commits identified with 0, 1, 2, 3, 4, 5 comprise the solution proposed in this PR.

  • Commit 0 is the CVE-2023-0597 fix proper, with some considerable upstream diffs, discussed below.
  • Commit 1 was picked to ease some conflicts for 0.
  • Commit 5 is an official bugfix to 0.
  • Commits 2, 3, 4 are unofficial bugfixes to 0, and also serve as prerequisites for 5.

The complete relations list - official and actual - are as follows:

1 --prereq--> 0  *
2 --bugfix--> 0
2 --prereq--> 0  *
2 --prereq--> 5
3 --bugfix--> 2  *
3 --prereq--> 5
4 --bugfix--> 2  *
4 --prereq--> 5
5 --bugfix--> 0  *

Marked with asterisk * are the ones which were able to be expressed in commit headlines with the cve-pre, cve-bf attributes.

Commentary on the applied changes:

  1. Commit 1 (05b042a) fixes a bug with the calculation of cpu entry area size and renames some constants in arch/x86/include/asm/cpu_entry_area.h which are later modified by 0. Its "Fixes" attribute points to the very previous commit 880a98c, but it merely exposed the bug, not introduced it.

  2. Commit 2 (3f148f3) is actually a bugfix of 0, but it was put earlier in mainline's history - note the AuthorDate 2022-10-27 of 0 and 2022-10-28 of 2, as well as 3f148f3's message:

    Thanks to the 0day folks for finding and reporting this to be an issue.

    [ dhansen: tweak changelog since this will get committed before peterz's
    actual cpu-entry-area randomization ]

    It was cherry-picked before 0 to preserve the ordering in kernel-mainline, and marked as cve-pre to avoid possible confusion around cve-bf and because it might as well be treated as preparation for the fix.

  3. Commit 0 (97e3d26) was picked with the following changes from the upstream:

    1. Ignored changes in arch/x86/kernel/hw_breakpoint.c. The modified function within_cpu_entry() doesn't exist in ciqlts8_6 revision. The conflict might have been resolved purely by cherry picking 24ae0c9, d390e6d, 97417cb, but that would have resulted in introducing dead code: within_area() and within_cpu_entry() functions.
    2. Moved the arch/x86/include/asm/pgtable_areas.h changes to arch/x86/include/asm/cpu_entry_area.h. This had to be done because of the 186525b commit missing from ciqlts8_6 history, which factored out the relevant #defines from cpu_entry_area.h to pgtable_areas.h. It was decided not to backport this commit as prerequisite since it's too extensive and making changes not related to the patch.
    3. Made small adaptation of changes relating to cea_offset() definitions in arch/x86/mm/cpu_entry_area.c which was necessary because of the dc4e002 commit missing from ciqlts8_6 history. It was too functionality-intrusive to backport as prerequisite for auto resolution of just this single conflict.
  4. Commits 3 (80d72a8) and 4 (9765014) expand on the 2 bugfix so must had been included for completeness.

  5. Commit 5 (a3f547a) makes the randomization implemented in 0 configurable, which it should have been from the beginning. Commits 2, 3, 4 were included also because they were 5's prerequisite:

    Since we have 3f148f3 ("x86/kasan: Map shadow for percpu pages on
    demand") and followups, we can use the more relaxed guard
    kasrl_enabled() (in contrast to kaslr_memory_enabled()).

kABI check: passed

DEBUG=1 CVE=CVE-2023-0597 ./ninja.sh _kabi_checked__x86_64--test--ciqlts8_6-CVE-2023-0597

[0/1] Check ABI of kernel [ciqlts8_6-CVE-2023-0597]
++ uname -m
+ python3 /data/src/ctrliq-github/kernel-dist-git-el-8.6/SOURCES/check-kabi -k /data/src/ctrliq-github/kernel-dist-git-el-8.6/SOURCES/Module.kabi_x86_64 -s vms/x86_64--build--ciqlts8_6/build_files/kernel-src-tree-ciqlts8_6-CVE-2023-0597/Module.symvers
kABI check passed
+ touch state/kernels/ciqlts8_6-CVE-2023-0597/x86_64/kabi_checked

Boot test: passed

boot-test.log

Kselftests: passed relative

Reference

kselftests–ciqlts8_6–run1.log

Patch

kselftests–ciqlts8_6-CVE-2023-0597–run1.log

Comparison

The tests results for the reference kernel and the patch are the same

$ ktests.xsh diff  kselftests*.log

Column    File
--------  ---------------------------------------------
Status0   kselftests--ciqlts8_6--run1.log
Status1   kselftests--ciqlts8_6-CVE-2023-0597--run1.log

TestCase                                     Status0  Status1  Summary
android:run.sh                               skip     skip     same
bpf:get_cgroup_id_user                       pass     pass     same
bpf:test_bpftool.sh                          pass     pass     same
bpf:test_bpftool_build.sh                    pass     pass     same
bpf:test_bpftool_metadata.sh                 pass     pass     same
bpf:test_cgroup_storage                      pass     pass     same
bpf:test_dev_cgroup                          pass     pass     same
bpf:test_doc_build.sh                        pass     pass     same
bpf:test_flow_dissector.sh                   pass     pass     same
bpf:test_lirc_mode2.sh                       pass     pass     same
bpf:test_lpm_map                             pass     pass     same
bpf:test_lru_map                             fail     fail     same
bpf:test_lwt_ip_encap.sh                     pass     pass     same
bpf:test_lwt_seg6local.sh                    pass     pass     same
bpf:test_netcnt                              pass     pass     same
bpf:test_offload.py                          pass     pass     same
bpf:test_skb_cgroup_id.sh                    pass     pass     same
bpf:test_sock                                pass     pass     same
bpf:test_sock_addr.sh                        pass     pass     same
bpf:test_sysctl                              pass     pass     same
bpf:test_tag                                 pass     pass     same
bpf:test_tc_edt.sh                           pass     pass     same
bpf:test_tc_tunnel.sh                        pass     pass     same
bpf:test_tcp_check_syncookie.sh              pass     pass     same
bpf:test_tcpnotify_user                      pass     pass     same
bpf:test_tunnel.sh                           pass     pass     same
bpf:test_verifier                            pass     pass     same
bpf:test_verifier_log                        pass     pass     same
bpf:test_xdp_meta.sh                         pass     pass     same
bpf:test_xdp_redirect.sh                     pass     pass     same
bpf:test_xdp_veth.sh                         pass     pass     same
bpf:test_xdp_vlan_mode_generic.sh            pass     pass     same
bpf:test_xdp_vlan_mode_native.sh             pass     pass     same
bpf:test_xdping.sh                           pass     pass     same
bpf:urandom_read                             pass     pass     same
breakpoints:breakpoint_test                  pass     pass     same
capabilities:test_execve                     pass     pass     same
core:close_range_test                        pass     pass     same
cpu-hotplug:cpu-on-off-test.sh               pass     pass     same
cpufreq:main.sh                              fail     fail     same
exec:execveat                                pass     pass     same
firmware:fw_run_tests.sh                     skip     skip     same
fpu:run_test_fpu.sh                          skip     skip     same
fpu:test_fpu                                 pass     pass     same
ftrace:ftracetest                            fail     fail     same
futex:run.sh                                 pass     pass     same
gpio:gpio-mockup.sh                          fail     fail     same
intel_pstate:run.sh                          pass     pass     same
ipc:msgque                                   pass     pass     same
kcmp:kcmp_test                               pass     pass     same
kexec:test_kexec_file_load.sh                skip     skip     same
kexec:test_kexec_load.sh                     skip     skip     same
kvm:access_tracking_perf_test                fail     fail     same
kvm:amx_test                                 fail     fail     same
kvm:cr4_cpuid_sync_test                      fail     fail     same
kvm:debug_regs                               fail     fail     same
kvm:demand_paging_test                       pass     pass     same
kvm:dirty_log_perf_test                      pass     pass     same
kvm:dirty_log_test                           fail     fail     same
kvm:emulator_error_test                      fail     fail     same
kvm:evmcs_test                               fail     fail     same
kvm:get_cpuid_test                           fail     fail     same
kvm:get_msr_index_features                   fail     fail     same
kvm:hardware_disable_test                    pass     pass     same
kvm:hyperv_clock                             fail     fail     same
kvm:hyperv_cpuid                             fail     fail     same
kvm:hyperv_features                          fail     fail     same
kvm:kvm_binary_stats_test                    pass     pass     same
kvm:kvm_create_max_vcpus                     skip     skip     same
kvm:kvm_page_table_test                      pass     pass     same
kvm:kvm_pv_test                              fail     fail     same
kvm:memslot_modification_stress_test         pass     pass     same
kvm:memslot_perf_test                        fail     fail     same
kvm:mmio_warning_test                        fail     fail     same
kvm:mmu_role_test                            fail     fail     same
kvm:platform_info_test                       fail     fail     same
kvm:rseq_test                                fail     fail     same
kvm:set_boot_cpu_id                          fail     fail     same
kvm:set_memory_region_test                   pass     pass     same
kvm:set_sregs_test                           fail     fail     same
kvm:smm_test                                 fail     fail     same
kvm:state_test                               fail     fail     same
kvm:steal_time                               pass     pass     same
kvm:svm_int_ctl_test                         fail     fail     same
kvm:svm_vmcall_test                          fail     fail     same
kvm:sync_regs_test                           fail     fail     same
kvm:tsc_msrs_test                            fail     fail     same
kvm:userspace_msr_exit_test                  fail     fail     same
kvm:vmx_apic_access_test                     fail     fail     same
kvm:vmx_close_while_nested_test              fail     fail     same
kvm:vmx_dirty_log_test                       fail     fail     same
kvm:vmx_nested_tsc_scaling_test              fail     fail     same
kvm:vmx_pmu_msrs_test                        fail     fail     same
kvm:vmx_preemption_timer_test                fail     fail     same
kvm:vmx_set_nested_state_test                fail     fail     same
kvm:vmx_tsc_adjust_test                      fail     fail     same
kvm:xapic_ipi_test                           fail     fail     same
kvm:xen_shinfo_test                          fail     fail     same
kvm:xen_vmcall_test                          fail     fail     same
kvm:xss_msr_test                             fail     fail     same
lib:bitmap.sh                                skip     skip     same
lib:prime_numbers.sh                         skip     skip     same
lib:printf.sh                                skip     skip     same
lib:scanf.sh                                 fail     fail     same
livepatch:test-callbacks.sh                  pass     pass     same
livepatch:test-ftrace.sh                     pass     pass     same
livepatch:test-livepatch.sh                  pass     pass     same
livepatch:test-shadow-vars.sh                pass     pass     same
livepatch:test-state.sh                      pass     pass     same
membarrier:membarrier_test_multi_thread      pass     pass     same
membarrier:membarrier_test_single_thread     pass     pass     same
memfd:memfd_test                             pass     pass     same
memfd:run_fuse_test.sh                       fail     fail     same
memfd:run_hugetlbfs_test.sh                  pass     pass     same
memory-hotplug:mem-on-off-test.sh            pass     pass     same
mount:run_tests.sh                           pass     pass     same
net/forwarding:bridge_port_isolation.sh      pass     pass     same
net/forwarding:bridge_sticky_fdb.sh          pass     pass     same
net/forwarding:bridge_vlan_aware.sh          fail     fail     same
net/forwarding:bridge_vlan_unaware.sh        pass     pass     same
net/forwarding:ethtool.sh                    fail     fail     same
net/forwarding:gre_multipath.sh              fail     fail     same
net/forwarding:ip6_forward_instats_vrf.sh    fail     fail     same
net/forwarding:ipip_flat_gre.sh              pass     pass     same
net/forwarding:ipip_flat_gre_key.sh          pass     pass     same
net/forwarding:ipip_flat_gre_keys.sh         pass     pass     same
net/forwarding:ipip_hier_gre.sh              pass     pass     same
net/forwarding:ipip_hier_gre_key.sh          pass     pass     same
net/forwarding:loopback.sh                   skip     skip     same
net/forwarding:mirror_gre.sh                 fail     fail     same
net/forwarding:mirror_gre_bound.sh           pass     pass     same
net/forwarding:mirror_gre_bridge_1d.sh       pass     pass     same
net/forwarding:mirror_gre_bridge_1q.sh       pass     pass     same
net/forwarding:mirror_gre_bridge_1q_lag.sh   pass     pass     same
net/forwarding:mirror_gre_changes.sh         fail     fail     same
net/forwarding:mirror_gre_flower.sh          fail     fail     same
net/forwarding:mirror_gre_lag_lacp.sh        pass     pass     same
net/forwarding:mirror_gre_neigh.sh           pass     pass     same
net/forwarding:mirror_gre_nh.sh              pass     pass     same
net/forwarding:mirror_gre_vlan.sh            pass     pass     same
net/forwarding:mirror_vlan.sh                pass     pass     same
net/forwarding:router.sh                     fail     fail     same
net/forwarding:router_bridge.sh              pass     pass     same
net/forwarding:router_bridge_vlan.sh         pass     pass     same
net/forwarding:router_broadcast.sh           fail     fail     same
net/forwarding:router_multicast.sh           fail     fail     same
net/forwarding:router_multipath.sh           fail     fail     same
net/forwarding:router_vid_1.sh               pass     pass     same
net/forwarding:tc_chains.sh                  pass     pass     same
net/forwarding:tc_flower.sh                  pass     pass     same
net/forwarding:tc_flower_router.sh           pass     pass     same
net/forwarding:tc_mpls_l2vpn.sh              pass     pass     same
net/forwarding:tc_shblocks.sh                pass     pass     same
net/forwarding:tc_vlan_modify.sh             pass     pass     same
net/forwarding:vxlan_asymmetric.sh           pass     pass     same
net/forwarding:vxlan_bridge_1d.sh            fail     fail     same
net/forwarding:vxlan_bridge_1d_port_8472.sh  pass     pass     same
net/forwarding:vxlan_bridge_1q.sh            fail     fail     same
net/forwarding:vxlan_bridge_1q_port_8472.sh  pass     pass     same
net/forwarding:vxlan_symmetric.sh            pass     pass     same
net/mptcp:diag.sh                            pass     pass     same
net/mptcp:mptcp_connect.sh                   pass     pass     same
net/mptcp:mptcp_sockopt.sh                   pass     pass     same
net/mptcp:pm_netlink.sh                      pass     pass     same
net:bareudp.sh                               pass     pass     same
net:devlink_port_split.py                    pass     pass     same
net:drop_monitor_tests.sh                    skip     skip     same
net:fcnal-test.sh                            pass     pass     same
net:fib-onlink-tests.sh                      pass     pass     same
net:fib_rule_tests.sh                        fail     fail     same
net:fib_tests.sh                             pass     pass     same
net:gre_gso.sh                               pass     pass     same
net:icmp_redirect.sh                         pass     pass     same
net:ip6_gre_headroom.sh                      pass     pass     same
net:ipv6_flowlabel.sh                        pass     pass     same
net:l2tp.sh                                  pass     pass     same
net:msg_zerocopy.sh                          fail     fail     same
net:netdevice.sh                             pass     pass     same
net:pmtu.sh                                  pass     pass     same
net:psock_snd.sh                             fail     fail     same
net:reuseaddr_conflict                       pass     pass     same
net:reuseport_bpf                            pass     pass     same
net:reuseport_bpf_cpu                        pass     pass     same
net:reuseport_bpf_numa                       pass     pass     same
net:reuseport_dualstack                      pass     pass     same
net:rtnetlink.sh                             skip     skip     same
net:run_afpackettests                        pass     pass     same
net:run_netsocktests                         pass     pass     same
net:rxtimestamp.sh                           pass     pass     same
net:so_txtime.sh                             fail     fail     same
net:test_bpf.sh                              pass     pass     same
net:test_vxlan_fdb_changelink.sh             pass     pass     same
net:tls                                      pass     pass     same
net:traceroute.sh                            pass     pass     same
net:udpgro.sh                                fail     fail     same
net:udpgro_bench.sh                          fail     fail     same
net:udpgso.sh                                pass     pass     same
net:veth.sh                                  fail     fail     same
net:vrf-xfrm-tests.sh                        pass     pass     same
netfilter:conntrack_icmp_related.sh          fail     fail     same
netfilter:conntrack_tcp_unreplied.sh         fail     fail     same
netfilter:ipvs.sh                            skip     skip     same
netfilter:nft_flowtable.sh                   fail     fail     same
netfilter:nft_meta.sh                        pass     pass     same
netfilter:nft_nat.sh                         skip     skip     same
netfilter:nft_queue.sh                       skip     skip     same
nsfs:owner                                   pass     pass     same
nsfs:pidns                                   pass     pass     same
proc:fd-001-lookup                           pass     pass     same
proc:fd-002-posix-eq                         pass     pass     same
proc:fd-003-kthread                          pass     pass     same
proc:proc-loadavg-001                        pass     pass     same
proc:proc-self-map-files-001                 pass     pass     same
proc:proc-self-map-files-002                 fail     fail     same
proc:proc-self-syscall                       pass     pass     same
proc:proc-self-wchan                         pass     pass     same
proc:proc-uptime-001                         pass     pass     same
proc:proc-uptime-002                         pass     pass     same
proc:read                                    pass     pass     same
proc:setns-dcache                            fail     fail     same
pstore:pstore_post_reboot_tests              skip     skip     same
pstore:pstore_tests                          fail     fail     same
ptrace:peeksiginfo                           pass     pass     same
ptrace:vmaccess                              fail     fail     same
rseq:basic_percpu_ops_test                   pass     pass     same
rseq:basic_test                              pass     pass     same
rseq:param_test                              pass     pass     same
rseq:param_test_benchmark                    pass     pass     same
rseq:param_test_compare_twice                pass     pass     same
rseq:run_param_test.sh                       fail     fail     same
sgx:test_sgx                                 fail     fail     same
sigaltstack:sas                              pass     pass     same
size:get_size                                pass     pass     same
splice:default_file_splice_read.sh           pass     pass     same
static_keys:test_static_keys.sh              skip     skip     same
tc-testing:tdc.sh                            pass     pass     same
timens:clock_nanosleep                       pass     pass     same
timens:exec                                  pass     pass     same
timens:procfs                                pass     pass     same
timens:timens                                pass     pass     same
timens:timer                                 pass     pass     same
timens:timerfd                               pass     pass     same
timers:inconsistency-check                   fail     fail     same
timers:mqueue-lat                            pass     pass     same
timers:nanosleep                             pass     pass     same
timers:nsleep-lat                            fail     fail     same
timers:posix_timers                          pass     pass     same
timers:rtcpie                                pass     pass     same
timers:set-timer-lat                         fail     fail     same
timers:threadtest                            pass     pass     same
tpm2:test_smoke.sh                           fail     fail     same
tpm2:test_space.sh                           fail     fail     same
vm:run_vmtests                               fail     fail     same
x86:amx_64                                   fail     fail     same
x86:check_initial_reg_state_64               pass     pass     same
x86:corrupt_xstate_header_64                 pass     pass     same
x86:fsgsbase_64                              pass     pass     same
x86:fsgsbase_restore_64                      pass     pass     same
x86:ioperm_64                                pass     pass     same
x86:iopl_64                                  pass     pass     same
x86:mov_ss_trap_64                           pass     pass     same
x86:mpx-mini-test_64                         fail     fail     same
x86:protection_keys_64                       pass     pass     same
x86:sigaltstack_64                           pass     pass     same
x86:sigreturn_64                             pass     pass     same
x86:single_step_syscall_64                   pass     pass     same
x86:syscall_nt_64                            pass     pass     same
x86:sysret_rip_64                            pass     pass     same
x86:sysret_ss_attrs_64                       pass     pass     same
x86:test_mremap_vdso_64                      pass     pass     same
x86:test_vdso_64                             pass     pass     same
x86:test_vsyscall_64                         pass     pass     same
zram:zram.sh                                 pass     pass     same

… make the CPU_ENTRY_AREA_PAGES assert precise

jira VULN-3958
cve-pre CVE-2023-0597
commit-author Ingo Molnar <[email protected]>
commit 05b042a

When two recent commits that increased the size of the 'struct cpu_entry_area'
were merged in -tip, the 32-bit defconfig build started failing on the following
build time assert:

  ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_189’ declared with attribute error: BUILD_BUG_ON failed: CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE
  arch/x86/mm/cpu_entry_area.c:189:2: note: in expansion of macro ‘BUILD_BUG_ON’
  In function ‘setup_cpu_entry_area_ptes’,

Which corresponds to the following build time assert:

	BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE);

The purpose of this assert is to sanity check the fixed-value definition of
CPU_ENTRY_AREA_PAGES arch/x86/include/asm/pgtable_32_types.h:

	#define CPU_ENTRY_AREA_PAGES    (NR_CPUS * 41)

The '41' is supposed to match sizeof(struct cpu_entry_area)/PAGE_SIZE, which value
we didn't want to define in such a low level header, because it would cause
dependency hell.

Every time the size of cpu_entry_area is changed, we have to adjust CPU_ENTRY_AREA_PAGES
accordingly - and this assert is checking that constraint.

But the assert is both imprecise and buggy, primarily because it doesn't
include the single readonly IDT page that is mapped at CPU_ENTRY_AREA_BASE
(which begins at a PMD boundary).

This bug was hidden by the fact that by accident CPU_ENTRY_AREA_PAGES is defined
too large upstream (v5.4-rc8):

	#define CPU_ENTRY_AREA_PAGES    (NR_CPUS * 40)

While 'struct cpu_entry_area' is 155648 bytes, or 38 pages. So we had two extra
pages, which hid the bug.

The following commit (not yet upstream) increased the size to 40 pages:

  x86/iopl: ("Restrict iopl() permission scope")

... but increased CPU_ENTRY_AREA_PAGES only 41 - i.e. shortening the gap
to just 1 extra page.

Then another not-yet-upstream commit changed the size again:

  880a98c: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")

Which increased the cpu_entry_area size from 38 to 39 pages, but
didn't change CPU_ENTRY_AREA_PAGES (kept it at 40). This worked
fine, because we still had a page left from the accidental 'reserve'.

But when these two commits were merged into the same tree, the
combined size of cpu_entry_area grew from 38 to 40 pages, while
CPU_ENTRY_AREA_PAGES finally caught up to 40 as well.

Which is fine in terms of functionality, but the assert broke:

	BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE);

because CPU_ENTRY_AREA_MAP_SIZE is the total size of the area,
which is 1 page larger due to the IDT page.

To fix all this, change the assert to two precise asserts:

	BUILD_BUG_ON((CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE);
	BUILD_BUG_ON(CPU_ENTRY_AREA_TOTAL_SIZE != CPU_ENTRY_AREA_MAP_SIZE);

This takes the IDT page into account, and also connects the size-based
define of CPU_ENTRY_AREA_TOTAL_SIZE with the address-subtraction based
define of CPU_ENTRY_AREA_MAP_SIZE.

Also clean up some of the names which made it rather confusing:

 - 'CPU_ENTRY_AREA_TOT_SIZE' wasn't actually the 'total' size of
   the cpu-entry-area, but the per-cpu array size, so rename this
   to CPU_ENTRY_AREA_ARRAY_SIZE.

 - Introduce CPU_ENTRY_AREA_TOTAL_SIZE that _is_ the total mapping
   size, with the IDT included.

 - Add comments where '+1' denotes the IDT mapping - it wasn't
   obvious and took me about 3 hours to decode...

Finally, because this particular commit is actually applied after
this patch:

  880a98c: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")

Fix the CPU_ENTRY_AREA_PAGES value from 40 pages to the correct 39 pages.

All future commits that change cpu_entry_area will have to adjust
this value precisely.

As a side note, we should probably attempt to remove CPU_ENTRY_AREA_PAGES
and derive its value directly from the structure, without causing
header hell - but that is an adventure for another day! :-)

Fixes: 880a98c: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")
	Cc: Thomas Gleixner <[email protected]>
	Cc: Borislav Petkov <[email protected]>
	Cc: Peter Zijlstra (Intel) <[email protected]>
	Cc: Linus Torvalds <[email protected]>
	Cc: Andy Lutomirski <[email protected]>
	Cc: [email protected]
	Signed-off-by: Ingo Molnar <[email protected]>
(cherry picked from commit 05b042a)
	Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-3958
cve-pre CVE-2023-0597
commit-author Andrey Ryabinin <[email protected]>
commit 3f148f3

KASAN maps shadow for the entire CPU-entry-area:
  [CPU_ENTRY_AREA_BASE, CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE]

This will explode once the per-cpu entry areas are randomized since it
will increase CPU_ENTRY_AREA_MAP_SIZE to 512 GB and KASAN fails to
allocate shadow for such big area.

Fix this by allocating KASAN shadow only for really used cpu entry area
addresses mapped by cea_map_percpu_pages()

Thanks to the 0day folks for finding and reporting this to be an issue.

[ dhansen: tweak changelog since this will get committed before peterz's
	   actual cpu-entry-area randomization ]

	Signed-off-by: Andrey Ryabinin <[email protected]>
	Signed-off-by: Dave Hansen <[email protected]>
	Tested-by: Yujie Liu <[email protected]>
	Cc: kernel test robot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 3f148f3)
	Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-3958
cve CVE-2023-0597
commit-author Peter Zijlstra <[email protected]>
commit 97e3d26
upstream-diff |
  1. Ignored changes in `arch/x86/kernel/hw_breakpoint.c'. The modified
     function `within_cpu_entry()' doesn't exist in `ciqlts8_6'
     revision. The conflict might have been resolved by pure cherry
     picking of 24ae0c9,
     d390e6d,
     97417cb, but would result in
     introducing dead code: `within_area()' and `within_cpu_entry()'
     functions.
  2. Moved the `arch/x86/include/asm/pgtable_areas.h' changes to
     `arch/x86/include/asm/cpu_entry_area.h'. This must have been done
     because of the 186525b commit
     missing from `ciqlts8_6' history, which factored out the relevant
     #defines from `cpu_entry_area.h' to `pgtable_areas.h'. It was decided
     not to backport this commit as prerequisite since it's too extensive
     and making changes not related to the patch.
  3. Made small adaptation of changes relating to `cea_offset()'
     definitions in `arch/x86/mm/cpu_entry_area.c' which was necessary
     because of the dc4e002 commit
     missing from `ciqlts8_6' history. It was too functionality-intrusive
     to backport as prerequisite for auto resolution of just this single
     conflict.

Seth found that the CPU-entry-area; the piece of per-cpu data that is
mapped into the userspace page-tables for kPTI is not subject to any
randomization -- irrespective of kASLR settings.

On x86_64 a whole P4D (512 GB) of virtual address space is reserved for
this structure, which is plenty large enough to randomize things a
little.

As such, use a straight forward randomization scheme that avoids
duplicates to spread the existing CPUs over the available space.

  [ bp: Fix le build. ]

	Reported-by: Seth Jenkins <[email protected]>
	Reviewed-by: Kees Cook <[email protected]>
	Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
	Signed-off-by: Dave Hansen <[email protected]>
	Signed-off-by: Borislav Petkov <[email protected]>
(cherry picked from commit 97e3d26)
	Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-3958
cve-bf CVE-2023-0597
commit-author Sean Christopherson <[email protected]>
commit 80d72a8

Recompute the physical address for each per-CPU page in the CPU entry
area, a recent commit inadvertantly modified cea_map_percpu_pages() such
that every PTE is mapped to the physical address of the first page.

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
	Signed-off-by: Sean Christopherson <[email protected]>
	Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
	Reviewed-by: Andrey Ryabinin <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
(cherry picked from commit 80d72a8)
	Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-3958
cve-bf CVE-2023-0597
commit-author Sean Christopherson <[email protected]>
commit 9765014

Populate a KASAN shadow for the entire possible per-CPU range of the CPU
entry area instead of requiring that each individual chunk map a shadow.
Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping
was left behind, which can lead to not-present page faults during KASAN
validation if the kernel performs a software lookup into the GDT.  The DS
buffer is also likely affected.

The motivation for mapping the per-CPU areas on-demand was to avoid
mapping the entire 512GiB range that's reserved for the CPU entry area,
shaving a few bytes by not creating shadows for potentially unused memory
was not a goal.

The bug is most easily reproduced by doing a sigreturn with a garbage
CS in the sigcontext, e.g.

  int main(void)
  {
    struct sigcontext regs;

    syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
    syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
    syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);

    memset(&regs, 0, sizeof(regs));
    regs.cs = 0x1d0;
    syscall(__NR_rt_sigreturn);
    return 0;
  }

to coerce the kernel into doing a GDT lookup to compute CS.base when
reading the instruction bytes on the subsequent #GP to determine whether
or not the #GP is something the kernel should handle, e.g. to fixup UMIP
violations or to emulate CLI/STI for IOPL=3 applications.

  BUG: unable to handle page fault for address: fffffbc8379ace00
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0
  Oops: 0000 [ctrliq#1] PREEMPT SMP KASAN
  CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ ctrliq#432
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kasan_check_range+0xdf/0x190
  Call Trace:
   <TASK>
   get_desc+0xb0/0x1d0
   insn_get_seg_base+0x104/0x270
   insn_fetch_from_user+0x66/0x80
   fixup_umip_exception+0xb1/0x530
   exc_general_protection+0x181/0x210
   asm_exc_general_protection+0x22/0x30
  RIP: 0003:0x0
  Code: Unable to access opcode bytes at 0xffffffffffffffd6.
  RSP: 0003:0000000000000000 EFLAGS: 00000202
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
   </TASK>

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
	Reported-by: [email protected]
	Suggested-by: Andrey Ryabinin <[email protected]>
	Signed-off-by: Sean Christopherson <[email protected]>
	Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
	Reviewed-by: Andrey Ryabinin <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
(cherry picked from commit 9765014)
	Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-3958
cve-bf CVE-2023-0597
commit-author Michal Koutný <[email protected]>
commit a3f547a

The commit 97e3d26 ("x86/mm: Randomize per-cpu entry area") fixed
an omission of KASLR on CPU entry areas. It doesn't take into account
KASLR switches though, which may result in unintended non-determinism
when a user wants to avoid it (e.g. debugging, benchmarking).

Generate only a single combination of CPU entry areas offsets -- the
linear array that existed prior randomization when KASLR is turned off.

Since we have 3f148f3 ("x86/kasan: Map shadow for percpu pages on
demand") and followups, we can use the more relaxed guard
kasrl_enabled() (in contrast to kaslr_memory_enabled()).

Fixes: 97e3d26 ("x86/mm: Randomize per-cpu entry area")
	Signed-off-by: Michal Koutný <[email protected]>
	Signed-off-by: Dave Hansen <[email protected]>
	Cc: [email protected]
Link: https://lore.kernel.org/all/20230306193144.24605-1-mkoutny%40suse.com
(cherry picked from commit a3f547a)
	Signed-off-by: Marcin Wcisło <[email protected]>
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@PlaidCat PlaidCat merged commit ca47d0b into ctrliq:ciqlts8_6 Sep 26, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants