-
Notifications
You must be signed in to change notification settings - Fork 12
[LTS 8.6] CVE-2023-0597 #591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… make the CPU_ENTRY_AREA_PAGES assert precise jira VULN-3958 cve-pre CVE-2023-0597 commit-author Ingo Molnar <[email protected]> commit 05b042a When two recent commits that increased the size of the 'struct cpu_entry_area' were merged in -tip, the 32-bit defconfig build started failing on the following build time assert: ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_189’ declared with attribute error: BUILD_BUG_ON failed: CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE arch/x86/mm/cpu_entry_area.c:189:2: note: in expansion of macro ‘BUILD_BUG_ON’ In function ‘setup_cpu_entry_area_ptes’, Which corresponds to the following build time assert: BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE); The purpose of this assert is to sanity check the fixed-value definition of CPU_ENTRY_AREA_PAGES arch/x86/include/asm/pgtable_32_types.h: #define CPU_ENTRY_AREA_PAGES (NR_CPUS * 41) The '41' is supposed to match sizeof(struct cpu_entry_area)/PAGE_SIZE, which value we didn't want to define in such a low level header, because it would cause dependency hell. Every time the size of cpu_entry_area is changed, we have to adjust CPU_ENTRY_AREA_PAGES accordingly - and this assert is checking that constraint. But the assert is both imprecise and buggy, primarily because it doesn't include the single readonly IDT page that is mapped at CPU_ENTRY_AREA_BASE (which begins at a PMD boundary). This bug was hidden by the fact that by accident CPU_ENTRY_AREA_PAGES is defined too large upstream (v5.4-rc8): #define CPU_ENTRY_AREA_PAGES (NR_CPUS * 40) While 'struct cpu_entry_area' is 155648 bytes, or 38 pages. So we had two extra pages, which hid the bug. The following commit (not yet upstream) increased the size to 40 pages: x86/iopl: ("Restrict iopl() permission scope") ... but increased CPU_ENTRY_AREA_PAGES only 41 - i.e. shortening the gap to just 1 extra page. Then another not-yet-upstream commit changed the size again: 880a98c: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit") Which increased the cpu_entry_area size from 38 to 39 pages, but didn't change CPU_ENTRY_AREA_PAGES (kept it at 40). This worked fine, because we still had a page left from the accidental 'reserve'. But when these two commits were merged into the same tree, the combined size of cpu_entry_area grew from 38 to 40 pages, while CPU_ENTRY_AREA_PAGES finally caught up to 40 as well. Which is fine in terms of functionality, but the assert broke: BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE); because CPU_ENTRY_AREA_MAP_SIZE is the total size of the area, which is 1 page larger due to the IDT page. To fix all this, change the assert to two precise asserts: BUILD_BUG_ON((CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE); BUILD_BUG_ON(CPU_ENTRY_AREA_TOTAL_SIZE != CPU_ENTRY_AREA_MAP_SIZE); This takes the IDT page into account, and also connects the size-based define of CPU_ENTRY_AREA_TOTAL_SIZE with the address-subtraction based define of CPU_ENTRY_AREA_MAP_SIZE. Also clean up some of the names which made it rather confusing: - 'CPU_ENTRY_AREA_TOT_SIZE' wasn't actually the 'total' size of the cpu-entry-area, but the per-cpu array size, so rename this to CPU_ENTRY_AREA_ARRAY_SIZE. - Introduce CPU_ENTRY_AREA_TOTAL_SIZE that _is_ the total mapping size, with the IDT included. - Add comments where '+1' denotes the IDT mapping - it wasn't obvious and took me about 3 hours to decode... Finally, because this particular commit is actually applied after this patch: 880a98c: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit") Fix the CPU_ENTRY_AREA_PAGES value from 40 pages to the correct 39 pages. All future commits that change cpu_entry_area will have to adjust this value precisely. As a side note, we should probably attempt to remove CPU_ENTRY_AREA_PAGES and derive its value directly from the structure, without causing header hell - but that is an adventure for another day! :-) Fixes: 880a98c: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit") Cc: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]> (cherry picked from commit 05b042a) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-3958 cve-pre CVE-2023-0597 commit-author Andrey Ryabinin <[email protected]> commit 3f148f3 KASAN maps shadow for the entire CPU-entry-area: [CPU_ENTRY_AREA_BASE, CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE] This will explode once the per-cpu entry areas are randomized since it will increase CPU_ENTRY_AREA_MAP_SIZE to 512 GB and KASAN fails to allocate shadow for such big area. Fix this by allocating KASAN shadow only for really used cpu entry area addresses mapped by cea_map_percpu_pages() Thanks to the 0day folks for finding and reporting this to be an issue. [ dhansen: tweak changelog since this will get committed before peterz's actual cpu-entry-area randomization ] Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Tested-by: Yujie Liu <[email protected]> Cc: kernel test robot <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 3f148f3) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-3958 cve CVE-2023-0597 commit-author Peter Zijlstra <[email protected]> commit 97e3d26 upstream-diff | 1. Ignored changes in `arch/x86/kernel/hw_breakpoint.c'. The modified function `within_cpu_entry()' doesn't exist in `ciqlts8_6' revision. The conflict might have been resolved by pure cherry picking of 24ae0c9, d390e6d, 97417cb, but would result in introducing dead code: `within_area()' and `within_cpu_entry()' functions. 2. Moved the `arch/x86/include/asm/pgtable_areas.h' changes to `arch/x86/include/asm/cpu_entry_area.h'. This must have been done because of the 186525b commit missing from `ciqlts8_6' history, which factored out the relevant #defines from `cpu_entry_area.h' to `pgtable_areas.h'. It was decided not to backport this commit as prerequisite since it's too extensive and making changes not related to the patch. 3. Made small adaptation of changes relating to `cea_offset()' definitions in `arch/x86/mm/cpu_entry_area.c' which was necessary because of the dc4e002 commit missing from `ciqlts8_6' history. It was too functionality-intrusive to backport as prerequisite for auto resolution of just this single conflict. Seth found that the CPU-entry-area; the piece of per-cpu data that is mapped into the userspace page-tables for kPTI is not subject to any randomization -- irrespective of kASLR settings. On x86_64 a whole P4D (512 GB) of virtual address space is reserved for this structure, which is plenty large enough to randomize things a little. As such, use a straight forward randomization scheme that avoids duplicates to spread the existing CPUs over the available space. [ bp: Fix le build. ] Reported-by: Seth Jenkins <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> (cherry picked from commit 97e3d26) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-3958 cve-bf CVE-2023-0597 commit-author Sean Christopherson <[email protected]> commit 80d72a8 Recompute the physical address for each per-CPU page in the CPU entry area, a recent commit inadvertantly modified cea_map_percpu_pages() such that every PTE is mapped to the physical address of the first page. Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Andrey Ryabinin <[email protected]> Link: https://lkml.kernel.org/r/[email protected] (cherry picked from commit 80d72a8) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-3958 cve-bf CVE-2023-0597 commit-author Sean Christopherson <[email protected]> commit 9765014 Populate a KASAN shadow for the entire possible per-CPU range of the CPU entry area instead of requiring that each individual chunk map a shadow. Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping was left behind, which can lead to not-present page faults during KASAN validation if the kernel performs a software lookup into the GDT. The DS buffer is also likely affected. The motivation for mapping the per-CPU areas on-demand was to avoid mapping the entire 512GiB range that's reserved for the CPU entry area, shaving a few bytes by not creating shadows for potentially unused memory was not a goal. The bug is most easily reproduced by doing a sigreturn with a garbage CS in the sigcontext, e.g. int main(void) { struct sigcontext regs; syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul); syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); memset(®s, 0, sizeof(regs)); regs.cs = 0x1d0; syscall(__NR_rt_sigreturn); return 0; } to coerce the kernel into doing a GDT lookup to compute CS.base when reading the instruction bytes on the subsequent #GP to determine whether or not the #GP is something the kernel should handle, e.g. to fixup UMIP violations or to emulate CLI/STI for IOPL=3 applications. BUG: unable to handle page fault for address: fffffbc8379ace00 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0 Oops: 0000 [ctrliq#1] PREEMPT SMP KASAN CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ ctrliq#432 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kasan_check_range+0xdf/0x190 Call Trace: <TASK> get_desc+0xb0/0x1d0 insn_get_seg_base+0x104/0x270 insn_fetch_from_user+0x66/0x80 fixup_umip_exception+0xb1/0x530 exc_general_protection+0x181/0x210 asm_exc_general_protection+0x22/0x30 RIP: 0003:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0003:0000000000000000 EFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Reported-by: [email protected] Suggested-by: Andrey Ryabinin <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Andrey Ryabinin <[email protected]> Link: https://lkml.kernel.org/r/[email protected] (cherry picked from commit 9765014) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-3958 cve-bf CVE-2023-0597 commit-author Michal Koutný <[email protected]> commit a3f547a The commit 97e3d26 ("x86/mm: Randomize per-cpu entry area") fixed an omission of KASLR on CPU entry areas. It doesn't take into account KASLR switches though, which may result in unintended non-determinism when a user wants to avoid it (e.g. debugging, benchmarking). Generate only a single combination of CPU entry areas offsets -- the linear array that existed prior randomization when KASLR is turned off. Since we have 3f148f3 ("x86/kasan: Map shadow for percpu pages on demand") and followups, we can use the more relaxed guard kasrl_enabled() (in contrast to kaslr_memory_enabled()). Fixes: 97e3d26 ("x86/mm: Randomize per-cpu entry area") Signed-off-by: Michal Koutný <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/all/20230306193144.24605-1-mkoutny%40suse.com (cherry picked from commit a3f547a) Signed-off-by: Marcin Wcisło <[email protected]>
bmastbergen
approved these changes
Sep 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥌
PlaidCat
approved these changes
Sep 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[LTS 8.6]
CVE-2023-0597
VULN-3958
Problem
https://access.redhat.com/security/cve/CVE-2023-0597
Affected: yes
This flaw is independent of any config options and affects all x86 kernels. The commit 97e3d26 solving the issue is absent from
ciqlts8_6
's history, along with other accompanying changes (see Solution). Additionally, the options settings for related randomization techniques found inconfigs/kernel-x86_64.config
kernel-src-tree/configs/kernel-x86_64.config
Lines 4565 to 4566 in 58ac554
clearly display the desire to have the randomization in place wherever it may apply.
Solution
The official mainline fix for CVE-2023-0597 is 97e3d26, but the actual solution is complicated on LTS 8.6 by the non-backported changes to kernel's memory mapping as well as by multiple fixes of the fix present in the mainline.
Consider the branched-off timeline of changes to the 97e3d26-affected files:
The commits identified with
0
,1
,2
,3
,4
,5
comprise the solution proposed in this PR.0
is the CVE-2023-0597 fix proper, with some considerable upstream diffs, discussed below.1
was picked to ease some conflicts for0
.5
is an official bugfix to0
.2
,3
,4
are unofficial bugfixes to0
, and also serve as prerequisites for5
.The complete relations list - official and actual - are as follows:
Marked with asterisk
*
are the ones which were able to be expressed in commit headlines with thecve-pre
,cve-bf
attributes.Commentary on the applied changes:
Commit
1
(05b042a) fixes a bug with the calculation of cpu entry area size and renames some constants inarch/x86/include/asm/cpu_entry_area.h
which are later modified by0
. Its "Fixes" attribute points to the very previous commit 880a98c, but it merely exposed the bug, not introduced it.Commit
2
(3f148f3) is actually a bugfix of0
, but it was put earlier in mainline's history - note the AuthorDate 2022-10-27 of0
and 2022-10-28 of2
, as well as 3f148f3's message:It was cherry-picked before
0
to preserve the ordering inkernel-mainline
, and marked ascve-pre
to avoid possible confusion aroundcve-bf
and because it might as well be treated as preparation for the fix.Commit
0
(97e3d26) was picked with the following changes from the upstream:arch/x86/kernel/hw_breakpoint.c
. The modified functionwithin_cpu_entry()
doesn't exist inciqlts8_6
revision. The conflict might have been resolved purely by cherry picking 24ae0c9, d390e6d, 97417cb, but that would have resulted in introducing dead code:within_area()
andwithin_cpu_entry()
functions.arch/x86/include/asm/pgtable_areas.h
changes toarch/x86/include/asm/cpu_entry_area.h
. This had to be done because of the 186525b commit missing fromciqlts8_6
history, which factored out the relevant #defines fromcpu_entry_area.h
topgtable_areas.h
. It was decided not to backport this commit as prerequisite since it's too extensive and making changes not related to the patch.cea_offset()
definitions inarch/x86/mm/cpu_entry_area.c
which was necessary because of the dc4e002 commit missing fromciqlts8_6
history. It was too functionality-intrusive to backport as prerequisite for auto resolution of just this single conflict.Commits
3
(80d72a8) and4
(9765014) expand on the2
bugfix so must had been included for completeness.Commit
5
(a3f547a) makes the randomization implemented in0
configurable, which it should have been from the beginning. Commits2
,3
,4
were included also because they were5
's prerequisite:kABI check: passed
Boot test: passed
boot-test.log
Kselftests: passed relative
Reference
kselftests–ciqlts8_6–run1.log
Patch
kselftests–ciqlts8_6-CVE-2023-0597–run1.log
Comparison
The tests results for the reference kernel and the patch are the same