Description
While implementing finer granular ASLR I came across this comment:
bootloader/src/binary/level_4_entries.rs
Line 40 in ac46d04
We mark the first 512GiB of the address space as unusable for dynamically generated addresses. I think we do this because we identity map the context switch code into kernel memory and this code most likely resides within the first 512GiB of the address space:
Lines 166 to 181 in a445433
This causes a number of (admittedly small and unlikely) problems:
- The identity mapped pages could overlap with the kernel or other mappings
- We don't expose the identity mapped addresses to the kernel in
Mappings
- An attacker could make use of the identity mapped pages to defeat ASLR
- We mark so a lot of usable memory as unusable and because of that we can't check for overlaps because there would be a lot of false positives. We currently just ignore overlaps.
We could probably work around those problems while still mapping parts of the bootloader into the kernel's address space, but I'd like to propose another solution: We use another very short lived page table to do the context switch. This page table would only map a few pages containing code that switches to the kernel's page table. Importantly, we would set the page table up in such a way that the kernel's entrypoint is just after the page table switch instruction, so we don't have to use any code to jump to the kernel, it would simply be the next instruction.
I don't think we could reliably map such code into the bootloader's address space because we'd have to map the code just before the kernel's entrypoint which could be close to bootloader's code, so that's why I want to use a short-lived page table.
We also identity map a GDT into the kernel's address space:
Lines 183 to 193 in a445433
We should probably make the GDT's location configurable and expose it in
Mappings
.
I'd be happy to work on a pr for this.
Activity
bjorn3 commentedon Jun 26, 2022
Can't the kernel make a page table from scratch and simply not map this memory range to the bootloader? I would expect any kernel implementing KASLR or a userspace to build their page tables from scratch and not identity map anything. AFAIK only the physical memory map needs to be respected. The virtual memory mapping can vary freely as a kernel wishes.
phil-opp commentedon Jun 27, 2022
Interesting idea! However, AFAIK the kernels entry point address can be an arbitrary offset, e.g. in the middle of the
.text
section. So the memory before the entry point might already be used by other kernel code.Freax13 commentedon Jun 27, 2022
Well in theory a kernel could do anything that we do in stage 4, so yeah they could totally just create their own page tables, but I'd argue that we shouldn't expect kernels to do that. Personally, in my kernel, I copy and update the page table created by the bootloader, but never create a new page table completely from scratch, and it's been working great.
That's exactly my point, none of the pages in the page in the page table created by the bootloader are identity mapped except for the context switch code and the GDT.
Freax13 commentedon Jun 27, 2022
The short lived context switch page table wouldn't contain any entries from the kernel's page table, it'd just contain some entries to switch to the kernel's page table, so there's no way the two could overlap.
phil-opp commentedon Jun 27, 2022
Ah, I think I understand what you mean now. Assuming the kernel's entry point address is
0x2ec060
. We would then map the context switch function in the temp page table in a way that it lives on the same virtual page as the entry point? We also offset it within the page so that the page table reload happens exactly at the instruction before0x2ec060
? Does this always work without violating any alignment requirements?Freax13 commentedon Jun 27, 2022
Yes, except that instead of mapping the context switch function, we might just write a the opcodes manually, I don't think we'll have to write many and it's probably easier/more reliable than making the function work when placed at a different address.
Almost. I'm not aware of any alignment requirements that could cause problems, but there's another problem: This won't work if the entrypoint is placed right after the address space gap, the instruction pointer will not automatically jump the gap, so this will cause a GP.
mov cr3, rax
is a 3 byte instruction, so if the entrypoint is at0xffff_8000_0000_0000
,0xffff_8000_0000_0001
or0xffff_8000_0000_0002
, this won't work. All other locations (including0
) should work fine though.phil-opp commentedon Jun 27, 2022
I don't think that there are kernels that link their
.text
section right at the lower/upper half boundary. So that should not be a problem.Sounds like it would be worth a try! So feel free to open a PR if you like, preferably against the
next
branch (I'm trying my best to finish the rewrite soon).