Skip to content

Commit ed53278

Browse files
davidhildenbrandgregkh
authored andcommitted
mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
commit e822969 upstream. Patch series "mm: fix max_pfn not falling on section boundary", v2. Playing with different memory sizes for a x86-64 guest, I discovered that some memmaps (highest section if max_mem does not fall on the section boundary) are marked as being valid and online, but contain garbage. We have to properly initialize these memmaps. Looking at /proc/kpageflags and friends, I found some more issues, partially related to this. This patch (of 3): If max_pfn is not aligned to a section boundary, we can easily run into BUGs. This can e.g., be triggered on x86-64 under QEMU by specifying a memory size that is not a multiple of 128MB (e.g., 4097MB, but also 4160MB). I was told that on real HW, we can easily have this scenario (esp., one of the main reasons sub-section hotadd of devmem was added). The issue is, that we have a valid memmap (pfn_valid()) for the whole section, and the whole section will be marked "online". pfn_to_online_page() will succeed, but the memmap contains garbage. E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m 4160M" - (see tools/vm/page-types.c): [ 200.476376] BUG: unable to handle page fault for address: fffffffffffffffe [ 200.477500] #PF: supervisor read access in kernel mode [ 200.478334] #PF: error_code(0x0000) - not-present page [ 200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0 [ 200.479557] Oops: 0000 [hardkernel#4] SMP NOPTI [ 200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G D W 5.5.0-rc1-next-20191209 hardkernel#93 [ 200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4 [ 200.481648] RIP: 0010:stable_page_flags+0x4d/0x410 [ 200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f [ 200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202 [ 200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000 [ 200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246 [ 200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000 [ 200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001 [ 200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08 [ 200.487130] FS: 00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000 [ 200.487804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0 [ 200.488897] Call Trace: [ 200.489115] kpageflags_read+0xe9/0x140 [ 200.489447] proc_reg_read+0x3c/0x60 [ 200.489755] vfs_read+0xc2/0x170 [ 200.490037] ksys_pread64+0x65/0xa0 [ 200.490352] do_syscall_64+0x5c/0xa0 [ 200.490665] entry_SYSCALL_64_after_hwframe+0x49/0xbe But it can be triggered much easier via "cat /proc/kpageflags > /dev/null" after cold/hot plugging a DIMM to such a system: [root@localhost ~]# cat /proc/kpageflags > /dev/null [ 111.517275] BUG: unable to handle page fault for address: fffffffffffffffe [ 111.517907] #PF: supervisor read access in kernel mode [ 111.518333] #PF: error_code(0x0000) - not-present page [ 111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0 This patch fixes that by at least zero-ing out that memmap (so e.g., page_to_pfn() will not crash). Commit 907ec5f ("mm: zero remaining unavailable struct pages") tried to fix a similar issue, but forgot to consider this special case. After this patch, there are still problems to solve. E.g., not all of these pages falling into a memory hole will actually get initialized later and set PageReserved - they are only zeroed out - but at least the immediate crashes are gone. A follow-up patch will take care of this. Link: http://lkml.kernel.org/r/[email protected] Fixes: f7f9910 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: David Hildenbrand <[email protected]> Tested-by: Daniel Jordan <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Steven Sistare <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Bob Picco <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Dan Williams <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: <[email protected]> [4.15+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent 03c0309 commit ed53278

File tree

1 file changed

+12
-2
lines changed

1 file changed

+12
-2
lines changed

mm/page_alloc.c

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6933,7 +6933,8 @@ static u64 zero_pfn_range(unsigned long spfn, unsigned long epfn)
69336933
* This function also addresses a similar issue where struct pages are left
69346934
* uninitialized because the physical address range is not covered by
69356935
* memblock.memory or memblock.reserved. That could happen when memblock
6936-
* layout is manually configured via memmap=.
6936+
* layout is manually configured via memmap=, or when the highest physical
6937+
* address (max_pfn) does not end on a section boundary.
69376938
*/
69386939
void __init zero_resv_unavail(void)
69396940
{
@@ -6951,7 +6952,16 @@ void __init zero_resv_unavail(void)
69516952
pgcnt += zero_pfn_range(PFN_DOWN(next), PFN_UP(start));
69526953
next = end;
69536954
}
6954-
pgcnt += zero_pfn_range(PFN_DOWN(next), max_pfn);
6955+
6956+
/*
6957+
* Early sections always have a fully populated memmap for the whole
6958+
* section - see pfn_valid(). If the last section has holes at the
6959+
* end and that section is marked "online", the memmap will be
6960+
* considered initialized. Make sure that memmap has a well defined
6961+
* state.
6962+
*/
6963+
pgcnt += zero_pfn_range(PFN_DOWN(next),
6964+
round_up(max_pfn, PAGES_PER_SECTION));
69556965

69566966
/*
69576967
* Struct pages that do not have backing memory. This could be because

0 commit comments

Comments
 (0)