Skip to content

Commit d3378e8

Browse files
Aili Yaotorvalds
Aili Yao
authored andcommitted
mm/gup: check page posion status for coredump.
When we do coredump for user process signal, this may be an SIGBUS signal with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is resulted from ECC memory fail like SRAR or SRAO, we expect the memory recovery work is finished correctly, then the get_dump_page() will not return the error page as its process pte is set invalid by memory_failure(). But memory_failure() may fail, and the process's related pte may not be correctly set invalid, for current code, we will return the poison page, get it dumped, and then lead to system panic as its in kernel code. So check the poison status in get_dump_page(), and if TRUE, return NULL. There maybe other scenario that is also better to check the posion status and not to panic, so make a wrapper for this check, Thanks to David's suggestion(<[email protected]>). [[email protected]: s/0/false/] [[email protected]: is_page_poisoned() arg cannot be null, per Matthew] Link: https://lkml.kernel.org/r/20210322115233.05e4e82a@alex-virtual-machine Link: https://lkml.kernel.org/r/20210319104437.6f30e80d@alex-virtual-machine Signed-off-by: Aili Yao <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Aili Yao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent a5c5e44 commit d3378e8

File tree

2 files changed

+24
-0
lines changed

2 files changed

+24
-0
lines changed

mm/gup.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1535,6 +1535,10 @@ struct page *get_dump_page(unsigned long addr)
15351535
FOLL_FORCE | FOLL_DUMP | FOLL_GET);
15361536
if (locked)
15371537
mmap_read_unlock(mm);
1538+
1539+
if (ret == 1 && is_page_poisoned(page))
1540+
return NULL;
1541+
15381542
return (ret == 1) ? page : NULL;
15391543
}
15401544
#endif /* CONFIG_ELF_CORE */

mm/internal.h

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,26 @@ static inline void set_page_refcounted(struct page *page)
9797
set_page_count(page, 1);
9898
}
9999

100+
/*
101+
* When kernel touch the user page, the user page may be have been marked
102+
* poison but still mapped in user space, if without this page, the kernel
103+
* can guarantee the data integrity and operation success, the kernel is
104+
* better to check the posion status and avoid touching it, be good not to
105+
* panic, coredump for process fatal signal is a sample case matching this
106+
* scenario. Or if kernel can't guarantee the data integrity, it's better
107+
* not to call this function, let kernel touch the poison page and get to
108+
* panic.
109+
*/
110+
static inline bool is_page_poisoned(struct page *page)
111+
{
112+
if (PageHWPoison(page))
113+
return true;
114+
else if (PageHuge(page) && PageHWPoison(compound_head(page)))
115+
return true;
116+
117+
return false;
118+
}
119+
100120
extern unsigned long highest_memmap_pfn;
101121

102122
/*

0 commit comments

Comments
 (0)