From 7118fc2906e2925d7edb5ed9c8a57f2a5f23b849 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Wed, 30 Jun 2021 18:48:34 -0700 Subject: hugetlb: address ref count racing in prep_compound_gigantic_page In [1], Jann Horn points out a possible race between prep_compound_gigantic_page and __page_cache_add_speculative. The root cause of the possible race is prep_compound_gigantic_page uncondittionally setting the ref count of pages to zero. It does this because prep_compound_gigantic_page is handed a 'group' of pages from an allocator and needs to convert that group of pages to a compound page. The ref count of each page in this 'group' is one as set by the allocator. However, the ref count of compound page tail pages must be zero. The potential race comes about when ref counted pages are returned from the allocator. When this happens, other mm code could also take a reference on the page. __page_cache_add_speculative is one such example. Therefore, prep_compound_gigantic_page can not just set the ref count of pages to zero as it does today. Doing so would lose the reference taken by any other code. This would lead to BUGs in code checking ref counts and could possibly even lead to memory corruption. There are two possible ways to address this issue. 1) Make all allocators of gigantic groups of pages be able to return a properly constructed compound page. 2) Make prep_compound_gigantic_page be more careful when constructing a compound page. This patch takes approach 2. In prep_compound_gigantic_page, use cmpxchg to only set ref count to zero if it is one. If the cmpxchg fails, call synchronize_rcu() in the hope that the extra ref count will be driopped during a rcu grace period. This is not a performance critical code path and the wait should be accceptable. If the ref count is still inflated after the grace period, then undo any modifications made and return an error. Currently prep_compound_gigantic_page is type void and does not return errors. Modify the two callers to check for and handle error returns. On error, the caller must free the 'group' of pages as they can not be used to form a gigantic page. After freeing pages, the runtime caller (alloc_fresh_huge_page) will retry the allocation once. Boot time allocations can not be retried. The routine prep_compound_page also unconditionally sets the ref count of compound page tail pages to zero. However, in this case the buddy allocator is constructing a compound page from freshly allocated pages. The ref count on those freshly allocated pages is already zero, so the set_page_count(p, 0) is unnecessary and could lead to confusion. Just remove it. [1] https://lore.kernel.org/linux-mm/CAG48ez23q0Jy9cuVnwAe7t_fdhMk2S7N5Hdi-GLcCeq5bsfLxw@mail.gmail.com/ Link: https://lkml.kernel.org/r/20210622021423.154662-3-mike.kravetz@oracle.com Fixes: 58a84aa92723 ("thp: set compound tail page _count to zero") Signed-off-by: Mike Kravetz Reported-by: Jann Horn Cc: Youquan Song Cc: Andrea Arcangeli Cc: Jan Kara Cc: John Hubbard Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Michal Hocko Cc: Muchun Song Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 1 - 1 file changed, 1 deletion(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index db00ee8d79d2..eeff64843718 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -754,7 +754,6 @@ void prep_compound_page(struct page *page, unsigned int order) __SetPageHead(page); for (i = 1; i < nr_pages; i++) { struct page *p = page + i; - set_page_count(p, 0); p->mapping = TAIL_MAPPING; set_compound_head(p, page); } -- cgit From 041711ce7cdf023f53d76f64d82b75210248e18d Mon Sep 17 00:00:00 2001 From: Zhen Lei Date: Wed, 30 Jun 2021 18:53:17 -0700 Subject: mm: fix spelling mistakes Fix some spelling mistakes in comments: each having differents usage ==> each has a different usage statments ==> statements adresses ==> addresses aggresive ==> aggressive datas ==> data posion ==> poison higer ==> higher precisly ==> precisely wont ==> won't We moves tha ==> We move the endianess ==> endianness Link: https://lkml.kernel.org/r/20210519065853.7723-2-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei Reviewed-by: Souptick Joarder Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index eeff64843718..6700cfcfab46 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3180,7 +3180,7 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) int cpu; /* - * Allocate in the BSS so we wont require allocation in + * Allocate in the BSS so we won't require allocation in * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y */ static cpumask_t cpus_with_pcps; -- cgit From f7173090033c70886d925995e9dfdfb76dbb2441 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 30 Jun 2021 18:53:25 -0700 Subject: mm/page_alloc: make should_fail_alloc_page() static make W=1 generates the following warning for mm/page_alloc.c mm/page_alloc.c:3651:15: warning: no previous prototype for `should_fail_alloc_page' [-Wmissing-prototypes] noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order) ^~~~~~~~~~~~~~~~~~~~~~ This function is deliberately split out for BPF to allow errors to be injected. The function is not used anywhere else so it is local to the file. Make it static which should still allow error injection to be used similar to how block/blk-core.c:should_fail_bio() works. Link: https://lkml.kernel.org/r/20210520084809.8576-4-mgorman@techsingularity.net Signed-off-by: Mel Gorman Reviewed-by: Yang Shi Acked-by: Vlastimil Babka Cc: Dan Streetman Cc: David Hildenbrand Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6700cfcfab46..2f71d92c45b4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3819,7 +3819,7 @@ static inline bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order) #endif /* CONFIG_FAIL_PAGE_ALLOC */ -noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order) +static noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order) { return __should_fail_alloc_page(gfp_mask, order); } -- cgit