summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2023-02-02zsmalloc: skip chain size calculation for pow_of_2 classesSergey Senozhatsky
If a class size is power of 2 then it wastes no memory and the best configuration is 1 physical page per-zspage. Link: https://lkml.kernel.org/r/20230118005210.2814763-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02zsmalloc: rework zspage chain size selectionSergey Senozhatsky
Patch series "zsmalloc: make zspage chain size configurable". Computers are bad at division. We currently decide the best zspage chain size (max number of physical pages per-zspage) by looking at a `used percentage` value. This is not enough as we lose precision during usage percentage calculations For example, let's look at size class 208: pages per zspage wasted bytes used% 1 144 96 2 80 99 3 16 99 4 160 99 Current algorithm will select 2 page per zspage configuration, as it's the first one to reach 99%. However, 3 pages per zspage waste less memory. Change algorithm and select zspage configuration that has lowest wasted value. Link: https://lkml.kernel.org/r/20230118005210.2814763-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20230118005210.2814763-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/page_alloc: use deferred_pages_enabled() wherever applicableAnshuman Khandual
Instead of directly accessing static deferred_pages, replace such instances with the helper deferred_pages_enabled(). No functional change is intended. Link: https://lkml.kernel.org/r/20230105082506.241529-1-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/page_ext: init page_ext early if there are no deferred struct pagesPasha Tatashin
page_ext must be initialized after all struct pages are initialized. Therefore, page_ext is initialized after page_alloc_init_late(), and can optionally be initialized earlier via early_page_ext kernel parameter which as a side effect also disables deferred struct pages. Allow to automatically init page_ext early when there are no deferred struct pages in order to be able to use page_ext during kernel boot and track for example page allocations early. [pasha.tatashin@soleen.com: fix build with CONFIG_PAGE_EXTENSION=n] Link: https://lkml.kernel.org/r/20230118155251.2522985-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20230117204617.1553748-1-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Charan Teja Kalla <quic_charante@quicinc.com> Cc: David Hildenbrand <david@redhat.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/damon/core: skip apply schemes if emptyHuaisheng Ye
Sometimes there is no scheme in damon's context, for example just use damo record to monitor workload's data access pattern. If current damon context doesn't have any scheme in the list, kdamond has no need to iterate over list of all targets and regions but do nothing. So, skip apply schemes when ctx->schemes is empty. Link: https://lkml.kernel.org/r/20230116062347.1148553-1-huaisheng.ye@intel.com Signed-off-by: Huaisheng Ye <huaisheng.ye@intel.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/secretmem: remove redundant initiialization of pointer fileColin Ian King
The pointer file is being initialized with a value that is never read, it is being re-assigned later on. Clean up code by removing the redundant initialization. Link: https://lkml.kernel.org/r/20230116164332.79500-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foudation.org> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02readahead: convert readahead_expand() to use a folioMatthew Wilcox (Oracle)
Replace the uses of page with a folio. Also add a missing test for workingset in the leading edge expansion. Link: https://lkml.kernel.org/r/20230116193941.2148487-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02filemap: convert filemap_range_has_page() to use a folioMatthew Wilcox (Oracle)
The folio isn't returned from this function, so this is an entirely internal change. Link: https://lkml.kernel.org/r/20230116193941.2148487-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02filemap: convert filemap_map_pmd() to take a folioMatthew Wilcox (Oracle)
Patch series "Some more filemap folio conversions". Three more places which could easily be converted to folios. The third one fixes a minor bug in readahead_expand(), but it's only a performance bug and there are few users of readahead_expand(), so I don't think it's worth backporting. This patch (of 3): Save a few calls to compound_head(). We specify exactly which page from the folio to use by passing in start_pgoff, which means this will work for a folio which is larger than PMD size. The rest of the VM isn't prepared for that yet, but now this function is. Link: https://lkml.kernel.org/r/20230116193941.2148487-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230116193941.2148487-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02rmap: add folio parameter to __page_set_anon_rmap()Matthew Wilcox (Oracle)
Avoid the compound_head() call in PageAnon() by passing in the folio that all callers have. Also save me from wondering whether page->mapping can ever be overwritten on a tail page (I don't think it can, but I'm not 100% sure). Link: https://lkml.kernel.org/r/20230116192959.2147032-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: clean up mlock_page / munlock_page references in commentsMatthew Wilcox (Oracle)
Change documentation and comments that refer to now-renamed functions. Link: https://lkml.kernel.org/r/20230116192827.2146732-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: remove munlock_vma_page()Matthew Wilcox (Oracle)
All callers now have a folio and can call munlock_vma_folio(). Update the documentation to refer to munlock_vma_folio(). Link: https://lkml.kernel.org/r/20230116192827.2146732-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: remove mlock_vma_page()Matthew Wilcox (Oracle)
All callers now have a folio and can call mlock_vma_folio(). Update the documentation to refer to mlock_vma_folio(). Link: https://lkml.kernel.org/r/20230116192827.2146732-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: remove page_evictable()Matthew Wilcox (Oracle)
Patch series "Remove leftover mlock/munlock page wrappers". We no longer need the various mlock page functions as all callers have folios. This patch (of 4): This function now has no users. Also update the unevictable-lru documentation to discuss folios instead of pages (mostly). [akpm@linux-foundation.org: fix Documentation/mm/unevictable-lru.rst underlining] Link: https://lkml.kernel.org/r/20230117145106.585b277b@canb.auug.org.au Link: https://lkml.kernel.org/r/20230116192827.2146732-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230116192827.2146732-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: convert mem_cgroup_css_from_page() to mem_cgroup_css_from_folio()Matthew Wilcox (Oracle)
Only one caller doesn't have a folio, so move the page_folio() call to that one caller from mem_cgroup_css_from_folio(). Link: https://lkml.kernel.org/r/20230116192507.2146150-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/fs: convert inode_attach_wb() to take a folioMatthew Wilcox (Oracle)
Patch series "Writeback folio conversions". Remove more calls to compound_head() by passing folios around instead of pages. This patch (of 2): The only caller of inode_attach_wb() which doesn't pass NULL already has a folio, so convert the whole call-chain to take folios. Link: https://lkml.kernel.org/r/20230116192507.2146150-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230116192507.2146150-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: use a folio in copy_present_pte()Matthew Wilcox (Oracle)
We still have to keep the page around because we need to know which page in the folio we're copying, but we can replace five implict calls to compound_head() with one. Link: https://lkml.kernel.org/r/20230116191813.2145215-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: use a folio in copy_pte_range()Matthew Wilcox (Oracle)
Allocate an order-0 folio instead of a page and pass it all the way down the call chain. Removes dozens of calls to compound_head(). Link: https://lkml.kernel.org/r/20230116191813.2145215-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: convert wp_page_copy() to use foliosMatthew Wilcox (Oracle)
Use new_folio instead of new_page throughout, because we allocated it and know it's an order-0 folio. Most old_page uses become old_folio, but use vmf->page where we need the precise page. Link: https://lkml.kernel.org/r/20230116191813.2145215-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: convert do_anonymous_page() to use a folioMatthew Wilcox (Oracle)
Removes six calls to compound_head(); some inline and some external. Link: https://lkml.kernel.org/r/20230116191813.2145215-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: add vma_alloc_zeroed_movable_folio()Matthew Wilcox (Oracle)
Replace alloc_zeroed_user_highpage_movable(). The main difference is returning a folio containing a single page instead of returning the page, but take the opportunity to rename the function to match other allocation functions a little better and rewrite the documentation to place more emphasis on the zeroing rather than the highmem aspect. Link: https://lkml.kernel.org/r/20230116191813.2145215-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02filemap: remove find_get_pages_range_tag()Vishal Moola (Oracle)
All callers to find_get_pages_range_tag(), find_get_pages_tag(), pagevec_lookup_range_tag(), and pagevec_lookup_tag() have been removed. Link: https://lkml.kernel.org/r/20230104211448.4804-24-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02page-writeback: convert write_cache_pages() to use filemap_get_folios_tag()Vishal Moola (Oracle)
Convert function to use folios throughout. This is in preparation for the removal of find_get_pages_range_tag(). This change removes 8 calls to compound_head(), and the function now supports large folios. Link: https://lkml.kernel.org/r/20230104211448.4804-5-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Matthew Wilcow (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02filemap: convert __filemap_fdatawait_range() to use filemap_get_folios_tag()Vishal Moola (Oracle)
Convert function to use folios. This is in preparation for the removal of find_get_pages_range_tag(). This change removes 2 calls to compound_head(). Link: https://lkml.kernel.org/r/20230104211448.4804-4-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Matthew Wilcow (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02filemap: add filemap_get_folios_tag()Vishal Moola (Oracle)
This is the equivalent of find_get_pages_range_tag(), except for folios instead of pages. One noteable difference is filemap_get_folios_tag() does not take in a maximum pages argument. It instead tries to fill a folio batch and stops either once full (15 folios) or reaching the end of the search range. The new function supports large folios, the initial function did not since all callers don't use large folios. Link: https://lkml.kernel.org/r/20230104211448.4804-3-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Matthew Wilcow (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: fix khugepaged with shmem_enabled=adviseDavid Stevens
Pass vm_flags as a parameter to shmem_is_huge, rather than reading the flags from the vm_area_struct in question. This allows the updated flags from hugepage_madvise to be passed to the check, which is necessary because madvise does not update the vm_area_struct's flags until after hugepage_madvise returns. This fixes an issue when shmem_enabled=madvise, where MADV_HUGEPAGE on shmem was not able to register the mm_struct with khugepaged. Prior to cd89fb065099, the mm_struct was registered by MADV_HUGEPAGE regardless of the value of shmem_enabled (which was only checked when scanning vmas). Link: https://lkml.kernel.org/r/20230113023011.1784015-1-stevensd@google.com Fixes: cd89fb065099 ("mm,thp,shmem: make khugepaged obey tmpfs mount flags") Signed-off-by: David Stevens <stevensd@chromium.org> Cc: David Stevens <stevensd@chromium.org> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: discard __GFP_ATOMICNeilBrown
__GFP_ATOMIC serves little purpose. Its main effect is to set ALLOC_HARDER which adds a few little boosts to increase the chance of an allocation succeeding, one of which is to lower the water-mark at which it will succeed. It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also adjusts this watermark. It is probable that other users of __GFP_HIGH should benefit from the other little bonuses that __GFP_ATOMIC gets. __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. There is little point to this. We already get a might_sleep() warning if __GFP_DIRECT_RECLAIM is set. __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is probable that testing ALLOC_HARDER is a better fit here. __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might sleep. This should test __GFP_DIRECT_RECLAIM instead. This patch: - removes __GFP_ATOMIC - allows __GFP_HIGH allocations to ignore watermark boosting as well as GFP_ATOMIC requests. - makes other adjustments as suggested by the above. The net result is not change to GFP_ATOMIC allocations. Other allocations that use __GFP_HIGH will benefit from a few different extra privileges. This affects: xen, dm, md, ntfs3 the vermillion frame buffer hibernation ksm swap all of which likely produce more benefit than cost if these selected allocation are more likely to succeed quickly. [mgorman: Minor adjustments to rework on top of a series] Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name Link: https://lkml.kernel.org/r/20230113111217.14134-7-mgorman@techsingularity.net Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations ↵Mel Gorman
accesses reserves GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a vague description. In preparation for the removal of GFP_ATOMIC redefine __GFP_ATOMIC to simply mean non-blocking and renaming ALLOC_HARDER to ALLOC_NON_BLOCK accordingly. __GFP_HIGH is required for access to reserves but non-blocking is granted more access. For example, GFP_NOWAIT is non-blocking but has no special access to reserves. A __GFP_NOFAIL blocking allocation is granted access similar to __GFP_HIGH if the only alternative is an OOM kill. Link: https://lkml.kernel.org/r/20230113111217.14134-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/page_alloc: explicitly define what alloc flags deplete min reservesMel Gorman
As there are more ALLOC_ flags that affect reserves, define what flags affect reserves and clarify the effect of each flag. Link: https://lkml.kernel.org/r/20230113111217.14134-5-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/page_alloc: explicitly record high-order atomic allocations in alloc_flagsMel Gorman
A high-order ALLOC_HARDER allocation is assumed to be atomic. While that is accurate, it changes later in the series. In preparation, explicitly record high-order atomic allocations in gfp_to_alloc_flags(). Link: https://lkml.kernel.org/r/20230113111217.14134-4-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/page_alloc: treat RT tasks similar to __GFP_HIGHMel Gorman
RT tasks are allowed to dip below the min reserve but ALLOC_HARDER is typically combined with ALLOC_MIN_RESERVE so RT tasks are a little unusual. While there is some justification for allowing RT tasks access to memory reserves, there is a strong chance that a RT task that is also under memory pressure is at risk of missing deadlines anyway. Relax how much reserves an RT task can access by treating it the same as __GFP_HIGH allocations. Note that in a future kernel release that the RT special casing will be removed. Hard realtime tasks should be locking down resources in advance and ensuring enough memory is available. Even a soft-realtime task like audio or video live decoding which cannot jitter should be allocating both memory and any disk space required up-front before the recording starts instead of relying on reserves. At best, reserve access will only delay the problem by a very short interval. Link: https://lkml.kernel.org/r/20230113111217.14134-3-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/page_alloc: rename ALLOC_HIGH to ALLOC_MIN_RESERVEMel Gorman
Patch series "Discard __GFP_ATOMIC", v3. Neil's patch has been residing in mm-unstable as commit 2fafb4fe8f7a ("mm: discard __GFP_ATOMIC") for a long time and recently brought up again. Most recently, I was worried that __GFP_HIGH allocations could use high-order atomic reserves which is unintentional but there was no response so lets revisit -- this series reworks how min reserves are used, protects highorder reserves and then finishes with Neil's patch with very minor modifications so it fits on top. There was a review discussion on renaming __GFP_DIRECT_RECLAIM to __GFP_ALLOW_BLOCKING but I didn't think it was that big an issue and is orthogonal to the removal of __GFP_ATOMIC. There were some concerns about how the gfp flags affect the min reserves but it never reached a solid conclusion so I made my own attempt. The series tries to iron out some of the details on how reserves are used. ALLOC_HIGH becomes ALLOC_MIN_RESERVE and ALLOC_HARDER becomes ALLOC_NON_BLOCK and documents how the reserves are affected. For example, ALLOC_NON_BLOCK (no direct reclaim) on its own allows 25% of the min reserve. ALLOC_MIN_RESERVE (__GFP_HIGH) allows 50% and both combined allows deeper access again. ALLOC_OOM allows access to 75%. High-order atomic allocations are explicitly handled with the caveat that no __GFP_ATOMIC flag means that any high-order allocation that specifies GFP_HIGH and cannot enter direct reclaim will be treated as if it was GFP_ATOMIC. This patch (of 6): __GFP_HIGH aliases to ALLOC_HIGH but the name does not really hint what it means. As ALLOC_HIGH is internal to the allocator, rename it to ALLOC_MIN_RESERVE to document that the min reserves can be depleted. Link: https://lkml.kernel.org/r/20230113111217.14134-1-mgorman@techsingularity.net Link: https://lkml.kernel.org/r/20230113111217.14134-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/page_ext: do not allocate space for page_ext->flags if not neededPasha Tatashin
There is 8 byte page_ext->flags field allocated per page whenever CONFIG_PAGE_EXTENSION is enabled. However, not every user of page_ext uses flags. Therefore, check whether flags is needed at least by one user and if so allocate space for it. For example when page_table_check is enabled, on a machine with 128G of memory before the fix: [ 2.244288] allocated 536870912 bytes of page_ext after the fix: [ 2.160154] allocated 268435456 bytes of page_ext Also, add a kernel-doc comment before page_ext_operations that describes the fields, and remove check if need() is set, as that is now a required field. [pasha.tatashin@soleen.com: address comments from Mike Rapoport] Link: https://lkml.kernel.org/r/20230117202103.1412449-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20230113154253.92480-1-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Charan Teja Kalla <quic_charante@quicinc.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand
__HAVE_ARCH_PTE_SWP_EXCLUSIVE is now supported by all architectures that support swp PTEs, so let's drop it. Link: https://lkml.kernel.org/r/20230113171026.582290-27-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checksDavid Hildenbrand
Patch series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". This is the follow-up on [1]: [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all remaining architectures that support swap PTEs. This makes sure that exclusive anonymous pages will stay exclusive, even after they were swapped out -- for example, making GUP R/W FOLL_GET of anonymous pages reliable. Details can be found in [1]. This primarily fixes remaining known O_DIRECT memory corruptions that can happen on concurrent swapout, whereby we can lose DMA reads to a page (modifying the user page by writing to it). To verify, there are two test cases (requiring swap space, obviously): (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries triggering a race condition. (2) My vmsplice() test case [3] that tries to detect if the exclusive marker was lost during swapout, not relying on a race condition. For example, on 32bit x86 (with and without PAE), my test case fails without these patches: $ ./test_swp_exclusive FAIL: page was replaced during COW But succeeds with these patches: $ ./test_swp_exclusive PASS: page was not replaced during COW Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even the ones where swap support might be in a questionable state? This is the first step towards removing "readable_exclusive" migration entries, and instead using pte_swp_exclusive() also with (readable) migration entries instead (as suggested by Peter). The only missing piece for that is supporting pmd_swp_exclusive() on relevant architectures with THP migration support. As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,, we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch. I tried cross-compiling all relevant setups and tested on x86 and sparc64 so far. CCing arch maintainers only on this cover letter and on the respective patch(es). [1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c This patch (of 26): We want to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures. Let's extend our sanity checks, especially testing that our PTE bit does not affect: * is_swap_pte() -> pte_present() and pte_none() * the swap entry + type * pte_swp_soft_dirty() Especially, the pfn_pte() is dodgy when the swap PTE layout differs heavily from ordinary PTEs. Let's properly construct a swap PTE from swap type+offset. [david@redhat.com: fix build] Link: https://lkml.kernel.org/r/6aaad548-cf48-77fa-9d6c-db83d724b2eb@redhat.com Link: https://lkml.kernel.org/r/20230113171026.582290-1-david@redhat.com Link: https://lkml.kernel.org/r/20230113171026.582290-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: <aou@eecs.berkeley.edu> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: H. Peter Anvin (Intel) <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Nadav Amit <namit@vmware.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Yang Shi <shy828301@gmail.com> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/khugepaged: convert release_pte_pages() to use foliosVishal Moola (Oracle)
Converts release_pte_pages() to use folios instead of pages. Link: https://lkml.kernel.org/r/20230114001556.43795-2-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/khugepaged: introduce release_pte_folio() to replace release_pte_page()Vishal Moola (Oracle)
release_pte_page() is converted to be a wrapper for release_pte_folio() to help facilitate the khugepaged conversion to folios. This replaces 3 calls to compound_head() with 1, and saves 85 bytes of kernel text. Link: https://lkml.kernel.org/r/20230114001556.43795-1-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02kmsan: silence -Wmissing-prototypes warningsAlexander Potapenko
When building the kernel with W=1, the compiler reports numerous warnings about the missing prototypes for KMSAN instrumentation hooks. Because these functions are not supposed to be called explicitly by the kernel code (calls to them are emitted by the compiler), they do not have to be declared in the headers. Instead, we add forward declarations right before the definitions to silence the warnings produced by -Wmissing-prototypes. Link: https://lkml.kernel.org/r/20230112103147.382416-1-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Suggested-by: Marco Elver <elver@google.com> Reviewed-by: Marco Elver <elver@google.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202301020356.dFruA4I5-lkp@intel.com/T/ Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: mlock: update the interface to use foliosLorenzo Stoakes
Update the mlock interface to accept folios rather than pages, bringing the interface in line with the internal implementation. munlock_vma_page() still requires a page_folio() conversion, however this is consistent with the existent mlock_vma_page() implementation and a product of rmap still dealing in pages rather than folios. Link: https://lkml.kernel.org/r/cba12777c5544305014bc0cbec56bb4cc71477d8.1673526881.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: mlock: use folios and a folio batch internallyLorenzo Stoakes
This brings mlock in line with the folio batches declared in mm/swap.c and makes the code more consistent across the two. The existing mechanism for identifying which operation each folio in the batch is undergoing is maintained, i.e. using the lower 2 bits of the struct folio address (previously struct page address). This should continue to function correctly as folios remain at least system word-aligned. All invocations of mlock() pass either a non-compound page or the head of a THP-compound page and no tail pages need updating so this functionality works with struct folios being used internally rather than struct pages. In this patch the external interface is kept identical to before in order to maintain separation between patches in the series, using a rather awkward conversion from struct page to struct folio in relevant functions. However, this maintenance of the existing interface is intended to be temporary - the next patch in the series will update the interfaces to accept folios directly. Link: https://lkml.kernel.org/r/9f894d54d568773f4ed3cb0eef5f8932f62c95f4.1673526881.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: madvise: use vm_normal_folio() in madvise_free_pte_range()Kefeng Wang
There is already a vm_normal_folio(), use it to make madvise_free_pte_range() only use a folio. Link: https://lkml.kernel.org/r/20230112124028.16964-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02shmem: convert shmem_write_end() to use a folioMatthew Wilcox (Oracle)
Use a folio internally to shmem_write_end() which saves a number of calls to compound_head() and lets us get rid of the custom code to zero out the rest of a THP and supports folios of arbitrary size. Link: https://lkml.kernel.org/r/20230112131031.1209553-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/memory-failure: convert unpoison_memory() to foliosSidhartha Kumar
Use a folio inside unpoison_memory which replaces a compound_head() call with a call to page_folio(). Link: https://lkml.kernel.org/r/20230112204608.80136-9-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/memory-failure: convert hugetlb_set_page_hwpoison() to foliosSidhartha Kumar
Change hugetlb_set_page_hwpoison() to folio_set_hugetlb_hwpoison() and use a folio internally. Link: https://lkml.kernel.org/r/20230112204608.80136-8-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/memory-failure: convert __free_raw_hwp_pages() to foliosSidhartha Kumar
Change __free_raw_hwp_pages() to __folio_free_raw_hwp() and modify its callers to pass in a folio. Link: https://lkml.kernel.org/r/20230112204608.80136-7-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/memory-failure: convert raw_hwp_list_head() to foliosSidhartha Kumar
Change raw_hwp_list_head() to take in a folio and modify its callers to pass in a folio. Also converts two users of hugetlb specific page macro users to their folio equivalents. Link: https://lkml.kernel.org/r/20230112204608.80136-6-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/memory-failure: convert free_raw_hwp_pages() to foliosSidhartha Kumar
Change free_raw_hwp_pages() to folio_free_raw_hwp(), converts two users of hugetlb specific page macro users to their folio equivalents. Link: https://lkml.kernel.org/r/20230112204608.80136-5-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/memory-failure: convert hugetlb_clear_page_hwpoison to foliosSidhartha Kumar
Change hugetlb_clear_page_hwpoison() to folio_clear_hugetlb_hwpoison() by changing the function to take in a folio. This converts one use of ClearPageHWPoison and HPageRawHwpUnreliable to their folio equivalents. Link: https://lkml.kernel.org/r/20230112204608.80136-4-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/memory-failure: convert try_memory_failure_hugetlb() to foliosSidhartha Kumar
Use a struct folio rather than a head page in try_memory_failure_hugetlb. This converts one user of SetHPageMigratable to the folio equivalent. Link: https://lkml.kernel.org/r/20230112204608.80136-3-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm/memory-failure: convert __get_huge_page_for_hwpoison() to foliosSidhartha Kumar
Patch series "convert hugepage memory failure functions to folios". This series contains a 1:1 straightforward page to folio conversion for memory failure functions which deal with huge pages. I renamed a few functions to fit with how other folio operating functions are named. These include: hugetlb_clear_page_hwpoison -> folio_clear_hugetlb_hwpoison free_raw_hwp_pages -> folio_free_raw_hwp __free_raw_hwp_pages -> __folio_free_raw_hwp hugetlb_set_page_hwpoison -> folio_set_hugetlb_hwpoison The goal of this series was to reduce users of the hugetlb specific page flag macros which take in a page so users are protected by the compiler to make sure they are operating on a head page. This patch (of 8): Use a folio throughout the function rather than using a head page. This also reduces the users of the page version of hugetlb specific page flags. Link: https://lkml.kernel.org/r/20230112204608.80136-2-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>