summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
10 daysmm/mremap: put VMA check and prep logic into helper functionLorenzo Stoakes
Rather than lumping everything together in do_mremap(), add a new helper function, check_prep_vma(), to do the work relating to each VMA. This further lays groundwork for subsequent patches which will allow for batched VMA mremap(). Additionally, if we set vrm->new_addr == vrm->addr when prepping the VMA, this avoids us needing to do so in the expand VMA mlocked case. No functional change intended. Link: https://lkml.kernel.org/r/15efa3c57935f7f8894094b94c1803c2f322c511.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 daysmm/mremap: refactor initial parameter sanity checksLorenzo Stoakes
We are currently checking some things later, and some things immediately. Aggregate the checks and avoid ones that need not be made. Simplify things by aligning lengths immediately. Defer setting the delta parameter until later, which removes some duplicate code in the hugetlb case. We can safely perform the checks moved from mremap_to() to check_mremap_params() because: * If we set a new address via vrm_set_new_addr(), then this is guaranteed to not overlap nor to position the new VMA past TASK_SIZE, so there's no need to check these later. * We can simply page align lengths immediately. We do not need to check for overlap nor TASK_SIZE sanity after hugetlb alignment as this asserts addresses are huge-aligned, then huge-aligns lengths, rounding down. This means any existing overlap would have already been caught. Moving things around like this lays the groundwork for subsequent changes to permit operations on batches of VMAs. No functional change intended. Link: https://lkml.kernel.org/r/c862d625c98b1abd861c406f2bfad8baf3287f83.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 daysmm/mremap: perform some simple cleanupsLorenzo Stoakes
Patch series "mm/mremap: permit mremap() move of multiple VMAs", v4. Historically we've made it a uAPI requirement that mremap() may only operate on a single VMA at a time. For instances where VMAs need to be resized, this makes sense, as it becomes very difficult to determine what a user actually wants should they indicate a desire to expand or shrink the size of multiple VMAs (truncate? Adjust sizes individually? Some other strategy?). However, in instances where a user is moving VMAs, it is restrictive to disallow this. This is especially the case when anonymous mapping remap may or may not be mergeable depending on whether VMAs have or have not been faulted due to anon_vma assignment and folio index alignment with vma->vm_pgoff. Often this can result in surprising impact where a moved region is faulted, then moved back and a user fails to observe a merge from otherwise compatible, adjacent VMAs. This change allows such cases to work without the user having to be cognizant of whether a prior mremap() move or other VMA operations has resulted in VMA fragmentation. In order to do this, this series performs a large amount of refactoring, most pertinently - grouping sanity checks together, separately those that check input parameters and those relating to VMAs. we also simplify the post-mmap lock drop processing for uffd and mlock()'d VMAs. With this done, we can then fairly straightforwardly implement this functionality. This works exclusively for mremap() invocations which specify MREMAP_FIXED. It is not compatible with VMAs which use userfaultfd, as the notification of the userland fault handler would require us to drop the mmap lock. It is also not compatible with file-backed mappings with customised get_unmapped_area() handlers as these may not honour MREMAP_FIXED. The input and output addresses ranges must not overlap. We carefully account for moves which would result in VMA iterator invalidation. While there can be gaps between VMAs in the input range, there can be no gap before the first VMA in the range. This patch (of 10): We const-ify the vrm flags parameter to indicate this will never change. We rename resize_is_valid() to remap_is_valid(), as this function does not only apply to cases where we resize, so it's simply confusing to refer to that here. We remove the BUG() from mremap_at(), as we should not BUG() unless we are certain it'll result in system instability. We rename vrm_charge() to vrm_calc_charge() to make it clear this simply calculates the charged number of pages rather than actually adjusting any state. We update the comment for vrm_implies_new_addr() to explain that MREMAP_DONTUNMAP does not require a set address, but will always be moved. Additionally consistently use 'res' rather than 'ret' for result values. No functional change intended. Link: https://lkml.kernel.org/r/cover.1752770784.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/d35ad8ce6b2c33b2f2f4ef7ec415f04a35cba34f.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 daysmm/vma: refactor vma_modify_flags_name() to vma_modify_name()Lorenzo Stoakes
The single instance in which we use this function doesn't actually need to change VMA flags, so remove this parameter and update the caller accordingly. [lorenzo.stoakes@oracle.com: correct comment] Link: https://lkml.kernel.org/r/77f45b2e-a748-4635-9381-a5051091087f@lucifer.local Link: https://lkml.kernel.org/r/20250714135839.178032-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 daysmm: optimize lru_note_cost() by adding lru_note_cost_unlock_irq()Hugh Dickins
Dropping a lock, just to demand it again for an afterthought, cannot be good if contended: convert lru_note_cost() to lru_note_cost_unlock_irq(). [hughd@google.com: delete unneeded comment] Link: https://lkml.kernel.org/r/dbf9352a-1ed9-a021-c0c7-9309ac73e174@google.com Link: https://lkml.kernel.org/r/21100102-51b6-79d5-03db-1bb7f97fa94c@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Tested-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 daysmm/mglru: stop try_to_inc_min_seq() if min_seq[type] has not increasedHao Jia
In try_to_inc_min_seq(), if min_seq[type] has not increased. In other words, min_seq[type] == lrugen->min_seq[type]. Then we should return directly to avoid unnecessary overhead later. Corollary: If min_seq[type] of both anonymous and file is not increased, try_to_inc_min_seq() will fail. Proof: It is known that min_seq[type] has not increased, that is, min_seq[type] is equal to lrugen->min_seq[type], then the following: case 1: min_seq[type] has not been reassigned and changed before judgment min_seq[type] <= lrugen->min_seq[type]. Then the subsequent min_seq[type] <= lrugen->min_seq[type] judgment will always be true. case 2: min_seq[type] is reassigned to seq, before judgment min_seq[type] <= lrugen->min_seq[type]. Then at least the condition of min_seq[type] > seq must be met before min_seq[type] will be reassigned to seq. That is to say, before the reassignment, lrugen->min_seq[type] > seq is met, and then min_seq[type] = seq. Then the following min_seq[type](seq) <= lrugen->min_seq[type] judgment is always true. Therefore, in try_to_inc_min_seq(), If min_seq[type] of both anonymous and file is not increased, we can return false directly to avoid unnecessary overhead. Link: https://lkml.kernel.org/r/20250703023946.65315-1-jiahao.kernel@gmail.com Signed-off-by: Hao Jia <jiahao1@lixiang.com> Suggested-by: Yuanchu Xie <yuanchu@google.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kinsey Ho <kinseyho@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 daysmm/damon/core: commit damos_quota_goal->nidSeongJae Park
DAMOS quota goal uses 'nid' field when the metric is DAMOS_QUOTA_NODE_MEM_{USED,FREE}_BP. But the goal commit function is not updating the goal's nid field. Fix it. Link: https://lkml.kernel.org/r/20250719181932.72944-1-sj@kernel.org Fixes: 0e1c773b501f ("mm/damon/core: introduce damos quota goal metrics for memory node utilization") [6.16.x] Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 dayskfence: Remove mention of PG_slabMatthew Wilcox (Oracle)
Improve the documentation slightly, assuming I understood it correctly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20250611155916.2579160-10-willy@infradead.org Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
11 daysmm: move randomize_va_space into memory.cJoel Granados
Move the randomize_va_space variable together with all its sysctl table elements into memory.c. Register it to the "kernel" directory by adding it to the subsys initialization calls This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
12 daysMerge branch 'acpi-apei'Rafael J. Wysocki
Merge ACPI APEI updates for 6.17-rc1: - Fix iomem-related sparse warnings in the APEI EINJ driver (Zaid Alali, Tony Luck) - Add EINJv2 error injection support to the APEI EINJ driver (Zaid Alali) - Fix memory corruption in error_type_set() in the APEI EINJ driver (Dan Carpenter) - Fix less than zero comparison on a size_t variable in the APEI EINJ driver (Colin Ian King) - Fix check and iounmap of an uninitialized pointer in the APEI EINJ driver (Colin Ian King) - Add TAINT_MACHINE_CHECK to the GHES panic path in APEI to improve diagnostics and post-mortem analysis (Breno Leitao) - Update APEI reviewer records in MAINTAINERS (Rafael Wysocki) - Fix the handling of synchronous uncorrected memory errors in APEI (Shuai Xue) * acpi-apei: ACPI: APEI: handle synchronous exceptions in task work ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered ACPI: APEI: MAINTAINERS: Update reviewers for APEI ACPI: APEI: EINJ: Fix trigger actions ACPI: APEI: GHES: add TAINT_MACHINE_CHECK on GHES panic path ACPI: APEI: EINJ: Fix check and iounmap of uninitialized pointer p ACPI: APEI: EINJ: Fix less than zero comparison on a size_t variable ACPI: APEI: EINJ: prevent memory corruption in error_type_set() ACPI: APEI: EINJ: Update the documentation for EINJv2 support ACPI: APEI: EINJ: Enable EINJv2 error injections ACPI: APEI: EINJ: Create debugfs files to enter device id and syndrome ACPI: APEI: EINJ: Discover EINJv2 parameters ACPI: APEI: EINJ: Add einjv2 extension struct ACPI: APEI: EINJ: Enable the discovery of EINJv2 capabilities ACPI: APEI: EINJ: Fix kernel test sparse warnings
2025-07-19kasan: use vmalloc_dump_obj() for vmalloc error reportsMarco Elver
Since 6ee9b3d84775 ("kasan: remove kasan_find_vm_area() to prevent possible deadlock"), more detailed info about the vmalloc mapping and the origin was dropped due to potential deadlocks. While fixing the deadlock is necessary, that patch was too quick in killing an otherwise useful feature, and did no due-diligence in understanding if an alternative option is available. Restore printing more helpful vmalloc allocation info in KASAN reports with the help of vmalloc_dump_obj(). Example report: | BUG: KASAN: vmalloc-out-of-bounds in vmalloc_oob+0x4c9/0x610 | Read of size 1 at addr ffffc900002fd7f3 by task kunit_try_catch/493 | | CPU: [...] | Call Trace: | <TASK> | dump_stack_lvl+0xa8/0xf0 | print_report+0x17e/0x810 | kasan_report+0x155/0x190 | vmalloc_oob+0x4c9/0x610 | [...] | | The buggy address belongs to a 1-page vmalloc region starting at 0xffffc900002fd000 allocated at vmalloc_oob+0x36/0x610 | The buggy address belongs to the physical page: | page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x126364 | flags: 0x200000000000000(node=0|zone=2) | raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000 | raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 | page dumped because: kasan: bad access detected | | [..] Link: https://lkml.kernel.org/r/20250716152448.3877201-1-elver@google.com Fixes: 6ee9b3d84775 ("kasan: remove kasan_find_vm_area() to prevent possible deadlock") Signed-off-by: Marco Elver <elver@google.com> Suggested-by: Uladzislau Rezki <urezki@gmail.com> Acked-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Yunseong Kim <ysk@kzalloc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show()Nathan Chancellor
After a recent change in clang to expose uninitialized warnings from const variables [1], there is a false positive warning from the if statement in advisor_mode_show(). mm/ksm.c:3687:11: error: variable 'output' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] 3687 | else if (ksm_advisor == KSM_ADVISOR_SCAN_TIME) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/ksm.c:3690:33: note: uninitialized use occurs here 3690 | return sysfs_emit(buf, "%s\n", output); | ^~~~~~ Rewrite the if statement to implicitly make KSM_ADVISOR_NONE the else branch so that it is obvious to the compiler that ksm_advisor can only be KSM_ADVISOR_NONE or KSM_ADVISOR_SCAN_TIME due to the assignments in advisor_mode_store(). Link: https://lkml.kernel.org/r/20250715-ksm-fix-clang-21-uninit-warning-v1-1-f443feb4bfc4@kernel.org Fixes: 66790e9a735b ("mm/ksm: add sysfs knobs for advisor") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Closes: https://github.com/ClangBuiltLinux/linux/issues/2100 Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 [1] Acked-by: David Hildenbrand <david@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Stefan Roesch <shr@devkernel.io> Cc: xu xin <xu.xin16@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=nHarry Yoo
Commit 48b4800a1c6a ("zsmalloc: page migration support") added support for migrating zsmalloc pages using the movable_operations migration framework. However, the commit did not take into account that zsmalloc supports migration only when CONFIG_COMPACTION is enabled. Tracing shows that zsmalloc was still passing the __GFP_MOVABLE flag even when compaction is not supported. This can result in unmovable pages being allocated from movable page blocks (even without stealing page blocks), ZONE_MOVABLE and CMA area. Possible user visible effects: - Some ZONE_MOVABLE memory can be not actually movable - CMA allocation can fail because of this - Increased memory fragmentation due to ignoring the page mobility grouping feature I'm not really sure who uses kernels without compaction support, though :( To fix this, clear the __GFP_MOVABLE flag when !IS_ENABLED(CONFIG_COMPACTION). Link: https://lkml.kernel.org/r/20250704103053.6913-1-harry.yoo@oracle.com Fixes: 48b4800a1c6a ("zsmalloc: page migration support") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_listJinjiang Tu
In shrink_folio_list(), the hwpoisoned folio may be large folio, which can't be handled by unmap_poisoned_folio(). For THP, try_to_unmap_one() must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then retry. Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of pvmw.pte. Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a WARN_ON_ONCE due to the page isn't in swapcache. Since UCE is rare in real world, and race with reclaimation is more rare, just skipping the hwpoisoned large folio is enough. memory_failure() will handle it if the UCE is triggered again. This happens when memory reclaim for large folio races with memory_failure(), and will lead to kernel panic. The race is as follows: cpu0 cpu1 shrink_folio_list memory_failure TestSetPageHWPoison unmap_poisoned_folio --> trigger BUG_ON due to unmap_poisoned_folio couldn't handle large folio [tujinjiang@huawei.com: add comment to unmap_poisoned_folio()] Link: https://lkml.kernel.org/r/69fd4e00-1b13-d5f7-1c82-705c7d977ea4@huawei.com Link: https://lkml.kernel.org/r/20250627125747.3094074-2-tujinjiang@huawei.com Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio") Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/ Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/page_owner: convert set_page_owner_migrate_reason() to foliosSidhartha Kumar
Both callers of set_page_owner_migrate_reason() use folios. Convert the function to take a folio directly and move the &folio->page conversion inside __set_page_owner_migrate_reason(). Link: https://lkml.kernel.org/r/20250711145910.90135-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/memfd: replace deprecated strcpy() with memcpy() in alloc_name()Thorsten Blum
strcpy() is deprecated; use memcpy() instead. Not copying the NUL terminator is safe because strncpy_from_user() would overwrite it anyway by appending uname to the destination buffer at index MFD_NAME_PREFIX_LEN. No functional changes intended. Link: https://github.com/KSPP/linux/issues/88 Link: https://lkml.kernel.org/r/20250712174516.64243-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/core: remove damon_callbackSeongJae Park
All damon_callback usages are replicated by damon_call() and damos_walk(). Time to say goodbye. Remove damon_callback. Link: https://lkml.kernel.org/r/20250712195016.151108-15-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/sysfs: remove damon_sysfs_before_terminate()SeongJae Park
DAMON core layer does target cleanup on its own. Remove duplicated and unnecessarily selective cleanup attempts in DAMON sysfs interface. Link: https://lkml.kernel.org/r/20250712195016.151108-14-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/core: destroy targets when kdamond_fn() finishSeongJae Park
When kdamond_fn() completes, the targets are kept. Those are kept to let callers do additional cleanups if they need. There are no such additional cleanups though. DAMON sysfs interface deallocates those in before_terminate() callback, to reduce unnecessary memory usage, for [f]vaddr use case. Just destroy the targets for every case in the core layer. This saves more memory and simplifies the logic. Link: https://lkml.kernel.org/r/20250712195016.151108-13-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/sysfs: remove damon_sysfs_destroy_targets()SeongJae Park
The function was introduced for putting pids and deallocating unnecessary targets. Hence it is called before damon_destroy_ctx(). Now vaddr puts pid for each target destruction (cleanup_target()). damon_destroy_ctx() deallocates the targets anyway. So damon_sysfs_destroy_targets() has no reason to exist. Remove it. Link: https://lkml.kernel.org/r/20250712195016.151108-12-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/vaddr: put pid in cleanup_target()SeongJae Park
Implement cleanup_target() callback for [f]vaddr, which calls put_pid() for each target that will be destroyed. Also remove redundant put_pid() calls in core, sysfs and sample modules, which were required to be done redundantly due to the lack of such self cleanup in vaddr. Link: https://lkml.kernel.org/r/20250712195016.151108-11-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/core: add cleanup_target() ops callbackSeongJae Park
Some DAMON operation sets may need additional cleanup per target. For example, [f]vaddr need to put pids of each target. Each user and core logic is doing that redundantly. Add another DAMON ops callback that will be used for doing such cleanups in operations set layer. [sj@kernel.org: add kernel-doc comment for damon_operations->cleanup_target] Link: https://lkml.kernel.org/r/20250715185239.89152-2-sj@kernel.org [sj@kernel.org: remove damon_ctx->callback kernel-doc comment] Link: https://lkml.kernel.org/r/20250715185239.89152-3-sj@kernel.org Link: https://lkml.kernel.org/r/20250712195016.151108-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/core: do not call ops.cleanup() when destroying targetsSeongJae Park
damon_operations.cleanup() is documented to be called for kdamond termination, but also being called for targets destruction, which is done for any damon_ctx destruction. Nobody is using the callback for now, though. Remove the cleanup() call under the destruction. Link: https://lkml.kernel.org/r/20250712195016.151108-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/lru_sort: use damon_call() repeat mode instead of damon_callbackSeongJae Park
DAMON_LRU_SORT uses damon_callback for periodically reading and writing DAMON internal data and parameters. Use its alternative, damon_call() repeat mode. Link: https://lkml.kernel.org/r/20250712195016.151108-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/reclaim: use damon_call() repeat mode instead of damon_callbackSeongJae Park
DAMON_RECLAIM uses damon_callback for periodically reading and writing DAMON internal data and parameters. Use its alternative, damon_call() repeat mode. Link: https://lkml.kernel.org/r/20250712195016.151108-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/stat: use damon_call() repeat mode instead of damon_callbackSeongJae Park
DAMON_STAT uses damon_callback for periodically reading DAMON internal data. Use its alternative, damon_call() repeat mode. Link: https://lkml.kernel.org/r/20250712195016.151108-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/core: introduce repeat mode damon_call()SeongJae Park
damon_call() can be useful for reading or writing DAMON internal data for one time. A common pattern of DAMON core usage from DAMON modules is doing such reads and writes repeatedly, for example, to periodically update the DAMOS stats. To do that with damon_call(), callers should call damon_call() repeatedly, with their own delay loop. Each caller doing that is repetitive. Introduce a repeat mode damon_call(). Callers can use the mode by setting a new field in damon_call_control. If the mode is turned on, damon_call() returns success immediately, and DAMON repeats invoking the callback function inside the kdamond main loop. Link: https://lkml.kernel.org/r/20250712195016.151108-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon: accept parallel damon_call() requestsSeongJae Park
Patch series "mm/damon: remove damon_callback". damon_callback was the only way for communicating with DAMON for contexts running on its worker thread. The interface is flexible and simple. But as DAMON evolves with more features, damon_callback has become somewhat too old. With runtime parameters update, for example, its lack of synchronization support was found to be inconvenient. Arguably it is also not easy to use correctly since the callers should understand when each callback is called, and implication of the return values from the callbacks. To replace it, damon_call() and damos_walk() are introduced. And those replaced a few damon_callback use cases. Some use cases of damon_callback such as parallel or repetitive DAMON internal data reading and additional cleanups cannot simply be replaced by damon_call() and damos_walk(), though. To allow those replaceable, extend damon_call() for parallel and/or repeated callbacks and modify the core/ops layers for additional resources cleanup. With the updates, replace the remaining damon_callback usages and finally say goodbye to damon_callback. This patch (of 14): Calling damon_call() while it is serving for another parallel thread immediately fails with -EBUSY. The caller should call it again, later. Each caller implementing such retry logic would be redundant. Accept parallel damon_call() requests and do the wait instead of the caller. Link: https://lkml.kernel.org/r/20250712195016.151108-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250712195016.151108-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm: simplify min_brk handling in brk()Xuanye Liu
Set min_brk to mm->start_brk by default, and override it with mm->end_data only when CONFIG_COMPAT_BRK is enabled and brk_randomized is false. This makes the logic clearer with no functional change. Link: https://lkml.kernel.org/r/20250710025859.926355-1-liuqiye2025@163.com Signed-off-by: Xuanye Liu <liuqiye2025@163.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19readahead: use folio_nr_pages() instead of shift operationChi Zhiling
folio_nr_pages() is faster helper function to get the number of pages when NR_PAGES_IN_LARGE_FOLIO is enabled. Link: https://lkml.kernel.org/r/20250710060451.3535957-1-chizhiling@163.com Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/hmm: move pmd_to_hmm_pfn_flags() to the respective #ifdefferyAndy Shevchenko
When pmd_to_hmm_pfn_flags() is unused, it prevents kernel builds with clang, `make W=1` and CONFIG_TRANSPARENT_HUGEPAGE=n: mm/hmm.c:186:29: warning: unused function 'pmd_to_hmm_pfn_flags' [-Wunused-function] Fix this by moving the function to the respective existing ifdeffery for its the only user. See also: 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build") Link: https://lkml.kernel.org/r/20250710082403.664093-1-andriy.shevchenko@linux.intel.com Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm: introduce per-node proactive reclaim interfaceDavidlohr Bueso
This adds support for allowing proactive reclaim in general on a NUMA system. A per-node interface extends support for beyond a memcg-specific interface, respecting the current semantics of memory.reclaim: respecting aging LRU and not supporting artificially triggering eviction on nodes belonging to non-bottom tiers. This patch allows userspace to do: echo "512M swappiness=10" > /sys/devices/system/node/nodeX/reclaim One of the premises for this is to semantically align as best as possible with memory.reclaim. During a brief time memcg did support nodemask until 55ab834a86a9 (Revert "mm: add nodes= arg to memory.reclaim"), for which semantics around reclaim (eviction) vs demotion were not clear, rendering charging expectations to be broken. With this approach: 1. Users who do not use memcg can benefit from proactive reclaim. The memcg interface is not NUMA aware and there are usecases that are focusing on NUMA balancing rather than workload memory footprint. 2. Proactive reclaim on top tiers will trigger demotion, for which memory is still byte-addressable. Reclaiming on the bottom nodes will trigger evicting to swap (the traditional sense of reclaim). This follows the semantics of what is today part of the aging process on tiered memory, mirroring what every other form of reclaim does (reactive and memcg proactive reclaim). Furthermore per-node proactive reclaim is not as susceptible to the memcg charging problem mentioned above. 3. Unlike the nodes= arg, this interface avoids confusing semantics, such as what exactly the user wants when mixing top-tier and low-tier nodes in the nodemask. Further per-node interface is less exposed to "free up memory in my container" usecases, where eviction is intended. 4. Users that *really* want to free up memory can use proactive reclaim on nodes knowingly to be on the bottom tiers to force eviction in a natural way - higher access latencies are still better than swap. If compelled, while no guarantees and perhaps not worth the effort, users could also also potentially follow a ladder-like approach to eventually free up the memory. Alternatively, perhaps an 'evict' option could be added to the parameters for both memory.reclaim and per-node interfaces to force this action unconditionally. [akpm@linux-foundation.org: user_proactive_reclaim(): return -EBUSY on PGDAT_RECLAIM_LOCKED contention, per Roman] [dave@stgolabs.net: memcg && node is also a bogus case, per Shakeel] Link: https://lkml.kernel.org/r/20250717235604.2atyx2aobwowpge3@offworld Link: https://lkml.kernel.org/r/20250623185851.830632-5-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/vmscan: make __node_reclaim() more genericDavidlohr Bueso
As this will be called from non page allocator paths for proactive reclaim, allow users to pass the sc and nr of pages, and adjust the return value as well. No change in semantics. Link: https://lkml.kernel.org/r/20250623185851.830632-4-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/memcg: make memory.reclaim interface genericDavidlohr Bueso
This adds a general call for both parsing as well as the common reclaim semantics. memcg is still the only user and no change in semantics. [akpm@linux-foundation.org: fix CONFIG_NUMA=n build] Link: https://lkml.kernel.org/r/20250623185851.830632-3-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/vmscan: respect psi_memstall region in node reclaimDavidlohr Bueso
Patch series "mm: per-node proactive reclaim", v2. This adds support for allowing proactive reclaim in general on a NUMA system. A per-node interface extends support for beyond a memcg-specific interface, respecting the current semantics of memory.reclaim: respecting aging LRU and not supporting artificially triggering eviction on nodes belonging to non-bottom tiers. This patch allows userspace to do: echo 512M swappiness=10 > /sys/devices/system/node/nodeX/reclaim One of the premises for this is to semantically align as best as possible with memory.reclaim. During a brief time memcg did support nodemask until 55ab834a86a9 (Revert "mm: add nodes= arg to memory.reclaim"), for which semantics around reclaim (eviction) vs demotion were not clear, rendering charging expectations to be broken. With this approach: 1. Users who do not use memcg can benefit from proactive reclaim. 2. Proactive reclaim on top tiers will trigger demotion, for which memory is still byte-addressable. Reclaiming on the bottom nodes will trigger evicting to swap (the traditional sense of reclaim). This follows the semantics of what is today part of the aging process on tiered memory, mirroring what every other form of reclaim does (reactive and memcg proactive reclaim). Furthermore per-node proactive reclaim is not as susceptible to the memcg charging problem mentioned above. 3. Unlike memcg, there should be no surprises of callers expecting reclaim but instead got a demotion. Essentially relying on behavior of shrink_folio_list() after 6b426d071419 ("mm: disable top-tier fallback to reclaim on proactive reclaim"), without the expectations of try_to_free_mem_cgroup_pages(). 4. Unlike the nodes= arg, this interface avoids confusing semantics, such as what exactly the user wants when mixing top-tier and low-tier nodes in the nodemask. Further per-node interface is less exposed to "free up memory in my container" usecases, where eviction is intended. 5. Users that *really* want to free up memory can use proactive reclaim on nodes knowingly to be on the bottom tiers to force eviction in a natural way - higher access latencies are still better than swap. If compelled, while no guarantees and perhaps not worth the effort, users could also also potentially follow a ladder-like approach to eventually free up the memory. Alternatively, perhaps an 'evict' option could be added to the parameters for both memory.reclaim and per-node interfaces to force this action unconditionally. This patch (of 4): ... rather benign but keep proper ending order. Link: https://lkml.kernel.org/r/20250623185851.830632-1-dave@stgolabs.net Link: https://lkml.kernel.org/r/20250623185851.830632-2-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/memory.c: use folios in __access_remote_vm()Vishal Moola (Oracle)
Use kmap_local_folio() instead of kmap_local_page(). Replaces 2 calls to compound_head() with one. This prepares us for the removal of unmap_and_put_page(). Link: https://lkml.kernel.org/r/20250709194017.927978-5-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jordan Rome <linux@jordanrome.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/memory.c: use folios in __copy_remote_vm_str()Vishal Moola (Oracle)
Patch series "Remove unmap_and_put_page()". This patchset uses folios in both the callers of unmap_and_put_page(), saving a couple calls to compound_head() wrappers. This patch (of 3): Use kmap_local_folio() instead of kmap_local_page(). Replaces 2 calls to compound_head() from unmap_and_put_page() with one. This prepares us for the removal of unmap_and_put_page(). Link: https://lkml.kernel.org/r/20250709194017.927978-3-vishal.moola@gmail.com Link: https://lkml.kernel.org/r/20250709194017.927978-4-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jordan Rome <linux@jordanrome.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/vaddr: apply filters in migrate_{hot/cold}Bijan Tabatabai
The paddr versions of migrate_{hot/cold} filter out folios from migration based on the scheme's filters. This patch does the same for the vaddr versions of those schemes. The filtering code is mostly the same for the paddr and vaddr versions. The exception is the young filter. paddr determines if a page is young by doing a folio rmap walk to find the page table entries corresponding to the folio. However, vaddr schemes have easier access to the page tables, so we add some logic to avoid the extra work. Link: https://lkml.kernel.org/r/20250709005952.17776-14-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon: move folio filtering from paddr to ops-commonBijan Tabatabai
This patch moves damos_pa_filter_match and the functions it calls to ops-common, renaming it to damos_folio_filter_match. Doing so allows us to share the filtering logic for the vaddr version of the migrate_{hot,cold} schemes. Link: https://lkml.kernel.org/r/20250709005952.17776-13-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/vaddr: use damos->migrate_dests in migrate_{hot,cold}Bijan Tabatabai
damos->migrate_dests provides a list of nodes the migrate_{hot,cold} actions should migrate to, as well as the weights which specify the ratio pages should be migrated to each destination node. This patch interleaves pages in the migrate_{hot,cold} actions according to the information provided in damos->migrate_dests if it is used. The interleaving algorithm used is similar to the one used in weighted_interleave_nid(). If damos->migration_dests is not provided, the actions migrate pages to the node specified in damos->target_nid as before. Link: https://lkml.kernel.org/r/20250709005952.17776-12-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/vaddr: add vaddr versions of migrate_{hot,cold}Bijan Tabatabai
migrate_{hot,cold} are paddr schemes that are used to migrate hot/cold data to a specified node. However, these schemes are only available when doing physical address monitoring. This patch adds an implementation for them virtual address monitoring as well. Link: https://lkml.kernel.org/r/20250709005952.17776-10-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon: move migration helpers from paddr to ops-commonBijan Tabatabai
This patch moves the damon_pa_migrate_pages function along with its corresponding helper functions from paddr to ops-common. The function prefix of "damon_pa_" was also changed to just "damon_" accordingly. This patch will allow page migration to be available to vaddr schemes as well as paddr schemes. Link: https://lkml.kernel.org/r/20250709005952.17776-9-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/core: commit damos->migrate_destsBijan Tabatabai
When committing new scheme parameters from the sysfs, copy the migrate_dests struct of the source schemes into the destination schemes. Link: https://lkml.kernel.org/r/20250709005952.17776-8-bijan311@gmail.com Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/sysfs-schemes: set damos->migrate_destsSeongJae Park
Pass user-specified multiple DAMOS action destinations and their weights to DAMON core API, so that user requests can really work. Link: https://lkml.kernel.org/r/20250709005952.17776-5-bijan311@gmail.com Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/sysfs-schemes: implement DAMOS action destinations directorySeongJae Park
DAMOS_MIGRATE_{HOT,COLD} can have multiple action destinations and their weights. Implement sysfs directory named 'dests' under each scheme directory to let DAMON sysfs ABI users utilize the feature. The interface is similar to other multiple parameters directory like kdamonds or filters. The directory contains only nr_dests file initially. Writing a number of desired destinations to nr_dests creates directories of the number. Each of the created directories has two files named id and weight. Users can then write the destination's identifier (node id in case of DAMOS_MIGRATE_*) and weight to the files. Link: https://lkml.kernel.org/r/20250709005952.17776-4-bijan311@gmail.com Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Bijan Tabatabai <bijantabatab@micron.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/core: add damos->migrate_dests fieldSeongJae Park
Add a new field to 'struct damos', namely migrate_dests, to allow DAMON API callers specify multiple migration destination nodes and their weights. Also update 'struct damos' creation and destruction functions accordingly to initialize the new field and free up the API caller-allocated buffers on those, respectively. Link: https://lkml.kernel.org/r/20250709005952.17776-3-bijan311@gmail.com Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon/core: commit damos->target_nidBijan Tabatabai
When committing new scheme parameters from the sysfs, the target_nid field of the damos struct would not be copied. This would result in the target_nid field to retain its original value, despite being updated in the sysfs interface. This patch fixes this issue by copying target_nid in damos_commit(). Link: https://lkml.kernel.org/r/20250709004729.17252-1-bijan311@gmail.com Fixes: 83dc7bbaecae ("mm/damon/sysfs: use damon_commit_ctx()") Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm, vmstat: remove the NR_WRITEBACK_TEMP node_stat_item counterVlastimil Babka
The only user of the counter (FUSE) was removed in commit 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") so follow the established pattern of removing the counter and hardcoding 0 in meminfo output, as done recently with NR_BOUNCE. Update documentation for procfs, including for the value for Bounce that was missed when removing its counter. Also remove the mention of NR_WRITEBACK_TEMP implications from a comment in wb_position_ratio(). The rest of the comment there about fuse setting bdi->max_ratio to 1% is still correct. [vbabka@suse.cz: v2] Link: https://lkml.kernel.org/r/5a848e15-6a57-4ecb-a015-d4f358b8a5d3@suse.cz Link: https://lkml.kernel.org/r/20250625-nr_writeback_removal-v1-1-7f2a0df70faa@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joanne Koong <joannelkoong@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Maxim Patlasov <mpatlasov@parallels.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/vmstat: utilize designated initializers for the vmstat_text arrayKirill A. Shutemov
The vmstat_text array defines labels for counters displayed in /proc/vmstat. The current definition of the array implies a specific order of the counters in their enums, making it fragile. To make it clear which counter the label is for, use designated initializers. Link: https://lkml.kernel.org/r/20250603084556.113975-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm: strictly check vmstat_text array sizeKirill A. Shutemov
/proc/vmstat displays counters from various sources. It is easy to forget to add or remove a label when a new counter is added or removed. There is a BUILD_BUG_ON() function that catches missing labels. However, for some reason, it ignores extra labels. Let's make the check strict. This would help to catch issues when a counter is removed. Link: https://lkml.kernel.org/r/20250529110541.2960330-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>