summaryrefslogtreecommitdiff
path: root/mm/damon/vaddr.c
AgeCommit message (Collapse)Author
2025-01-25mm/damon: ask apply_scheme() to report filter-passed region-internal bytesSeongJae Park
Some DAMOS filter types including those for young page, anon page, and belonging memcg are handled by underlying DAMON operations set implementation, via damon_operations->apply_scheme() interface. How many bytes of the region have passed the filter can be useful for DAMOS scheme tuning and access pattern monitoring. Modify the interface to let the callback implementation reports back the number if possible. Link: https://lkml.kernel.org/r/20250106193401.109161-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm/damon/vaddr: add 'nr_piece == 1' check in damon_va_evenly_split_region()Zheng Yejian
As discussed in [1], damon_va_evenly_split_region() is called to size-evenly split a region into 'nr_pieces' small regions, when nr_pieces == 1, no actual split is required. Check that case for better code readability and add a simple kunit testcase. [1] https://lore.kernel.org/all/20241021163316.12443-1-sj@kernel.org/ Link: https://lkml.kernel.org/r/20241022083927.3592237-3-zhengyejian@huaweicloud.com Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Fernand Sieber <sieberf@amazon.com> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Ye Weihua <yeweihua4@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm/damon/vaddr: fix issue in damon_va_evenly_split_region()Zheng Yejian
Patch series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()". v2. According to the logic of damon_va_evenly_split_region(), currently following split case would not meet the expectation: Suppose DAMON_MIN_REGION=0x1000, Case: Split [0x0, 0x3000) into 2 pieces, then the result would be acutually 3 regions: [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000) but NOT the expected 2 regions: [0x0, 0x1000), [0x1000, 0x3000) !!! The root cause is that when calculating size of each split piece in damon_va_evenly_split_region(): `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);` both the dividing and the ALIGN_DOWN may cause loss of precision, then each time split one piece of size 'sz_piece' from origin 'start' to 'end' would cause more pieces are split out than expected!!! To fix it, count for each piece split and make sure no more than 'nr_pieces'. In addition, add above case into damon_test_split_evenly(). And add 'nr_piece == 1' check in damon_va_evenly_split_region() for better code readability and add a corresponding kunit testcase. This patch (of 2): According to the logic of damon_va_evenly_split_region(), currently following split case would not meet the expectation: Suppose DAMON_MIN_REGION=0x1000, Case: Split [0x0, 0x3000) into 2 pieces, then the result would be acutually 3 regions: [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000) but NOT the expected 2 regions: [0x0, 0x1000), [0x1000, 0x3000) !!! The root cause is that when calculating size of each split piece in damon_va_evenly_split_region(): `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);` both the dividing and the ALIGN_DOWN may cause loss of precision, then each time split one piece of size 'sz_piece' from origin 'start' to 'end' would cause more pieces are split out than expected!!! To fix it, count for each piece split and make sure no more than 'nr_pieces'. In addition, add above case into damon_test_split_evenly(). After this patch, damon-operations test passed: # ./tools/testing/kunit/kunit.py run damon-operations [...] ============== damon-operations (6 subtests) =============== [PASSED] damon_test_three_regions_in_vmas [PASSED] damon_test_apply_three_regions1 [PASSED] damon_test_apply_three_regions2 [PASSED] damon_test_apply_three_regions3 [PASSED] damon_test_apply_three_regions4 [PASSED] damon_test_split_evenly ================ [PASSED] damon-operations ================= Link: https://lkml.kernel.org/r/20241022083927.3592237-1-zhengyejian@huaweicloud.com Link: https://lkml.kernel.org/r/20241022083927.3592237-2-zhengyejian@huaweicloud.com Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces") Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Fernand Sieber <sieberf@amazon.com> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Ye Weihua <yeweihua4@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: add missing mmu_notifier_clear_young for !MMU_NOTIFIERJames Houghton
Remove the now unnecessary ifdef in mm/damon/vaddr.c as well. Link: https://lkml.kernel.org/r/20241021160212.9935-1-jthoughton@google.com Signed-off-by: James Houghton <jthoughton@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-21Merge tag 'mm-stable-2024-09-20-02-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Along with the usual shower of singleton patches, notable patch series in this pull request are: - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. - "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. - "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. - "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". - "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. - "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. - "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. - "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. - "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). - "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. - "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. - "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. - "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. - "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. - "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. - "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. - "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). - "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! - "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. - "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. - "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. - "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. - "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. - "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy. This was overprovisioning THPs in sparsely accessed memory areas. - "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. - "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. - "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. - "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios" * tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits) zram: free secondary algorithms names uprobes: turn xol_area->pages[2] into xol_area->page uprobes: introduce the global struct vm_special_mapping xol_mapping Revert "uprobes: use vm_special_mapping close() functionality" mm: support large folios swap-in for sync io devices mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios mm: fix swap_read_folio_zeromap() for large folios with partial zeromap mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries set_memory: add __must_check to generic stubs mm/vma: return the exact errno in vms_gather_munmap_vmas() memcg: cleanup with !CONFIG_MEMCG_V1 mm/show_mem.c: report alloc tags in human readable units mm: support poison recovery from copy_present_page() mm: support poison recovery from do_cow_fault() resource, kunit: add test case for region_intersects() resource: make alloc_free_mem_region() works for iomem_resource mm: z3fold: deprecate CONFIG_Z3FOLD vfio/pci: implement huge_fault support mm/arm64: support large pfn mappings mm/x86: support large pfn mappings ...
2024-09-09mm/damon/vaddr: protect vma traversal in __damon_va_thre_regions() with rcu ↵Liam R. Howlett
read lock Traversing VMAs of a given maple tree should be protected by rcu read lock. However, __damon_va_three_regions() is not doing the protection. Hold the lock. Link: https://lkml.kernel.org/r/20240905001204.1481-1-sj@kernel.org Fixes: d0cf3dd47f0d ("damon: convert __damon_va_three_regions to use the VMA iterator") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/b83651a0-5b24-4206-b860-cb54ffdf209b@roeck-us.net Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03mm/damon: move kunit tests to tests/ subdirectory with _kunit suffixSeongJae Park
There was a discussion about better places for kunit test code[1] and test file name suffix[2]. Folowwing the conclusion, move kunit tests for DAMON to mm/damon/tests/ subdirectory and rename those. [1] https://lore.kernel.org/CABVgOS=pUdWb6NDHszuwb1HYws4a1-b1UmN=i8U_ED7HbDT0mg@mail.gmail.com [2] https://lore.kernel.org/CABVgOSmKwPq7JEpHfS6sbOwsR0B-DBDk_JP-ZD9s9ZizvpUjbQ@mail.gmail.com Link: https://lkml.kernel.org/r/20240827030336.7930-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12mm: provide mm_struct and address to huge_ptep_get()Christophe Leroy
On powerpc 8xx huge_ptep_get() will need to know whether the given ptep is a PTE entry or a PMD entry. This cannot be known with the PMD entry itself because there is no easy way to know it from the content of the entry. So huge_ptep_get() will need to know either the size of the page or get the pmd. In order to be consistent with huge_ptep_get_and_clear(), give mm and address to huge_ptep_get(). Link: https://lkml.kernel.org/r/cc00c70dd384298796a4e1b25d6c4eb306d3af85.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29mm/damon/vaddr: change asm-generic/mman-common.h to linux/mman.hTanzir Hasan
asm-generic/mman-common.h can be replaced by linux/mman.h and the file will still build correctly. It is an asm-generic file which should be avoided if possible. Link: https://lkml.kernel.org/r/20231221-asmgenericvaddr-v1-1-742b170c914e@google.com Fixes: 6dea8add4d28 ("mm/damon/vaddr: support DAMON-based Operation Schemes") Signed-off-by: Tanzir Hasan <tanzirh@google.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20mm/damon: update email of SeongJaeSeongJae Park
Patch series "mm/damon: misc updates for 6.8". Update comments, tests, and documents for DAMON. This patch (of 6): SeongJae is using his kernel.org account for DAMON development. Update the old email addresses on the comments of DAMON source files. Link: https://lkml.kernel.org/r/20231213190338.54146-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231213190338.54146-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04mm/damon/core: use pseudo-moving sum for nr_accesses_bpSeongJae Park
Let nr_accesses_bp be calculated as a pseudo-moving sum that updated for every sampling interval, using damon_moving_sum(). This is assumed to be useful for cases that the aggregation interval is set quite huge, but the monivoting results need to be collected earlier than next aggregation interval is passed. Link: https://lkml.kernel.org/r/20230915025251.72816-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04mm/damon/vaddr: call damon_update_region_access_rate() alwaysSeongJae Park
When getting mm_struct of the monitoring target process fails, there wil be no need to increase the access rate counter (nr_accesses) of the regions for the process. Hence, damon_va_check_accesses() skips calling damon_update_region_access_rate() in the case. This breaks the assumption that damon_update_region_access_rate() is called for every region, for every sampling interval. Call the function for every region even in the case. This might increase the overhead in some cases, but such case would not be frequent, so no significant impact is really expected. Link: https://lkml.kernel.org/r/20230915025251.72816-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04mm/damon/core: define and use a dedicated function for region access rate updateSeongJae Park
Patch series "mm/damon: provide pseudo-moving sum based access rate". DAMON checks the access to each region for every sampling interval, increase the access rate counter of the region, namely nr_accesses, if the access was made. For every aggregation interval, the counter is reset. The counter is exposed to users to be used as a metric showing the relative access rate (frequency) of each region. In other words, DAMON provides access rate of each region in every aggregation interval. The aggregation avoids temporal access pattern changes making things confusing. However, this also makes a few DAMON-related operations to unnecessarily need to be aligned to the aggregation interval. This can restrict the flexibility of DAMON applications, especially when the aggregation interval is huge. To provide the monitoring results in finer-grained timing while keeping handling of temporal access pattern change, this patchset implements a pseudo-moving sum based access rate metric. It is pseudo-moving sum because strict moving sum implementation would need to keep all values for last time window, and that could incur high overhead of there could be arbitrary number of values in a time window. Especially in case of the nr_accesses, since the sampling interval and aggregation interval can arbitrarily set and the past values should be maintained for every region, it could be risky. The pseudo-moving sum assumes there were no temporal access pattern change in last discrete time window to remove the needs for keeping the list of the last time window values. As a result, it beocmes not strict moving sum implementation, but provides a reasonable accuracy. Also, it keeps an important property of the moving sum. That is, the moving sum becomes same to discrete-window based sum at the time that aligns to the time window. This means using the pseudo moving sum based nr_accesses makes no change to users who shows the value for every aggregation interval. Patches Sequence ---------------- The sequence of the patches is as follows. The first four patches are for preparation of the change. The first two (patches 1 and 2) implements a helper function for nr_accesses update and eliminate corner case that skips use of the function, respectively. Following two (patches 3 and 4) respectively implement the pseudo-moving sum function and its simple unit test case. Two patches for making DAMON to use the pseudo-moving sum follow. The fifthe one (patch 5) introduces a new field for representing the pseudo-moving sum-based access rate of each region, and the sixth one makes the new representation to actually updated with the pseudo-moving sum function. Last two patches (patches 7 and 8) makes followup fixes for skipping unnecessary updates and marking the moving sum function as static, respectively. This patch (of 8): Each DAMON operarions set is updating nr_accesses field of each damon_region for each of their access check results, from the check_accesses() callback. Directly accessing the field could make things complex to manage and change in future. Define and use a dedicated function for the purpose. Link: https://lkml.kernel.org/r/20230915025251.72816-1-sj@kernel.org Link: https://lkml.kernel.org/r/20230915025251.72816-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29mm: hugetlb: add huge page size param to set_huge_pte_at()Ryan Roberts
Patch series "Fix set_huge_pte_at() panic on arm64", v2. This series fixes a bug in arm64's implementation of set_huge_pte_at(), which can result in an unprivileged user causing a kernel panic. The problem was triggered when running the new uffd poison mm selftest for HUGETLB memory. This test (and the uffd poison feature) was merged for v6.5-rc7. Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable (correctly this time) to get it backported to v6.5, where the issue first showed up. Description of Bug ================== arm64's huge pte implementation supports multiple huge page sizes, some of which are implemented in the page table with multiple contiguous entries. So set_huge_pte_at() needs to work out how big the logical pte is, so that it can also work out how many physical ptes (or pmds) need to be written. It previously did this by grabbing the folio out of the pte and querying its size. However, there are cases when the pte being set is actually a swap entry. But this also used to work fine, because for huge ptes, we only ever saw migration entries and hwpoison entries. And both of these types of swap entries have a PFN embedded, so the code would grab that and everything still worked out. But over time, more calls to set_huge_pte_at() have been added that set swap entry types that do not embed a PFN. And this causes the code to go bang. The triggering case is for the uffd poison test, commit 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") - added in v6.5-rc7. Although review shows that there are other call sites that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger on arm64 because arm64 doesn't support UFFD WP. If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise, it will dereference a bad pointer in page_folio(): static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry) { VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry)); return page_folio(pfn_to_page(swp_offset_pfn(entry))); } Fix === The simplest fix would have been to revert the dodgy cleanup commit 18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since things have moved on, this would have required an audit of all the new set_huge_pte_at() call sites to see if they should be converted to set_huge_swap_pte_at(). As per the original intent of the change, it would also leave us open to future bugs when people invariably get it wrong and call the wrong helper. So instead, I've added a huge page size parameter to set_huge_pte_at(). This means that the arm64 code has the size in all cases. It's a bigger change, due to needing to touch the arches that implement the function, but it is entirely mechanical, so in my view, low risk. I've compile-tested all touched arches; arm64, parisc, powerpc, riscv, s390, sparc (and additionally x86_64). I've additionally booted and run mm selftests against arm64, where I observe the uffd poison test is fixed, and there are no other regressions. This patch (of 2): In order to fix a bug, arm64 needs to be told the size of the huge page for which the pte is being set in set_huge_pte_at(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. No behavioral changes intended. Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx] Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> [vmalloc change] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> [6.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21merge mm-hotfixes-stable into mm-stable to pick up depended-upon changesAndrew Morton
2023-08-21damon: use pmdp_get instead of drectly dereferencing pmdLevi Yun
As ptep_get, Use the pmdp_get wrapper when we accessing pmdval instead of directly dereferencing pmd. Link: https://lkml.kernel.org/r/20230727212157.2985025-1-ppbuk5246@gmail.com Signed-off-by: Levi Yun <ppbuk5246@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21mm: enable page walking API to lock vmas during the walkSuren Baghdasaryan
walk_page_range() and friends often operate under write-locked mmap_lock. With introduction of vma locks, the vmas have to be locked as well during such walks to prevent concurrent page faults in these areas. Add an additional member to mm_walk_ops to indicate locking requirements for the walk. The change ensures that page walks which prevent concurrent page faults by write-locking mmap_lock, operate correctly after introduction of per-vma locks. With per-vma locks page faults can be handled under vma lock without taking mmap_lock at all, so write locking mmap_lock would not stop them. The change ensures vmas are properly locked during such walks. A sample issue this solves is do_mbind() performing queue_pages_range() to queue pages for migration. Without this change a concurrent page can be faulted into the area and be left out of migration. Link: https://lkml.kernel.org/r/20230804152724.3090321-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Suggested-by: Jann Horn <jannh@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: ptep_get() conversionRyan Roberts
Convert all instances of direct pte_t* dereferencing to instead use ptep_get() helper. This means that by default, the accesses change from a C dereference to a READ_ONCE(). This is technically the correct thing to do since where pgtables are modified by HW (for access/dirty) they are volatile and therefore we should always ensure READ_ONCE() semantics. But more importantly, by always using the helper, it can be overridden by the architecture to fully encapsulate the contents of the pte. Arch code is deliberately not converted, as the arch code knows best. It is intended that arch code (arm64) will override the default with its own implementation that can (e.g.) hide certain bits from the core code, or determine young/dirty status by mixing in state from another source. Conversion was done using Coccinelle: ---- // $ make coccicheck \ // COCCI=ptepget.cocci \ // SPFLAGS="--include-headers" \ // MODE=patch virtual patch @ depends on patch @ pte_t *v; @@ - *v + ptep_get(v) ---- Then reviewed and hand-edited to avoid multiple unnecessary calls to ptep_get(), instead opting to store the result of a single call in a variable, where it is correct to do so. This aims to negate any cost of READ_ONCE() and will benefit arch-overrides that may be more complex. Included is a fix for an issue in an earlier version of this patch that was pointed out by kernel test robot. The issue arose because config MMU=n elides definition of the ptep helper functions, including ptep_get(). HUGETLB_PAGE=n configs still define a simple huge_ptep_clear_flush() for linking purposes, which dereferences the ptep. So when both configs are disabled, this caused a build error because ptep_get() is not defined. Fix by continuing to do a direct dereference when MMU=n. This is safe because for this config the arch code cannot be trying to virtualize the ptes because none of the ptep helpers are defined. Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() failsHugh Dickins
Simple walk_page_range() users should set ACTION_AGAIN to retry when pte_offset_map_lock() fails. No need to check pmd_trans_unstable(): that was precisely to avoid the possiblity of calling pte_offset_map() on a racily removed or inserted THP entry, but such cases are now safely handled inside it. Likewise there is no need to check pmd_none() or pmd_bad() before calling it. Link: https://lkml.kernel.org/r/c77d9d10-3aad-e3ce-4896-99e91c7947f3@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: SeongJae Park <sj@kernel.org> for mm/damon part Cc: Alistair Popple <apopple@nvidia.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Song Liu <song@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zack Rusin <zackr@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09mm/damon/ops-common: atomically test and clear young on ptes and pmdsRyan Roberts
It is racy to non-atomically read a pte, then clear the young bit, then write it back as this could discard dirty information. Further, it is bad practice to directly set a pte entry within a table. Instead clearing young must go through the arch-provided helper, ptep_test_and_clear_young() to ensure it is modified atomically and to give the arch code visibility and allow it to check (and potentially modify) the operation. Link: https://lkml.kernel.org/r/20230602092949.545577-3-ryan.roberts@arm.com Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces"). Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18mm/damon/vaddr: record appropriate folio size when the access is not foundSeongJae Park
DAMON virtual address spaces monitoring operations set doesn't set folio size of the access checked address if access is not found. It could result in unnecessary and inefficient repeated check. Appropriately set the size regardless of access check result. Link: https://lkml.kernel.org/r/20230109213335.62525-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18mm/damon/vaddr: support folio of neither HPAGE_PMD_SIZE nor PAGE_SIZESeongJae Park
DAMON virtual address space monitoring operations set treats folios having non-HPAGE_PMD_SIZE size as having PAGE_SIZE size. Use the exact size of the folio. Link: https://lkml.kernel.org/r/20230109213335.62525-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18mm/damon/vaddr: rename 'damon_young_walk_private->page_sz' to 'folio_sz'SeongJae Park
Patch series "mm/damon/{v,p}addr: misc fixups for folio usage". DAMON's monitoring operations set for the virtual and the physical address spaces use folio now, but some code is not reflecting the fact. Further cleanup the code for folio usage. This patch (of 6): DAMON's virtual address space monitoring operations set is using folio now. Rename 'damon_pa_access_chk_result->page_sz' to reflect the fact. Link: https://lkml.kernel.org/r/20230109213335.62525-1-sj@kernel.org Link: https://lkml.kernel.org/r/20230109213335.62525-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18mm/damon/vaddr: convert hugetlb related functions to use a folioKefeng Wang
Convert damon_hugetlb_mkold() and damon_young_hugetlb_entry() to use a folio. Link: https://lkml.kernel.org/r/20221230070849.63358-9-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18mm/damon/vaddr: convert damon_young_pmd_entry() to use a folioKefeng Wang
With damon_get_folio(), let's convert damon_young_pmd_entry() to use a folio. Link: https://lkml.kernel.org/r/20221230070849.63358-7-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12mm/damon: use damon_sz_region() in appropriate placeXin Hao
In many places we can use damon_sz_region() to instead of "r->ar.end - r->ar.start". Link: https://lkml.kernel.org/r/20220927001946.85375-2-xhao@linux.alibaba.com Signed-off-by: Xin Hao <xhao@linux.alibaba.com> Suggested-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon: rename damon_pageout_score() to damon_cold_score()Kaixu Xia
In the beginning there is only one damos_action 'DAMOS_PAGEOUT' that need to get the coldness score of a region for a scheme, which using damon_pageout_score() to do that. But now there are also other damos_action actions need the coldness score, so rename it to damon_cold_score() to make more sense. Link: https://lkml.kernel.org/r/1663423014-28907-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon: use 'struct damon_target *' instead of 'void *' in target_valid()Kaixu Xia
We could use 'struct damon_target *' directly instead of 'void *' in target_valid() operation to make code simple. Link: https://lkml.kernel.org/r/1663241621-13293-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon: simplify the parameter passing for 'prepare_access_checks'Kaixu Xia
Patch series "mm/damon: code simplifications and cleanups". This patchset contains some code simplifications and cleanups for DAMON. This patch (of 4): The parameter 'struct damon_ctx *ctx' isn't used in the functions __damon_{p,v}a_prepare_access_check(), so we can remove it and simplify the parameter passing. Link: https://lkml.kernel.org/r/1663060287-30201-1-git-send-email-kaixuxia@tencent.com Link: https://lkml.kernel.org/r/1663060287-30201-2-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon/core: use a dedicated struct for monitoring attributesSeongJae Park
DAMON monitoring attributes are directly defined as fields of 'struct damon_ctx'. This makes 'struct damon_ctx' a little long and complicated. This commit defines and uses a struct, 'struct damon_attrs', which is dedicated for only the monitoring attributes to make the purpose of the five values clearer and simplify 'struct damon_ctx'. Link: https://lkml.kernel.org/r/20220913174449.50645-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon/vaddr: add a comment for 'default' case in damon_va_apply_scheme()Kaixu Xia
The switch case 'DAMOS_STAT' and switch case 'default' have same return value in damon_va_apply_scheme(), and the 'default' case is for DAMOS actions that not supported by 'vaddr'. It might make sense to add a comment here. [akpm@linux-foundation.org: fx comment grammar] Link: https://lkml.kernel.org/r/1662606797-23534-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26damon: convert __damon_va_three_regions to use the VMA iteratorLiam R. Howlett
This rather specialised walk can use the VMA iterator. If this proves to be too slow, we can write a custom routine to find the two largest gaps, but it will be somewhat complicated, so let's see if we need it first. Update the kunit test case to use the maple tree. This also fixes an issue with the kunit testcase not adding the last VMA to the list. Link: https://lkml.kernel.org/r/20220906194824.2110408-16-Liam.Howlett@oracle.com Fixes: 17ccae8bb5c9 (mm/damon: add kunit tests) Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11mm/damon/vaddr: remove comparison between mm and last_mm when checking ↵Kaixu Xia
region accesses The damon regions that belong to the same damon target have the same 'struct mm_struct *mm', so it's unnecessary to compare the mm and last_mm objects among the damon regions in one damon target when checking accesses. But the check is necessary when the target changed in '__damon_va_check_accesses()', so we can simplify the whole operation by using the bool 'same_target' to indicate whether the target changed. Link: https://lkml.kernel.org/r/1661590971-20893-3-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11mm/damon: simplify the parameter passing for 'check_accesses'Kaixu Xia
Patch series "mm/damon: Simplify the damon regions access check", v2. This patchset simplifies the operations when checking the damon regions accesses. This patch (of 2): The parameter 'struct damon_ctx *ctx' isn't used in the functions __damon_{p,v}a_check_access(), so we can remove it and simplify the parameter passing. Link: https://lkml.kernel.org/r/1661590971-20893-1-git-send-email-kaixuxia@tencent.com Link: https://lkml.kernel.org/r/1661590971-20893-2-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11mm/damon: replace pmd_huge() with pmd_trans_huge() for THPBaolin Wang
pmd_huge() is usually used to indicate a pmd level hugetlb. However a pmd mapped huge page can only be THP in damon_mkold_pmd_entry() or damon_young_pmd_entry(), so replace pmd_huge() with pmd_trans_huge() in this case to make the code more readable according to the discussion [1]. [1] https://lore.kernel.org/all/098c1480-416d-bca9-cedb-ca495df69b64@linux.alibaba.com/ Link: https://lkml.kernel.org/r/a9e010ca5d299e18d740c7c52290ecb6a014dde6.1660805030.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11mm/damon: validate if the pmd entry is present before accessingBaolin Wang
pmd_huge() is used to validate if the pmd entry is mapped by a huge page, also including the case of non-present (migration or hwpoisoned) pmd entry on arm64 or x86 architectures. This means that pmd_pfn() can not get the correct pfn number for a non-present pmd entry, which will cause damon_get_page() to get an incorrect page struct (also may be NULL by pfn_to_online_page()), making the access statistics incorrect. This means that the DAMON may make incorrect decision according to the incorrect statistics, for example, DAMON may can not reclaim cold page in time due to this cold page was regarded as accessed mistakenly if DAMOS_PAGEOUT operation is specified. Moreover it does not make sense that we still waste time to get the page of the non-present entry. Just treat it as not-accessed and skip it, which maintains consistency with non-present pte level entries. So add pmd entry present validation to fix the above issues. Link: https://lkml.kernel.org/r/58b1d1f5fbda7db49ca886d9ef6783e3dcbbbc98.1660805030.git.baolin.wang@linux.alibaba.com Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/damon: use set_huge_pte_at() to make huge pte oldBaolin Wang
The huge_ptep_set_access_flags() can not make the huge pte old according to the discussion [1], that means we will always mornitor the young state of the hugetlb though we stopped accessing the hugetlb, as a result DAMON will get inaccurate accessing statistics. So changing to use set_huge_pte_at() to make the huge pte old to fix this issue. [1] https://lore.kernel.org/all/Yqy97gXI4Nqb7dYo@arm.com/ Link: https://lkml.kernel.org/r/1655692482-28797-1-git-send-email-baolin.wang@linux.alibaba.com Fixes: 49f4203aae06 ("mm/damon: add access checking for hugetlb pages") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: SeongJae Park <sj@kernel.org> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm: damon: use HPAGE_PMD_SIZEKefeng Wang
Use HPAGE_PMD_SIZE instead of open coding. Link: https://lkml.kernel.org/r/20220517145120.118523-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/damon/vaddr: remove damon_va_apply_three_regions()SeongJae Park
'damon_va_apply_three_regions()' is just a wrapper of its general version, 'damon_set_regions()'. This commit replaces the wrapper calls to directly call the general version. Link: https://lkml.kernel.org/r/20220429160606.127307-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/damon/vaddr: move 'damon_set_regions()' to coreSeongJae Park
This commit moves 'damon_set_regions()' from vaddr to core, as it is aimed to be used by not only 'vaddr' but also other parts of DAMON. Link: https://lkml.kernel.org/r/20220429160606.127307-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/damon/vaddr: generalize damon_va_apply_three_regions()SeongJae Park
'damon_va_apply_three_regions()' is for adjusting address ranges to fit in three discontiguous ranges. The function can be generalized for arbitrary number of discontiguous ranges and reused for future usage, such as arbitrary online regions update. For such future usage, this commit introduces a generalized version of the function called 'damon_set_regions()'. Link: https://lkml.kernel.org/r/20220429160606.127307-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/damon/vaddr: register a damon_operations for fixed virtual address ranges ↵SeongJae Park
monitoring Patch series "support fixed virtual address ranges monitoring". The monitoring operations set for virtual address spaces automatically updates the monitoring target regions to cover entire mappings of the virtual address spaces as much as possible. Some users could have more information about their programs than kernel and therefore have interest in not entire regions but only specific regions. For such cases, the automatic monitoring target regions updates are only unnecessary overhead or distractions. This patchset adds supports for the use case on DAMON's kernel API (DAMON_OPS_FVADDR) and sysfs interface ('fvaddr' keyword for 'operations' sysfs file). This patch (of 3): The monitoring operations set for virtual address spaces automatically updates the monitoring target regions to cover entire mappings of the virtual address spaces as much as possible. Some users could have more information about their programs than kernel and therefore have interest in not entire regions but only specific regions. For such cases, the automatic monitoring target regions updates are only unnecessary overheads or distractions. For such cases, DAMON's API users can simply set the '->init()' and '->update()' of the DAMON context's '->ops' NULL, and set the target monitoring regions when creating the context. But, that would be a dirty hack. Worse yet, the hack is unavailable for DAMON user space interface users. To support the use case in a clean way that can easily exported to the user space, this commit adds another monitoring operations set called 'fvaddr', which is same to 'vaddr' but does not automatically update the monitoring regions. Instead, it will only respect the virtual address regions which have explicitly passed at the initial context creation. Note that this commit leave sysfs interface not supporting the feature yet. The support will be made in a following commit. Link: https://lkml.kernel.org/r/20220426231750.48822-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220426231750.48822-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-03-22mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()SeongJae Park
Because DAMON debugfs interface and DAMON-based proactive reclaim are now using monitoring operations via registration mechanism, damon_{p,v}a_{target_valid,set_operations}() functions have no user. This commit clean them up. Link: https://lkml.kernel.org/r/20220215184603.1479-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/damon/paddr,vaddr: register themselves to DAMON in subsys_initcallSeongJae Park
This commit makes the monitoring operations for the physical address space and virtual address spaces register themselves to DAMON in the subsys_initcall step. Later, in-kernel DAMON user code can use them via damon_select_ops() without have to unnecessarily depend on all possible monitoring operations implementations. Link: https://lkml.kernel.org/r/20220215184603.1479-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/damon: rename damon_primitives to damon_operationsSeongJae Park
Patch series "Allow DAMON user code independent of monitoring primitives". In-kernel DAMON user code is required to configure the monitoring context (struct damon_ctx) with proper monitoring primitives (struct damon_primitive). This makes the user code dependent to all supporting monitoring primitives. For example, DAMON debugfs interface depends on both DAMON_VADDR and DAMON_PADDR, though some users have interest in only one use case. As more monitoring primitives are introduced, the problem will be bigger. To minimize such unnecessary dependency, this patchset makes monitoring primitives can be registered by the implemnting code and later dynamically searched and selected by the user code. In addition to that, this patchset renames monitoring primitives to monitoring operations, which is more easy to intuitively understand what it means and how it would be structed. This patch (of 8): DAMON has a set of callback functions called monitoring primitives and let it can be configured with various implementations for easy extension for different address spaces and usages. However, the word 'primitive' is not so explicit. Meanwhile, many other structs resembles similar purpose calls themselves 'operations'. To make the code easier to be understood, this commit renames 'damon_primitives' to 'damon_operations' before it is too late to rename. Link: https://lkml.kernel.org/r/20220215184603.1479-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220215184603.1479-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Xin Hao <xhao@linux.alibaba.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/damon: remove redundant page validationBaolin Wang
It will never get a NULL page by pte_page() as discussed in thread [1], thus remove the redundant page validation to fix below Smatch static checker warning. mm/damon/vaddr.c:405 damon_hugetlb_mkold() warn: 'page' can't be NULL. [1] https://lore.kernel.org/linux-mm/20220106091200.GA14564@kili/ Link: https://lkml.kernel.org/r/6d32f7d201b8970d53f51b6c5717d472aed2987c.1642386715.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: SeongJae Park <sj@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/damon: remove the target id conceptSeongJae Park
DAMON asks each monitoring target ('struct damon_target') to have one 'unsigned long' integer called 'id', which should be unique among the targets of same monitoring context. Meaning of it is, however, totally up to the monitoring primitives that registered to the monitoring context. For example, the virtual address spaces monitoring primitives treats the id as a 'struct pid' pointer. This makes the code flexible, but ugly, not well-documented, and type-unsafe[1]. Also, identification of each target can be done via its index. For the reason, this commit removes the concept and uses clear type definition. For now, only 'struct pid' pointer is used for the virtual address spaces monitoring. If DAMON is extended in future so that we need to put another identifier field in the struct, we will use a union for such primitives-dependent fields and document which primitives are using which type. [1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/ Link: https://lkml.kernel.org/r/20211230100723.2238-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure logSeongJae Park
The failure log message for 'damon_va_three_regions()' prints the target id, which is a 'struct pid' pointer in the case. To avoid exposing the kernel pointer via the log, this makes the log to use the index of the target in the context's targets list instead. Link: https://lkml.kernel.org/r/20211229131016.23641-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure loggingSeongJae Park
Failure of 'damon_va_three_regions()' is logged using 'pr_err()'. But, the function can fail in legal situations. To avoid making users be surprised and to keep the kernel clean, this makes the log to be printed using 'pr_debug()'. Link: https://lkml.kernel.org/r/20211229131016.23641-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15mm/damon: add access checking for hugetlb pagesBaolin Wang
The process's VMAs can be mapped by hugetlb page, but now the DAMON did not implement the access checking for hugetlb pte, so we can not get the actual access count like below if a process VMAs were mapped by hugetlb. damon_aggregated: target_id=18446614368406014464 nr_regions=12 4194304-5476352: 0 545 damon_aggregated: target_id=18446614368406014464 nr_regions=12 140662370467840-140662372970496: 0 545 damon_aggregated: target_id=18446614368406014464 nr_regions=12 140662372970496-140662375460864: 0 545 damon_aggregated: target_id=18446614368406014464 nr_regions=12 140662375460864-140662377951232: 0 545 damon_aggregated: target_id=18446614368406014464 nr_regions=12 140662377951232-140662380449792: 0 545 damon_aggregated: target_id=18446614368406014464 nr_regions=12 140662380449792-140662382944256: 0 545 ...... Thus this patch adds hugetlb access checking support, with this patch we can see below VMA mapped by hugetlb access count. damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296486649856-140296489914368: 1 3 damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296489914368-140296492978176: 1 3 damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296492978176-140296495439872: 1 3 damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296495439872-140296498311168: 1 3 damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296498311168-140296501198848: 1 3 damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296501198848-140296504320000: 1 3 damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296504320000-140296507568128: 1 2 ...... [baolin.wang@linux.alibaba.com: fix unused var warning] Link: https://lkml.kernel.org/r/1aaf9c11-0d8e-b92d-5c92-46e50a6e8d4e@linux.alibaba.com [baolin.wang@linux.alibaba.com: v3] Link: https://lkml.kernel.org/r/486927ecaaaecf2e3a7fbe0378ec6e1c58b50747.1640852276.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/6afcbd1fda5f9c7c24f320d26a98188c727ceec3.1639623751.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>