summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-07-24selftests/proc: add verbose mode for /proc/pid/maps tearing testsSuren Baghdasaryan
Add verbose mode to the /proc/pid/maps tearing tests to print debugging information. VERBOSE environment variable is used to enable it. Usage example: VERBOSE=1 ./proc-maps-race Link: https://lkml.kernel.org/r/20250719182854.3166724-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeongjun Park <aha310510@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: T.J. Mercier <tjmercier@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Ye Bin <yebin10@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-24selftests/proc: extend /proc/pid/maps tearing test to include vma remappingSuren Baghdasaryan
Test that /proc/pid/maps does not report unexpected holes in the address space when we concurrently remap a part of a vma into the middle of another vma. This remapping results in the destination vma being split into three parts and the part in the middle being patched back from, all done concurrently from under the reader. We should always see either original vma or the split one with no holes. Link: https://lkml.kernel.org/r/20250719182854.3166724-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeongjun Park <aha310510@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: T.J. Mercier <tjmercier@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Ye Bin <yebin10@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-24selftests/proc: extend /proc/pid/maps tearing test to include vma resizingSuren Baghdasaryan
Test that /proc/pid/maps does not report unexpected holes in the address space when a vma at the edge of the page is being concurrently remapped. This remapping results in the vma shrinking and expanding from under the reader. We should always see either shrunk or expanded (original) version of the vma. Link: https://lkml.kernel.org/r/20250719182854.3166724-3-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeongjun Park <aha310510@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: T.J. Mercier <tjmercier@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Ye Bin <yebin10@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-24selftests/proc: add /proc/pid/maps tearing from vma split testSuren Baghdasaryan
Patch series "use per-vma locks for /proc/pid/maps reads", v8. Reading /proc/pid/maps requires read-locking mmap_lock which prevents any other task from concurrently modifying the address space. This guarantees coherent reporting of virtual address ranges, however it can block important updates from happening. Oftentimes /proc/pid/maps readers are low priority monitoring tasks and them blocking high priority tasks results in priority inversion. Locking the entire address space is required to present fully coherent picture of the address space, however even current implementation does not strictly guarantee that by outputting vmas in page-size chunks and dropping mmap_lock in between each chunk. Address space modifications are possible while mmap_lock is dropped and userspace reading the content is expected to deal with possible concurrent address space modifications. Considering these relaxed rules, holding mmap_lock is not strictly needed as long as we can guarantee that a concurrently modified vma is reported either in its original form or after it was modified. This patchset switches from holding mmap_lock while reading /proc/pid/maps to taking per-vma locks as we walk the vma tree. This reduces the contention with tasks modifying the address space because they would have to contend for the same vma as opposed to the entire address space. Previous version of this patchset [1] tried to perform /proc/pid/maps reading under RCU, however its implementation is quite complex and the results are worse than the new version because it still relied on mmap_lock speculation which retries if any part of the address space gets modified. New implementaion is both simpler and results in less contention. Note that similar approach would not work for /proc/pid/smaps reading as it also walks the page table and that's not RCU-safe. Paul McKenney's designed a test [2] to measure mmap/munmap latencies while concurrently reading /proc/pid/maps. The test has a pair of processes scanning /proc/PID/maps, and another process unmapping and remapping 4K pages from a 128MB range of anonymous memory. At the end of each 10 second run, the latency of each mmap() or munmap() operation is measured, and for each run the maximum and mean latency is printed. The map/unmap process is started first, its PID is passed to the scanners, and then the map/unmap process waits until both scanners are running before starting its timed test. The scanners keep scanning until the specified /proc/PID/maps file disappears. The latest results from Paul: Stock mm-unstable, all of the runs had maximum latencies in excess of 0.5 milliseconds, and with 80% of the runs' latencies exceeding a full millisecond, and ranging up beyond 4 full milliseconds. In contrast, 99% of the runs with this patch series applied had maximum latencies of less than 0.5 milliseconds, with the single outlier at only 0.608 milliseconds. From a median-performance (as opposed to maximum-latency) viewpoint, this patch series also looks good, with stock mm weighing in at 11 microseconds and patch series at 6 microseconds, better than a 2x improvement. Before the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.011 0.008 0.521 0.011 0.008 0.552 0.011 0.008 0.590 0.011 0.008 0.660 ... 0.011 0.015 2.987 0.011 0.015 3.038 0.011 0.016 3.431 0.011 0.016 4.707 After the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.006 0.005 0.026 0.006 0.005 0.029 0.006 0.005 0.034 0.006 0.005 0.035 ... 0.006 0.006 0.421 0.006 0.006 0.423 0.006 0.006 0.439 0.006 0.006 0.608 The patchset also adds a number of tests to check for /proc/pid/maps data coherency. They are designed to detect any unexpected data tearing while performing some common address space modifications (vma split, resize and remap). Even before these changes, reading /proc/pid/maps might have inconsistent data because the file is read page-by-page with mmap_lock being dropped between the pages. An example of user-visible inconsistency can be that the same vma is printed twice: once before it was modified and then after the modifications. For example if vma was extended, it might be found and reported twice. What is not expected is to see a gap where there should have been a vma both before and after modification. This patchset increases the chances of such tearing, therefore it's even more important now to test for unexpected inconsistencies. In [3] Lorenzo identified the following possible vma merging/splitting scenarios: Merges with changes to existing vmas: 1 Merge both - mapping a vma over another one and between two vmas which can be merged after this replacement; 2. Merge left full - mapping a vma at the end of an existing one and completely over its right neighbor; 3. Merge left partial - mapping a vma at the end of an existing one and partially over its right neighbor; 4. Merge right full - mapping a vma before the start of an existing one and completely over its left neighbor; 5. Merge right partial - mapping a vma before the start of an existing one and partially over its left neighbor; Merges without changes to existing vmas: 6. Merge both - mapping a vma into a gap between two vmas which can be merged after the insertion; 7. Merge left - mapping a vma at the end of an existing one; 8. Merge right - mapping a vma before the start end of an existing one; Splits 9. Split with new vma at the lower address; 10. Split with new vma at the higher address; If such merges or splits happen concurrently with the /proc/maps reading we might report a vma twice, once before the modification and once after it is modified: Case 1 might report overwritten and previous vma along with the final merged vma; Case 2 might report previous and the final merged vma; Case 3 might cause us to retry once we detect the temporary gap caused by shrinking of the right neighbor; Case 4 might report overritten and the final merged vma; Case 5 might cause us to retry once we detect the temporary gap caused by shrinking of the left neighbor; Case 6 might report previous vma and the gap along with the final marged vma; Case 7 might report previous and the final merged vma; Case 8 might report the original gap and the final merged vma covering the gap; Case 9 might cause us to retry once we detect the temporary gap caused by shrinking of the original vma at the vma start; Case 10 might cause us to retry once we detect the temporary gap caused by shrinking of the original vma at the vma end; In all these cases the retry mechanism prevents us from reporting possible temporary gaps. [1] https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/ [2] https://github.com/paulmckrcu/proc-mmap_sem-test [3] https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.local/ The /proc/pid/maps file is generated page by page, with the mmap_lock released between pages. This can lead to inconsistent reads if the underlying vmas are concurrently modified. For instance, if a vma split or merge occurs at a page boundary while /proc/pid/maps is being read, the same vma might be seen twice: once before and once after the change. This duplication is considered acceptable for userspace handling. However, observing a "hole" where a vma should be (e.g., due to a vma being replaced and the space temporarily being empty) is unacceptable. Implement a test that: 1. Forks a child process which continuously modifies its address space, specifically targeting a vma at the boundary between two pages. 2. The parent process repeatedly reads the child's /proc/pid/maps. 3. The parent process checks the last vma of the first page and the first vma of the second page for consistency, looking for the effects of vma splits or merges. The test duration is configurable via DURATION environment variable expressed in seconds. The default test duration is 5 seconds. Example Command: DURATION=10 ./proc-maps-race Link: https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/ [1] Link: https://github.com/paulmckrcu/proc-mmap_sem-test [2] Link: https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.local/ [3] Link: https://lkml.kernel.org/r/20250719182854.3166724-1-surenb@google.com Link: https://lkml.kernel.org/r/20250719182854.3166724-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeongjun Park <aha310510@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: T.J. Mercier <tjmercier@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Ye Bin <yebin10@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-24tools/testing/selftests: extend mremap_test to test multi-VMA mremapLorenzo Stoakes
Now that we have added the ability to move multiple VMAs at once, assert that this functions correctly, both overwriting VMAs and moving backwards and forwards with merge and VMA invalidation. Additionally assert that page tables are correctly propagated by setting random data and reading it back. Link: https://lkml.kernel.org/r/139074a24a011ca4ed52498a7fa2080024b43917.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19selftests/mm: pagemap_scan ioctl: add PFN ZERO test casesMuhammad Usama Anjum
Add test cases to test the correctness of PFN ZERO flag of pagemap_scan ioctl. Test with normal pages backed memory and huge pages backed memory. Link: https://lkml.kernel.org/r/20250707073321.106431-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: David Hildenbrand <david@redhat.com> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13tools/testing/selftests: add mremap() unfaulted/faulted test casesLorenzo Stoakes
Assert that mremap() behaviour is as expected when moving around unfaulted VMAs immediately adjacent to faulted ones, as well as moving around faulted VMAs and placing them back immediately adjacent to the VMA from which they were moved. This also introduces a shared helper for the syscall version of mremap() so we don't encounter any issues with libc filtering parameters. Link: https://lkml.kernel.org/r/20250702084717.21360-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13selftests/damon/sysfs.py: test DAMOS schemes parameters setupSeongJae Park
Add DAMON sysfs interface functionality tests for basic DAMOS schemes parameters setup. Link: https://lkml.kernel.org/r/20250628160428.53115-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13selftests/damon/sysfs.py: test adaptive targets parameterSeongJae Park
Add DAMON sysfs interface functionality tests for setup of basic adaptive targets parameters. Link: https://lkml.kernel.org/r/20250628160428.53115-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13selftests/damon/sysfs.py: test monitoring attribute parametersSeongJae Park
Add DAMON sysfs interface functionality tests for DAMON monitoring attribute parameters, including intervals, intervals tuning goals, and min/max number of regions. Link: https://lkml.kernel.org/r/20250628160428.53115-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13selftests/damon: add python and drgn-based DAMON sysfs testSeongJae Park
Add a python-written DAMON sysfs functionality selftest. It sets DAMON parameters using Python module _damon_sysfs, reads updated kernel internal DAMON status and parameters using a 'drgn' script, namely drgn_dump_damon_status.py, and compare if the resulted DAMON internal status is as expected. The test is very minimum at the moment. Link: https://lkml.kernel.org/r/20250628160428.53115-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13selftests/damon/_damon_sysfs: set Kdamond.pid in start()SeongJae Park
_damon_sysfs.py is a Python module for reading and writing DAMON sysfs for testing. It is not reading resulting kdamond pids. Read and update those when starting kdamonds. Link: https://lkml.kernel.org/r/20250628160428.53115-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13selftests/damon: add drgn script for extracting damon statusSeongJae Park
Patch series "selftests/damon: add python and drgn based DAMON sysfs functionality tests". DAMON sysfs interface is the bridge between the user space and the kernel space for DAMON parameters. There is no good and simple test to see if the parameters are set as expected. Existing DAMON selftests therefore test end-to-end features. For example, damos_quota_goal.py runs a DAMOS scheme with quota goal set against a test program running an artificial access pattern, and see if the result is as expected. Such tests cover only a few part of DAMON. Adding more tests is also complicated. Finally, the reliability of the test itself on different systems is bad. 'drgn' is a tool that can extract kernel internal data structures like DAMON parameters. Add a test that passes specific DAMON parameters via DAMON sysfs reusing _damon_sysfs.py, extract resulting DAMON parameters via 'drgn', and compare those. Note that this test is not adding exhaustive tests of all DAMON parameters and input combinations but very basic things. Advancing the test infrastructure and adding more tests are future works. This patch (of 6): 'drgn' is a useful tool for extracting kernel internal data structures such as DAMON's parameter and running status. Add a 'drgn' script that extracts such DAMON internal data at runtime, for using it as a tool for seeing if a test input has made expected results in the kernel. The script saves or prints out the DAMON internal data as a json file or string. This is for making use of it not very depends on 'drgn'. If 'drgn' is not available on a test setup and we find alternative tools for doing that, the json-based tests can be updated to use an alternative tool in future. Note that the script is tested with 'drgn v0.0.22'. Link: https://lkml.kernel.org/r/20250628160428.53115-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250628160428.53115-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-12Merge branch 'mm-hotfixes-stable' into mm-stable to pick up changes whichAndrew Morton
are required for a merge of the series "mm: folio_pte_batch() improvements".
2025-07-09ksm_tests: skip hugepage test when Transparent Hugepages are disabledLi Wang
Some systems (e.g. minimal or real-time kernels) may not enable Transparent Hugepages (THP), causing MADV_HUGEPAGE to return EINVAL. This patch introduces a runtime check using the existing THP sysfs interface and skips the hugepage merging test (`-H`) when THP is not available. To avoid those failures: # ----------------------------- # running ./ksm_tests -H -s 100 # ----------------------------- # ksm_tests: MADV_HUGEPAGE: Invalid argument # [FAIL] not ok 1 ksm_tests -H -s 100 # exit=2 # -------------------- # running ./khugepaged # -------------------- # Reading PMD pagesize failed# [FAIL] not ok 1 khugepaged # exit=1 # -------------------- # running ./soft-dirty # -------------------- # TAP version 13 # 1..15 # ok 1 Test test_simple # ok 2 Test test_vma_reuse dirty bit of allocated page # ok 3 Test test_vma_reuse dirty bit of reused address page # Bail out! Reading PMD pagesize failed# Planned tests != run tests (15 != 3) # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0 # [FAIL] not ok 1 soft-dirty # exit=1 # SUMMARY: PASS=0 SKIP=0 FAIL=1 # ------------------- # running ./migration # ------------------- # TAP version 13 # 1..3 # # Starting 3 tests from 1 test cases. # # RUN migration.private_anon ... # # OK migration.private_anon # ok 1 migration.private_anon # # RUN migration.shared_anon ... # # OK migration.shared_anon # ok 2 migration.shared_anon # # RUN migration.private_anon_thp ... # # migration.c:196:private_anon_thp:Expected madvise(ptr, TWOMEG, MADV_HUGEPAGE) (-1) == 0 (0) # # private_anon_thp: Test terminated by assertion # # FAIL migration.private_anon_thp # not ok 3 migration.private_anon_thp # # FAILED: 2 / 3 tests passed. # # Totals: pass:2 fail:1 xfail:0 xpass:0 skip:0 error:0 # [FAIL] not ok 1 migration # exit=1 It's true that CONFIG_TRANSPARENT_HUGEPAGE=y is explicitly enabled in tools/testing/selftests/mm/config, so ideally the runtime environment should also support THP. However, in practice, we've found that on some systems: - THP is disabled at boot time (transparent_hugepage=never) - Or manually disabled via sysfs - Or unavailable in RT kernels, containers, or minimal CI environments In these cases, the test will fail with EINVAL on madvise(MADV_HUGEPAGE), even though the kernel config is correct. To make the test suite more robust and avoid false negatives, this patch adds a runtime check for /sys/kernel/mm/transparent_hugepage/enabled. If THP is not available, the hugepage test (-H) is skipped with a clear message. Link: https://lkml.kernel.org/r/20250624032748.393836-1-liwang@redhat.com Signed-off-by: Li Wang <liwang@redhat.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftests/mm: fix UFFDIO_API usage with proper two-step feature negotiationLi Wang
The current implementation of test_unmerge_uffd_wp() explicitly sets `uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP` before calling UFFDIO_API. This can cause the ioctl() call to fail with EINVAL on kernels that do not support UFFD-WP, leading the test to fail unnecessarily: # ------------------------------ # running ./ksm_functional_tests # ------------------------------ # TAP version 13 # 1..9 # # [RUN] test_unmerge # ok 1 Pages were unmerged # # [RUN] test_unmerge_zero_pages # ok 2 KSM zero pages were unmerged # # [RUN] test_unmerge_discarded # ok 3 Pages were unmerged # # [RUN] test_unmerge_uffd_wp # not ok 4 UFFDIO_API failed <----- # # [RUN] test_prot_none # ok 5 Pages were unmerged # # [RUN] test_prctl # ok 6 Setting/clearing PR_SET_MEMORY_MERGE works # # [RUN] test_prctl_fork # # No pages got merged # # [RUN] test_prctl_fork_exec # ok 7 PR_SET_MEMORY_MERGE value is inherited # # [RUN] test_prctl_unmerge # ok 8 Pages were unmerged # Bail out! 1 out of 8 tests failed # # Planned tests != run tests (9 != 8) # # Totals: pass:7 fail:1 xfail:0 xpass:0 skip:0 error:0 # [FAIL] This patch improves compatibility and robustness of the UFFD-WP test (test_unmerge_uffd_wp) by correctly implementing the UFFDIO_API two-step handshake as recommended by the userfaultfd(2) man page. Key changes: 1. Use features=0 in the initial UFFDIO_API call to query supported feature bits, rather than immediately requesting WP support. 2. Skip the test gracefully if: - UFFDIO_API fails with EINVAL (e.g. unsupported API version), or - UFFD_FEATURE_PAGEFAULT_FLAG_WP is not advertised by the kernel. 3. Close the initial userfaultfd and create a new one before enabling the required feature, since UFFDIO_API can only be called once per fd. 4. Improve diagnostics by distinguishing between expected and unexpected failures, using strerror() to report errors. This ensures the test behaves correctly across a wider range of kernel versions and configurations, while preserving the intended behavior on kernels that support UFFD-WP. [liwang@redhat.com: fail the test if sys_userfaultfd() fails, per David] Link: https://lkml.kernel.org/r/20250625004645.400520-1-liwang@redhat.com Link: https://lkml.kernel.org/r/20250624042411.395285-1-liwang@redhat.com Signed-off-by: Li Wang <liwang@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Li Wang <liwang@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftests/mm: remove duplicate .gitignore entriesMoon Hee Lee
Remove redundant entries in .gitignore confirmed by: $ sort tools/testing/selftests/mm/.gitignore | uniq -d hugetlb_dio pkey_sighandler_tests_32 pkey_sighandler_tests_64 These entries were originally added by [1], and later duplicated by [2]. [1] https://lore.kernel.org/all/20240924185911.117937-1-lorenzo.stoakes@oracle.com/ [2] https://lore.kernel.org/all/20241125064036.413536-1-lizhijian@fujitsu.com/ Link: https://lkml.kernel.org/r/20250626020758.163243-1-moonhee.lee.ca@gmail.com Signed-off-by: Moon Hee Lee <moonhee.lee.ca@gmail.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftests/mm: reduce uffd-unit-test poison test to minimumPeter Xu
The test will still generate quite some unwanted MCE error messages to syslog. There was old proposal ratelimiting the MCE messages from kernel, but that has risk of hiding real useful information on production systems. We can at least reduce the test to minimum to not over-pollute dmesg, however trying to not lose its coverage too much. [peterx@redhat.com: reduce uffd-unit-test poison test to minimum] Link: https://lkml.kernel.org/r/aF2RSsjuEOtzXcUa@x1.local Link: https://lkml.kernel.org/r/20250620150058.1729489-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftets/damon: add a test for memcg_path leakSeongJae Park
There was a memory leak bug in DAMOS sysfs memcg_path file. Add a selftest to ensure the bug never comes back. Link: https://lkml.kernel.org/r/20250619183608.6647-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: remove callers of pfn_t functionalityAlistair Popple
All PFN_* pfn_t flags have been removed. Therefore there is no longer a need for the pfn_t type and all uses can be replaced with normal pfns. Link: https://lkml.kernel.org/r/bbedfa576c9822f8032494efbe43544628698b1f.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Björn Töpel <bjorn@rivosinc.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: remove PFN_DEV, PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LASTAlistair Popple
The PFN_MAP flag is no longer used for anything, so remove it. The PFN_SG_CHAIN and PFN_SG_LAST flags never appear to have been used so also remove them. The last user of PFN_SPECIAL was removed by 653d7825c149 ("dcssblk: mark DAX broken, remove FS_DAX_LIMITED support"). Users of PFN_DEV were removed earlier in this series by "mm: Remove remaining uses of PFN_DEV". Link: https://lkml.kernel.org/r/670b3950d70b4d97b905bb597dadfd3633de4314.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Björn Töpel <bjorn@rivosinc.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftests/udmabuf: add a test to pin first before writing to memfdVivek Kasireddy
Unlike the existing tests, this new test will create a memfd (backed by hugetlb) and pin the folios in it (a small subset) before writing/ populating it with data. This is a valid use-case that invokes the memfd_alloc_folio() kernel API and is expected to work unless there aren't enough hugetlb folios to satisfy the allocation needs. Link: https://lkml.kernel.org/r/20250618053415.1036185-4-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Steve Sistare <steven.sistare@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: update architecture and driver code to use vm_flags_tLorenzo Stoakes
In future we intend to change the vm_flags_t type, so it isn't correct for architecture and driver code to assume it is unsigned long. Correct this assumption across the board. Overall, this patch does not introduce any functional change. Link: https://lkml.kernel.org/r/b6eb1894abc5555ece80bb08af5c022ef780c8bc.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: update core kernel code to use vm_flags_t consistentlyLorenzo Stoakes
The core kernel code is currently very inconsistent in its use of vm_flags_t vs. unsigned long. This prevents us from changing the type of vm_flags_t in the future and is simply not correct, so correct this. While this results in rather a lot of churn, it is a critical pre-requisite for a future planned change to VMA flag type. Additionally, update VMA userland tests to account for the changes. To make review easier and to break things into smaller parts, driver and architecture-specific changes is left for a subsequent commit. The code has been adjusted to cascade the changes across all calling code as far as is needed. We will adjust architecture-specific and driver code in a subsequent patch. Overall, this patch does not introduce any functional change. Link: https://lkml.kernel.org/r/d1588e7bb96d1ea3fe7b9df2c699d5b4592d901d.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Kees Cook <kees@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: change vm_get_page_prot() to accept vm_flags_t argumentLorenzo Stoakes
Patch series "use vm_flags_t consistently". The VMA flags field vma->vm_flags is of type vm_flags_t. Right now this is exactly equivalent to unsigned long, but it should not be assumed to be. Much code that references vma->vm_flags already correctly uses vm_flags_t, but a fairly large chunk of code simply uses unsigned long and assumes that the two are equivalent. This series corrects that and has us use vm_flags_t consistently. This series is motivated by the desire to, in a future series, adjust vm_flags_t to be a u64 regardless of whether the kernel is 32-bit or 64-bit in order to deal with the VMA flag exhaustion issue and avoid all the various problems that arise from it (being unable to use certain features in 32-bit, being unable to add new flags except for 64-bit, etc.) This is therefore a critical first step towards that goal. At any rate, using the correct type is of value regardless. We additionally take the opportunity to refer to VMA flags as vm_flags where possible to make clear what we're referring to. Overall, this series does not introduce any functional change. This patch (of 3): We abstract the type of the VMA flags to vm_flags_t, however in may places it is simply assumed this is unsigned long, which is simply incorrect. At the moment this is simply an incongruity, however in future we plan to change this type and therefore this change is a critical requirement for doing so. Overall, this patch does not introduce any functional change. [lorenzo.stoakes@oracle.com: add missing vm_get_page_prot() instance, remove include] Link: https://lkml.kernel.org/r/552f88e1-2df8-4e95-92b8-812f7c8db829@lucifer.local Link: https://lkml.kernel.org/r/cover.1750274467.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/a12769720a2743f235643b158c4f4f0a9911daf0.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09tools/testing/radix-tree: test maple tree chaining mas_preallocate() callsLiam R. Howlett
Testing calling multiple mas_preallocate() calls in a row after adjusting the maple state. Ensures new calls to mas_preallocate() will change the number of allocated nodes. Link: https://lkml.kernel.org/r/20250616184521.3382795-4-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Hailong Liu <hailong.liu@oppo.com> Cc: "Liam R. Howlett" <howlett@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09testing/radix-tree/maple: increase readers and reduce delay for faster machinesLiam R. Howlett
Faster machines may not see the initial or updated value in the race condition. Reduce the delay so that faster machines are less likely to fail testing of the race conditions. Link: https://lkml.kernel.org/r/20250616184521.3382795-2-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <howlett@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Hailong Liu <hailong.liu@oppo.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftest/mm: skip if fallocate() is unsupported in gup_longtermMark Brown
Currently gup_longterm assumes that filesystems support fallocate() and uses that to allocate space in files, however this is an optional feature and is in particular not implemented by NFSv3 which is commonly used in CI systems leading to spurious failures. Check for lack of support and report a skip instead for that case. Link: https://lkml.kernel.org/r/20250613-selftest-mm-gup-longterm-fallocate-nfs-v1-1-758a104c175f@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm/vma: use vmg->target to specify target VMA for new VMA mergeLorenzo Stoakes
In commit 3a75ccba047b ("mm: simplify vma merge structure and expand comments") we introduced the vmg->target field to make the merging of existing VMAs simpler - clarifying precisely which VMA would eventually become the merged VMA once the merge operation was complete. New VMA merging did not get quite the same treatment, retaining the rather confusing convention of storing the target VMA in vmg->middle. This patch corrects this state of affairs, utilising vmg->target for this purpose for both vma_merge_new_range() and also for vma_expand(). We retain the WARN_ON for vmg->middle being specified in vma_merge_new_range() as doing so would make no sense, but add an additional debug assert for setting vmg->target. This patch additionally updates VMA userland testing to account for this change. [lorenzo.stoakes@oracle.com: make comment consistent in vma_expand()] Link: https://lkml.kernel.org/r/c54f45e3-a6ac-4749-93c0-cc9e3080ee37@lucifer.local Link: https://lkml.kernel.org/r/20250613184807.108089-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftests: mm: add shmem collapse as a default test itemBaolin Wang
Currently, we only test anonymous memory collapse by default. We should also add shmem collapse as a default test item to catch issues that could break the test cases. Link: https://lkml.kernel.org/r/a30b1529b399f2e649b5a05c3d352f41a68faeae.1749779183.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Tested-by: Dev Jain <dev.jain@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Barry Song <baohua@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftests: khugepaged: fix the shmem collapse failureBaolin Wang
When running the khugepaged selftest for shmem (./khugepaged all:shmem), I encountered the following test failures: : Run test: collapse_full (khugepaged:shmem) : Collapse multiple fully populated PTE table.... Fail : ... : Run test: collapse_single_pte_entry (khugepaged:shmem) : Collapse PTE table with single PTE entry present.... Fail : ... : Run test: collapse_full_of_compound (khugepaged:shmem) : Allocate huge page... OK : Split huge page leaving single PTE page table full of compound pages... OK : Collapse PTE table full of compound pages.... Fail The reason for the failure is that it will set MADV_NOHUGEPAGE to prevent khugepaged from continuing to scan shmem VMA after khugepaged finishes scanning in the wait_for_scan() function. Moreover, shmem requires a refault to establish PMD mappings. However, after commit 2b0f922323cc ("mm: don't install PMD mappings when THPs are disabled by the hw/process/vma"), PMD mappings are prevented if the VMA is set with MADV_NOHUGEPAGE flag, so shmem cannot establish PMD mappings during refault. One way to fix this issue is to move the MADV_NOHUGEPAGE setting after the shmem refault. After shmem refault and check huge, the test case will unmap the shmem immediately. So it seems unnecessary to set the MADV_NOHUGEPAGE. Then we can simply drop the MADV_NOHUGEPAGE setting, and all khugepaged test cases passed. Link: https://lkml.kernel.org/r/d8502fc50d0304c2afd27ced062b1d636b7a872e.1749779183.git.baolin.wang@linux.alibaba.com Fixes: 2b0f922323cc ("mm: don't install PMD mappings when THPs are disabled by the hw/process/vma") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Tested-by: Dev Jain <dev.jain@arm.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftests/mm: use generic read_sysfs in thuge-gen testPu Lehui
As generic read_sysfs is available in vm_utils, let's use is in thuge-gen test. Link: https://lkml.kernel.org/r/20250611100106.1331197-1-pulehui@huaweicloud.com Signed-off-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftests/mm: check for YAMA ptrace_scope configuraiton before modifying itMark Brown
When running the memfd_secret test run_vmtests.sh unconditionally tries to confgiure the YAMA LSM's ptrace_scope configuration, leading to an error if YAMA is not in the running kernel: # ./run_vmtests.sh: line 432: /proc/sys/kernel/yama/ptrace_scope: No such file or directory # # ---------------------- # # running ./memfd_secret # # ---------------------- Check that this file is present before trying to write to it. The indentation here is a bit odd, and it doesn't seem great that we configure but don't restore ptrace_scope. Link: https://lkml.kernel.org/r/20250610-selftest-mm-enable-yama-v1-1-0097b6713116@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftests/mm: add messages about test errors to the cow testsMark Brown
It is not sufficiently clear what the individual tests in the cow test program are checking so add messages for the failure cases. Link: https://lkml.kernel.org/r/20250610-selftest-mm-cow-tweaks-v1-4-43cd7457500f@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftests/mm: don't compare return values to in cowMark Brown
Tweak the coding style for checking for non-zero return values. While we're at it also remove a now redundant oring of the madvise() return code. Link: https://lkml.kernel.org/r/20250610-selftest-mm-cow-tweaks-v1-3-43cd7457500f@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09selftests/mm: convert some cow error reports to ksft_perror()Mark Brown
This prints the errno and a string decode of it. Link: https://lkml.kernel.org/r/20250610-selftest-mm-cow-tweaks-v1-2-43cd7457500f@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09kselftest/mm: clarify errors for pipe()Mark Brown
Patch series "selftests/mm: Tweaks to the cow test". A collection of non-functional updates from David Hildenbrand's review. This patch (of 4): Specify that errors reported from pipe() failures are the result of failures. Link: https://lkml.kernel.org/r/20250610-selftest-mm-cow-tweaks-v1-0-43cd7457500f@kernel.org Link: https://lkml.kernel.org/r/20250610-selftest-mm-cow-tweaks-v1-1-43cd7457500f@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09tools/testing/selftests: add VMA merge tests for KSM mergeLorenzo Stoakes
Add test to assert that we have now allowed merging of VMAs when KSM merging-by-default has been set by prctl(PR_SET_MEMORY_MERGE, ...). We simply perform a trivial mapping of adjacent VMAs expecting a merge, however prior to recent changes implementing this mode earlier than before, these merges would not have succeeded. Assert that we have fixed this! Link: https://lkml.kernel.org/r/6dec7aabf062c6b121cfac992c9c716cefdda00c.1748537921.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Tested-by: Chengming Zhou <chengming.zhou@linux.dev> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Stefan Roesch <shr@devkernel.io> Cc: Xu Xin <xu.xin16@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: prevent KSM from breaking VMA merging for new VMAsLorenzo Stoakes
If a user wishes to enable KSM mergeability for an entire process and all fork/exec'd processes that come after it, they use the prctl() PR_SET_MEMORY_MERGE operation. This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set (in order to indicate they are KSM mergeable), as well as setting this flag for all existing VMAs and propagating this across fork/exec. However it also breaks VMA merging for new VMAs, both in the process and all forked (and fork/exec'd) child processes. This is because when a new mapping is proposed, the flags specified will never have VM_MERGEABLE set. However all adjacent VMAs will already have VM_MERGEABLE set, rendering VMAs unmergeable by default. To work around this, we try to set the VM_MERGEABLE flag prior to attempting a merge. In the case of brk() this can always be done. However on mmap() things are more complicated - while KSM is not supported for MAP_SHARED file-backed mappings, it is supported for MAP_PRIVATE file-backed mappings. These mappings may have deprecated .mmap() callbacks specified which could, in theory, adjust flags and thus KSM eligibility. So we check to determine whether this is possible. If not, we set VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the previous behaviour. This fixes VMA merging for all new anonymous mappings, which covers the majority of real-world cases, so we should see a significant improvement in VMA mergeability. For MAP_PRIVATE file-backed mappings, those which implement the .mmap_prepare() hook and shmem are both known to be safe, so we allow these, disallowing all other cases. Also add stubs for newly introduced function invocations to VMA userland testing. [lorenzo.stoakes@oracle.com: correctly invoke late KSM check after mmap hook] Link: https://lkml.kernel.org/r/5861f8f6-cf5a-4d82-a062-139fb3f9cddb@lucifer.local Link: https://lkml.kernel.org/r/3ba660af716d87a18ca5b4e635f2101edeb56340.1748537921.git.lorenzo.stoakes@oracle.com Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport! Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Xu Xin <xu.xin16@zte.com.cn> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09tools/mm: add script to display page state for a given PID and VADDRYe Liu
Introduces a new drgn script, `show_page_info.py`, which allows users to analyze the state of a page given a process ID (PID) and a virtual address (VADDR). This can help kernel developers or debuggers easily inspect page-related information in a live kernel or vmcore. The script extracts information such as the page flags, mapping, and other metadata relevant to diagnosing memory issues. Output example: sudo ./show_page_info.py 1 0x7fc988181000 PID: 1 Comm: systemd mm: 0xffff8d22c4089700 RAW: 0017ffffc000416c fffff939062ff708 fffff939062ffe08 ffff8d23062a12a8 RAW: 0000000000000000 ffff8d2323438f60 0000002500000007 ffff8d23203ff500 Page Address: 0xfffff93905664e00 Page Flags: PG_referenced|PG_uptodate|PG_lru|PG_head|PG_active| PG_private|PG_reported|PG_has_hwpoisoned Page Size: 4096 Page PFN: 0x159938 Page Physical: 0x159938000 Page Virtual: 0xffff8d2319938000 Page Refcount: 37 Page Mapcount: 7 Page Index: 0x0 Page Memcg Data: 0xffff8d23203ff500 Memcg Name: init.scope Memcg Path: /sys/fs/cgroup/memory/init.scope Page Mapping: 0xffff8d23062a12a8 Page Anon/File: File Page VMA: 0xffff8d22e06e0e40 VMA Start: 0x7fc988181000 VMA End: 0x7fc988185000 This page is part of a compound page. This page is the head page of a compound page. Head Page: 0xfffff93905664e00 Compound Order: 2 Number of Pages: 4 Link: https://lkml.kernel.org/r/20250530055855.687067-1-ye.liu@linux.dev Signed-off-by: Ye Liu <liuye@kylinos.cn> Tested-by: SeongJae Park <sj@kernel.org> Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Omar Sandoval <osandov@osandov.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09kallsyms: fix build without execinfoAchill Gilgenast
Some libc's like musl libc don't provide execinfo.h since it's not part of POSIX. In order to fix compilation on musl, only include execinfo.h if available (HAVE_BACKTRACE_SUPPORT) This was discovered with c104c16073b7 ("Kunit to check the longest symbol length") which starts to include linux/kallsyms.h with Alpine Linux' configs. Link: https://lkml.kernel.org/r/20250622014608.448718-1-fossdd@pwned.life Fixes: c104c16073b7 ("Kunit to check the longest symbol length") Signed-off-by: Achill Gilgenast <fossdd@pwned.life> Cc: Luis Henriques <luis@igalia.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-06Merge tag 'objtool_urgent_for_v6.16_rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Borislav Petkov: - Fix the compilation of an x86 kernel on a big engian machine due to a missed endianness conversion * tag 'objtool_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Add missing endian conversion to read_annotate()
2025-07-06Merge tag 'locking_urgent_for_v6.16_rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: - Disable FUTEX_PRIVATE_HASH for this cycle due to a performance regression - Add a selftests compilation product to the corresponding .gitignore file * tag 'locking_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/futex: Add futex_numa to .gitignore futex: Temporary disable FUTEX_PRIVATE_HASH
2025-07-06selftests/futex: Add futex_numa to .gitignoreTerry Tritton
futex_numa was never added to the .gitignore file. Add it. Fixes: 9140f57c1c13 ("futex,selftests: Add another FUTEX2_NUMA selftest") Signed-off-by: Terry Tritton <terry.tritton@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/all/20250704103749.10341-1-terry.tritton@linaro.org
2025-07-04Merge tag 'vfs-6.16-rc5.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a regression caused by the anonymous inode rework. Making them regular files causes various places in the kernel to tip over starting with io_uring. Revert to the former status quo and port our assertion to be based on checking the inode so we don't lose the valuable VFS_*_ON_*() assertions that have already helped discover weird behavior our outright bugs. - Fix the the upper bound calculation in fuse_fill_write_pages() - Fix priority inversion issues in the eventpoll code - Make secretmen use anon_inode_make_secure_inode() to avoid bypassing the LSM layer - Fix a netfs hang due to missing case in final DIO read result collection - Fix a double put of the netfs_io_request struct - Provide some helpers to abstract out NETFS_RREQ_IN_PROGRESS flag wrangling - Fix infinite looping in netfs_wait_for_pause/request() - Fix a netfs ref leak on an extra subrequest inserted into a request's list of subreqs - Fix various cifs RPC callbacks to set NETFS_SREQ_NEED_RETRY if a subrequest fails retriably - Fix a cifs warning in the workqueue code when reconnecting a channel - Fix the updating of i_size in netfs to avoid a race between testing if we should have extended the file with a DIO write and changing i_size - Merge the places in netfs that update i_size on write - Fix coredump socket selftests * tag 'vfs-6.16-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: anon_inode: rework assertions netfs: Update tracepoints in a number of ways netfs: Renumber the NETFS_RREQ_* flags to make traces easier to read netfs: Merge i_size update functions netfs: Fix i_size updating smb: client: set missing retry flag in cifs_writev_callback() smb: client: set missing retry flag in cifs_readv_callback() smb: client: set missing retry flag in smb2_writev_callback() netfs: Fix ref leak on inserted extra subreq in write retry netfs: Fix looping in wait functions netfs: Provide helpers to perform NETFS_RREQ_IN_PROGRESS flag wangling netfs: Fix double put of request netfs: Fix hang due to missing case in final DIO read result collection eventpoll: Fix priority inversion problem fuse: fix fuse_fill_write_pages() upper bound calculation fs: export anon_inode_make_secure_inode() and fix secretmem LSM bypass selftests/coredump: Fix "socket_detect_userspace_client" test failure
2025-07-02Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: "Some changes to the userspace selftest framework cause the iommufd tests to start failing. This turned out to be bugs in the iommufd side that were just getting uncovered. - Deal with MAP_HUGETLB mmaping more than requested even when in MAP_FIXED mode - Fixup missing error flow cleanup in the test - Check that the memory allocations suceeded - Suppress some bogus gcc 'may be used uninitialized' warnings" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd/selftest: Fix build warnings due to uninitialized mfd iommufd/selftest: Add asserts testing global mfd iommufd/selftest: Add missing close(mfd) in memfd_mmap() iommufd/selftest: Fix iommufd_dirty_tracking with large hugepage sizes
2025-07-01objtool: Add missing endian conversion to read_annotate()Heiko Carstens
Trying to compile an x86 kernel on big endian results in this error: net/ipv4/netfilter/iptable_nat.o: warning: objtool: iptable_nat_table_init+0x150: Unknown annotation type: 50331648 make[5]: *** [scripts/Makefile.build:287: net/ipv4/netfilter/iptable_nat.o] Error 255 Reason is a missing endian conversion in read_annotate(). Add the missing conversion to fix this. Fixes: 2116b349e29a ("objtool: Generic annotation infrastructure") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250630131230.4130185-1-hca@linux.ibm.com
2025-06-28Merge tag 'loongarch-fixes-6.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: - replace __ASSEMBLY__ with __ASSEMBLER__ in headers like others - fix build warnings about export.h - reserve the EFI memory map region for kdump - handle __init vs inline mismatches - fix some KVM bugs * tag 'loongarch-fixes-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Disable updating of "num_cpu" and "feature" LoongArch: KVM: Check validity of "num_cpu" from user space LoongArch: KVM: Check interrupt route from physical CPU LoongArch: KVM: Fix interrupt route update with EIOINTC LoongArch: KVM: Add address alignment check for IOCSR emulation LoongArch: KVM: Avoid overflow with array index LoongArch: Handle KCOV __init vs inline mismatches LoongArch: Reserve the EFI memory map region LoongArch: Fix build warnings about export.h LoongArch: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers
2025-06-27Merge tag 'mm-hotfixes-stable-2025-06-27-16-56' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes. 6 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 5 are for MM" * tag 'mm-hotfixes-stable-2025-06-27-16-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: add Lorenzo as THP co-maintainer mailmap: update Duje Mihanović's email address selftests/mm: fix validate_addr() helper crashdump: add CONFIG_KEYS dependency mailmap: correct name for a historical account of Zijun Hu mailmap: add entries for Zijun Hu fuse: fix runtime warning on truncate_folio_batch_exceptionals() scripts/gdb: fix dentry_name() lookup mm/damon/sysfs-schemes: free old damon_sysfs_scheme_filter->memcg_path on write mm/alloc_tag: fix the kmemleak false positive issue in the allocation of the percpu variable tag->counters lib/group_cpus: fix NULL pointer dereference from group_cpus_evenly() mm/hugetlb: remove unnecessary holding of hugetlb_lock MAINTAINERS: add missing files to mm page alloc section MAINTAINERS: add tree entry to mm init block mm: add OOM killer maintainer structure fs/proc/task_mmu: fix PAGE_IS_PFNZERO detection for the huge zero folio
2025-06-27Merge tag 'block-6.16-20250626' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fixes for ublk: - fix C++ narrowing warnings in the uapi header - update/improve UBLK_F_SUPPORT_ZERO_COPY comment in uapi header - fix for the ublk ->queue_rqs() implementation, limiting a batch to just the specific task AND ring - ublk_get_data() error handling fix - sanity check more arguments in ublk_ctrl_add_dev() - selftest addition - NVMe pull request via Christoph: - reset delayed remove_work after reconnect - fix atomic write size validation - Fix for a warning introduced in bdev_count_inflight_rw() in this merge window * tag 'block-6.16-20250626' of git://git.kernel.dk/linux: block: fix false warning in bdev_count_inflight_rw() ublk: sanity check add_dev input for underflow nvme: fix atomic write size validation nvme: refactor the atomic write unit detection nvme: reset delayed remove_work after reconnect ublk: setup ublk_io correctly in case of ublk_get_data() failure ublk: update UBLK_F_SUPPORT_ZERO_COPY comment in UAPI header ublk: fix narrowing warnings in UAPI header selftests: ublk: don't take same backing file for more than one ublk devices ublk: build batch from IOs in same io_ring_ctx and io task