summaryrefslogtreecommitdiff
path: root/tools/testing/selftests/mm
AgeCommit message (Collapse)Author
2025-05-31Merge tag 'mm-nonmm-stable-2025-05-31-15-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "hung_task: extend blocking task stacktrace dump to semaphore" from Lance Yang enhances the hung task detector. The detector presently dumps the blocking tasks's stack when it is blocked on a mutex. Lance's series extends this to semaphores - "nilfs2: improve sanity checks in dirty state propagation" from Wentao Liang addresses a couple of minor flaws in nilfs2 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn fixes a couple of issues in the gdb scripts - "Support kdump with LUKS encryption by reusing LUKS volume keys" from Coiby Xu addresses a usability problem with kdump. When the dump device is LUKS-encrypted, the kdump kernel may not have the keys to the encrypted filesystem. A full writeup of this is in the series [0/N] cover letter - "sysfs: add counters for lockups and stalls" from Max Kellermann adds /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and /sys/kernel/rcu_stall_count - "fork: Page operation cleanups in the fork code" from Pasha Tatashin implements a number of code cleanups in fork.c - "scripts/gdb/symbols: determine KASLR offset on s390 during early boot" from Ilya Leoshkevich fixes some s390 issues in the gdb scripts * tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits) llist: make llist_add_batch() a static inline delayacct: remove redundant code and adjust indentation squashfs: add optional full compressed block caching crash_dump, nvme: select CONFIGFS_FS as built-in scripts/gdb/symbols: determine KASLR offset on s390 during early boot scripts/gdb/symbols: factor out pagination_off() scripts/gdb/symbols: factor out get_vmlinux() kernel/panic.c: format kernel-doc comments mailmap: update and consolidate Casey Connolly's name and email nilfs2: remove wbc->for_reclaim handling fork: define a local GFP_VMAP_STACK fork: check charging success before zeroing stack fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code fork: clean-up ifdef logic around stack allocation kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count x86/crash: make the page that stores the dm crypt keys inaccessible x86/crash: pass dm crypt keys to kdump kernel Revert "x86/mm: Remove unused __set_memory_prot()" crash_dump: retrieve dm crypt keys in kdump kernel ...
2025-05-22selftests/mm: deduplicate second mmap() of 5*PAGE_SIZE at baseMark Brown
The map_fixed_noreplace test does two blocks of test starting from a mapping of 5 pages at the base address, logging a test result for each initial mapping. These are logged with the same test name, causing test automation software to see two reports for the same test in a single run. Tweak the log message for the second one to deduplicate. Link: https://lkml.kernel.org/r/20250518-selftests-mm-map-fixed-noreplace-dup-v1-1-1a11a62c5e9f@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-22selftests/mm: add simple VM_PFNMAP tests based on mmap'ing /dev/memDavid Hildenbrand
Let's test some basic functionality using /dev/mem. These tests will implicitly cover some PAT (Page Attribute Handling) handling on x86. These tests will only run when /dev/mem access to the first two pages in physical address space is possible and allowed; otherwise, the tests are skipped. On current x86-64 with PAT inside a VM, all tests pass: TAP version 13 1..6 # Starting 6 tests from 1 test cases. # RUN pfnmap.madvise_disallowed ... # OK pfnmap.madvise_disallowed ok 1 pfnmap.madvise_disallowed # RUN pfnmap.munmap_split ... # OK pfnmap.munmap_split ok 2 pfnmap.munmap_split # RUN pfnmap.mremap_fixed ... # OK pfnmap.mremap_fixed ok 3 pfnmap.mremap_fixed # RUN pfnmap.mremap_shrink ... # OK pfnmap.mremap_shrink ok 4 pfnmap.mremap_shrink # RUN pfnmap.mremap_expand ... # OK pfnmap.mremap_expand ok 5 pfnmap.mremap_expand # RUN pfnmap.fork ... # OK pfnmap.fork ok 6 pfnmap.fork # PASSED: 6 / 6 tests passed. # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 However, we are able to trigger: [ 27.888251] x86/PAT: pfnmap:1790 freeing invalid memtype [mem 0x00000000-0x00000fff] There are probably more things worth testing in the future, such as MAP_PRIVATE handling. But this set of tests is sufficient to cover most of the things we will rework regarding PAT handling. Link: https://lkml.kernel.org/r/20250509153033.952746-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12mm/selftests: add a test to verify mmap_changing race with -EAGAINPeter Xu
Add an unit test to verify the recent mmap_changing ABI breakage. Note that I used some tricks here and there to make the test simple, e.g. I abused UFFDIO_MOVE on top of shmem with the fact that I know what I want to test will be even earlier than the vma type check. Rich comments were added to explain trivial details. Before that fix, -EAGAIN would have been written to the copy field most of the time but not always; the test should be able to reliably trigger the outlier case. After the fix, it's written always, the test verifies that making sure corresponding field (e.g. copy.copy for UFFDIO_COPY) is updated. [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20250424215729.194656-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12selftests/mm: use long for dwRegionSizeSiddarth G
Change the type of 'dwRegionSize' in wp_init() and wp_free() from int to long to match callers that pass long or unsigned long long values. wp_addr_range function is left unchanged because it passes 'dwRegionSize' parameter directly to pagemap_ioctl, which expects an int. This patch does not fix any actual known issues. It aligns parameter types with their actual usage and avoids any potential future issues. Link: https://lkml.kernel.org/r/20250427102639.39978-1-siddarthsgml@gmail.com Signed-off-by: Siddarth G <siddarthsgml@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftests: fix some typos in tools/testing/selftestsChelsy Ratnawat
Fix multiple spelling errors: - "rougly" -> "roughly" - "fielesystems" -> "filesystems" - "Can'" -> "Can't" Link: https://lkml.kernel.org/r/20250503211959.507815-1-chelsyratnawat2001@gmail.com Signed-off-by: Chelsy Ratnawat <chelsyratnawat2001@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftests/mm: restore default nr_hugepages value during cleanup in ↵Donet Tom
hugetlb_reparenting_test.sh During cleanup, the value of /proc/sys/vm/nr_hugepages is currently being set to 0. At the end of the test, if all tests pass, the original nr_hugepages value is restored. However, if any test fails, it remains set to 0. With this patch, we ensure that the original nr_hugepages value is restored during cleanup, regardless of whether the test passes or fails. Link: https://lkml.kernel.org/r/20250410100748.2310-1-donettom@linux.ibm.com Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Cc: Li Wang <liwang@redhat.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11tools/testing/selftests: assert that anon merge cases behave as expectedLorenzo Stoakes
Prior to the recently applied commit that permits this merge, mprotect()'ing a faulted VMA, adjacent to an unfaulted VMA, such that the two share characteristics would fail to merge due to what appear to be unintended consequences of commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs"). Now we have fixed this bug, assert that we can indeed merge anonymous VMAs this way. Also assert that forked source/target VMAs are equally rejected. Previously, all empty target anon merges with one VMA faulted and the other unfaulted would be rejected incorrectly, now we ensure that unforked merge, but forked do not. Additionally, add the new test file to the MEMORY MAPPING section in MAINTAINERS, as these tests are explicitly memory mapping related. Link: https://lkml.kernel.org/r/2b69330274a3b71721f7042c5eabe91143934415.1744104124.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11tools/testing: add PROCMAP_QUERY helper functions in mm self testsLorenzo Stoakes
The PROCMAP_QUERY ioctl() is very useful - it allows for binary access to /proc/$pid/[s]maps data and thus convenient lookup of data contained there. This patch exposes this for convenient use by mm self tests so the state of VMAs can easily be queried. Link: https://lkml.kernel.org/r/ce83d877093d1fc594762cf4b82f0c27963030ee.1744104124.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftest/mm: make hugetlb_reparenting_test tolerant to async reparentingLi Wang
In cgroup v2, memory and hugetlb usage reparenting is asynchronous. This can cause test flakiness when immediately asserting usage after deleting a child cgroup. To address this, add a helper function `assert_with_retry()` that checks usage values with a timeout-based retry. This improves test stability without relying on fixed sleep delays. Also bump up the tolerance size to 7MB. To avoid False Positives: ... # Assert memory charged correctly for child only use. # actual a = 11 MB # expected a = 0 MB # fail # cleanup # [FAIL] not ok 11 hugetlb_reparenting_test.sh -cgroup-v2 # exit=1 # 0 # SUMMARY: PASS=10 SKIP=0 FAIL=1 Link: https://lkml.kernel.org/r/20250407084201.74492-1-liwang@redhat.com Signed-off-by: Li Wang <liwang@redhat.com> Tested-by: Donet Tom <donettom@linux.ibm.com> Cc: Waiman Long <longman@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftests/mm: add PAGEMAP_SCAN guard region testAndrei Vagin
Add a selftest to verify the PAGEMAP_SCAN ioctl correctly reports guard regions using the newly introduced PAGE_IS_GUARD flag. Link: https://lkml.kernel.org/r/20250324065328.107678-4-avagin@google.com Signed-off-by: Andrei Vagin <avagin@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftests/mm: convert page_size to unsigned longSiddarth G
Cppcheck warning: int result is assigned to long long variable. If the variable is long long to avoid loss of information, then you have loss of information. This patch changes the type of page_size from 'unsigned int' to 'unsigned long' instead of using ULL suffixes. Changing hpage_size to 'unsigned long' was considered, but since gethugepage() expects an int, this change was avoided. Link: https://lkml.kernel.org/r/20250403101345.29226-1-siddarthsgml@gmail.com Signed-off-by: Siddarth G <siddarthsgml@gmail.com> Reported-by: David Binderman <dcb314@hotmail.com> Closes: https://lore.kernel.org/all/AS8PR02MB10217315060BBFDB21F19643E9CA62@AS8PR02MB10217.eurprd02.prod.outlook.com/ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-07selftests/mm: fix a build failure on powerpcNysal Jan K.A.
The compiler is unaware of the size of code generated by the ".rept" assembler directive. This results in the compiler emitting branch instructions where the offset to branch to exceeds the maximum allowed value, resulting in build failures like the following: CC protection_keys /tmp/ccypKWAE.s: Assembler messages: /tmp/ccypKWAE.s:2073: Error: operand out of range (0x0000000000020158 is not between 0xffffffffffff8000 and 0x0000000000007ffc) /tmp/ccypKWAE.s:2509: Error: operand out of range (0x0000000000020130 is not between 0xffffffffffff8000 and 0x0000000000007ffc) Fix the issue by manually adding nop instructions using the preprocessor. Link: https://lkml.kernel.org/r/20250428131937.641989-2-nysal@linux.ibm.com Fixes: 46036188ea1f ("selftests/mm: build with -O2") Reported-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Tested-by: Donet Tom <donettom@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-07selftests/mm: fix build break when compiling pkey_util.cMadhavan Srinivasan
Commit 50910acd6f615 ("selftests/mm: use sys_pkey helpers consistently") added a pkey_util.c to refactor some of the protection_keys functions accessible by other tests. But this broken the build in powerpc in two ways, pkey-powerpc.h: In function `arch_is_powervm': pkey-powerpc.h:73:21: error: storage size of `buf' isn't known 73 | struct stat buf; | ^~~ pkey-powerpc.h:75:14: error: implicit declaration of function `stat'; did you mean `strcat'? [-Wimplicit-function-declaration] 75 | if ((stat("/sys/firmware/devicetree/base/ibm,partition-name", &buf) == 0) && | ^~~~ | strcat Since pkey_util.c includes pkeys-helper.h, which in turn includes pkeys-powerpc.h, stat.h including is missing for "struct stat". This is fixed by adding "sys/stat.h" in pkeys-powerpc.h Secondly, pkey-powerpc.h:55:18: warning: format `%llx' expects argument of type `long long unsigned int', but argument 3 has type `u64' {aka `long unsigned int'} [-Wformat=] 55 | dprintf4("%s() changing %016llx to %016llx\n", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 56 | __func__, __read_pkey_reg(), pkey_reg); | ~~~~~~~~~~~~~~~~~ | | | u64 {aka long unsigned int} pkey-helpers.h:63:32: note: in definition of macro `dprintf_level' 63 | sigsafe_printf(args); \ | ^~~~ These format specifier related warning are removed by adding "__SANE_USERSPACE_TYPES__" to pkeys_utils.c. Link: https://lkml.kernel.org/r/20250428131937.641989-1-nysal@linux.ibm.com Fixes: 50910acd6f61 ("selftests/mm: use sys_pkey helpers consistently") Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-07tools/testing/selftests: fix guard region test tmpfs assumptionLorenzo Stoakes
The current implementation of the guard region tests assume that /tmp is mounted as tmpfs, that is shmem. This isn't always the case, and at least one instance of a spurious test failure has been reported as a result. This assumption is unsafe, rushed and silly - and easily remedied by simply using memfd, so do so. We also have to fixup the readonly_file test to explicitly only be applicable to file-backed cases. Link: https://lkml.kernel.org/r/20250425162436.564002-1-lorenzo.stoakes@oracle.com Fixes: 272f37d3e99a ("tools/selftests: expand all guard region tests to file-backed") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Ryan Roberts <ryan.roberts@arm.com> Closes: https://lore.kernel.org/linux-mm/a2d2766b-0ab4-437b-951a-8595a7506fe9@arm.com/ Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-07selftests/mm: compaction_test: support platform with huge mount of memoryFeng Tang
When running mm selftest to verify mm patches, 'compaction_test' case failed on an x86 server with 1TB memory. And the root cause is that it has too much free memory than what the test supports. The test case tries to allocate 100000 huge pages, which is about 200 GB for that x86 server, and when it succeeds, it expects it's large than 1/3 of 80% of the free memory in system. This logic only works for platform with 750 GB ( 200 / (1/3) / 80% ) or less free memory, and may raise false alarm for others. Fix it by changing the fixed page number to self-adjustable number according to the real number of free memory. Link: https://lkml.kernel.org/r/20250423103645.2758-1-feng.tang@linux.alibaba.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Acked-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@inux.alibaba.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11selftests/mm: fix compiler -Wmaybe-uninitialized warningAnshuman Khandual
Following build warning comes up for cow test as 'transferred' variable has not been initialized. Fix the warning via zero init for the variable. CC cow cow.c: In function `do_test_vmsplice_in_parent': cow.c:365:61: warning: `transferred' may be used uninitialized [-Wmaybe-uninitialized] 365 | cur = read(fds[0], new + total, transferred - total); | ~~~~~~~~~~~~^~~~~~~ cow.c:296:29: note: `transferred' was declared here 296 | ssize_t cur, total, transferred; | ^~~~~~~~~~~ CC compaction_test CC gup_longterm Link: https://lkml.kernel.org/r/20250409095006.1422620-1-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11selftests/mm: generate a temporary mountpoint for cgroup filesystemMark Brown
Currently if the filesystem for the cgroups version it wants to use is not mounted charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh tests will attempt to mount it on the hard coded path /dev/cgroup/memory, deleting that directory when the test finishes. This will fail if there is not a preexisting directory at that path, and since the directory is deleted subsequent runs of the test will fail. Instead of relying on this hard coded directory name use mktemp to generate a temporary directory to use as a mountpoint, fixing both the assumption and the disruption caused by deleting a preexisting directory. This means that if the relevant cgroup filesystem is not already mounted then we rely on having coreutils (which provides mktemp) installed. I suspect that many current users are relying on having things automounted by default, and given that the script relies on bash it's probably not an unreasonable requirement. Link: https://lkml.kernel.org/r/20250404-kselftest-mm-cgroup2-detection-v1-1-3dba6d32ba8c@kernel.org Fixes: 209376ed2a84 ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Aishwarya TCV <aishwarya.tcv@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01selftest/mm: va_high_addr_switch: add ppc64 support checkLi Wang
Add PPC64 Radix MMU support to the va_high_addr_switch.sh by introducing check_supported_ppc64(). The function verifies: - 5-level paging (PGTABLE_LEVELS >= 5) enable in kernel config - Radix MMU (required for PPC64 5-level translation) - HugePages availability (needed for some tests) If any check fails, the test is skipped (ksft_skip). This ensures compatibility with Power9/Power10 systems running in Radix MMU mode. Avoid failures on 4-level paging system: # mmap(NULL, MAP_HUGETLB): 0xffffffffffffffff - FAILED # mmap(LOW_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED # mmap(HIGH_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED # mmap(HIGH_ADDR, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED # mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED # mmap(-1, MAP_HUGETLB): 0xffffffffffffffff - FAILED # mmap(-1, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED # mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2*HUGETLB_SIZE, MAP_HUGETLB): 0xffffffffffffffff - FAILED # mmap(ADDR_SWITCH_HINT , 2*HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED Link: https://lkml.kernel.org/r/20250327114813.25980-1-liwang@redhat.com Signed-off-by: Li Wang <liwang@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01Merge tag 'mm-stable-2025-03-30-16-52' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was found to be incorrect. - The series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. * tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits) mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex() x86/mm: restore early initialization of high_memory for 32-bits mm/vmscan: don't try to reclaim hwpoison folio mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper cgroup: docs: add pswpin and pswpout items in cgroup v2 doc mm: vmscan: split proactive reclaim statistics from direct reclaim statistics selftests/mm: speed up split_huge_page_test selftests/mm: uffd-unit-tests support for hugepages > 2M docs/mm/damon/design: document active DAMOS filter type mm/damon: implement a new DAMOS filter type for active pages fs/dax: don't disassociate zero page entries MM documentation: add "Unaccepted" meminfo entry selftests/mm: add commentary about 9pfs bugs fork: use __vmalloc_node() for stack allocation docs/mm: Physical Memory: Populate the "Zones" section xen: balloon: update the NR_BALLOON_PAGES state hv_balloon: update the NR_BALLOON_PAGES state balloon_compaction: update the NR_BALLOON_PAGES state meminfo: add a per node counter for balloon drivers mm: remove references to folio in __memcg_kmem_uncharge_page() ...
2025-03-25Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Nothing major this time around. Apart from the usual perf/PMU updates, some page table cleanups, the notable features are average CPU frequency based on the AMUv1 counters, CONFIG_HOTPLUG_SMT and MOPS instructions (memcpy/memset) in the uaccess routines. Perf and PMUs: - Support for the 'Rainier' CPU PMU from Arm - Preparatory driver changes and cleanups that pave the way for BRBE support - Support for partial virtualisation of the Apple-M1 PMU - Support for the second event filter in Arm CSPMU designs - Minor fixes and cleanups (CMN and DWC PMUs) - Enable EL2 requirements for FEAT_PMUv3p9 Power, CPU topology: - Support for AMUv1-based average CPU frequency - Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It adds a generic topology_is_primary_thread() function overridden by x86 and powerpc New(ish) features: - MOPS (memcpy/memset) support for the uaccess routines Security/confidential compute: - Fix the DMA address for devices used in Realms with Arm CCA. The CCA architecture uses the address bit to differentiate between shared and private addresses - Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by default Memory management clean-ups: - Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs - Some minor page table accessor clean-ups - PIE/POE (permission indirection/overlay) helpers clean-up Kselftests: - MTE: skip hugetlb tests if MTE is not supported on such mappings and user correct naming for sync/async tag checking modes Miscellaneous: - Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people request) - Sysreg updates for new register fields - CPU type info for some Qualcomm Kryo cores" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits) arm64: mm: Don't use %pK through printk perf/arm_cspmu: Fix missing io.h include arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists arm64: cputype: Add MIDR_CORTEX_A76AE arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list arm64/sysreg: Enforce whole word match for open/close tokens arm64/sysreg: Fix unbalanced closing block arm64: Kconfig: Enable HOTPLUG_SMT arm64: topology: Support SMT control on ACPI based system arch_topology: Support SMT control for OF based system cpu/SMT: Provide a default topology_is_primary_thread() arm64/mm: Define PTDESC_ORDER perf/arm_cspmu: Add PMEVFILT2R support perf/arm_cspmu: Generalise event filtering perf/arm_cspmu: Move register definitons to header arm64/kernel: Always use level 2 or higher for early mappings arm64/mm: Drop PXD_TABLE_BIT arm64/mm: Check pmd_table() in pmd_trans_huge() ...
2025-03-24Merge tag 'vfs-6.15-rc1.pidfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs pidfs updates from Christian Brauner: - Allow retrieving exit information after a process has been reaped through pidfds via the new PIDFD_INTO_EXIT extension for the PIDFD_GET_INFO ioctl. Various tools need access to information about a process/task even after it has already been reaped. Pidfd polling allows waiting on either task exit or for a task to have been reaped. The contract for PIDFD_INFO_EXIT is simply that EPOLLHUP must be observed before exit information can be retrieved, i.e., exit information is only provided once the task has been reaped and then can be retrieved as long as the pidfd is open. - Add PIDFD_SELF_{THREAD,THREAD_GROUP} sentinels allowing userspace to forgo allocating a file descriptor for their own process. This is useful in scenarios where users want to act on their own process through pidfds and is akin to AT_FDCWD. - Improve premature thread-group leader and subthread exec behavior when polling on pidfds: (1) During a multi-threaded exec by a subthread, i.e., non-thread-group leader thread, all other threads in the thread-group including the thread-group leader are killed and the struct pid of the thread-group leader will be taken over by the subthread that called exec. IOW, two tasks change their TIDs. (2) A premature thread-group leader exit means that the thread-group leader exited before all of the other subthreads in the thread-group have exited. Both cases lead to inconsistencies for pidfd polling with PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the current thread-group leader may or may not see an exit notification on the file descriptor depending on when poll is performed. If the poll is performed before the exec of the subthread has concluded an exit notification is generated for the old thread-group leader. If the poll is performed after the exec of the subthread has concluded no exit notification is generated for the old thread-group leader. The correct behavior is to simply not generate an exit notification on the struct pid of a subhthread exec because the struct pid is taken over by the subthread and thus remains alive. But this is difficult to handle because a thread-group may exit premature as mentioned in (2). In that case an exit notification is reliably generated but the subthreads may continue to run for an indeterminate amount of time and thus also may exec at some point. After this pull no exit notifications will be generated for a PIDFD_THREAD pidfd for a thread-group leader until all subthreads have been reaped. If a subthread should exec before no exit notification will be generated until that task exits or it creates subthreads and repeates the cycle. This means an exit notification indicates the ability for the father to reap the child. * tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits) selftests/pidfd: third test for multi-threaded exec polling selftests/pidfd: second test for multi-threaded exec polling selftests/pidfd: first test for multi-threaded exec polling pidfs: improve multi-threaded exec and premature thread-group leader exit polling pidfs: ensure that PIDFS_INFO_EXIT is available selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest selftests/pidfd: add third PIDFD_INFO_EXIT selftest selftests/pidfd: add second PIDFD_INFO_EXIT selftest selftests/pidfd: add first PIDFD_INFO_EXIT selftest selftests/pidfd: expand common pidfd header pidfs/selftests: ensure correct headers for ioctl handling selftests/pidfd: fix header inclusion pidfs: allow to retrieve exit information pidfs: record exit code and cgroupid at exit pidfs: use private inode slab cache pidfs: move setting flags into pidfs_alloc_file() pidfd: rely on automatic cleanup in __pidfd_prepare() ...
2025-03-21selftests/mm: speed up split_huge_page_testRyan Roberts
create_pagecache_thp_and_fd() was previously writing a file sized at twice the PMD size by making a per-byte write syscall. This was quite slow when the PMD size is 4M, but completely intolerable for 32M (PMD size for arm64's 16K page size), and 512M (PMD size for arm64's 64K page size). The byte pattern has a 256 byte period, so let's create a 1K buffer and fill it with exactly 4 periods. Then we can write the buffer as many times as is required to fill the file. This makes things much more tolerable. The test now passes for 16K page size. It still fails for 64K page size because MAX_PAGECACHE_ORDER is too small for 512M folio size (I think). Link: https://lkml.kernel.org/r/20250318174343.243631-3-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Rafael Aquini <raquini@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21selftests/mm: uffd-unit-tests support for hugepages > 2MRyan Roberts
uffd-unit-tests uses a memory area with a fixed 32M size. Then it calculates the number of pages by dividing by page_size, which itself is either the base page size or the PMD huge page size depending on the test config. For the latter, we end up with nr_pages=1 for arm64 16K base pages, and nr_pages=0 for 64K base pages. This doesn't end well. So let's make the 32M size a floor and also ensure that we have at least 2 pages given the PMD size. With this change, the tests pass on arm64 64K base page size configuration. Link: https://lkml.kernel.org/r/20250318174343.243631-2-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Rafael Aquini <raquini@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21selftests/mm: add commentary about 9pfs bugsBrendan Jackman
As discussed here: https://lore.kernel.org/lkml/Z9RRkL1hom48z3Tt@google.com/ This code could benefit from some more commentary. To avoid needing to comment the same thing in multiple places (I guess more of these SKIPs will need to be added over time, for now I am only like 20% of the way through Project Run run_vmtests.sh Successfully), add a dummy "skip tests for this specific reason" function that basically just serves as a hook to hang comments on. Link: https://lkml.kernel.org/r/20250317-9pfs-comments-v1-1-9ac96043e146@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17selftests/mm/cow: fix the incorrect error handlingCyan Yang
Error handling doesn't check the correct return value. This patch will fix it. Link: https://lkml.kernel.org/r/20250312043840.71799-1-cyan.yang@sifive.com Fixes: f4b5fd6946e2 ("selftests/vm: anon_cow: THP tests") Signed-off-by: Cyan Yang <cyan.yang@sifive.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17selftests/mm: add tests for folio_split(), buddy allocator like splitZi Yan
It splits page cache folios to orders from 0 to 8 at different in-folio offset. Link: https://lkml.kernel.org/r/20250307174001.242794-9-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Kairui Song <kasong@tencent.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16tools/selftests: add guard region test for /proc/$pid/pagemapLorenzo Stoakes
Add a test to the guard region self tests to assert that the /proc/$pid/pagemap information now made availabile to the user correctly identifies and reports guard regions. As a part of this change, update vm_util.h to add the new bit (note there is no header file in the kernel where this is exposed, the user is expected to provide their own mask) and utilise the helper functions there for pagemap functionality. [lorenzo.stoakes@oracle.com: fixup define name] Link: https://lkml.kernel.org/r/32e83941-e6f5-42ee-9292-a44c16463cf1@lucifer.local Link: https://lkml.kernel.org/r/164feb0a43ae72650e6b20c3910213f469566311.1740139449.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm/mlock: print error on failureBrendan Jackman
It's not really possible to start diagnosing this without knowing the actual error. Also update the mlock2 helper to behave like libc would by setting errno and returning -1. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-12-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: skip mlock tests if nobody user can't read itBrendan Jackman
If running from a directory that can't be read by unprivileged users, executing on-fault-test via the nobody user will fail. The kselftest build does give the file the correct permissions, but after being installed it might be in a directory without global execute permissions. Since the script can't safely fix that, just skip if it happens. Note that the stderr of the `ls` command is unfiltered meaning the user sees a "permission denied" error that can help inform them why the test was skipped. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-11-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: ensure uffd-wp-mremap gets pages of each sizeBrendan Jackman
This test allocates a page of every available size and doesn't have any SKIP logic if the allocation fails. So, ensure it's available and skip the test if we can't do so. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-10-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: drop unnecessary sudo usageBrendan Jackman
This script must be run as root anyway (see all the writing to privileged files in /proc etc). Remove the unnecessary use of sudo to avoid breaking on single-user systems that don't have sudo. This also avoids confusing readers. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-9-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: skip gup_longterm tests on weird filesystemsBrendan Jackman
Some filesystems don't support ftruncate()ing unlinked files. They return ENOENT. In that case, skip the test. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-8-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: skip map_populate on weird filesystemsBrendan Jackman
It seems that 9pfs does not allow truncating unlinked files, Mark Brown has noted that NFS may also behave this way. It doesn't seem quite right to call this a "bug" but it's probably a special enough case that it makes sense for the test to just SKIP if it happens. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-7-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: don't fail uffd-stress if too many CPUsBrendan Jackman
This calculation divides a fixed parameter by an environment-dependent parameter i.e. the number of CPUs. The simple way to avoid machine-specific failures here is to just put a cap on the max value of the latter. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-6-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Suggested-by: Mateusz Guzik <mjguzik@gmail.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: print some details when uffd-stress gets bad paramsBrendan Jackman
So this can be debugged more easily. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-5-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm/uffd: rename nr_cpus -> nr_parallelBrendan Jackman
A later commit will bound this variable so it no longer necessarily matches the number of CPUs. Rename it appropriately. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-4-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: skip uffd-wp-mremap if userfaultfd not availableBrendan Jackman
It's obvious that this should fail in that case, but still, save the reader the effort of figuring out that they've run into this by just SKIPping Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-3-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: skip uffd-stress if userfaultfd not availableBrendan Jackman
It's pretty obvious that the test wouldn't work if you don't have the feature enabled. But, it's still useful to SKIP instead of failing so the reader can immediately tell that this is the reason why. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-2-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: report errno when things fail in gup_longtermBrendan Jackman
Patch series "selftests/mm: Some cleanups from trying to run them", v4. I never had much luck running mm selftests so I spent a few hours digging into why. Looks like most of the reason is missing SKIP checks, so this series is just adding a bunch of those that I found. I did not do anything like all of them, just the ones I spotted in gup_longterm, gup_test, mmap, userfaultfd and memfd_secret. It's a bit unfortunate to have to skip those tests when ftruncate() fails, but I don't have time to dig deep enough into it to actually make them pass. I have observed the issue on 9pfs and heard rumours that NFS has a similar problem. I'm now able to run these test groups successfully: - mmap - gup_test - compaction - migration - page_frag - userfaultfd - mlock I've never gone past "Waiting for hugetlb memory to get depleted", in the hugetlb tests. I don't know if they are stuck or if they would eventually work if I was patient enough (testing on a 1G machine). I have not investigated further. I had some issues with mlock tests failing due to -ENOSRCH from mlock2(), I can no longer reproduce that though, things work OK now. Of the remaining tests there may be others that work fine, but there's no convenient way to survey the whole output of run_vmtests.sh so I'm just going test by test here. In my spare moments I am slowly chipping away at a setup to run these tests continuously in a reasonably hermetic QEMU environment via virtme-ng: https://github.com/bjackman/linux/blob/5fad4b9c592290f38e0f8bc73c9abb9c99d8787c/README.md Hopefully that will eventually offer a way to provide a "canned" environment where the tests are known to work, which can be fairly easily reproduced by any developer. This patch (of 12): Just reporting failure doesn't tell you what went wrong. This can fail in different ways so report errno to help the reader get started debugging. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-0-dec210a658f5@google.com Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-1-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: fix spellingUjwal Kundur
Fix misspelling flagged by codespell. Link: https://lkml.kernel.org/r/20250215081803.1793-1-ujwal.kundur@gmail.com Signed-off-by: Ujwal Kundur <ujwal.kundur@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16tools/selftests: add file/shmem-backed mapping guard region testsLorenzo Stoakes
Extend the guard region self tests to explicitly assert that guard regions work correctly for functionality specific to file-backed and shmem mappings. In addition to testing all of the existing guard region functionality that is currently tested against anonymous mappings against file-backed and shmem mappings (except those which are exclusive to anonymous mapping), we now also: * Test that MADV_SEQUENTIAL does not cause unexpected readahead behaviour. * Test that MAP_PRIVATE behaves as expected with guard regions installed in both a shared and private mapping of an fd. * Test that a read-only file can correctly establish guard regions. * Test a probable fault-around case does not interfere with guard regions (or vice-versa). * Test that truncation does not eliminate guard regions. * Test that hole punching functions as expected in the presence of guard regions. * Test that a read-only mapping of a memfd write sealed mapping can have guard regions established within it and function correctly without violation of the seal. * Test that guard regions installed into a mapping of the anonymous zero page function correctly. Link: https://lkml.kernel.org/r/90c16bec5fcaafcd1700dfa3e9988c3e1aa9ac1d.1739469950.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16tools/selftests: expand all guard region tests to file-backedLorenzo Stoakes
Extend the guard region tests to allow for test fixture variants for anon, shmem, and local file files. This allows us to assert that each of the expected behaviours of anonymous memory also applies correctly to file-backed (both shmem and an a file created locally in the current working directory) and thus asserts the same correctness guarantees as all the remaining tests do. The fixture teardown is now performed in the parent process rather than child forked ones, meaning cleanup is always performed, including unlinking any generated temporary files. Additionally the variant fixture data type now contains an enum value indicating the type of backing store and the mmap() invocation is abstracted to allow for the mapping of whichever backing store the variant is testing. We adjust tests as necessary to account for the fact they may now reference files rather than anonymous memory. Link: https://lkml.kernel.org/r/ab42228d2bd9b8aa18e9faebcd5c88732a7e5820.1739469950.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: rename guard-pages to guard-regionsLorenzo Stoakes
The feature formerly referred to as guard pages is more correctly referred to as 'guard regions', as in fact no pages are ever allocated in the process of installing the regions. To avoid confusion, rename the tests accordingly. [lorenzo.stoakes@oracle.com: fix guard regions invocation] Link: https://lkml.kernel.org/r/13426c71-d069-4407-9340-b227ff8b8736@lucifer.local Link: https://lkml.kernel.org/r/1c3cd04a3f69b5756b94bda701ac88325a9be18b.1739469950.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: allow tests to run with no huge pages supportMark Brown
Currently the mm selftests refuse to run if huge pages are not available in the current system but this is an optional feature and not all the tests actually require them. Change the test during startup to be non-fatal and skip or omit tests which actually rely on having huge pages, allowing the other tests to be run. The gup_test does support using madvise() to configure huge pages but it ignores the error code so we just let it run. Link: https://lkml.kernel.org/r/20250212-kselftest-mm-no-hugepages-v1-2-44702f538522@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nico Pache <npache@redhat.com> Cc: Mariano Pache <npache@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests: mm: fix typoEric Salem
Fix misspelling. Link: https://lkml.kernel.org/r/77e0e915-36c3-4c95-84b8-0b73aaa17951@gmail.com Signed-off-by: Eric Salem <ericsalem@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: fix thuge-gen test name uniquenessMark Brown
The thuge-gen test_mmap() and test_shmget() tests are repeatedly run for a variety of sizes but always report the result of their test with the same name, meaning that automated sysetms running the tests are unable to distinguish between the various tests. Add the supplied sizes to the logged test names to distinguish between runs. My test automation was getting pretty confused about what was going on - the test names are a pretty important external interface. Link: https://lkml.kernel.org/r/20250204-kselftest-mm-fix-dups-v1-1-6afe417ef4bb@kernel.org Fixes: b38bd9b2c448 ("selftests/mm: thuge-gen: conform to TAP format output") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: test splitting file-backed THP to any lower orderZi Yan
Now split_huge_page*() supports shmem THP split to any lower order. Test it. The test now reads file content out after split to check if the split corrupts the file data. Link: https://lkml.kernel.org/r/20250122161928.1240637-3-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: make file-backed THP split work by writing PMD size dataZi Yan
Commit acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs") changes huge=always to allocate THP/mTHP based on write size and split_huge_page_test does not write PMD size data, so file-back THP is not created during the test. Fix it by writing PMD size data. Link: https://lkml.kernel.org/r/20250122161928.1240637-1-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculationRafael Aquini
We noticed that uffd-stress test was always failing to run when invoked for the hugetlb profiles on x86_64 systems with a processor count of 64 or bigger: ... # ------------------------------------ # running ./uffd-stress hugetlb 128 32 # ------------------------------------ # ERROR: invalid MiB (errno=9, @uffd-stress.c:459) ... # [FAIL] not ok 3 uffd-stress hugetlb 128 32 # exit=1 ... The problem boils down to how run_vmtests.sh (mis)calculates the size of the region it feeds to uffd-stress. The latter expects to see an amount of MiB while the former is just giving out the number of free hugepages halved down. This measurement discrepancy ends up violating uffd-stress' assertion on number of hugetlb pages allocated per CPU, causing it to bail out with the error above. This commit fixes that issue by adjusting run_vmtests.sh's half_ufd_size_MB calculation so it properly renders the region size in MiB, as expected, while maintaining all of its original constraints in place. Link: https://lkml.kernel.org/r/20250218192251.53243-1-aquini@redhat.com Fixes: 2e47a445d7b3 ("selftests/mm: run_vmtests.sh: fix hugetlb mem size calculation") Signed-off-by: Rafael Aquini <raquini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>