Age | Commit message (Collapse) | Author |
|
Add verbose mode to the /proc/pid/maps tearing tests to print debugging
information. VERBOSE environment variable is used to enable it.
Usage example: VERBOSE=1 ./proc-maps-race
Link: https://lkml.kernel.org/r/20250719182854.3166724-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Ye Bin <yebin10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Test that /proc/pid/maps does not report unexpected holes in the address
space when we concurrently remap a part of a vma into the middle of
another vma. This remapping results in the destination vma being split
into three parts and the part in the middle being patched back from, all
done concurrently from under the reader. We should always see either
original vma or the split one with no holes.
Link: https://lkml.kernel.org/r/20250719182854.3166724-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Ye Bin <yebin10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Test that /proc/pid/maps does not report unexpected holes in the address
space when a vma at the edge of the page is being concurrently remapped.
This remapping results in the vma shrinking and expanding from under the
reader. We should always see either shrunk or expanded (original) version
of the vma.
Link: https://lkml.kernel.org/r/20250719182854.3166724-3-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Ye Bin <yebin10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "use per-vma locks for /proc/pid/maps reads", v8.
Reading /proc/pid/maps requires read-locking mmap_lock which prevents any
other task from concurrently modifying the address space. This guarantees
coherent reporting of virtual address ranges, however it can block
important updates from happening. Oftentimes /proc/pid/maps readers are
low priority monitoring tasks and them blocking high priority tasks
results in priority inversion.
Locking the entire address space is required to present fully coherent
picture of the address space, however even current implementation does not
strictly guarantee that by outputting vmas in page-size chunks and
dropping mmap_lock in between each chunk. Address space modifications are
possible while mmap_lock is dropped and userspace reading the content is
expected to deal with possible concurrent address space modifications.
Considering these relaxed rules, holding mmap_lock is not strictly needed
as long as we can guarantee that a concurrently modified vma is reported
either in its original form or after it was modified.
This patchset switches from holding mmap_lock while reading /proc/pid/maps
to taking per-vma locks as we walk the vma tree. This reduces the
contention with tasks modifying the address space because they would have
to contend for the same vma as opposed to the entire address space.
Previous version of this patchset [1] tried to perform /proc/pid/maps
reading under RCU, however its implementation is quite complex and the
results are worse than the new version because it still relied on
mmap_lock speculation which retries if any part of the address space gets
modified. New implementaion is both simpler and results in less
contention. Note that similar approach would not work for /proc/pid/smaps
reading as it also walks the page table and that's not RCU-safe.
Paul McKenney's designed a test [2] to measure mmap/munmap latencies while
concurrently reading /proc/pid/maps. The test has a pair of processes
scanning /proc/PID/maps, and another process unmapping and remapping 4K
pages from a 128MB range of anonymous memory. At the end of each 10
second run, the latency of each mmap() or munmap() operation is measured,
and for each run the maximum and mean latency is printed. The map/unmap
process is started first, its PID is passed to the scanners, and then the
map/unmap process waits until both scanners are running before starting
its timed test. The scanners keep scanning until the specified
/proc/PID/maps file disappears.
The latest results from Paul:
Stock mm-unstable, all of the runs had maximum latencies in excess of 0.5
milliseconds, and with 80% of the runs' latencies exceeding a full
millisecond, and ranging up beyond 4 full milliseconds. In contrast, 99%
of the runs with this patch series applied had maximum latencies of less
than 0.5 milliseconds, with the single outlier at only 0.608 milliseconds.
From a median-performance (as opposed to maximum-latency) viewpoint, this
patch series also looks good, with stock mm weighing in at 11 microseconds
and patch series at 6 microseconds, better than a 2x improvement.
Before the change:
./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2
0.011 0.008 0.521
0.011 0.008 0.552
0.011 0.008 0.590
0.011 0.008 0.660
...
0.011 0.015 2.987
0.011 0.015 3.038
0.011 0.016 3.431
0.011 0.016 4.707
After the change:
./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2
0.006 0.005 0.026
0.006 0.005 0.029
0.006 0.005 0.034
0.006 0.005 0.035
...
0.006 0.006 0.421
0.006 0.006 0.423
0.006 0.006 0.439
0.006 0.006 0.608
The patchset also adds a number of tests to check for /proc/pid/maps data
coherency. They are designed to detect any unexpected data tearing while
performing some common address space modifications (vma split, resize and
remap). Even before these changes, reading /proc/pid/maps might have
inconsistent data because the file is read page-by-page with mmap_lock
being dropped between the pages. An example of user-visible inconsistency
can be that the same vma is printed twice: once before it was modified and
then after the modifications. For example if vma was extended, it might
be found and reported twice. What is not expected is to see a gap where
there should have been a vma both before and after modification. This
patchset increases the chances of such tearing, therefore it's even more
important now to test for unexpected inconsistencies.
In [3] Lorenzo identified the following possible vma merging/splitting
scenarios:
Merges with changes to existing vmas:
1 Merge both - mapping a vma over another one and between two vmas which
can be merged after this replacement;
2. Merge left full - mapping a vma at the end of an existing one and
completely over its right neighbor;
3. Merge left partial - mapping a vma at the end of an existing one and
partially over its right neighbor;
4. Merge right full - mapping a vma before the start of an existing one
and completely over its left neighbor;
5. Merge right partial - mapping a vma before the start of an existing one
and partially over its left neighbor;
Merges without changes to existing vmas:
6. Merge both - mapping a vma into a gap between two vmas which can be
merged after the insertion;
7. Merge left - mapping a vma at the end of an existing one;
8. Merge right - mapping a vma before the start end of an existing one;
Splits
9. Split with new vma at the lower address;
10. Split with new vma at the higher address;
If such merges or splits happen concurrently with the /proc/maps reading
we might report a vma twice, once before the modification and once after
it is modified:
Case 1 might report overwritten and previous vma along with the final
merged vma;
Case 2 might report previous and the final merged vma;
Case 3 might cause us to retry once we detect the temporary gap caused by
shrinking of the right neighbor;
Case 4 might report overritten and the final merged vma;
Case 5 might cause us to retry once we detect the temporary gap caused by
shrinking of the left neighbor;
Case 6 might report previous vma and the gap along with the final marged
vma;
Case 7 might report previous and the final merged vma;
Case 8 might report the original gap and the final merged vma covering the
gap;
Case 9 might cause us to retry once we detect the temporary gap caused by
shrinking of the original vma at the vma start;
Case 10 might cause us to retry once we detect the temporary gap caused by
shrinking of the original vma at the vma end;
In all these cases the retry mechanism prevents us from reporting possible
temporary gaps.
[1] https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/
[2] https://github.com/paulmckrcu/proc-mmap_sem-test
[3] https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.local/
The /proc/pid/maps file is generated page by page, with the mmap_lock
released between pages. This can lead to inconsistent reads if the
underlying vmas are concurrently modified. For instance, if a vma split
or merge occurs at a page boundary while /proc/pid/maps is being read, the
same vma might be seen twice: once before and once after the change. This
duplication is considered acceptable for userspace handling. However,
observing a "hole" where a vma should be (e.g., due to a vma being
replaced and the space temporarily being empty) is unacceptable.
Implement a test that:
1. Forks a child process which continuously modifies its address
space, specifically targeting a vma at the boundary between two pages.
2. The parent process repeatedly reads the child's /proc/pid/maps.
3. The parent process checks the last vma of the first page and the
first vma of the second page for consistency, looking for the effects
of vma splits or merges.
The test duration is configurable via DURATION environment variable
expressed in seconds. The default test duration is 5 seconds.
Example Command: DURATION=10 ./proc-maps-race
Link: https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/ [1]
Link: https://github.com/paulmckrcu/proc-mmap_sem-test [2]
Link: https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.local/ [3]
Link: https://lkml.kernel.org/r/20250719182854.3166724-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250719182854.3166724-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Ye Bin <yebin10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now that we have added the ability to move multiple VMAs at once, assert
that this functions correctly, both overwriting VMAs and moving backwards
and forwards with merge and VMA invalidation.
Additionally assert that page tables are correctly propagated by setting
random data and reading it back.
Link: https://lkml.kernel.org/r/139074a24a011ca4ed52498a7fa2080024b43917.1752770784.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add test cases to test the correctness of PFN ZERO flag of pagemap_scan
ioctl. Test with normal pages backed memory and huge pages backed memory.
Link: https://lkml.kernel.org/r/20250707073321.106431-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Assert that mremap() behaviour is as expected when moving around unfaulted
VMAs immediately adjacent to faulted ones, as well as moving around
faulted VMAs and placing them back immediately adjacent to the VMA from
which they were moved.
This also introduces a shared helper for the syscall version of mremap()
so we don't encounter any issues with libc filtering parameters.
Link: https://lkml.kernel.org/r/20250702084717.21360-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add DAMON sysfs interface functionality tests for basic DAMOS schemes
parameters setup.
Link: https://lkml.kernel.org/r/20250628160428.53115-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add DAMON sysfs interface functionality tests for setup of basic adaptive
targets parameters.
Link: https://lkml.kernel.org/r/20250628160428.53115-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add DAMON sysfs interface functionality tests for DAMON monitoring
attribute parameters, including intervals, intervals tuning goals, and
min/max number of regions.
Link: https://lkml.kernel.org/r/20250628160428.53115-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a python-written DAMON sysfs functionality selftest. It sets DAMON
parameters using Python module _damon_sysfs, reads updated kernel internal
DAMON status and parameters using a 'drgn' script, namely
drgn_dump_damon_status.py, and compare if the resulted DAMON internal
status is as expected. The test is very minimum at the moment.
Link: https://lkml.kernel.org/r/20250628160428.53115-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
_damon_sysfs.py is a Python module for reading and writing DAMON sysfs for
testing. It is not reading resulting kdamond pids. Read and update those
when starting kdamonds.
Link: https://lkml.kernel.org/r/20250628160428.53115-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "selftests/damon: add python and drgn based DAMON sysfs
functionality tests".
DAMON sysfs interface is the bridge between the user space and the kernel
space for DAMON parameters. There is no good and simple test to see if
the parameters are set as expected. Existing DAMON selftests therefore
test end-to-end features. For example, damos_quota_goal.py runs a DAMOS
scheme with quota goal set against a test program running an artificial
access pattern, and see if the result is as expected. Such tests cover
only a few part of DAMON. Adding more tests is also complicated.
Finally, the reliability of the test itself on different systems is bad.
'drgn' is a tool that can extract kernel internal data structures like
DAMON parameters. Add a test that passes specific DAMON parameters via
DAMON sysfs reusing _damon_sysfs.py, extract resulting DAMON parameters
via 'drgn', and compare those. Note that this test is not adding
exhaustive tests of all DAMON parameters and input combinations but very
basic things. Advancing the test infrastructure and adding more tests are
future works.
This patch (of 6):
'drgn' is a useful tool for extracting kernel internal data structures
such as DAMON's parameter and running status. Add a 'drgn' script that
extracts such DAMON internal data at runtime, for using it as a tool for
seeing if a test input has made expected results in the kernel.
The script saves or prints out the DAMON internal data as a json file or
string. This is for making use of it not very depends on 'drgn'. If
'drgn' is not available on a test setup and we find alternative tools for
doing that, the json-based tests can be updated to use an alternative tool
in future.
Note that the script is tested with 'drgn v0.0.22'.
Link: https://lkml.kernel.org/r/20250628160428.53115-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250628160428.53115-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
are required for a merge of the series "mm: folio_pte_batch()
improvements".
|
|
Some systems (e.g. minimal or real-time kernels) may not enable
Transparent Hugepages (THP), causing MADV_HUGEPAGE to return EINVAL. This
patch introduces a runtime check using the existing THP sysfs interface
and skips the hugepage merging test (`-H`) when THP is not available.
To avoid those failures:
# -----------------------------
# running ./ksm_tests -H -s 100
# -----------------------------
# ksm_tests: MADV_HUGEPAGE: Invalid argument
# [FAIL]
not ok 1 ksm_tests -H -s 100 # exit=2
# --------------------
# running ./khugepaged
# --------------------
# Reading PMD pagesize failed# [FAIL]
not ok 1 khugepaged # exit=1
# --------------------
# running ./soft-dirty
# --------------------
# TAP version 13
# 1..15
# ok 1 Test test_simple
# ok 2 Test test_vma_reuse dirty bit of allocated page
# ok 3 Test test_vma_reuse dirty bit of reused address page
# Bail out! Reading PMD pagesize failed# Planned tests != run tests (15 != 3)
# # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
# [FAIL]
not ok 1 soft-dirty # exit=1
# SUMMARY: PASS=0 SKIP=0 FAIL=1
# -------------------
# running ./migration
# -------------------
# TAP version 13
# 1..3
# # Starting 3 tests from 1 test cases.
# # RUN migration.private_anon ...
# # OK migration.private_anon
# ok 1 migration.private_anon
# # RUN migration.shared_anon ...
# # OK migration.shared_anon
# ok 2 migration.shared_anon
# # RUN migration.private_anon_thp ...
# # migration.c:196:private_anon_thp:Expected madvise(ptr, TWOMEG, MADV_HUGEPAGE) (-1) == 0 (0)
# # private_anon_thp: Test terminated by assertion
# # FAIL migration.private_anon_thp
# not ok 3 migration.private_anon_thp
# # FAILED: 2 / 3 tests passed.
# # Totals: pass:2 fail:1 xfail:0 xpass:0 skip:0 error:0
# [FAIL]
not ok 1 migration # exit=1
It's true that CONFIG_TRANSPARENT_HUGEPAGE=y is explicitly enabled in
tools/testing/selftests/mm/config, so ideally the runtime environment
should also support THP.
However, in practice, we've found that on some systems:
- THP is disabled at boot time (transparent_hugepage=never)
- Or manually disabled via sysfs
- Or unavailable in RT kernels, containers, or minimal CI environments
In these cases, the test will fail with EINVAL on madvise(MADV_HUGEPAGE),
even though the kernel config is correct.
To make the test suite more robust and avoid false negatives, this patch
adds a runtime check for /sys/kernel/mm/transparent_hugepage/enabled.
If THP is not available, the hugepage test (-H) is skipped with a clear
message.
Link: https://lkml.kernel.org/r/20250624032748.393836-1-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Keith Lucas <keith.lucas@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The current implementation of test_unmerge_uffd_wp() explicitly sets
`uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP` before calling
UFFDIO_API. This can cause the ioctl() call to fail with EINVAL on
kernels that do not support UFFD-WP, leading the test to fail
unnecessarily:
# ------------------------------
# running ./ksm_functional_tests
# ------------------------------
# TAP version 13
# 1..9
# # [RUN] test_unmerge
# ok 1 Pages were unmerged
# # [RUN] test_unmerge_zero_pages
# ok 2 KSM zero pages were unmerged
# # [RUN] test_unmerge_discarded
# ok 3 Pages were unmerged
# # [RUN] test_unmerge_uffd_wp
# not ok 4 UFFDIO_API failed <-----
# # [RUN] test_prot_none
# ok 5 Pages were unmerged
# # [RUN] test_prctl
# ok 6 Setting/clearing PR_SET_MEMORY_MERGE works
# # [RUN] test_prctl_fork
# # No pages got merged
# # [RUN] test_prctl_fork_exec
# ok 7 PR_SET_MEMORY_MERGE value is inherited
# # [RUN] test_prctl_unmerge
# ok 8 Pages were unmerged
# Bail out! 1 out of 8 tests failed
# # Planned tests != run tests (9 != 8)
# # Totals: pass:7 fail:1 xfail:0 xpass:0 skip:0 error:0
# [FAIL]
This patch improves compatibility and robustness of the UFFD-WP test
(test_unmerge_uffd_wp) by correctly implementing the UFFDIO_API two-step
handshake as recommended by the userfaultfd(2) man page.
Key changes:
1. Use features=0 in the initial UFFDIO_API call to query supported
feature bits, rather than immediately requesting WP support.
2. Skip the test gracefully if:
- UFFDIO_API fails with EINVAL (e.g. unsupported API version), or
- UFFD_FEATURE_PAGEFAULT_FLAG_WP is not advertised by the kernel.
3. Close the initial userfaultfd and create a new one before enabling
the required feature, since UFFDIO_API can only be called once per fd.
4. Improve diagnostics by distinguishing between expected and unexpected
failures, using strerror() to report errors.
This ensures the test behaves correctly across a wider range of kernel
versions and configurations, while preserving the intended behavior on
kernels that support UFFD-WP.
[liwang@redhat.com: fail the test if sys_userfaultfd() fails, per David]
Link: https://lkml.kernel.org/r/20250625004645.400520-1-liwang@redhat.com
Link: https://lkml.kernel.org/r/20250624042411.395285-1-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Keith Lucas <keith.lucas@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Li Wang <liwang@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove redundant entries in .gitignore confirmed by:
$ sort tools/testing/selftests/mm/.gitignore | uniq -d
hugetlb_dio
pkey_sighandler_tests_32
pkey_sighandler_tests_64
These entries were originally added by [1], and later duplicated by [2].
[1] https://lore.kernel.org/all/20240924185911.117937-1-lorenzo.stoakes@oracle.com/
[2] https://lore.kernel.org/all/20241125064036.413536-1-lizhijian@fujitsu.com/
Link: https://lkml.kernel.org/r/20250626020758.163243-1-moonhee.lee.ca@gmail.com
Signed-off-by: Moon Hee Lee <moonhee.lee.ca@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The test will still generate quite some unwanted MCE error messages to
syslog. There was old proposal ratelimiting the MCE messages from kernel,
but that has risk of hiding real useful information on production systems.
We can at least reduce the test to minimum to not over-pollute dmesg,
however trying to not lose its coverage too much.
[peterx@redhat.com: reduce uffd-unit-test poison test to minimum]
Link: https://lkml.kernel.org/r/aF2RSsjuEOtzXcUa@x1.local
Link: https://lkml.kernel.org/r/20250620150058.1729489-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There was a memory leak bug in DAMOS sysfs memcg_path file. Add a
selftest to ensure the bug never comes back.
Link: https://lkml.kernel.org/r/20250619183608.6647-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
All PFN_* pfn_t flags have been removed. Therefore there is no longer a
need for the pfn_t type and all uses can be replaced with normal pfns.
Link: https://lkml.kernel.org/r/bbedfa576c9822f8032494efbe43544628698b1f.1750323463.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Björn Töpel <bjorn@rivosinc.com>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Inki Dae <m.szyprowski@samsung.com>
Cc: John Groves <john@groves.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The PFN_MAP flag is no longer used for anything, so remove it. The
PFN_SG_CHAIN and PFN_SG_LAST flags never appear to have been used so also
remove them. The last user of PFN_SPECIAL was removed by 653d7825c149
("dcssblk: mark DAX broken, remove FS_DAX_LIMITED support").
Users of PFN_DEV were removed earlier in this series by "mm: Remove
remaining uses of PFN_DEV".
Link: https://lkml.kernel.org/r/670b3950d70b4d97b905bb597dadfd3633de4314.1750323463.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Björn Töpel <bjorn@rivosinc.com>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Inki Dae <m.szyprowski@samsung.com>
Cc: John Groves <john@groves.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Unlike the existing tests, this new test will create a memfd (backed by
hugetlb) and pin the folios in it (a small subset) before writing/
populating it with data. This is a valid use-case that invokes the
memfd_alloc_folio() kernel API and is expected to work unless there aren't
enough hugetlb folios to satisfy the allocation needs.
Link: https://lkml.kernel.org/r/20250618053415.1036185-4-vivek.kasireddy@intel.com
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Steve Sistare <steven.sistare@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In future we intend to change the vm_flags_t type, so it isn't correct for
architecture and driver code to assume it is unsigned long. Correct this
assumption across the board.
Overall, this patch does not introduce any functional change.
Link: https://lkml.kernel.org/r/b6eb1894abc5555ece80bb08af5c022ef780c8bc.1750274467.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The core kernel code is currently very inconsistent in its use of
vm_flags_t vs. unsigned long. This prevents us from changing the type of
vm_flags_t in the future and is simply not correct, so correct this.
While this results in rather a lot of churn, it is a critical
pre-requisite for a future planned change to VMA flag type.
Additionally, update VMA userland tests to account for the changes.
To make review easier and to break things into smaller parts, driver and
architecture-specific changes is left for a subsequent commit.
The code has been adjusted to cascade the changes across all calling code
as far as is needed.
We will adjust architecture-specific and driver code in a subsequent patch.
Overall, this patch does not introduce any functional change.
Link: https://lkml.kernel.org/r/d1588e7bb96d1ea3fe7b9df2c699d5b4592d901d.1750274467.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Kees Cook <kees@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "use vm_flags_t consistently".
The VMA flags field vma->vm_flags is of type vm_flags_t. Right now this
is exactly equivalent to unsigned long, but it should not be assumed to
be.
Much code that references vma->vm_flags already correctly uses vm_flags_t,
but a fairly large chunk of code simply uses unsigned long and assumes
that the two are equivalent.
This series corrects that and has us use vm_flags_t consistently.
This series is motivated by the desire to, in a future series, adjust
vm_flags_t to be a u64 regardless of whether the kernel is 32-bit or
64-bit in order to deal with the VMA flag exhaustion issue and avoid all
the various problems that arise from it (being unable to use certain
features in 32-bit, being unable to add new flags except for 64-bit, etc.)
This is therefore a critical first step towards that goal. At any rate,
using the correct type is of value regardless.
We additionally take the opportunity to refer to VMA flags as vm_flags
where possible to make clear what we're referring to.
Overall, this series does not introduce any functional change.
This patch (of 3):
We abstract the type of the VMA flags to vm_flags_t, however in may places
it is simply assumed this is unsigned long, which is simply incorrect.
At the moment this is simply an incongruity, however in future we plan to
change this type and therefore this change is a critical requirement for
doing so.
Overall, this patch does not introduce any functional change.
[lorenzo.stoakes@oracle.com: add missing vm_get_page_prot() instance, remove include]
Link: https://lkml.kernel.org/r/552f88e1-2df8-4e95-92b8-812f7c8db829@lucifer.local
Link: https://lkml.kernel.org/r/cover.1750274467.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/a12769720a2743f235643b158c4f4f0a9911daf0.1750274467.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Testing calling multiple mas_preallocate() calls in a row after adjusting
the maple state. Ensures new calls to mas_preallocate() will change the
number of allocated nodes.
Link: https://lkml.kernel.org/r/20250616184521.3382795-4-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Hailong Liu <hailong.liu@oppo.com>
Cc: "Liam R. Howlett" <howlett@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Faster machines may not see the initial or updated value in the race
condition. Reduce the delay so that faster machines are less likely to
fail testing of the race conditions.
Link: https://lkml.kernel.org/r/20250616184521.3382795-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <howlett@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Hailong Liu <hailong.liu@oppo.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently gup_longterm assumes that filesystems support fallocate() and
uses that to allocate space in files, however this is an optional feature
and is in particular not implemented by NFSv3 which is commonly used in CI
systems leading to spurious failures. Check for lack of support and
report a skip instead for that case.
Link: https://lkml.kernel.org/r/20250613-selftest-mm-gup-longterm-fallocate-nfs-v1-1-758a104c175f@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In commit 3a75ccba047b ("mm: simplify vma merge structure and expand
comments") we introduced the vmg->target field to make the merging of
existing VMAs simpler - clarifying precisely which VMA would eventually
become the merged VMA once the merge operation was complete.
New VMA merging did not get quite the same treatment, retaining the rather
confusing convention of storing the target VMA in vmg->middle.
This patch corrects this state of affairs, utilising vmg->target for this
purpose for both vma_merge_new_range() and also for vma_expand().
We retain the WARN_ON for vmg->middle being specified in
vma_merge_new_range() as doing so would make no sense, but add an
additional debug assert for setting vmg->target.
This patch additionally updates VMA userland testing to account for this
change.
[lorenzo.stoakes@oracle.com: make comment consistent in vma_expand()]
Link: https://lkml.kernel.org/r/c54f45e3-a6ac-4749-93c0-cc9e3080ee37@lucifer.local
Link: https://lkml.kernel.org/r/20250613184807.108089-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, we only test anonymous memory collapse by default. We should
also add shmem collapse as a default test item to catch issues that could
break the test cases.
Link: https://lkml.kernel.org/r/a30b1529b399f2e649b5a05c3d352f41a68faeae.1749779183.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Tested-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When running the khugepaged selftest for shmem (./khugepaged all:shmem), I
encountered the following test failures:
: Run test: collapse_full (khugepaged:shmem)
: Collapse multiple fully populated PTE table.... Fail
: ...
: Run test: collapse_single_pte_entry (khugepaged:shmem)
: Collapse PTE table with single PTE entry present.... Fail
: ...
: Run test: collapse_full_of_compound (khugepaged:shmem)
: Allocate huge page... OK
: Split huge page leaving single PTE page table full of compound pages... OK
: Collapse PTE table full of compound pages.... Fail
The reason for the failure is that it will set MADV_NOHUGEPAGE to prevent
khugepaged from continuing to scan shmem VMA after khugepaged finishes
scanning in the wait_for_scan() function. Moreover, shmem requires a
refault to establish PMD mappings.
However, after commit 2b0f922323cc ("mm: don't install PMD mappings when
THPs are disabled by the hw/process/vma"), PMD mappings are prevented if
the VMA is set with MADV_NOHUGEPAGE flag, so shmem cannot establish PMD
mappings during refault.
One way to fix this issue is to move the MADV_NOHUGEPAGE setting after the
shmem refault. After shmem refault and check huge, the test case will
unmap the shmem immediately. So it seems unnecessary to set the
MADV_NOHUGEPAGE.
Then we can simply drop the MADV_NOHUGEPAGE setting, and all khugepaged
test cases passed.
Link: https://lkml.kernel.org/r/d8502fc50d0304c2afd27ced062b1d636b7a872e.1749779183.git.baolin.wang@linux.alibaba.com
Fixes: 2b0f922323cc ("mm: don't install PMD mappings when THPs are disabled by the hw/process/vma")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Tested-by: Dev Jain <dev.jain@arm.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As generic read_sysfs is available in vm_utils, let's use is in thuge-gen
test.
Link: https://lkml.kernel.org/r/20250611100106.1331197-1-pulehui@huaweicloud.com
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When running the memfd_secret test run_vmtests.sh unconditionally tries
to confgiure the YAMA LSM's ptrace_scope configuration, leading to an error
if YAMA is not in the running kernel:
# ./run_vmtests.sh: line 432: /proc/sys/kernel/yama/ptrace_scope: No such file or directory
# # ----------------------
# # running ./memfd_secret
# # ----------------------
Check that this file is present before trying to write to it.
The indentation here is a bit odd, and it doesn't seem great that we
configure but don't restore ptrace_scope.
Link: https://lkml.kernel.org/r/20250610-selftest-mm-enable-yama-v1-1-0097b6713116@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It is not sufficiently clear what the individual tests in the cow test
program are checking so add messages for the failure cases.
Link: https://lkml.kernel.org/r/20250610-selftest-mm-cow-tweaks-v1-4-43cd7457500f@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Tweak the coding style for checking for non-zero return values.
While we're at it also remove a now redundant oring of the madvise()
return code.
Link: https://lkml.kernel.org/r/20250610-selftest-mm-cow-tweaks-v1-3-43cd7457500f@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This prints the errno and a string decode of it.
Link: https://lkml.kernel.org/r/20250610-selftest-mm-cow-tweaks-v1-2-43cd7457500f@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "selftests/mm: Tweaks to the cow test".
A collection of non-functional updates from David Hildenbrand's review.
This patch (of 4):
Specify that errors reported from pipe() failures are the result of
failures.
Link: https://lkml.kernel.org/r/20250610-selftest-mm-cow-tweaks-v1-0-43cd7457500f@kernel.org
Link: https://lkml.kernel.org/r/20250610-selftest-mm-cow-tweaks-v1-1-43cd7457500f@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add test to assert that we have now allowed merging of VMAs when KSM
merging-by-default has been set by prctl(PR_SET_MEMORY_MERGE, ...).
We simply perform a trivial mapping of adjacent VMAs expecting a merge,
however prior to recent changes implementing this mode earlier than
before, these merges would not have succeeded.
Assert that we have fixed this!
Link: https://lkml.kernel.org/r/6dec7aabf062c6b121cfac992c9c716cefdda00c.1748537921.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Tested-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Xu Xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If a user wishes to enable KSM mergeability for an entire process and all
fork/exec'd processes that come after it, they use the prctl()
PR_SET_MEMORY_MERGE operation.
This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
(in order to indicate they are KSM mergeable), as well as setting this
flag for all existing VMAs and propagating this across fork/exec.
However it also breaks VMA merging for new VMAs, both in the process and
all forked (and fork/exec'd) child processes.
This is because when a new mapping is proposed, the flags specified will
never have VM_MERGEABLE set. However all adjacent VMAs will already have
VM_MERGEABLE set, rendering VMAs unmergeable by default.
To work around this, we try to set the VM_MERGEABLE flag prior to
attempting a merge. In the case of brk() this can always be done.
However on mmap() things are more complicated - while KSM is not supported
for MAP_SHARED file-backed mappings, it is supported for MAP_PRIVATE
file-backed mappings.
These mappings may have deprecated .mmap() callbacks specified which
could, in theory, adjust flags and thus KSM eligibility.
So we check to determine whether this is possible. If not, we set
VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
previous behaviour.
This fixes VMA merging for all new anonymous mappings, which covers the
majority of real-world cases, so we should see a significant improvement
in VMA mergeability.
For MAP_PRIVATE file-backed mappings, those which implement the
.mmap_prepare() hook and shmem are both known to be safe, so we allow
these, disallowing all other cases.
Also add stubs for newly introduced function invocations to VMA userland
testing.
[lorenzo.stoakes@oracle.com: correctly invoke late KSM check after mmap hook]
Link: https://lkml.kernel.org/r/5861f8f6-cf5a-4d82-a062-139fb3f9cddb@lucifer.local
Link: https://lkml.kernel.org/r/3ba660af716d87a18ca5b4e635f2101edeb56340.1748537921.git.lorenzo.stoakes@oracle.com
Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport!
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xu Xin <xu.xin16@zte.com.cn>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduces a new drgn script, `show_page_info.py`, which allows users
to analyze the state of a page given a process ID (PID) and a virtual
address (VADDR). This can help kernel developers or debuggers easily
inspect page-related information in a live kernel or vmcore.
The script extracts information such as the page flags, mapping, and
other metadata relevant to diagnosing memory issues.
Output example:
sudo ./show_page_info.py 1 0x7fc988181000
PID: 1 Comm: systemd mm: 0xffff8d22c4089700
RAW: 0017ffffc000416c fffff939062ff708 fffff939062ffe08 ffff8d23062a12a8
RAW: 0000000000000000 ffff8d2323438f60 0000002500000007 ffff8d23203ff500
Page Address: 0xfffff93905664e00
Page Flags: PG_referenced|PG_uptodate|PG_lru|PG_head|PG_active|
PG_private|PG_reported|PG_has_hwpoisoned
Page Size: 4096
Page PFN: 0x159938
Page Physical: 0x159938000
Page Virtual: 0xffff8d2319938000
Page Refcount: 37
Page Mapcount: 7
Page Index: 0x0
Page Memcg Data: 0xffff8d23203ff500
Memcg Name: init.scope
Memcg Path: /sys/fs/cgroup/memory/init.scope
Page Mapping: 0xffff8d23062a12a8
Page Anon/File: File
Page VMA: 0xffff8d22e06e0e40
VMA Start: 0x7fc988181000
VMA End: 0x7fc988185000
This page is part of a compound page.
This page is the head page of a compound page.
Head Page: 0xfffff93905664e00
Compound Order: 2
Number of Pages: 4
Link: https://lkml.kernel.org/r/20250530055855.687067-1-ye.liu@linux.dev
Signed-off-by: Ye Liu <liuye@kylinos.cn>
Tested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Some libc's like musl libc don't provide execinfo.h since it's not part of
POSIX. In order to fix compilation on musl, only include execinfo.h if
available (HAVE_BACKTRACE_SUPPORT)
This was discovered with c104c16073b7 ("Kunit to check the longest symbol
length") which starts to include linux/kallsyms.h with Alpine Linux'
configs.
Link: https://lkml.kernel.org/r/20250622014608.448718-1-fossdd@pwned.life
Fixes: c104c16073b7 ("Kunit to check the longest symbol length")
Signed-off-by: Achill Gilgenast <fossdd@pwned.life>
Cc: Luis Henriques <luis@igalia.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
- Fix the compilation of an x86 kernel on a big engian machine due to a
missed endianness conversion
* tag 'objtool_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Add missing endian conversion to read_annotate()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Borislav Petkov:
- Disable FUTEX_PRIVATE_HASH for this cycle due to a performance
regression
- Add a selftests compilation product to the corresponding .gitignore
file
* tag 'locking_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/futex: Add futex_numa to .gitignore
futex: Temporary disable FUTEX_PRIVATE_HASH
|
|
futex_numa was never added to the .gitignore file.
Add it.
Fixes: 9140f57c1c13 ("futex,selftests: Add another FUTEX2_NUMA selftest")
Signed-off-by: Terry Tritton <terry.tritton@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/all/20250704103749.10341-1-terry.tritton@linaro.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix a regression caused by the anonymous inode rework. Making them
regular files causes various places in the kernel to tip over
starting with io_uring.
Revert to the former status quo and port our assertion to be based on
checking the inode so we don't lose the valuable VFS_*_ON_*()
assertions that have already helped discover weird behavior our
outright bugs.
- Fix the the upper bound calculation in fuse_fill_write_pages()
- Fix priority inversion issues in the eventpoll code
- Make secretmen use anon_inode_make_secure_inode() to avoid bypassing
the LSM layer
- Fix a netfs hang due to missing case in final DIO read result
collection
- Fix a double put of the netfs_io_request struct
- Provide some helpers to abstract out NETFS_RREQ_IN_PROGRESS flag
wrangling
- Fix infinite looping in netfs_wait_for_pause/request()
- Fix a netfs ref leak on an extra subrequest inserted into a request's
list of subreqs
- Fix various cifs RPC callbacks to set NETFS_SREQ_NEED_RETRY if a
subrequest fails retriably
- Fix a cifs warning in the workqueue code when reconnecting a channel
- Fix the updating of i_size in netfs to avoid a race between testing
if we should have extended the file with a DIO write and changing
i_size
- Merge the places in netfs that update i_size on write
- Fix coredump socket selftests
* tag 'vfs-6.16-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
anon_inode: rework assertions
netfs: Update tracepoints in a number of ways
netfs: Renumber the NETFS_RREQ_* flags to make traces easier to read
netfs: Merge i_size update functions
netfs: Fix i_size updating
smb: client: set missing retry flag in cifs_writev_callback()
smb: client: set missing retry flag in cifs_readv_callback()
smb: client: set missing retry flag in smb2_writev_callback()
netfs: Fix ref leak on inserted extra subreq in write retry
netfs: Fix looping in wait functions
netfs: Provide helpers to perform NETFS_RREQ_IN_PROGRESS flag wangling
netfs: Fix double put of request
netfs: Fix hang due to missing case in final DIO read result collection
eventpoll: Fix priority inversion problem
fuse: fix fuse_fill_write_pages() upper bound calculation
fs: export anon_inode_make_secure_inode() and fix secretmem LSM bypass
selftests/coredump: Fix "socket_detect_userspace_client" test failure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:
"Some changes to the userspace selftest framework cause the iommufd
tests to start failing. This turned out to be bugs in the iommufd side
that were just getting uncovered.
- Deal with MAP_HUGETLB mmaping more than requested even when in
MAP_FIXED mode
- Fixup missing error flow cleanup in the test
- Check that the memory allocations suceeded
- Suppress some bogus gcc 'may be used uninitialized' warnings"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd/selftest: Fix build warnings due to uninitialized mfd
iommufd/selftest: Add asserts testing global mfd
iommufd/selftest: Add missing close(mfd) in memfd_mmap()
iommufd/selftest: Fix iommufd_dirty_tracking with large hugepage sizes
|
|
Trying to compile an x86 kernel on big endian results in this error:
net/ipv4/netfilter/iptable_nat.o: warning: objtool: iptable_nat_table_init+0x150: Unknown annotation type: 50331648
make[5]: *** [scripts/Makefile.build:287: net/ipv4/netfilter/iptable_nat.o] Error 255
Reason is a missing endian conversion in read_annotate().
Add the missing conversion to fix this.
Fixes: 2116b349e29a ("objtool: Generic annotation infrastructure")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250630131230.4130185-1-hca@linux.ibm.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
- replace __ASSEMBLY__ with __ASSEMBLER__ in headers like others
- fix build warnings about export.h
- reserve the EFI memory map region for kdump
- handle __init vs inline mismatches
- fix some KVM bugs
* tag 'loongarch-fixes-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Disable updating of "num_cpu" and "feature"
LoongArch: KVM: Check validity of "num_cpu" from user space
LoongArch: KVM: Check interrupt route from physical CPU
LoongArch: KVM: Fix interrupt route update with EIOINTC
LoongArch: KVM: Add address alignment check for IOCSR emulation
LoongArch: KVM: Avoid overflow with array index
LoongArch: Handle KCOV __init vs inline mismatches
LoongArch: Reserve the EFI memory map region
LoongArch: Fix build warnings about export.h
LoongArch: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"16 hotfixes.
6 are cc:stable and the remainder address post-6.15 issues or aren't
considered necessary for -stable kernels. 5 are for MM"
* tag 'mm-hotfixes-stable-2025-06-27-16-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS: add Lorenzo as THP co-maintainer
mailmap: update Duje Mihanović's email address
selftests/mm: fix validate_addr() helper
crashdump: add CONFIG_KEYS dependency
mailmap: correct name for a historical account of Zijun Hu
mailmap: add entries for Zijun Hu
fuse: fix runtime warning on truncate_folio_batch_exceptionals()
scripts/gdb: fix dentry_name() lookup
mm/damon/sysfs-schemes: free old damon_sysfs_scheme_filter->memcg_path on write
mm/alloc_tag: fix the kmemleak false positive issue in the allocation of the percpu variable tag->counters
lib/group_cpus: fix NULL pointer dereference from group_cpus_evenly()
mm/hugetlb: remove unnecessary holding of hugetlb_lock
MAINTAINERS: add missing files to mm page alloc section
MAINTAINERS: add tree entry to mm init block
mm: add OOM killer maintainer structure
fs/proc/task_mmu: fix PAGE_IS_PFNZERO detection for the huge zero folio
|
|
Pull block fixes from Jens Axboe:
- Fixes for ublk:
- fix C++ narrowing warnings in the uapi header
- update/improve UBLK_F_SUPPORT_ZERO_COPY comment in uapi header
- fix for the ublk ->queue_rqs() implementation, limiting a batch
to just the specific task AND ring
- ublk_get_data() error handling fix
- sanity check more arguments in ublk_ctrl_add_dev()
- selftest addition
- NVMe pull request via Christoph:
- reset delayed remove_work after reconnect
- fix atomic write size validation
- Fix for a warning introduced in bdev_count_inflight_rw() in this
merge window
* tag 'block-6.16-20250626' of git://git.kernel.dk/linux:
block: fix false warning in bdev_count_inflight_rw()
ublk: sanity check add_dev input for underflow
nvme: fix atomic write size validation
nvme: refactor the atomic write unit detection
nvme: reset delayed remove_work after reconnect
ublk: setup ublk_io correctly in case of ublk_get_data() failure
ublk: update UBLK_F_SUPPORT_ZERO_COPY comment in UAPI header
ublk: fix narrowing warnings in UAPI header
selftests: ublk: don't take same backing file for more than one ublk devices
ublk: build batch from IOs in same io_ring_ctx and io task
|