summaryrefslogtreecommitdiff
path: root/fs/proc/generic.c
diff options
context:
space:
mode:
authorBarry Song <v-songbaohua@oppo.com>2025-06-08 10:01:50 +1200
committerAndrew Morton <akpm@linux-foundation.org>2025-07-09 22:42:01 -0700
commita6fde7add78d122f5e09cb6280f99c4b5ead7d56 (patch)
tree549cbc634c5ce05cb0651b7b89d9f56138978fc3 /fs/proc/generic.c
parent5e00e31867d16e235bb693b900c85e86dc2c3464 (diff)
mm: use per_vma lock for MADV_DONTNEED
Certain madvise operations, especially MADV_DONTNEED, occur far more frequently than other madvise options, particularly in native and Java heaps for dynamic memory management. Currently, the mmap_lock is always held during these operations, even when unnecessary. This causes lock contention and can lead to severe priority inversion, where low-priority threads—such as Android's HeapTaskDaemon— hold the lock and block higher-priority threads. This patch enables the use of per-VMA locks when the advised range lies entirely within a single VMA, avoiding the need for full VMA traversal. In practice, userspace heaps rarely issue MADV_DONTNEED across multiple VMAs. Tangquan's testing shows that over 99.5% of memory reclaimed by Android benefits from this per-VMA lock optimization. After extended runtime, 217,735 madvise calls from HeapTaskDaemon used the per-VMA path, while only 1,231 fell back to mmap_lock. To simplify handling, the implementation falls back to the standard mmap_lock if userfaultfd is enabled on the VMA, avoiding the complexity of userfaultfd_remove(). Many thanks to Lorenzo's work[1] on "mm/madvise: support VMA read locks for MADV_DONTNEED[_LOCKED]" Then use this mechanism to permit VMA locking to be done later in the madvise() logic and also to allow altering of the locking mode to permit falling back to an mmap read lock if required." One important point, as pointed out by Jann[2], is that untagged_addr_remote() requires holding mmap_lock. This is because address tagging on x86 and RISC-V is quite complex. Until untagged_addr_remote() becomes atomic—which seems unlikely in the near future—we cannot support per-VMA locks for remote processes. So for now, only local processes are supported. Lance said: : Just to put some numbers on it, I ran a micro-benchmark with 100 : parallel threads, where each thread calls madvise() on its own 1GiB : chunk of 64KiB mTHP-backed memory. The performance gain is huge: : : 1) MADV_DONTNEED saw its average time drop from 0.0508s to 0.0270s : (~47% faster) : : 2) MADV_FREE saw its average time drop from 0.3078s to 0.1095s (~64% : faster) [lorenzo.stoakes@oracle.com: avoid any chance of uninitialised pointer deref] Link: https://lkml.kernel.org/r/309d22ca-6cd9-4601-8402-d441a07d9443@lucifer.local Link: https://lore.kernel.org/all/0b96ce61-a52c-4036-b5b6-5c50783db51f@lucifer.local/ [1] Link: https://lore.kernel.org/all/CAG48ez11zi-1jicHUZtLhyoNPGGVB+ROeAJCUw48bsjk4bbEkA@mail.gmail.com/ [2] Link: https://lkml.kernel.org/r/20250607220150.2980-1-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Tangquan Zheng <zhengtangquan@oppo.com> Cc: Lance Yang <ioworker0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'fs/proc/generic.c')
0 files changed, 0 insertions, 0 deletions