diff options
author | Suren Baghdasaryan <surenb@google.com> | 2025-07-19 11:28:54 -0700 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2025-07-24 19:12:37 -0700 |
commit | 5631da56c9a87ea41d69d1bbbc1cee327eb9354b (patch) | |
tree | 6ef59386661dfddcfcfd40757cb7c4e48630a589 /fs/proc/internal.h | |
parent | 03d98703f7e172778786ebd7c5f2471d0f65d3a6 (diff) |
fs/proc/task_mmu: read proc/pid/maps under per-vma lock
With maple_tree supporting vma tree traversal under RCU and per-vma locks,
/proc/pid/maps can be read while holding individual vma locks instead of
locking the entire address space.
A completely lockless approach (walking vma tree under RCU) would be quite
complex with the main issue being get_vma_name() using callbacks which
might not work correctly with a stable vma copy, requiring original
(unstable) vma - see special_mapping_name() for example.
When per-vma lock acquisition fails, we take the mmap_lock for reading,
lock the vma, release the mmap_lock and continue. This fallback to mmap
read lock guarantees the reader to make forward progress even during lock
contention. This will interfere with the writer but for a very short time
while we are acquiring the per-vma lock and only when there was contention
on the vma reader is interested in.
We shouldn't see a repeated fallback to mmap read locks in practice, as
this require a very unlikely series of lock contentions (for instance due
to repeated vma split operations). However even if this did somehow
happen, we would still progress.
One case requiring special handling is when a vma changes between the time
it was found and the time it got locked. A problematic case would be if a
vma got shrunk so that its vm_start moved higher in the address space and
a new vma was installed at the beginning:
reader found: |--------VMA A--------|
VMA is modified: |-VMA B-|----VMA A----|
reader locks modified VMA A
reader reports VMA A: | gap |----VMA A----|
This would result in reporting a gap in the address space that does not
exist. To prevent this we retry the lookup after locking the vma, however
we do that only when we identify a gap and detect that the address space
was changed after we found the vma.
This change is designed to reduce mmap_lock contention and prevent a
process reading /proc/pid/maps files (often a low priority task, such as
monitoring/data collection services) from blocking address space updates.
Note that this change has a userspace visible disadvantage: it allows for
sub-page data tearing as opposed to the previous mechanism where data
tearing could happen only between pages of generated output data. Since
current userspace considers data tearing between pages to be acceptable,
we assume is will be able to handle sub-page data tearing as well.
Link: https://lkml.kernel.org/r/20250719182854.3166724-7-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Ye Bin <yebin10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'fs/proc/internal.h')
-rw-r--r-- | fs/proc/internal.h | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 3d48ffe72583..7c235451c5ea 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -384,6 +384,11 @@ struct proc_maps_private { struct task_struct *task; struct mm_struct *mm; struct vma_iterator iter; + loff_t last_pos; +#ifdef CONFIG_PER_VMA_LOCK + bool mmap_locked; + struct vm_area_struct *locked_vma; +#endif #ifdef CONFIG_NUMA struct mempolicy *task_mempolicy; #endif |