summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2023-02-09mm/swapfile: remove pr_debug in get_swap_pages()Longlong Xia
It's known that get_swap_pages() may fail to find available space under some extreme case, but pr_debug() provides useless information. Let's remove it. Link: https://lkml.kernel.org/r/20230131071035.1085968-1-xialonglong1@huawei.com Signed-off-by: Longlong Xia <xialonglong1@huawei.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09memory tier: release the new_memtier in find_create_memory_tier()Tong Tiangen
In find_create_memory_tier(), if failed to register device, then we should release new_memtier from the tier list and put device instead of memtier. Link: https://lkml.kernel.org/r/20230129040651.1329208-1-tongtiangen@huawei.com Fixes: 9832fb87834e ("mm/demotion: expose memory tier details via sysfs") Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Guohanjun <guohanjun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09kasan: infer allocation size by scanning metadataKuan-Ying Lee
Make KASAN scan metadata to infer the requested allocation size instead of printing cache->object_size. This patch fixes confusing slab-out-of-bounds reports as reported in: https://bugzilla.kernel.org/show_bug.cgi?id=216457 As an example of the confusing behavior, the report below hints that the allocation size was 192, while the kernel actually called kmalloc(184): ================================================================== BUG: KASAN: slab-out-of-bounds in _find_next_bit+0x143/0x160 lib/find_bit.c:109 Read of size 8 at addr ffff8880175766b8 by task kworker/1:1/26 ... The buggy address belongs to the object at ffff888017576600 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 184 bytes inside of 192-byte region [ffff888017576600, ffff8880175766c0) ... Memory state around the buggy address: ffff888017576580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888017576600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff888017576680: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc ^ ffff888017576700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888017576780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== With this patch, the report shows: ================================================================== ... The buggy address belongs to the object at ffff888017576600 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 0 bytes to the right of allocated 184-byte region [ffff888017576600, ffff8880175766b8) ... ================================================================== Also report slab use-after-free bugs as "slab-use-after-free" and print "freed" instead of "allocated" in the report when describing the accessed memory region. Also improve the metadata-related comment in kasan_find_first_bad_addr and use addr_has_metadata across KASAN code instead of open-coding KASAN_SHADOW_START checks. [akpm@linux-foundation.org: fix printk warning] Link: https://bugzilla.kernel.org/show_bug.cgi?id=216457 Link: https://lkml.kernel.org/r/20230129021437.18812-1-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Co-developed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: export dump_mm()Suren Baghdasaryan
mmap_assert_write_locked() is used in vm_flags modifiers. Because mmap_assert_write_locked() uses dump_mm() and vm_flags are sometimes modified from inside a module, it's necessary to export dump_mm() function. Link: https://lkml.kernel.org/r/20230126193752.297968-8-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: introduce __vm_flags_mod and use it in untrack_pfnSuren Baghdasaryan
There are scenarios when vm_flags can be modified without exclusive mmap_lock, such as: - after VMA was isolated and mmap_lock was downgraded or dropped - in exit_mmap when there are no other mm users and locking is unnecessary Introduce __vm_flags_mod to avoid assertions when the caller takes responsibility for the required locking. Pass a hint to untrack_pfn to conditionally use __vm_flags_mod for flags modification to avoid assertion. Link: https://lkml.kernel.org/r/20230126193752.297968-7-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASKSuren Baghdasaryan
To simplify the usage of VM_LOCKED_CLEAR_MASK in vm_flags_clear(), replace it with VM_LOCKED_MASK bitmask and convert all users. Link: https://lkml.kernel.org/r/20230126193752.297968-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09vma_merge: set vma iterator to correct position.Liam R. Howlett
When merging the previous value, set the vma iterator to the previous slot. Don't use the vma iterator to get the next/prev so that it is in the correct position for a write. Link: https://lkml.kernel.org/r/20230120162650.984577-50-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mmap: remove __vma_adjust()Liam R. Howlett
Inline the work of __vma_adjust() into vma_merge(). This reduces code size and has the added benefits of the comments for the cases being located with the code. Change the comments referencing vma_adjust() accordingly. [Liam.Howlett@oracle.com: fix vma_merge() offset when expanding the next vma] Link: https://lkml.kernel.org/r/20230130195713.2881766-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20230120162650.984577-49-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mmap: convert do_brk_flags() to use vma_prepare() and vma_complete()Liam R. Howlett
Use the abstracted vma locking for do_brk_flags() Link: https://lkml.kernel.org/r/20230120162650.984577-48-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mmap: introduce dup_vma_anon() helperLiam R. Howlett
Create a helper for duplicating the anon vma when adjusting the vma. This simplifies the logic of __vma_adjust(). Link: https://lkml.kernel.org/r/20230120162650.984577-47-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mmap: don't use __vma_adjust() in shift_arg_pages()Liam R. Howlett
Introduce shrink_vma() which uses the vma_prepare() and vma_complete() functions to reduce the vma coverage. Convert shift_arg_pages() to use expand_vma() and the new shrink_vma() function. Remove support from __vma_adjust() to reduce a vma size since shift_arg_pages() is the only user that shrinks a VMA in this way. Link: https://lkml.kernel.org/r/20230120162650.984577-46-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mremap: convert vma_adjust() to vma_expand()Liam R. Howlett
Stop using vma_adjust() in preparation for removing the function. Export vma_expand() to use instead. Link: https://lkml.kernel.org/r/20230120162650.984577-45-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: don't use __vma_adjust() in __split_vma()Liam R. Howlett
Use the abstracted locking and maple tree operations. Since __split_vma() is the only user of the __vma_adjust() function to use the insert argument, drop that argument. Remove the NULL passed through from fs/exec's shift_arg_pages() and mremap() at the same time. Link: https://lkml.kernel.org/r/20230120162650.984577-44-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mmap: introduce init_vma_prep() and init_multi_vma_prep()Liam R. Howlett
Add init_vma_prep() and init_multi_vma_prep() to set up the struct vma_prepare. This is to abstract the locking when adjusting the VMAs. Also change __vma_adjust() variable remove_next int in favour of a pointer to the VMA to remove. Rename next_next to remove2 since this better reflects its use. Link: https://lkml.kernel.org/r/20230120162650.984577-43-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mmap: use vma_prepare() and vma_complete() in vma_expand()Liam R. Howlett
Use the new locking functions for vma_expand(). This reduces code duplication. At the same time change VM_BUG_ON() to VM_WARN_ON() Link: https://lkml.kernel.org/r/20230120162650.984577-42-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mmap: refactor locking out of __vma_adjust()Liam R. Howlett
Move the locking into vma_prepare() and vma_complete() for use elsewhere Link: https://lkml.kernel.org/r/20230120162650.984577-41-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mmap: move anon_vma setting in __vma_adjust()Liam R. Howlett
Move the anon_vma setting & warn_no up the function. This is done to clear up the locking later. Link: https://lkml.kernel.org/r/20230120162650.984577-40-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: change munmap splitting order and move_vma()Liam R. Howlett
Splitting can be more efficient when the order is not of concern. Change do_vmi_align_munmap() to reduce walking of the tree during split operations. move_vma() must also be altered to remove the dependency of keeping the original VMA as the active part of the split. Transition to using vma iterator to look up the prev and/or next vma after munmap. [Liam.Howlett@oracle.com: fix vma iterator initialization] Link: https://lkml.kernel.org/r/20230126212011.980350-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20230120162650.984577-39-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mmap: clean up mmap_region() unrollingLiam R. Howlett
Move logic of unrolling to the error path as apposed to duplicating it within the function body. This reduces the potential of missing an update to one path when making changes. Link: https://lkml.kernel.org/r/20230120162650.984577-38-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Li Zetao <lizetao1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: add vma iterator to vma_adjust() argumentsLiam R. Howlett
Change the vma_adjust() function definition to accept the vma iterator and pass it through to __vma_adjust(). Update fs/exec to use the new vma_adjust() function parameters. Update mm/mremap to use the new vma_adjust() function parameters. Revert the __split_vma() calls back from __vma_adjust() to vma_adjust() and pass through the vma iterator. Link: https://lkml.kernel.org/r/20230120162650.984577-37-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: pass vma iterator through to __vma_adjust()Liam R. Howlett
Pass the iterator through to be used in __vma_adjust(). The state of the iterator needs to be correct for the operation that will occur so make the adjustments. Link: https://lkml.kernel.org/r/20230120162650.984577-36-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: remove unnecessary write to vma iterator in __vma_adjust()Liam R. Howlett
If the vma start address is going to change due to an insert, then it is safe to not write the vma to the tree. The write of the insert vma will alter the tree as necessary. Link: https://lkml.kernel.org/r/20230120162650.984577-35-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09madvise: use split_vma() instead of __split_vma()Liam R. Howlett
The split_vma() wrapper is specifically for this use case, so use it. [Liam.Howlett@oracle.com: fix VMA_ITERATOR start position] Link: https://lkml.kernel.org/r/20230125135809.85262-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20230120162650.984577-34-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: pass through vma iterator to __vma_adjust()Liam R. Howlett
Pass the vma iterator through to __vma_adjust() so the state can be updated. Link: https://lkml.kernel.org/r/20230120162650.984577-33-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mmap: convert __vma_adjust() to use vma iteratorLiam R. Howlett
Use the vma iterator internally for __vma_adjust(). Avoid using the maple tree interface directly for type safety. Link: https://lkml.kernel.org/r/20230120162650.984577-32-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/damon/vaddr-test.h: stop using vma_mas_store() for maple tree storeLiam R. Howlett
Prepare for the removal of the vma_mas_store() function by open coding the maple tree store in this test code. Set the range of the maple state and call the store function directly. Link: https://lkml.kernel.org/r/20230120162650.984577-31-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: switch vma_merge(), split_vma(), and __split_vma to vma iteratorLiam R. Howlett
Drop the vmi_* functions and transition all users to use the vma iterator directly. Link: https://lkml.kernel.org/r/20230120162650.984577-30-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09nommu: pass through vma iterator to shrink_vma()Liam R. Howlett
Rename the function to vmi_shrink_vma() indicate it takes the vma iterator. Use the iterator to preallocate and drop the delete function. The maple tree is able to do the modification easier than the linked list and rbtree, so just clear the necessary area in the tree. add_vma_to_mm() is no longer used, so drop this function. vmi_add_vma_to_mm() is now only used once, so inline this function into do_mmap(). Link: https://lkml.kernel.org/r/20230120162650.984577-29-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09nommu: convert nommu to using the vma iteratorLiam R. Howlett
Gain type safety in nommu by using the vma_iterator and not the maple tree directly. Link: https://lkml.kernel.org/r/20230120162650.984577-28-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mremap: use vmi version of vma_merge()Liam R. Howlett
Use the vma iterator so that the iterator can be invalidated or updated to avoid each caller doing so. Link: https://lkml.kernel.org/r/20230120162650.984577-27-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mmap: use vmi version of vma_merge()Liam R. Howlett
Use the vma iterator so that the iterator can be invalidated or updated to avoid each caller doing so. Link: https://lkml.kernel.org/r/20230120162650.984577-26-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mmap: pass through vmi iterator to __split_vma()Liam R. Howlett
Use the vma iterator so that the iterator can be invalidated or updated to avoid each caller doing so. Link: https://lkml.kernel.org/r/20230120162650.984577-25-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09madvise: use vmi iterator for __split_vma() and vma_merge()Liam R. Howlett
Use the vma iterator so that the iterator can be invalidated or updated to avoid each caller doing so. Link: https://lkml.kernel.org/r/20230120162650.984577-24-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mempolicy: convert to vma iteratorLiam R. Howlett
Use the vma iterator so that the iterator can be invalidated or updated to avoid each caller doing so. Link: https://lkml.kernel.org/r/20230120162650.984577-21-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mlock: convert mlock to vma iteratorLiam R. Howlett
Use the vma iterator so that the iterator can be invalidated or updated to avoid each caller doing so. Link: https://lkml.kernel.org/r/20230120162650.984577-19-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: change mprotect_fixup to vma iteratorLiam R. Howlett
Use the vma iterator so that the iterator can be invalidated or updated to avoid each caller doing so. Link: https://lkml.kernel.org/r/20230120162650.984577-18-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09ipc/shm: introduce new do_vma_munmap() to munmapLiam R. Howlett
The shm already has the vma iterator in position for a write. do_vmi_munmap() searches for the correct position and aligns the write, so it is not the right function to use in this case. The shm VMA tree modification is similar to the brk munmap situation, the vma iterator is in position and the VMA is already known. This patch generalizes the brk munmap function do_brk_munmap() to be used for any other callers with the vma iterator already in position to munmap a VMA. Link: https://lkml.kernel.org/r/20230126212049.980501-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Sven Schnelle <svens@linux.ibm.com> Link: https://lore.kernel.org/linux-mm/yt9dh6wec21a.fsf@linux.ibm.com/ Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: add temporary vma iterator versions of vma_merge(), split_vma(), and ↵Liam R. Howlett
__split_vma() These wrappers are short-lived in this patch set so that each user can be converted on its own. In the end, these functions are renamed in one commit. Link: https://lkml.kernel.org/r/20230120162650.984577-15-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mmap: convert vma_expand() to use vma iteratorLiam R. Howlett
Use the vma iterator instead of the maple state for type safety and for consistency through the mm code. Link: https://lkml.kernel.org/r/20230120162650.984577-14-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iteratorLiam R. Howlett
Start passing the vma iterator through the mm code. This will allow for reuse of the state and cleaner invalidation if necessary. Link: https://lkml.kernel.org/r/20230120162650.984577-13-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mmap: remove preallocation from do_mas_align_munmap()Liam R. Howlett
In preparation of passing the vma state through split, the pre-allocation that occurs before the split has to be moved to after. Since the preallocation would then live right next to the store, just call store instead of preallocating. This effectively restores the potential error path of splitting and not munmap'ing which pre-dates the maple tree. Link: https://lkml.kernel.org/r/20230120162650.984577-12-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mmap: convert vma_link() vma iteratorLiam R. Howlett
Avoid using the maple tree interface directly. Link: https://lkml.kernel.org/r/20230120162650.984577-11-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm/mmap: convert brk to use vma iteratorLiam R. Howlett
Use the vma iterator API for the brk() system call. This will provide type safety at compile time. Link: https://lkml.kernel.org/r/20230120162650.984577-9-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: expand vma iterator interfaceLiam R. Howlett
Add wrappers for the maple tree to the vma iterator. This will provide type safety at compile time. Link: https://lkml.kernel.org/r/20230120162650.984577-8-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: shrinkers: fix deadlock in shrinker debugfsQi Zheng
The debugfs_remove_recursive() is invoked by unregister_shrinker(), which is holding the write lock of shrinker_rwsem. It will waits for the handler of debugfs file complete. The handler also needs to hold the read lock of shrinker_rwsem to do something. So it may cause the following deadlock: CPU0 CPU1 debugfs_file_get() shrinker_debugfs_count_show()/shrinker_debugfs_scan_write() unregister_shrinker() --> down_write(&shrinker_rwsem); debugfs_remove_recursive() // wait for (A) --> wait_for_completion(); // wait for (B) --> down_read_killable(&shrinker_rwsem) debugfs_file_put() -- (A) up_write() -- (B) The down_read_killable() can be killed, so that the above deadlock can be recovered. But it still requires an extra kill action, otherwise it will block all subsequent shrinker-related operations, so it's better to fix it. [akpm@linux-foundation.org: fix CONFIG_SHRINKER_DEBUG=n stub] Link: https://lkml.kernel.org/r/20230202105612.64641-1-zhengqi.arch@bytedance.com Fixes: 5035ebc644ae ("mm: shrinkers: introduce debugfs interface for memory shrinkers") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: hwpoison: support recovery from ksm_might_need_to_copy()Kefeng Wang
When the kernel copies a page from ksm_might_need_to_copy(), but runs into an uncorrectable error, it will crash since poisoned page is consumed by kernel, this is similar to the issue recently fixed by Copy-on-write poison recovery. When an error is detected during the page copy, return VM_FAULT_HWPOISON in do_swap_page(), and install a hwpoison entry in unuse_pte() when swapoff, which help us to avoid system crash. Note, memory failure on a KSM page will be skipped, but still call memory_failure_queue() to be consistent with general memory failure process, and we could support KSM page recovery in the feature. [wangkefeng.wang@huawei.com: enhance unuse_pte(), fix issue found by lkp] Link: https://lkml.kernel.org/r/20221213120523.141588-1-wangkefeng.wang@huawei.com [wangkefeng.wang@huawei.com: update changelog, alter ksm_might_need_to_copy(), restore unlikely() in unuse_pte()] Link: https://lkml.kernel.org/r/20230201074433.96641-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20221209072801.193221-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09kasan: fix Oops due to missing calls to kasan_arch_is_ready()Christophe Leroy
On powerpc64, you can build a kernel with KASAN as soon as you build it with RADIX MMU support. However if the CPU doesn't have RADIX MMU, KASAN isn't enabled at init and the following Oops is encountered. [ 0.000000][ T0] KASAN not enabled as it requires radix! [ 4.484295][ T26] BUG: Unable to handle kernel data access at 0xc00e000000804a04 [ 4.485270][ T26] Faulting instruction address: 0xc00000000062ec6c [ 4.485748][ T26] Oops: Kernel access of bad area, sig: 11 [#1] [ 4.485920][ T26] BE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 4.486259][ T26] Modules linked in: [ 4.486637][ T26] CPU: 0 PID: 26 Comm: kworker/u2:2 Not tainted 6.2.0-rc3-02590-gf8a023b0a805 #249 [ 4.486907][ T26] Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,HEAD pSeries [ 4.487445][ T26] Workqueue: eval_map_wq .tracer_init_tracefs_work_func [ 4.488744][ T26] NIP: c00000000062ec6c LR: c00000000062bb84 CTR: c0000000002ebcd0 [ 4.488867][ T26] REGS: c0000000049175c0 TRAP: 0380 Not tainted (6.2.0-rc3-02590-gf8a023b0a805) [ 4.489028][ T26] MSR: 8000000002009032 <SF,VEC,EE,ME,IR,DR,RI> CR: 44002808 XER: 00000000 [ 4.489584][ T26] CFAR: c00000000062bb80 IRQMASK: 0 [ 4.489584][ T26] GPR00: c0000000005624d4 c000000004917860 c000000001cfc000 1800000000804a04 [ 4.489584][ T26] GPR04: c0000000003a2650 0000000000000cc0 c00000000000d3d8 c00000000000d3d8 [ 4.489584][ T26] GPR08: c0000000049175b0 a80e000000000000 0000000000000000 0000000017d78400 [ 4.489584][ T26] GPR12: 0000000044002204 c000000003790000 c00000000435003c c0000000043f1c40 [ 4.489584][ T26] GPR16: c0000000043f1c68 c0000000043501a0 c000000002106138 c0000000043f1c08 [ 4.489584][ T26] GPR20: c0000000043f1c10 c0000000043f1c20 c000000004146c40 c000000002fdb7f8 [ 4.489584][ T26] GPR24: c000000002fdb834 c000000003685e00 c000000004025030 c000000003522e90 [ 4.489584][ T26] GPR28: 0000000000000cc0 c0000000003a2650 c000000004025020 c000000004025020 [ 4.491201][ T26] NIP [c00000000062ec6c] .kasan_byte_accessible+0xc/0x20 [ 4.491430][ T26] LR [c00000000062bb84] .__kasan_check_byte+0x24/0x90 [ 4.491767][ T26] Call Trace: [ 4.491941][ T26] [c000000004917860] [c00000000062ae70] .__kasan_kmalloc+0xc0/0x110 (unreliable) [ 4.492270][ T26] [c0000000049178f0] [c0000000005624d4] .krealloc+0x54/0x1c0 [ 4.492453][ T26] [c000000004917990] [c0000000003a2650] .create_trace_option_files+0x280/0x530 [ 4.492613][ T26] [c000000004917a90] [c000000002050d90] .tracer_init_tracefs_work_func+0x274/0x2c0 [ 4.492771][ T26] [c000000004917b40] [c0000000001f9948] .process_one_work+0x578/0x9f0 [ 4.492927][ T26] [c000000004917c30] [c0000000001f9ebc] .worker_thread+0xfc/0x950 [ 4.493084][ T26] [c000000004917d60] [c00000000020be84] .kthread+0x1a4/0x1b0 [ 4.493232][ T26] [c000000004917e10] [c00000000000d3d8] .ret_from_kernel_thread+0x58/0x60 [ 4.495642][ T26] Code: 60000000 7cc802a6 38a00000 4bfffc78 60000000 7cc802a6 38a00001 4bfffc68 60000000 3d20a80e 7863e8c2 792907c6 <7c6348ae> 20630007 78630fe0 68630001 [ 4.496704][ T26] ---[ end trace 0000000000000000 ]--- The Oops is due to kasan_byte_accessible() not checking the readiness of KASAN. Add missing call to kasan_arch_is_ready() and bail out when not ready. The same problem is observed with ____kasan_kfree_large() so fix it the same. Also, as KASAN is not available and no shadow area is allocated for linear memory mapping, there is no point in allocating shadow mem for vmalloc memory as shown below in /sys/kernel/debug/kernel_page_tables ---[ kasan shadow mem start ]--- 0xc00f000000000000-0xc00f00000006ffff 0x00000000040f0000 448K r w pte valid present dirty accessed 0xc00f000000860000-0xc00f00000086ffff 0x000000000ac10000 64K r w pte valid present dirty accessed 0xc00f3ffffffe0000-0xc00f3fffffffffff 0x0000000004d10000 128K r w pte valid present dirty accessed ---[ kasan shadow mem end ]--- So, also verify KASAN readiness before allocating and poisoning shadow mem for VMAs. Link: https://lkml.kernel.org/r/150768c55722311699fdcf8f5379e8256749f47d.1674716617.git.christophe.leroy@csgroup.eu Fixes: 41b7a347bf14 ("powerpc: Book3S 64-bit outline-only KASAN support") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reported-by: Nathan Lynch <nathanl@linux.ibm.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> [5.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-08mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available earlyThomas Gleixner
The memory allocators are available during early boot even in the phase where interrupts are disabled and scheduling is not yet possible. The setup is so that GFP_KERNEL allocations work in this phase without causing might_alloc() splats to be emitted because the system state is SYSTEM_BOOTING at that point which prevents the warnings to trigger. Most allocation/free functions use local_irq_save()/restore() or a lock variant of that. But kmem_cache_alloc_bulk() and kmem_cache_free_bulk() use local_[lock]_irq_disable()/enable(), which leads to a lockdep warning when interrupts are enabled during the early boot phase. This went unnoticed so far as there are no early users of these interfaces. The upcoming conversion of the interrupt descriptor store from radix_tree to maple_tree triggered this warning as maple_tree uses the bulk interface. Cure this by moving the kmem_cache_alloc/free() bulk variants of SLUB and SLAB to local[_lock]_irq_save()/restore(). There is obviously no reclaim possible and required at this point so there is no need to expand this coverage further. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2023-02-07Revert "mm: Always release pages to the buddy allocator in ↵Aaron Thompson
memblock_free_late()." This reverts commit 115d9d77bb0f9152c60b6e8646369fa7f6167593. The pages being freed by memblock_free_late() have already been initialized, but if they are in the deferred init range, __free_one_page() might access nearby uninitialized pages when trying to coalesce buddies. This can, for example, trigger this BUG: BUG: unable to handle page fault for address: ffffe964c02580c8 RIP: 0010:__list_del_entry_valid+0x3f/0x70 <TASK> __free_one_page+0x139/0x410 __free_pages_ok+0x21d/0x450 memblock_free_late+0x8c/0xb9 efi_free_boot_services+0x16b/0x25c efi_enter_virtual_mode+0x403/0x446 start_kernel+0x678/0x714 secondary_startup_64_no_verify+0xd2/0xdb </TASK> A proper fix will be more involved so revert this change for the time being. Fixes: 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late().") Signed-off-by: Aaron Thompson <dev@aaront.org> Link: https://lore.kernel.org/r/20230207082151.1303-1-dev@aaront.org Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>