summaryrefslogtreecommitdiff
path: root/mm/kasan
AgeCommit message (Collapse)Author
2025-09-08mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc()Uladzislau Rezki (Sony)
kasan_populate_vmalloc() and its helpers ignore the caller's gfp_mask and always allocate memory using the hardcoded GFP_KERNEL flag. This makes them inconsistent with vmalloc(), which was recently extended to support GFP_NOFS and GFP_NOIO allocations. Page table allocations performed during shadow population also ignore the external gfp_mask. To preserve the intended semantics of GFP_NOFS and GFP_NOIO, wrap the apply_to_page_range() calls into the appropriate memalloc scope. xfs calls vmalloc with GFP_NOFS, so this bug could lead to deadlock. There was a report here https://lkml.kernel.org/r/686ea951.050a0220.385921.0016.GAE@google.com This patch: - Extends kasan_populate_vmalloc() and helpers to take gfp_mask; - Passes gfp_mask down to alloc_pages_bulk() and __get_free_page(); - Enforces GFP_NOFS/NOIO semantics with memalloc_*_save()/restore() around apply_to_page_range(); - Updates vmalloc.c and percpu allocator call sites accordingly. Link: https://lkml.kernel.org/r/20250831121058.92971-1-urezki@gmail.com Fixes: 451769ebb7e7 ("mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reported-by: syzbot+3470c9ffee63e4abafeb@syzkaller.appspotmail.com Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-27mm: introduce and use {pgd,p4d}_populate_kernel()Harry Yoo
Introduce and use {pgd,p4d}_populate_kernel() in core MM code when populating PGD and P4D entries for the kernel address space. These helpers ensure proper synchronization of page tables when updating the kernel portion of top-level page tables. Until now, the kernel has relied on each architecture to handle synchronization of top-level page tables in an ad-hoc manner. For example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes"). However, this approach has proven fragile for following reasons: 1) It is easy to forget to perform the necessary page table synchronization when introducing new changes. For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") overlooked the need to synchronize page tables for the vmemmap area. 2) It is also easy to overlook that the vmemmap and direct mapping areas must not be accessed before explicit page table synchronization. For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")) caused crashes by accessing the vmemmap area before calling sync_global_pgds(). To address this, as suggested by Dave Hansen, introduce _kernel() variants of the page table population helpers, which invoke architecture-specific hooks to properly synchronize page tables. These are introduced in a new header file, include/linux/pgalloc.h, so they can be called from common code. They reuse existing infrastructure for vmalloc and ioremap. Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK, and the actual synchronization is performed by arch_sync_kernel_mappings(). This change currently targets only x86_64, so only PGD and P4D level helpers are introduced. Currently, these helpers are no-ops since no architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK. In theory, PUD and PMD level helpers can be added later if needed by other architectures. For now, 32-bit architectures (x86-32 and arm) only handle PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless we introduce a PMD level helper. [harry.yoo@oracle.com: fix KASAN build error due to p*d_populate_kernel()] Link: https://lkml.kernel.org/r/20250822020727.202749-1-harry.yoo@oracle.com Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: bibo mao <maobibo@loongson.cn> Cc: Borislav Betkov <bp@alien8.de> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-27mm/kasan: avoid lazy MMU mode hazardsAlexander Gordeev
Functions __kasan_populate_vmalloc() and __kasan_depopulate_vmalloc() use apply_to_pte_range(), which enters lazy MMU mode. In that mode updating PTEs may not be observed until the mode is left. That may lead to a situation in which otherwise correct reads and writes to a PTE using ptep_get(), set_pte(), pte_clear() and other access primitives bring wrong results when the vmalloc shadow memory is being (de-)populated. To avoid these hazards leave the lazy MMU mode before and re-enter it after each PTE manipulation. Link: https://lkml.kernel.org/r/0d2efb7ddddbff6b288fbffeeb10166e90771718.1755528662.git.agordeev@linux.ibm.com Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-27mm/kasan: fix vmalloc shadow memory (de-)population racesAlexander Gordeev
While working on the lazy MMU mode enablement for s390 I hit pretty curious issues in the kasan code. The first is related to a custom kasan-based sanitizer aimed at catching invalid accesses to PTEs and is inspired by [1] conversation. The kasan complains on valid PTE accesses, while the shadow memory is reported as unpoisoned: [ 102.783993] ================================================================== [ 102.784008] BUG: KASAN: out-of-bounds in set_pte_range+0x36c/0x390 [ 102.784016] Read of size 8 at addr 0000780084cf9608 by task vmalloc_test/0/5542 [ 102.784019] [ 102.784040] CPU: 1 UID: 0 PID: 5542 Comm: vmalloc_test/0 Kdump: loaded Tainted: G OE 6.16.0-gcc-ipte-kasan-11657-gb2d930c4950e #340 PREEMPT [ 102.784047] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 102.784049] Hardware name: IBM 8561 T01 703 (LPAR) [ 102.784052] Call Trace: [ 102.784054] [<00007fffe0147ac0>] dump_stack_lvl+0xe8/0x140 [ 102.784059] [<00007fffe0112484>] print_address_description.constprop.0+0x34/0x2d0 [ 102.784066] [<00007fffe011282c>] print_report+0x10c/0x1f8 [ 102.784071] [<00007fffe090785a>] kasan_report+0xfa/0x220 [ 102.784078] [<00007fffe01d3dec>] set_pte_range+0x36c/0x390 [ 102.784083] [<00007fffe01d41c2>] leave_ipte_batch+0x3b2/0xb10 [ 102.784088] [<00007fffe07d3650>] apply_to_pte_range+0x2f0/0x4e0 [ 102.784094] [<00007fffe07e62e4>] apply_to_pmd_range+0x194/0x3e0 [ 102.784099] [<00007fffe07e820e>] __apply_to_page_range+0x2fe/0x7a0 [ 102.784104] [<00007fffe07e86d8>] apply_to_page_range+0x28/0x40 [ 102.784109] [<00007fffe090a3ec>] __kasan_populate_vmalloc+0xec/0x310 [ 102.784114] [<00007fffe090aa36>] kasan_populate_vmalloc+0x96/0x130 [ 102.784118] [<00007fffe0833a04>] alloc_vmap_area+0x3d4/0xf30 [ 102.784123] [<00007fffe083a8ba>] __get_vm_area_node+0x1aa/0x4c0 [ 102.784127] [<00007fffe083c4f6>] __vmalloc_node_range_noprof+0x126/0x4e0 [ 102.784131] [<00007fffe083c980>] __vmalloc_node_noprof+0xd0/0x110 [ 102.784135] [<00007fffe083ca32>] vmalloc_noprof+0x32/0x40 [ 102.784139] [<00007fff608aa336>] fix_size_alloc_test+0x66/0x150 [test_vmalloc] [ 102.784147] [<00007fff608aa710>] test_func+0x2f0/0x430 [test_vmalloc] [ 102.784153] [<00007fffe02841f8>] kthread+0x3f8/0x7a0 [ 102.784159] [<00007fffe014d8b4>] __ret_from_fork+0xd4/0x7d0 [ 102.784164] [<00007fffe299c00a>] ret_from_fork+0xa/0x30 [ 102.784173] no locks held by vmalloc_test/0/5542. [ 102.784176] [ 102.784178] The buggy address belongs to the physical page: [ 102.784186] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x84cf9 [ 102.784198] flags: 0x3ffff00000000000(node=0|zone=1|lastcpupid=0x1ffff) [ 102.784212] page_type: f2(table) [ 102.784225] raw: 3ffff00000000000 0000000000000000 0000000000000122 0000000000000000 [ 102.784234] raw: 0000000000000000 0000000000000000 f200000000000001 0000000000000000 [ 102.784248] page dumped because: kasan: bad access detected [ 102.784250] [ 102.784252] Memory state around the buggy address: [ 102.784260] 0000780084cf9500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 102.784274] 0000780084cf9580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 102.784277] >0000780084cf9600: fd 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 102.784290] ^ [ 102.784293] 0000780084cf9680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 102.784303] 0000780084cf9700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 102.784306] ================================================================== The second issue hits when the custom sanitizer above is not implemented, but the kasan itself is still active: [ 1554.438028] Unable to handle kernel pointer dereference in virtual kernel address space [ 1554.438065] Failing address: 001c0ff0066f0000 TEID: 001c0ff0066f0403 [ 1554.438076] Fault in home space mode while using kernel ASCE. [ 1554.438103] AS:00000000059d400b R2:0000000ffec5c00b R3:00000000c6c9c007 S:0000000314470001 P:00000000d0ab413d [ 1554.438158] Oops: 0011 ilc:2 [#1]SMP [ 1554.438175] Modules linked in: test_vmalloc(E+) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) sunrpc(E) pkey_pckmo(E) uvdevice(E) s390_trng(E) rng_core(E) eadm_sch(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) loop(E) i2c_core(E) drm_panel_orientation_quirks(E) nfnetlink(E) ctcm(E) fsm(E) zfcp(E) scsi_transport_fc(E) diag288_wdt(E) watchdog(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha1_s390(E) sha_common(E) pkey(E) autofs4(E) [ 1554.438319] Unloaded tainted modules: pkey_uv(E):1 hmac_s390(E):2 [ 1554.438354] CPU: 1 UID: 0 PID: 1715 Comm: vmalloc_test/0 Kdump: loaded Tainted: G E 6.16.0-gcc-ipte-kasan-11657-gb2d930c4950e #350 PREEMPT [ 1554.438368] Tainted: [E]=UNSIGNED_MODULE [ 1554.438374] Hardware name: IBM 8561 T01 703 (LPAR) [ 1554.438381] Krnl PSW : 0704e00180000000 00007fffe1d3d6ae (memset+0x5e/0x98) [ 1554.438396] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 1554.438409] Krnl GPRS: 0000000000000001 001c0ff0066f0000 001c0ff0066f0000 00000000000000f8 [ 1554.438418] 00000000000009fe 0000000000000009 0000000000000000 0000000000000002 [ 1554.438426] 0000000000005000 000078031ae655c8 00000feffdcf9f59 0000780258672a20 [ 1554.438433] 0000780243153500 00007f8033780000 00007fffe083a510 00007f7fee7cfa00 [ 1554.438452] Krnl Code: 00007fffe1d3d6a0: eb540008000c srlg %r5,%r4,8 00007fffe1d3d6a6: b9020055 ltgr %r5,%r5 #00007fffe1d3d6aa: a784000b brc 8,00007fffe1d3d6c0 >00007fffe1d3d6ae: 42301000 stc %r3,0(%r1) 00007fffe1d3d6b2: d2fe10011000 mvc 1(255,%r1),0(%r1) 00007fffe1d3d6b8: 41101100 la %r1,256(%r1) 00007fffe1d3d6bc: a757fff9 brctg %r5,00007fffe1d3d6ae 00007fffe1d3d6c0: 42301000 stc %r3,0(%r1) [ 1554.438539] Call Trace: [ 1554.438545] [<00007fffe1d3d6ae>] memset+0x5e/0x98 [ 1554.438552] ([<00007fffe083a510>] remove_vm_area+0x220/0x400) [ 1554.438562] [<00007fffe083a9d6>] vfree.part.0+0x26/0x810 [ 1554.438569] [<00007fff6073bd50>] fix_align_alloc_test+0x50/0x90 [test_vmalloc] [ 1554.438583] [<00007fff6073c73a>] test_func+0x46a/0x6c0 [test_vmalloc] [ 1554.438593] [<00007fffe0283ac8>] kthread+0x3f8/0x7a0 [ 1554.438603] [<00007fffe014d8b4>] __ret_from_fork+0xd4/0x7d0 [ 1554.438613] [<00007fffe299ac0a>] ret_from_fork+0xa/0x30 [ 1554.438622] INFO: lockdep is turned off. [ 1554.438627] Last Breaking-Event-Address: [ 1554.438632] [<00007fffe1d3d65c>] memset+0xc/0x98 [ 1554.438644] Kernel panic - not syncing: Fatal exception: panic_on_oops This series fixes the above issues and is a pre-requisite for the s390 lazy MMU mode implementation. test_vmalloc was used to stress-test the fixes. This patch (of 2): When vmalloc shadow memory is established the modification of the corresponding page tables is not protected by any locks. Instead, the locking is done per-PTE. This scheme however has defects. kasan_populate_vmalloc_pte() - while ptep_get() read is atomic the sequence pte_none(ptep_get()) is not. Doing that outside of the lock might lead to a concurrent PTE update and what could be seen as a shadow memory corruption as result. kasan_depopulate_vmalloc_pte() - by the time a page whose address was extracted from ptep_get() read and cached in a local variable outside of the lock is attempted to get free, could actually be freed already. To avoid these put ptep_get() itself and the code that manipulates the result of the read under lock. In addition, move freeing of the page out of the atomic context. Link: https://lkml.kernel.org/r/cover.1755528662.git.agordeev@linux.ibm.com Link: https://lkml.kernel.org/r/adb258634194593db294c0d1fb35646e894d6ead.1755528662.git.agordeev@linux.ibm.com Link: https://lore.kernel.org/linux-mm/5b0609c9-95ee-4e48-bb6d-98f57c5d2c31@arm.com/ [1] Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-27kunit: kasan_test: disable fortify string checker on kasan_strings() testYeoreum Yun
Similar to commit 09c6304e38e4 ("kasan: test: fix compatibility with FORTIFY_SOURCE") the kernel is panicing in kasan_string(). This is due to the `src` and `ptr` not being hidden from the optimizer which would disable the runtime fortify string checker. Call trace: __fortify_panic+0x10/0x20 (P) kasan_strings+0x980/0x9b0 kunit_try_run_case+0x68/0x190 kunit_generic_run_threadfn_adapter+0x34/0x68 kthread+0x1c4/0x228 ret_from_fork+0x10/0x20 Code: d503233f a9bf7bfd 910003fd 9424b243 (d4210000) ---[ end trace 0000000000000000 ]--- note: kunit_try_catch[128] exited with irqs disabled note: kunit_try_catch[128] exited with preempt_count 1 # kasan_strings: try faulted: last ** replaying previous printk message ** # kasan_strings: try faulted: last line seen mm/kasan/kasan_test_c.c:1600 # kasan_strings: internal error occurred preventing test case from running: -4 Link: https://lkml.kernel.org/r/20250801120236.2962642-1-yeoreum.yun@arm.com Fixes: 73228c7ecc5e ("KASAN: port KASAN Tests to KUnit") Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-05kasan/test: fix protection against compiler elisionJann Horn
The kunit test is using assignments to "static volatile void *kasan_ptr_result" to prevent elision of memory loads, but that's not working: In this variable definition, the "volatile" applies to the "void", not to the pointer. To make "volatile" apply to the pointer as intended, it must follow after the "*". This makes the kasan_memchr test pass again on my system. The kasan_strings test is still failing because all the definitions of load_unaligned_zeropad() are lacking explicit instrumentation hooks and ASAN does not instrument asm() memory operands. Link: https://lkml.kernel.org/r/20250728-kasan-kunit-fix-volatile-v1-1-e7157c9af82d@google.com Fixes: 5f1c8108e7ad ("mm:kasan: fix sparse warnings: Should it be static?") Signed-off-by: Jann Horn <jannh@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Cc: Nihar Chaithanya <niharchaithanya@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-05Merge tag 'mm-stable-2025-08-03-12-35' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: "Significant patch series in this pull request: - "mseal cleanups" (Lorenzo Stoakes) Some mseal cleaning with no intended functional change. - "Optimizations for khugepaged" (David Hildenbrand) Improve khugepaged throughput by batching PTE operations for large folios. This gain is mainly for arm64. - "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes" (Mike Rapoport) A bugfix, additional debug code and cleanups to the execmem code. - "mm/shmem, swap: bugfix and improvement of mTHP swap in" (Kairui Song) Bugfixes, cleanups and performance improvememnts to the mTHP swapin code" * tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (38 commits) mm: mempool: fix crash in mempool_free() for zero-minimum pools mm: correct type for vmalloc vm_flags fields mm/shmem, swap: fix major fault counting mm/shmem, swap: rework swap entry and index calculation for large swapin mm/shmem, swap: simplify swapin path and result handling mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO mm/shmem, swap: tidy up swap entry splitting mm/shmem, swap: tidy up THP swapin checks mm/shmem, swap: avoid redundant Xarray lookup during swapin x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations execmem: drop writable parameter from execmem_fill_trapping_insns() execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP) execmem: move execmem_force_rw() and execmem_restore_rox() before use execmem: rework execmem_cache_free() execmem: introduce execmem_alloc_rw() execmem: drop unused execmem_update_copy() mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped mm/rmap: add anon_vma lifetime debug check mm: remove mm/io-mapping.c ...
2025-08-02kasan: skip quarantine if object is still accessible under RCUJann Horn
Currently, enabling KASAN masks bugs where a lockless lookup path gets a pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be recycled and is insufficiently careful about handling recycled objects: KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine queues, even when it can't actually detect UAF in these objects, and the quarantine prevents fast recycling. When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed after an RCU grace period and put them on the quarantine, while disabling CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately; but that hasn't actually been working. I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I could only trigger this bug in a KASAN build by disabling CONFIG_SLUB_RCU_DEBUG and applying this patch. Link: https://lkml.kernel.org/r/20250723-kasan-tsbrcu-noquarantine-v1-1-846c8645976c@google.com Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-31Merge tag 'mm-stable-2025-07-30-15-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "As usual, many cleanups. The below blurbiage describes 42 patchsets. 21 of those are partially or fully cleanup work. "cleans up", "cleanup", "maintainability", "rationalizes", etc. I never knew the MM code was so dirty. "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes) addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for merging with existing adjacent VMAs. "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park) adds a new kernel module which simplifies the setup and usage of DAMON in production environments. "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig) is a cleanup to the writeback code which removes a couple of pointers from struct writeback_control. "drivers/base/node.c: optimization and cleanups" (Donet Tom) contains largely uncorrelated cleanups to the NUMA node setup and management code. "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman) does some maintenance work on the userfaultfd code. "Readahead tweaks for larger folios" (Ryan Roberts) implements some tuneups for pagecache readahead when it is reading into order>0 folios. "selftests/mm: Tweaks to the cow test" (Mark Brown) provides some cleanups and consistency improvements to the selftests code. "Optimize mremap() for large folios" (Dev Jain) does that. A 37% reduction in execution time was measured in a memset+mremap+munmap microbenchmark. "Remove zero_user()" (Matthew Wilcox) expunges zero_user() in favor of the more modern memzero_page(). "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand) addresses some warts which David noticed in the huge page code. These were not known to be causing any issues at this time. "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park) provides some cleanup and consolidation work in DAMON. "use vm_flags_t consistently" (Lorenzo Stoakes) uses vm_flags_t in places where we were inappropriately using other types. "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy) increases the reliability of large page allocation in the memfd code. "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple) removes several now-unneeded PFN_* flags. "mm/damon: decouple sysfs from core" (SeongJae Park) implememnts some cleanup and maintainability work in the DAMON sysfs layer. "madvise cleanup" (Lorenzo Stoakes) does quite a lot of cleanup/maintenance work in the madvise() code. "madvise anon_name cleanups" (Vlastimil Babka) provides additional cleanups on top or Lorenzo's effort. "Implement numa node notifier" (Oscar Salvador) creates a standalone notifier for NUMA node memory state changes. Previously these were lumped under the more general memory on/offline notifier. "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan) cleans up the pageblock isolation code and fixes a potential issue which doesn't seem to cause any problems in practice. "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park) adds additional drgn- and python-based DAMON selftests which are more comprehensive than the existing selftest suite. "Misc rework on hugetlb faulting path" (Oscar Salvador) fixes a rather obscure deadlock in the hugetlb fault code and follows that fix with a series of cleanups. "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport) rationalizes and cleans up the highmem-specific code in the CMA allocator. "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand) provides cleanups and future-preparedness to the migration code. "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park) adds some tracepoints to some DAMON auto-tuning code. "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park) does that. "mm/damon: misc cleanups" (SeongJae Park) also does what it claims. "mm: folio_pte_batch() improvements" (David Hildenbrand) cleans up the large folio PTE batching code. "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park) facilitates dynamic alteration of DAMON's inter-node allocation policy. "Remove unmap_and_put_page()" (Vishal Moola) provides a couple of page->folio conversions. "mm: per-node proactive reclaim" (Davidlohr Bueso) implements a per-node control of proactive reclaim - beyond the current memcg-based implementation. "mm/damon: remove damon_callback" (SeongJae Park) replaces the damon_callback interface with a more general and powerful damon_call()+damos_walk() interface. "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes) implements a number of mremap cleanups (of course) in preparation for adding new mremap() functionality: newly permit the remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It still excludes some specialized situations where this cannot be performed reliably. "drop hugetlb_free_pgd_range()" (Anthony Yznaga) switches some sparc hugetlb code over to the generic version and removes the thus-unneeded hugetlb_free_pgd_range(). "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park) augments the present userspace-requested update of DAMON sysfs monitoring files. Automatic update is now provided, along with a tunable to control the update interval. "Some randome fixes and cleanups to swapfile" (Kemeng Shi) does what is claims. "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand) provides (and uses) a means by which debug-style functions can grab a copy of a pageframe and inspect it locklessly without tripping over the races inherent in operating on the live pageframe directly. "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan) addresses the large contention issues which can be triggered by reads from that procfs file. Latencies are reduced by more than half in some situations. The series also introduces several new selftests for the /proc/pid/maps interface. "__folio_split() clean up" (Zi Yan) cleans up __folio_split()! "Optimize mprotect() for large folios" (Dev Jain) provides some quite large (>3x) speedups to mprotect() when dealing with large folios. "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian) does some cleanup work in the selftests code. "tools/testing: expand mremap testing" (Lorenzo Stoakes) extends the mremap() selftest in several ways, including adding more checking of Lorenzo's recently added "permit mremap() move of multiple VMAs" feature. "selftests/damon/sysfs.py: test all parameters" (SeongJae Park) extends the DAMON sysfs interface selftest so that it tests all possible user-requested parameters. Rather than the present minimal subset" * tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits) MAINTAINERS: add missing headers to mempory policy & migration section MAINTAINERS: add missing file to cgroup section MAINTAINERS: add MM MISC section, add missing files to MISC and CORE MAINTAINERS: add missing zsmalloc file MAINTAINERS: add missing files to page alloc section MAINTAINERS: add missing shrinker files MAINTAINERS: move memremap.[ch] to hotplug section MAINTAINERS: add missing mm_slot.h file THP section MAINTAINERS: add missing interval_tree.c to memory mapping section MAINTAINERS: add missing percpu-internal.h file to per-cpu section mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() selftests/damon: introduce _common.sh to host shared function selftests/damon/sysfs.py: test runtime reduction of DAMON parameters selftests/damon/sysfs.py: test non-default parameters runtime commit selftests/damon/sysfs.py: generalize DAMON context commit assertion selftests/damon/sysfs.py: generalize monitoring attributes commit assertion selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion selftests/damon/sysfs.py: test DAMOS filters commitment selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion selftests/damon/sysfs.py: test DAMOS destinations commitment ...
2025-07-19kasan: use vmalloc_dump_obj() for vmalloc error reportsMarco Elver
Since 6ee9b3d84775 ("kasan: remove kasan_find_vm_area() to prevent possible deadlock"), more detailed info about the vmalloc mapping and the origin was dropped due to potential deadlocks. While fixing the deadlock is necessary, that patch was too quick in killing an otherwise useful feature, and did no due-diligence in understanding if an alternative option is available. Restore printing more helpful vmalloc allocation info in KASAN reports with the help of vmalloc_dump_obj(). Example report: | BUG: KASAN: vmalloc-out-of-bounds in vmalloc_oob+0x4c9/0x610 | Read of size 1 at addr ffffc900002fd7f3 by task kunit_try_catch/493 | | CPU: [...] | Call Trace: | <TASK> | dump_stack_lvl+0xa8/0xf0 | print_report+0x17e/0x810 | kasan_report+0x155/0x190 | vmalloc_oob+0x4c9/0x610 | [...] | | The buggy address belongs to a 1-page vmalloc region starting at 0xffffc900002fd000 allocated at vmalloc_oob+0x36/0x610 | The buggy address belongs to the physical page: | page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x126364 | flags: 0x200000000000000(node=0|zone=2) | raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000 | raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 | page dumped because: kasan: bad access detected | | [..] Link: https://lkml.kernel.org/r/20250716152448.3877201-1-elver@google.com Fixes: 6ee9b3d84775 ("kasan: remove kasan_find_vm_area() to prevent possible deadlock") Signed-off-by: Marco Elver <elver@google.com> Suggested-by: Uladzislau Rezki <urezki@gmail.com> Acked-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Yunseong Kim <ysk@kzalloc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-12Merge branch 'mm-hotfixes-stable' into mm-stable to pick up changes whichAndrew Morton
are required for a merge of the series "mm: folio_pte_batch() improvements".
2025-07-09mm: unexport globally copy_to_kernel_nofaultSabyrzhan Tasbolatov
copy_to_kernel_nofault() is an internal helper which should not be visible to loadable modules – exporting it would give exploit code a cheap oracle to probe kernel addresses. Instead, keep the helper un-exported and compile the kunit case that exercises it only when mm/kasan/kasan_test.o is linked into vmlinux. [snovitoll@gmail.com: add a brief comment to `#ifndef MODULE`] Link: https://lkml.kernel.org/r/20250622141142.79332-1-snovitoll@gmail.com Link: https://lkml.kernel.org/r/20250622051906.67374-1-snovitoll@gmail.com Fixes: ca79a00bb9a8 ("kasan: migrate copy_user_test to kunit") Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Marco Elver <elver@google.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09kasan: remove kasan_find_vm_area() to prevent possible deadlockYeoreum Yun
find_vm_area() couldn't be called in atomic_context. If find_vm_area() is called to reports vm area information, kasan can trigger deadlock like: CPU0 CPU1 vmalloc(); alloc_vmap_area(); spin_lock(&vn->busy.lock) spin_lock_bh(&some_lock); <interrupt occurs> <in softirq> spin_lock(&some_lock); <access invalid address> kasan_report(); print_report(); print_address_description(); kasan_find_vm_area(); find_vm_area(); spin_lock(&vn->busy.lock) // deadlock! To prevent possible deadlock while kasan reports, remove kasan_find_vm_area(). Link: https://lkml.kernel.org/r/20250703181018.580833-1-yeoreum.yun@arm.com Fixes: c056a364e954 ("kasan: print virtual mapping info in reports") Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reported-by: Yunseong Kim <ysk@kzalloc.com> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-28Merge tag 'hardening-v6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - Update overflow helpers to ease refactoring of on-stack flex array instances (Gustavo A. R. Silva, Kees Cook) - lkdtm: Use SLAB_NO_MERGE instead of constructors (Harry Yoo) - Simplify CONFIG_CC_HAS_COUNTED_BY (Jan Hendrik Farr) - Disable u64 usercopy KUnit test on 32-bit SPARC (Thomas Weißschuh) - Add missed designated initializers now exposed by fixed randstruct (Nathan Chancellor, Kees Cook) - Document compilers versions for __builtin_dynamic_object_size - Remove ARM_SSP_PER_TASK GCC plugin - Fix GCC plugin randstruct, add selftests, and restore COMPILE_TEST builds - Kbuild: induce full rebuilds when dependencies change with GCC plugins, the Clang sanitizer .scl file, or the randstruct seed. - Kbuild: Switch from -Wvla to -Wvla-larger-than=1 - Correct several __nonstring uses for -Wunterminated-string-initialization * tag 'hardening-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits) Revert "hardening: Disable GCC randstruct for COMPILE_TEST" lib/tests: randstruct: Add deep function pointer layout test lib/tests: Add randstruct KUnit test randstruct: gcc-plugin: Remove bogus void member net: qede: Initialize qede_ll_ops with designated initializer scsi: qedf: Use designated initializer for struct qed_fcoe_cb_ops md/bcache: Mark __nonstring look-up table integer-wrap: Force full rebuild when .scl file changes randstruct: Force full rebuild when seed changes gcc-plugins: Force full rebuild when plugins change kbuild: Switch from -Wvla to -Wvla-larger-than=1 hardening: simplify CONFIG_CC_HAS_COUNTED_BY overflow: Fix direct struct member initialization in _DEFINE_FLEX() kunit/overflow: Add tests for STACK_FLEX_ARRAY_SIZE() helper overflow: Add STACK_FLEX_ARRAY_SIZE() helper input/joystick: magellan: Mark __nonstring look-up table const watchdog: exar: Shorten identity name to fit correctly mod_devicetable: Enlarge the maximum platform_device_id name length overflow: Clarify expectations for getting DEFINE_FLEX variable sizes compiler_types: Identify compiler versions for __builtin_dynamic_object_size ...
2025-05-20kasan: avoid sleepable page allocation from atomic contextAlexander Gordeev
apply_to_pte_range() enters the lazy MMU mode and then invokes kasan_populate_vmalloc_pte() callback on each page table walk iteration. However, the callback can go into sleep when trying to allocate a single page, e.g. if an architecutre disables preemption on lazy MMU mode enter. On s390 if make arch_enter_lazy_mmu_mode() -> preempt_enable() and arch_leave_lazy_mmu_mode() -> preempt_disable(), such crash occurs: [ 0.663336] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 [ 0.663348] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd [ 0.663358] preempt_count: 1, expected: 0 [ 0.663366] RCU nest depth: 0, expected: 0 [ 0.663375] no locks held by kthreadd/2. [ 0.663383] Preemption disabled at: [ 0.663386] [<0002f3284cbb4eda>] apply_to_pte_range+0xfa/0x4a0 [ 0.663405] CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.15.0-rc5-gcc-kasan-00043-gd76bb1ebb558-dirty #162 PREEMPT [ 0.663408] Hardware name: IBM 3931 A01 701 (KVM/Linux) [ 0.663409] Call Trace: [ 0.663410] [<0002f3284c385f58>] dump_stack_lvl+0xe8/0x140 [ 0.663413] [<0002f3284c507b9e>] __might_resched+0x66e/0x700 [ 0.663415] [<0002f3284cc4f6c0>] __alloc_frozen_pages_noprof+0x370/0x4b0 [ 0.663419] [<0002f3284ccc73c0>] alloc_pages_mpol+0x1a0/0x4a0 [ 0.663421] [<0002f3284ccc8518>] alloc_frozen_pages_noprof+0x88/0xc0 [ 0.663424] [<0002f3284ccc8572>] alloc_pages_noprof+0x22/0x120 [ 0.663427] [<0002f3284cc341ac>] get_free_pages_noprof+0x2c/0xc0 [ 0.663429] [<0002f3284cceba70>] kasan_populate_vmalloc_pte+0x50/0x120 [ 0.663433] [<0002f3284cbb4ef8>] apply_to_pte_range+0x118/0x4a0 [ 0.663435] [<0002f3284cbc7c14>] apply_to_pmd_range+0x194/0x3e0 [ 0.663437] [<0002f3284cbc99be>] __apply_to_page_range+0x2fe/0x7a0 [ 0.663440] [<0002f3284cbc9e88>] apply_to_page_range+0x28/0x40 [ 0.663442] [<0002f3284ccebf12>] kasan_populate_vmalloc+0x82/0xa0 [ 0.663445] [<0002f3284cc1578c>] alloc_vmap_area+0x34c/0xc10 [ 0.663448] [<0002f3284cc1c2a6>] __get_vm_area_node+0x186/0x2a0 [ 0.663451] [<0002f3284cc1e696>] __vmalloc_node_range_noprof+0x116/0x310 [ 0.663454] [<0002f3284cc1d950>] __vmalloc_node_noprof+0xd0/0x110 [ 0.663457] [<0002f3284c454b88>] alloc_thread_stack_node+0xf8/0x330 [ 0.663460] [<0002f3284c458d56>] dup_task_struct+0x66/0x4d0 [ 0.663463] [<0002f3284c45be90>] copy_process+0x280/0x4b90 [ 0.663465] [<0002f3284c460940>] kernel_clone+0xd0/0x4b0 [ 0.663467] [<0002f3284c46115e>] kernel_thread+0xbe/0xe0 [ 0.663469] [<0002f3284c4e440e>] kthreadd+0x50e/0x7f0 [ 0.663472] [<0002f3284c38c04a>] __ret_from_fork+0x8a/0xf0 [ 0.663475] [<0002f3284ed57ff2>] ret_from_fork+0xa/0x38 Instead of allocating single pages per-PTE, bulk-allocate the shadow memory prior to applying kasan_populate_vmalloc_pte() callback on a page range. Link: https://lkml.kernel.org/r/c61d3560297c93ed044f0b1af085610353a06a58.1747316918.git.agordeev@linux.ibm.com Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Daniel Axtens <dja@axtens.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-08kbuild: Switch from -Wvla to -Wvla-larger-than=1Kees Cook
Variable Length Arrays (VLAs) on the stack must not be used in the kernel. Function parameter VLAs[1] should be usable, but -Wvla will warn for those. For example, this will produce a warning but it is not using a stack VLA: int something(size_t n, int array[n]) { ... Clang has no way yet to distinguish between the VLA types[2], so depend on GCC for now to keep stack VLAs out of the tree by using GCC's -Wvla-larger-than=N option (though GCC may split -Wvla similarly[3] to how Clang is planning to). While GCC 8+ supports -Wvla-larger-than, only 9+ supports ...=0[4], so use -Wvla-larger-than=1. Adjust mm/kasan/Makefile to remove it from CFLAGS (GCC <9 appears unable to disable the warning correctly[5]). The VLA usage in lib/test_ubsan.c was removed in commit 9d7ca61b1366 ("lib/test_ubsan.c: VLA no longer used in kernel") so the lib/Makefile disabling of VLA checking can be entirely removed. Link: https://en.cppreference.com/w/c/language/array [1] Link: https://github.com/llvm/llvm-project/issues/57098 [2] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98217 [3] Link: https://lore.kernel.org/lkml/7780883c-0ac8-4aaa-b850-469e33b50672@linux.ibm.com/ [4] Link: https://lore.kernel.org/r/202505071331.4iOzqmuE-lkp@intel.com/ [5] Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Link: https://lore.kernel.org/r/20250418213235.work.532-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-04-18Merge tag 'hardening-v6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - lib/prime_numbers: KUnit test should not select PRIME_NUMBERS (Geert Uytterhoeven) - ubsan: Fix panic from test_ubsan_out_of_bounds (Mostafa Saleh) - ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP (Nathan Chancellor) - string: Add load_unaligned_zeropad() code path to sized_strscpy() (Peter Collingbourne) - kasan: Add strscpy() test to trigger tag fault on arm64 (Vincenzo Frascino) - Disable GCC randstruct for COMPILE_TEST * tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lib/prime_numbers: KUnit test should not select PRIME_NUMBERS ubsan: Fix panic from test_ubsan_out_of_bounds lib/Kconfig.ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP hardening: Disable GCC randstruct for COMPILE_TEST kasan: Add strscpy() test to trigger tag fault on arm64 string: Add load_unaligned_zeropad() code path to sized_strscpy()
2025-04-15kasan: Add strscpy() test to trigger tag fault on arm64Vincenzo Frascino
When we invoke strscpy() with a maximum size of N bytes, it assumes that: - It can always read N bytes from the source. - It always write N bytes (zero-padded) to the destination. On aarch64 with Memory Tagging Extension enabled if we pass an N that is bigger then the source buffer, it would previously trigger an MTE fault. Implement a KASAN KUnit test that triggers the issue with the previous implementation of read_word_at_a_time() on aarch64 with MTE enabled. Cc: Will Deacon <will@kernel.org> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Peter Collingbourne <pcc@google.com> Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://linux-review.googlesource.com/id/If88e396b9e7c058c1a4b5a252274120e77b1898a Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250403000703.2584581-3-pcc@google.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-04-11mm/kasan: add module decriptionArnd Bergmann
Modules without a description now cause a warning: WARNING: modpost: missing MODULE_DESCRIPTION() in mm/kasan/kasan_test.o [akpm@linux-foundation.org: update description text, per Andrey] Link: https://lkml.kernel.org/r/20250324173242.1501003-9-arnd@kernel.org Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Cc: Macro Elver <elver@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nihar Chaithanya <niharchaithanya@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01mm/kasan: use SLAB_NO_MERGE flag instead of an empty constructorHarry Yoo
Use SLAB_NO_MERGE flag to prevent merging instead of providing an empty constructor. Using an empty constructor in this manner is an abuse of slab interface. The SLAB_NO_MERGE flag should be used with caution, but in this case, it is acceptable as the cache is intended solely for debugging purposes. No functional changes intended. Link: https://lkml.kernel.org/r/20250318015926.1629748-1-harry.yoo@oracle.com Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17kasan: don't call find_vm_area() in a PREEMPT_RT kernelWaiman Long
The following bug report was found when running a PREEMPT_RT debug kernel. BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 140605, name: kunit_try_catch preempt_count: 1, expected: 0 Call trace: rt_spin_lock+0x70/0x140 find_vmap_area+0x84/0x168 find_vm_area+0x1c/0x50 print_address_description.constprop.0+0x2a0/0x320 print_report+0x108/0x1f8 kasan_report+0x90/0xc8 Since commit e30a0361b851 ("kasan: make report_lock a raw spinlock"), report_lock was changed to raw_spinlock_t to fix another similar PREEMPT_RT problem. That alone isn't enough to cover other corner cases. print_address_description() is always invoked under the report_lock. The context under this lock is always atomic even on PREEMPT_RT. find_vm_area() acquires vmap_node::busy.lock which is a spinlock_t, becoming a sleeping lock on PREEMPT_RT and must not be acquired in atomic context. Don't invoke find_vm_area() on PREEMPT_RT and just print the address. Non-PREEMPT_RT builds remain unchanged. Add a DEFINE_WAIT_OVERRIDE_MAP() macro to tell lockdep that this lock nesting is allowed because the PREEMPT_RT part (which is invalid) has been taken care of. This macro was first introduced in commit 0cce06ba859a ("debugobjects,locking: Annotate debug_object_fill_pool() wait type violation"). Link: https://lkml.kernel.org/r/20250217204402.60533-1-longman@redhat.com Fixes: e30a0361b851 ("kasan: make report_lock a raw spinlock") Signed-off-by: Waiman Long <longman@redhat.com> Suggested-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mariano Pache <npache@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()Thorsten Blum
Remove hard-coded strings by using the str_on_off() helper function. Link: https://lkml.kernel.org/r/20250116062403.2496-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()Thorsten Blum
Remove hard-coded strings by using the str_on_off() helper function. Link: https://lkml.kernel.org/r/20250114150935.780869-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25kasan: use correct kernel-doc formatRandy Dunlap
Use the correct kernel-doc character following function parameters or struct members (':' instead of '-') to eliminate kernel-doc warnings. kasan.h:509: warning: Function parameter or struct member 'addr' not described in 'kasan_poison' kasan.h:509: warning: Function parameter or struct member 'size' not described in 'kasan_poison' kasan.h:509: warning: Function parameter or struct member 'value' not described in 'kasan_poison' kasan.h:509: warning: Function parameter or struct member 'init' not described in 'kasan_poison' kasan.h:522: warning: Function parameter or struct member 'addr' not described in 'kasan_unpoison' kasan.h:522: warning: Function parameter or struct member 'size' not described in 'kasan_unpoison' kasan.h:522: warning: Function parameter or struct member 'init' not described in 'kasan_unpoison' kasan.h:539: warning: Function parameter or struct member 'address' not described in 'kasan_poison_last_granule' kasan.h:539: warning: Function parameter or struct member 'size' not described in 'kasan_poison_last_granule' Link: https://lkml.kernel.org/r/20250111063249.910975-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm:kasan: fix sparse warnings: Should it be static?Nihar Chaithanya
Yes, when making the global variables kasan_ptr_result and kasan_int_result as static volatile, the warnings are removed and the variable and assignments are retained, but when just static is used I understand that it might be optimized. Add a fix making the global varaibles - static volatile, removing the warnings: mm/kasan/kasan_test.c:36:6: warning: symbol 'kasan_ptr_result' was not declared. Should it be static? mm/kasan/kasan_test.c:37:5: warning: symbol 'kasan_int_result' was not declared. Should it be static? Link: https://lkml.kernel.org/r/20241011114537.35664-1-niharchaithanya@gmail.com Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312261010.o0lRiI9b-lkp@intel.com/ Signed-off-by: Nihar Chaithanya <niharchaithanya@gmail.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13kasan: make kasan_record_aux_stack_noalloc() the default behaviourPeter Zijlstra
kasan_record_aux_stack_noalloc() was introduced to record a stack trace without allocating memory in the process. It has been added to callers which were invoked while a raw_spinlock_t was held. More and more callers were identified and changed over time. Is it a good thing to have this while functions try their best to do a locklessly setup? The only downside of having kasan_record_aux_stack() not allocate any memory is that we end up without a stacktrace if stackdepot runs out of memory and at the same stacktrace was not recorded before To quote Marco Elver from https://lore.kernel.org/all/CANpmjNPmQYJ7pv1N3cuU8cP18u7PP_uoZD8YxwZd4jtbof9nVQ@mail.gmail.com/ | I'd be in favor, it simplifies things. And stack depot should be | able to replenish its pool sufficiently in the "non-aux" cases | i.e. regular allocations. Worst case we fail to record some | aux stacks, but I think that's only really bad if there's a bug | around one of these allocations. In general the probabilities | of this being a regression are extremely small [...] Make the kasan_record_aux_stack_noalloc() behaviour default as kasan_record_aux_stack(). [bigeasy@linutronix.de: dressed the diff as patch] Link: https://lkml.kernel.org/r/20241122155451.Mb2pmeyJ@linutronix.de Fixes: 7cb3007ce2da ("kasan: generic: introduce kasan_record_aux_stack_noalloc()") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: syzbot+39f85d612b7c20d8db48@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67275485.050a0220.3c8d68.0a37.GAE@google.com Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Ben Segall <bsegall@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: <kasan-dev@googlegroups.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: syzkaller-bugs@googlegroups.com Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-08Merge tag 'mm-hotfixes-stable-2024-12-07-22-39' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "24 hotfixes. 17 are cc:stable. 15 are MM and 9 are non-MM. The usual bunch of singletons - please see the relevant changelogs for details" * tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits) iio: magnetometer: yas530: use signed integer type for clamp limits sched/numa: fix memory leak due to the overwritten vma->numab_state mm/damon: fix order of arguments in damos_before_apply tracepoint lib: stackinit: hide never-taken branch from compiler mm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio() scatterlist: fix incorrect func name in kernel-doc mm: correct typo in MMAP_STATE() macro mm: respect mmap hint address when aligning for THP mm: memcg: declare do_memsw_account inline mm/codetag: swap tags when migrate pages ocfs2: update seq_file index in ocfs2_dlm_seq_next stackdepot: fix stack_depot_save_flags() in NMI context mm: open-code page_folio() in dump_page() mm: open-code PageTail in folio_flags() and const_folio_flags() mm: fix vrealloc()'s KASAN poisoning logic Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()" selftests/damon: add _damon_sysfs.py to TEST_FILES selftest: hugetlb_dio: fix test naming ocfs2: free inode when ocfs2_get_init_inode() fails nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry() ...
2024-12-05kasan: make report_lock a raw spinlockJared Kangas
If PREEMPT_RT is enabled, report_lock is a sleeping spinlock and must not be locked when IRQs are disabled. However, KASAN reports may be triggered in such contexts. For example: char *s = kzalloc(1, GFP_KERNEL); kfree(s); local_irq_disable(); char c = *s; /* KASAN report here leads to spin_lock() */ local_irq_enable(); Make report_spinlock a raw spinlock to prevent rescheduling when PREEMPT_RT is enabled. Link: https://lkml.kernel.org/r/20241119210234.1602529-1-jkangas@redhat.com Fixes: 342a93247e08 ("locking/spinlock: Provide RT variant header: <linux/spinlock_rt.h>") Signed-off-by: Jared Kangas <jkangas@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-02module: Convert symbol namespace to string literalPeter Zijlstra
Clean up the existing export namespace code along the same lines of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ && $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]*\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-11-26Merge tag 'rust-6.13' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Enable a series of lints, including safety-related ones, e.g. the compiler will now warn about missing safety comments, as well as unnecessary ones. How safety documentation is organized is a frequent source of review comments, thus having the compiler guide new developers on where they are expected (and where not) is very nice. - Start using '#[expect]': an interesting feature in Rust (stabilized in 1.81.0) that makes the compiler warn if an expected warning was _not_ emitted. This is useful to avoid forgetting cleaning up locally ignored diagnostics ('#[allow]'s). - Introduce '.clippy.toml' configuration file for Clippy, the Rust linter, which will allow us to tweak its behaviour. For instance, our first use cases are declaring a disallowed macro and, more importantly, enabling the checking of private items. - Lints-related fixes and cleanups related to the items above. - Migrate from 'receiver_trait' to 'arbitrary_self_types': to get the kernel into stable Rust, one of the major pieces of the puzzle is the support to write custom types that can be used as 'self', i.e. as receivers, since the kernel needs to write types such as 'Arc' that common userspace Rust would not. 'arbitrary_self_types' has been accepted to become stable, and this is one of the steps required to get there. - Remove usage of the 'new_uninit' unstable feature. - Use custom C FFI types. Includes a new 'ffi' crate to contain our custom mapping, instead of using the standard library 'core::ffi' one. The actual remapping will be introduced in a later cycle. - Map '__kernel_{size_t,ssize_t,ptrdiff_t}' to 'usize'/'isize' instead of 32/64-bit integers. - Fix 'size_t' in bindgen generated prototypes of C builtins. - Warn on bindgen < 0.69.5 and libclang >= 19.1 due to a double issue in the projects, which we managed to trigger with the upcoming tracepoint support. It includes a build test since some distributions backported the fix (e.g. Debian -- thanks!). All major distributions we list should be now OK except Ubuntu non-LTS. 'macros' crate: - Adapt the build system to be able run the doctests there too; and clean up and enable the corresponding doctests. 'kernel' crate: - Add 'alloc' module with generic kernel allocator support and remove the dependency on the Rust standard library 'alloc' and the extension traits we used to provide fallible methods with flags. Add the 'Allocator' trait and its implementations '{K,V,KV}malloc'. Add the 'Box' type (a heap allocation for a single value of type 'T' that is also generic over an allocator and considers the kernel's GFP flags) and its shorthand aliases '{K,V,KV}Box'. Add 'ArrayLayout' type. Add 'Vec' (a contiguous growable array type) and its shorthand aliases '{K,V,KV}Vec', including iterator support. For instance, now we may write code such as: let mut v = KVec::new(); v.push(1, GFP_KERNEL)?; assert_eq!(&v, &[1]); Treewide, move as well old users to these new types. - 'sync' module: add global lock support, including the 'GlobalLockBackend' trait; the 'Global{Lock,Guard,LockedBy}' types and the 'global_lock!' macro. Add the 'Lock::try_lock' method. - 'error' module: optimize 'Error' type to use 'NonZeroI32' and make conversion functions public. - 'page' module: add 'page_align' function. - Add 'transmute' module with the existing 'FromBytes' and 'AsBytes' traits. - 'block::mq::request' module: improve rendered documentation. - 'types' module: extend 'Opaque' type documentation and add simple examples for the 'Either' types. drm/panic: - Clean up a series of Clippy warnings. Documentation: - Add coding guidelines for lints and the '#[expect]' feature. - Add Ubuntu to the list of distributions in the Quick Start guide. MAINTAINERS: - Add Danilo Krummrich as maintainer of the new 'alloc' module. And a few other small cleanups and fixes" * tag 'rust-6.13' of https://github.com/Rust-for-Linux/linux: (82 commits) rust: alloc: Fix `ArrayLayout` allocations docs: rust: remove spurious item in `expect` list rust: allow `clippy::needless_lifetimes` rust: warn on bindgen < 0.69.5 and libclang >= 19.1 rust: use custom FFI integer types rust: map `__kernel_size_t` and friends also to usize/isize rust: fix size_t in bindgen prototypes of C builtins rust: sync: add global lock support rust: macros: enable the rest of the tests rust: macros: enable paste! use from macro_rules! rust: enable macros::module! tests rust: kbuild: expand rusttest target for macros rust: types: extend `Opaque` documentation rust: block: fix formatting of `kernel::block::mq::request` module rust: macros: fix documentation of the paste! macro rust: kernel: fix THIS_MODULE header path in ThisModule doc comment rust: page: add Rust version of PAGE_ALIGN rust: helpers: remove unnecessary header includes rust: exports: improve grammar in commentary drm/panic: allow verbose version check ...
2024-11-25Merge tag 'slab-for-6.13-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - Add new slab_strict_numa boot parameter to enforce per-object memory policies on top of slab folio policies, for systems where saving cost of remote accesses is more important than minimizing slab allocation overhead (Christoph Lameter) - Fix for freeptr_offset alignment check being too strict for m68k (Geert Uytterhoeven) - krealloc() fixes for not violating __GFP_ZERO guarantees on krealloc() when slub_debug (redzone and object tracking) is enabled (Feng Tang) - Fix a memory leak in case sysfs registration fails for a slab cache, and also no longer fail to create the cache in that case (Hyeonggon Yoo) - Fix handling of detected consistency problems (due to buggy slab user) with slub_debug enabled, so that it does not cause further list corruption bugs (yuan.gao) - Code cleanup and kerneldocs polishing (Zhen Lei, Vlastimil Babka) * tag 'slab-for-6.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: slab: Fix too strict alignment check in create_cache() mm/slab: Allow cache creation to proceed even if sysfs registration fails mm/slub: Avoid list corruption when removing a slab from the full list mm/slub, kunit: Add testcase for krealloc redzone and zeroing mm/slub: Improve redzone check and zeroing for krealloc() mm/slub: Consider kfence case for get_orig_size() SLUB: Add support for per object memory policies mm, slab: add kerneldocs for common SLAB_ flags mm/slab: remove duplicate check in create_cache() mm/slub: Move krealloc() and related code to slub.c mm/kasan: Don't store metadata inside kmalloc object when slub_debug_orig_size is on
2024-11-11mm: define general function pXd_init()Bibo Mao
pud_init(), pmd_init() and kernel_pte_init() are duplicated defined in file kasan.c and sparse-vmemmap.c as weak functions. Move them to generic header file pgtable.h, architecture can redefine them. Link: https://lkml.kernel.org/r/20241104070712.52902-1-maobibo@loongson.cn Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_callerNihar Chaithanya
The Kunit tests for kmalloc_track_caller and kmalloc_node_track_caller were missing in kasan_test_c.c, which check that these functions poison the memory properly. Add a Kunit test: -> kmalloc_tracker_caller_oob_right(): This includes out-of-bounds access test for kmalloc_track_caller and kmalloc_node_track_caller. Link: https://lkml.kernel.org/r/20241014190128.442059-1-niharchaithanya@gmail.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216509 Signed-off-by: Nihar Chaithanya <niharchaithanya@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOWSabyrzhan Tasbolatov
During running KASAN Kunit tests with CONFIG_KASAN enabled, the following "warning" is reported by kunit framework: # kasan_atomics: Test should be marked slow (runtime: 2.604703115s) It took 2.6 seconds on my PC (Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz), apparently, due to multiple atomic checks in kasan_atomics_helper(). Let's mark it with KUNIT_CASE_SLOW which reports now as: # kasan_atomics.speed: slow Link: https://lkml.kernel.org/r/20241101184011.3369247-3-snovitoll@gmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbolsSabyrzhan Tasbolatov
Patch series "kasan: few improvements on kunit tests". This patch series addresses the issue [1] with KASAN symbols used in the Kunit test, but exported as EXPORT_SYMBOL_GPL. Also a small tweak of marking kasan_atomics() as KUNIT_CASE_SLOW to avoid kunit report that the test should be marked as slow. This patch (of 2): Replace EXPORT_SYMBOL_GPL with EXPORT_SYMBOL_IF_KUNIT to mark the symbols as visible only if CONFIG_KUNIT is enabled. KASAN Kunit test should import the namespace EXPORTED_FOR_KUNIT_TESTING to use these marked symbols. Link: https://lkml.kernel.org/r/20241101184011.3369247-1-snovitoll@gmail.com Link: https://lkml.kernel.org/r/20241101184011.3369247-2-snovitoll@gmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Reported-by: Andrey Konovalov <andreyknvl@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218315 Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11kasan: delete CONFIG_KASAN_MODULE_TESTSabyrzhan Tasbolatov
Since we've migrated all tests to the KUnit framework, we can delete CONFIG_KASAN_MODULE_TEST and mentioning of it in the documentation as well. I've used the online translator to modify the non-English documentation. [snovitoll@gmail.com: fix indentation in translation] Link: https://lkml.kernel.org/r/20241020042813.3223449-1-snovitoll@gmail.com Link: https://lkml.kernel.org/r/20241016131802.3115788-4-snovitoll@gmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alex Shi <alexs@kernel.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11kasan: migrate copy_user_test to kunitSabyrzhan Tasbolatov
Migrate the copy_user_test to the KUnit framework to verify out-of-bound detection via KASAN reports in copy_from_user(), copy_to_user() and their static functions. This is the last migrated test in kasan_test_module.c, therefore delete the file. [arnd@arndb.de: export copy_to_kernel_nofault] Link: https://lkml.kernel.org/r/20241018151112.3533820-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20241016131802.3115788-3-snovitoll@gmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alex Shi <alexs@kernel.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm, kasan, kmsan: instrument copy_from/to_kernel_nofaultSabyrzhan Tasbolatov
Instrument copy_from_kernel_nofault() with KMSAN for uninitialized kernel memory check and copy_to_kernel_nofault() with KASAN, KCSAN to detect the memory corruption. syzbot reported that bpf_probe_read_kernel() kernel helper triggered KASAN report via kasan_check_range() which is not the expected behaviour as copy_from_kernel_nofault() is meant to be a non-faulting helper. Solution is, suggested by Marco Elver, to replace KASAN, KCSAN check in copy_from_kernel_nofault() with KMSAN detection of copying uninitilaized kernel memory. In copy_to_kernel_nofault() we can retain instrument_write() explicitly for the memory corruption instrumentation. copy_to_kernel_nofault() is tested on x86_64 and arm64 with CONFIG_KASAN_SW_TAGS. On arm64 with CONFIG_KASAN_HW_TAGS, kunit test currently fails. Need more clarification on it. [akpm@linux-foundation.org: fix comment layout, per checkpatch Link: https://lore.kernel.org/linux-mm/CANpmjNMAVFzqnCZhEity9cjiqQ9CVN1X7qeeeAp_6yKjwKo8iw@mail.gmail.com/ Link: https://lkml.kernel.org/r/20241011035310.2982017-1-snovitoll@gmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Reviewed-by: Marco Elver <elver@google.com> Reported-by: syzbot+61123a5daeb9f7454599@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=61123a5daeb9f7454599 Reported-by: Andrey Konovalov <andreyknvl@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=210505 Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> [KASAN] Tested-by: Andrey Konovalov <andreyknvl@gmail.com> [KASAN] Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05mm/vmalloc: combine all TLB flush operations of KASAN shadow virtual address ↵Adrian Huang
into one operation When compiling kernel source 'make -j $(nproc)' with the up-and-running KASAN-enabled kernel on a 256-core machine, the following soft lockup is shown: watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [kworker/28:1:1760] CPU: 28 PID: 1760 Comm: kworker/28:1 Kdump: loaded Not tainted 6.10.0-rc5 #95 Workqueue: events drain_vmap_area_work RIP: 0010:smp_call_function_many_cond+0x1d8/0xbb0 Code: 38 c8 7c 08 84 c9 0f 85 49 08 00 00 8b 45 08 a8 01 74 2e 48 89 f1 49 89 f7 48 c1 e9 03 41 83 e7 07 4c 01 e9 41 83 c7 03 f3 90 <0f> b6 01 41 38 c7 7c 08 84 c0 0f 85 d4 06 00 00 8b 45 08 a8 01 75 RSP: 0018:ffffc9000cb3fb60 EFLAGS: 00000202 RAX: 0000000000000011 RBX: ffff8883bc4469c0 RCX: ffffed10776e9949 RDX: 0000000000000002 RSI: ffff8883bb74ca48 RDI: ffffffff8434dc50 RBP: ffff8883bb74ca40 R08: ffff888103585dc0 R09: ffff8884533a1800 R10: 0000000000000004 R11: ffffffffffffffff R12: ffffed1077888d39 R13: dffffc0000000000 R14: ffffed1077888d38 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8883bc400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005577b5c8d158 CR3: 0000000004850000 CR4: 0000000000350ef0 Call Trace: <IRQ> ? watchdog_timer_fn+0x2cd/0x390 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x300/0x6d0 ? sched_clock_cpu+0x69/0x4e0 ? __pfx___hrtimer_run_queues+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? ktime_get_update_offsets_now+0x7f/0x2a0 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? hrtimer_interrupt+0x2ca/0x760 ? __sysvec_apic_timer_interrupt+0x8c/0x2b0 ? sysvec_apic_timer_interrupt+0x6a/0x90 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? smp_call_function_many_cond+0x1d8/0xbb0 ? __pfx_do_kernel_range_flush+0x10/0x10 on_each_cpu_cond_mask+0x20/0x40 flush_tlb_kernel_range+0x19b/0x250 ? srso_return_thunk+0x5/0x5f ? kasan_release_vmalloc+0xa7/0xc0 purge_vmap_node+0x357/0x820 ? __pfx_purge_vmap_node+0x10/0x10 __purge_vmap_area_lazy+0x5b8/0xa10 drain_vmap_area_work+0x21/0x30 process_one_work+0x661/0x10b0 worker_thread+0x844/0x10e0 ? srso_return_thunk+0x5/0x5f ? __kthread_parkme+0x82/0x140 ? __pfx_worker_thread+0x10/0x10 kthread+0x2a5/0x370 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Debugging Analysis: 1. The following ftrace log shows that the lockup CPU spends too much time iterating vmap_nodes and flushing TLB when purging vm_area structures. (Some info is trimmed). kworker: funcgraph_entry: | drain_vmap_area_work() { kworker: funcgraph_entry: | mutex_lock() { kworker: funcgraph_entry: 1.092 us | __cond_resched(); kworker: funcgraph_exit: 3.306 us | } ... ... kworker: funcgraph_entry: | flush_tlb_kernel_range() { ... ... kworker: funcgraph_exit: # 7533.649 us | } ... ... kworker: funcgraph_entry: 2.344 us | mutex_unlock(); kworker: funcgraph_exit: $ 23871554 us | } The drain_vmap_area_work() spends over 23 seconds. There are 2805 flush_tlb_kernel_range() calls in the ftrace log. * One is called in __purge_vmap_area_lazy(). * Others are called by purge_vmap_node->kasan_release_vmalloc. purge_vmap_node() iteratively releases kasan vmalloc allocations and flushes TLB for each vmap_area. - [Rough calculation] Each flush_tlb_kernel_range() runs about 7.5ms. -- 2804 * 7.5ms = 21.03 seconds. -- That's why a soft lock is triggered. 2. Extending the soft lockup time can work around the issue (For example, # echo 60 > /proc/sys/kernel/watchdog_thresh). This confirms the above-mentioned speculation: drain_vmap_area_work() spends too much time. If we combine all TLB flush operations of the KASAN shadow virtual address into one operation in the call path 'purge_vmap_node()->kasan_release_vmalloc()', the running time of drain_vmap_area_work() can be saved greatly. The idea is from the flush_tlb_kernel_range() call in __purge_vmap_area_lazy(). And, the soft lockup won't be triggered. Here is the test result based on 6.10: [6.10 wo/ the patch] 1. ftrace latency profiling (record a trace if the latency > 20s). echo 20000000 > /sys/kernel/debug/tracing/tracing_thresh echo drain_vmap_area_work > /sys/kernel/debug/tracing/set_graph_function echo function_graph > /sys/kernel/debug/tracing/current_tracer echo 1 > /sys/kernel/debug/tracing/tracing_on 2. Run `make -j $(nproc)` to compile the kernel source 3. Once the soft lockup is reproduced, check the ftrace log: cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 76) $ 50412985 us | } /* __purge_vmap_area_lazy */ 76) $ 50412997 us | } /* drain_vmap_area_work */ 76) $ 29165911 us | } /* __purge_vmap_area_lazy */ 76) $ 29165926 us | } /* drain_vmap_area_work */ 91) $ 53629423 us | } /* __purge_vmap_area_lazy */ 91) $ 53629434 us | } /* drain_vmap_area_work */ 91) $ 28121014 us | } /* __purge_vmap_area_lazy */ 91) $ 28121026 us | } /* drain_vmap_area_work */ [6.10 w/ the patch] 1. Repeat step 1-2 in "[6.10 wo/ the patch]" 2. The soft lockup is not triggered and ftrace log is empty. cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 3. Setting 'tracing_thresh' to 10/5 seconds does not get any ftrace log. 4. Setting 'tracing_thresh' to 1 second gets ftrace log. cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 23) $ 1074942 us | } /* __purge_vmap_area_lazy */ 23) $ 1074950 us | } /* drain_vmap_area_work */ The worst execution time of drain_vmap_area_work() is about 1 second. Link: https://lore.kernel.org/lkml/ZqFlawuVnOMY2k3E@pc638.lan/ Link: https://lkml.kernel.org/r/20240726165246.31326-1-ahuang12@lenovo.com Fixes: 282631cb2447 ("mm: vmalloc: remove global purge_vmap_area_root rb-tree") Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Co-developed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Jiwei Sun <sunjw10@lenovo.com> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30kasan: remove vmalloc_percpu testAndrey Konovalov
Commit 1a2473f0cbc0 ("kasan: improve vmalloc tests") added the vmalloc_percpu KASAN test with the assumption that __alloc_percpu always uses vmalloc internally, which is tagged by KASAN. However, __alloc_percpu might allocate memory from the first per-CPU chunk, which is not allocated via vmalloc(). As a result, the test might fail. Remove the test until proper KASAN annotation for the per-CPU allocated are added; tracked in https://bugzilla.kernel.org/show_bug.cgi?id=215019. Link: https://lkml.kernel.org/r/20241022160706.38943-1-andrey.konovalov@linux.dev Fixes: 1a2473f0cbc0 ("kasan: improve vmalloc tests") Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> Reported-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/all/4a245fff-cc46-44d1-a5f9-fd2f1c3764ae@sifive.com/ Reported-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Link: https://lore.kernel.org/all/CACzwLxiWzNqPBp4C1VkaXZ2wDwvY3yZeetCi1TLGFipKW77drA@mail.gmail.com/ Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-29mm/kasan: Don't store metadata inside kmalloc object when ↵Feng Tang
slub_debug_orig_size is on For a kmalloc object, when both kasan and slub redzone sanity check are enabled, they could both manipulate its data space like storing kasan free meta data and setting up kmalloc redzone, and may affect accuracy of that object's 'orig_size'. As an accurate 'orig_size' will be needed by some function like krealloc() soon, save kasan's free meta data in slub's metadata area instead of inside object when 'orig_size' is enabled. This will make it easier to maintain/understand the code. Size wise, when these two options are both enabled, the slub meta data space is already huge, and this just slightly increase the overall size. Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-10-21LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel spaceBibo Mao
There are two pages in one TLB entry on LoongArch system. For kernel space, it requires both two pte entries (buddies) with PAGE_GLOBAL bit set, otherwise HW treats it as non-global tlb, there will be potential problems if tlb entry for kernel space is not global. Such as fail to flush kernel tlb with the function local_flush_tlb_kernel_range() which supposed only flush tlb with global bit. Kernel address space areas include percpu, vmalloc, vmemmap, fixmap and kasan areas. For these areas both two consecutive page table entries should be enabled with PAGE_GLOBAL bit. So with function set_pte() and pte_clear(), pte buddy entry is checked and set besides its own pte entry. However it is not atomic operation to set both two pte entries, there is problem with test_vmalloc test case. So function kernel_pte_init() is added to init a pte table when it is created for kernel address space, and the default initial pte value is PAGE_GLOBAL rather than zero at beginning. Then only its own pte entry need update with function set_pte() and pte_clear(), nothing to do with the pte buddy entry. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-10-15rust: treewide: switch to the kernel `Vec` typeDanilo Krummrich
Now that we got the kernel `Vec` in place, convert all existing `Vec` users to make use of it. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Gary Guo <gary@garyguo.net> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20241004154149.93856-20-dakr@kernel.org [ Converted `kasan_test_rust.rs` too, as discussed. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-10-07rust: enable `clippy::undocumented_unsafe_blocks` lintMiguel Ojeda
Checking that we are not missing any `// SAFETY` comments in our `unsafe` blocks is something we have wanted to do for a long time, as well as cleaning up the remaining cases that were not documented [1]. Back when Rust for Linux started, this was something that could have been done via a script, like Rust's `tidy`. Soon after, in Rust 1.58.0, Clippy implemented the `undocumented_unsafe_blocks` lint [2]. Even though the lint has a few false positives, e.g. in some cases where attributes appear between the comment and the `unsafe` block [3], there are workarounds and the lint seems quite usable already. Thus enable the lint now. We still have a few cases to clean up, so just allow those for the moment by writing a `TODO` comment -- some of those may be good candidates for new contributors. Link: https://github.com/Rust-for-Linux/linux/issues/351 [1] Link: https://rust-lang.github.io/rust-clippy/master/#/undocumented_unsafe_blocks [2] Link: https://github.com/rust-lang/rust-clippy/issues/13189 [3] Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Trevor Gross <tmgross@umich.edu> Tested-by: Gary Guo <gary@garyguo.net> Reviewed-by: Gary Guo <gary@garyguo.net> Link: https://lore.kernel.org/r/20240904204347.168520-5-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-09-25Merge tag 'rust-6.12' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull Rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Support 'MITIGATION_{RETHUNK,RETPOLINE,SLS}' (which cleans up objtool warnings), teach objtool about 'noreturn' Rust symbols and mimic '___ADDRESSABLE()' for 'module_{init,exit}'. With that, we should be objtool-warning-free, so enable it to run for all Rust object files. - KASAN (no 'SW_TAGS'), KCFI and shadow call sanitizer support. - Support 'RUSTC_VERSION', including re-config and re-build on change. - Split helpers file into several files in a folder, to avoid conflicts in it. Eventually those files will be moved to the right places with the new build system. In addition, remove the need to manually export the symbols defined there, reusing existing machinery for that. - Relax restriction on configurations with Rust + GCC plugins to just the RANDSTRUCT plugin. 'kernel' crate: - New 'list' module: doubly-linked linked list for use with reference counted values, which is heavily used by the upcoming Rust Binder. This includes 'ListArc' (a wrapper around 'Arc' that is guaranteed unique for the given ID), 'AtomicTracker' (tracks whether a 'ListArc' exists using an atomic), 'ListLinks' (the prev/next pointers for an item in a linked list), 'List' (the linked list itself), 'Iter' (an iterator over a 'List'), 'Cursor' (a cursor into a 'List' that allows to remove elements), 'ListArcField' (a field exclusively owned by a 'ListArc'), as well as support for heterogeneous lists. - New 'rbtree' module: red-black tree abstractions used by the upcoming Rust Binder. This includes 'RBTree' (the red-black tree itself), 'RBTreeNode' (a node), 'RBTreeNodeReservation' (a memory reservation for a node), 'Iter' and 'IterMut' (immutable and mutable iterators), 'Cursor' (bidirectional cursor that allows to remove elements), as well as an entry API similar to the Rust standard library one. - 'init' module: add 'write_[pin_]init' methods and the 'InPlaceWrite' trait. Add the 'assert_pinned!' macro. - 'sync' module: implement the 'InPlaceInit' trait for 'Arc' by introducing an associated type in the trait. - 'alloc' module: add 'drop_contents' method to 'BoxExt'. - 'types' module: implement the 'ForeignOwnable' trait for 'Pin<Box<T>>' and improve the trait's documentation. In addition, add the 'into_raw' method to the 'ARef' type. - 'error' module: in preparation for the upcoming Rust support for 32-bit architectures, like arm, locally allow Clippy lint for those. Documentation: - https://rust.docs.kernel.org has been announced, so link to it. - Enable rustdoc's "jump to definition" feature, making its output a bit closer to the experience in a cross-referencer. - Debian Testing now also provides recent Rust releases (outside of the freeze period), so add it to the list. MAINTAINERS: - Trevor is joining as reviewer of the "RUST" entry. And a few other small bits" * tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux: (54 commits) kasan: rust: Add KASAN smoke test via UAF kbuild: rust: Enable KASAN support rust: kasan: Rust does not support KHWASAN kbuild: rust: Define probing macros for rustc kasan: simplify and clarify Makefile rust: cfi: add support for CFI_CLANG with Rust cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS rust: support for shadow call stack sanitizer docs: rust: include other expressions in conditional compilation section kbuild: rust: replace proc macros dependency on `core.o` with the version text kbuild: rust: rebuild if the version text changes kbuild: rust: re-run Kconfig if the version text changes kbuild: rust: add `CONFIG_RUSTC_VERSION` rust: avoid `box_uninit_write` feature MAINTAINERS: add Trevor Gross as Rust reviewer rust: rbtree: add `RBTree::entry` rust: rbtree: add cursor rust: rbtree: add mutable iterator rust: rbtree: add iterator rust: rbtree: add red-black tree implementation backed by the C version ...
2024-09-16kasan: rust: Add KASAN smoke test via UAFMatthew Maurer
Adds a smoke test to ensure that KASAN in Rust is actually detecting a Rust-native UAF. There is significant room to expand this test suite, but this will at least ensure that flags are having the intended effect. The rename from kasan_test.c to kasan_test_c.c is in order to allow the single kasan_test.ko test suite to contain both a .o file produced by the C compiler and one produced by rustc. Signed-off-by: Matthew Maurer <mmaurer@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20240820194910.187826-5-mmaurer@google.com [ Applied empty line nit, removed double empty line, applied `rustfmt` and formatted crate comment. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-08-27slub: Introduce CONFIG_SLUB_RCU_DEBUGJann Horn
Currently, KASAN is unable to catch use-after-free in SLAB_TYPESAFE_BY_RCU slabs because use-after-free is allowed within the RCU grace period by design. Add a SLUB debugging feature which RCU-delays every individual kmem_cache_free() before either actually freeing the object or handing it off to KASAN, and change KASAN to poison freed objects as normal when this option is enabled. For now I've configured Kconfig.debug to default-enable this feature in the KASAN GENERIC and SW_TAGS modes; I'm not enabling it by default in HW_TAGS mode because I'm not sure if it might have unwanted performance degradation effects there. Note that this is mostly useful with KASAN in the quarantine-based GENERIC mode; SLAB_TYPESAFE_BY_RCU slabs are basically always also slabs with a ->ctor, and KASAN's assign_tag() currently has to assign fixed tags for those, reducing the effectiveness of SW_TAGS/HW_TAGS mode. (A possible future extension of this work would be to also let SLUB call the ->ctor() on every allocation instead of only when the slab page is allocated; then tag-based modes would be able to assign new tags on every reallocation.) Tested-by: syzbot+263726e59eab6b442723@syzkaller.appspotmail.com Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: Marco Elver <elver@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> #slab Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-08-27kasan: catch invalid free before SLUB reinitializes the objectJann Horn
Currently, when KASAN is combined with init-on-free behavior, the initialization happens before KASAN's "invalid free" checks. More importantly, a subsequent commit will want to RCU-delay the actual SLUB freeing of an object, and we'd like KASAN to still validate synchronously that freeing the object is permitted. (Otherwise this change will make the existing testcase kmem_cache_invalid_free fail.) So add a new KASAN hook that allows KASAN to pre-validate a kmem_cache_free() operation before SLUB actually starts modifying the object or its metadata. Inside KASAN, this: - moves checks from poison_slab_object() into check_slab_allocation() - moves kasan_arch_is_ready() up into callers of poison_slab_object() - removes "ip" argument of poison_slab_object() and __kasan_slab_free() (since those functions no longer do any reporting) Acked-by: Vlastimil Babka <vbabka@suse.cz> #slub Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-06-24kasan: fix bad call to unpoison_slab_objectAndrey Konovalov
Commit 29d7355a9d05 ("kasan: save alloc stack traces for mempool") messed up one of the calls to unpoison_slab_object: the last two arguments are supposed to be GFP flags and whether to init the object memory. Fix the call. Without this fix, __kasan_mempool_unpoison_object provides the object's size as GFP flags to unpoison_slab_object, which can cause LOCKDEP reports (and probably other issues). Link: https://lkml.kernel.org/r/20240614143238.60323-1-andrey.konovalov@linux.dev Fixes: 29d7355a9d05 ("kasan: save alloc stack traces for mempool") Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> Reported-by: Brad Spengler <spender@grsecurity.net> Acked-by: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25fix missing vmalloc.h includesKent Overstreet
Patch series "Memory allocation profiling", v6. Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production. Example output: root@moria-kvm:~# sort -rn /proc/allocinfo 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext 56373248 4737 mm/slub.c:2259 func:alloc_slab_page 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start 3940352 962 mm/memory.c:4214 func:alloc_anon_folio 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node ... Usage: kconfig options: - CONFIG_MEM_ALLOC_PROFILING - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT - CONFIG_MEM_ALLOC_PROFILING_DEBUG adds warnings for allocations that weren't accounted because of a missing annotation sysctl: /proc/sys/vm/mem_profiling Runtime info: /proc/allocinfo Notes: [1]: Overhead To measure the overhead we are comparing the following configurations: (1) Baseline with CONFIG_MEMCG_KMEM=n (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y Performance overhead: To evaluate performance we implemented an in-kernel test executing multiple get_free_page/free_page and kmalloc/kfree calls with allocation sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU affinity set to a specific CPU to minimize the noise. Below are results from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on 56 core Intel Xeon: kmalloc pgalloc (1 baseline) 6.764s 16.902s (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) (6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%) (7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%) Memory overhead: Kernel size: text data bss dec diff (1) 26515311 18890222 17018880 62424413 (2) 26524728 19423818 16740352 62688898 264485 (3) 26524724 19423818 16740352 62688894 264481 (4) 26524728 19423818 16740352 62688898 264485 (5) 26541782 18964374 16957440 62463596 39183 Memory consumption on a 56 core Intel CPU with 125GB of memory: Code tags: 192 kB PageExts: 262144 kB (256MB) SlabExts: 9876 kB (9.6MB) PcpuExts: 512 kB (0.5MB) Total overhead is 0.2% of total memory. Benchmarks: Hackbench tests run 100 times: hackbench -s 512 -l 200 -g 15 -f 25 -P baseline disabled profiling enabled profiling avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023) stdev 0.0137 0.0188 0.0077 hackbench -l 10000 baseline disabled profiling enabled profiling avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859) stdev 0.0933 0.0286 0.0489 stress-ng tests: stress-ng --class memory --seq 4 -t 60 stress-ng --class cpu --seq 4 -t 60 Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/ [2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/ This patch (of 37): The next patch drops vmalloc.h from a system header in order to fix a circular dependency; this adds it to all the files that were pulling it in implicitly. [kent.overstreet@linux.dev: fix arch/alpha/lib/memcpy.c] Link: https://lkml.kernel.org/r/20240327002152.3339937-1-kent.overstreet@linux.dev [surenb@google.com: fix arch/x86/mm/numa_32.c] Link: https://lkml.kernel.org/r/20240402180933.1663992-1-surenb@google.com [kent.overstreet@linux.dev: a few places were depending on sizes.h] Link: https://lkml.kernel.org/r/20240404034744.1664840-1-kent.overstreet@linux.dev [arnd@arndb.de: fix mm/kasan/hw_tags.c] Link: https://lkml.kernel.org/r/20240404124435.3121534-1-arnd@kernel.org [surenb@google.com: fix arc build] Link: https://lkml.kernel.org/r/20240405225115.431056-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-2-surenb@google.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>