git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-08-19	mm: rust: add page.rs to MEMORY MANAGEMENT - RUST	Alice Ryhl
	The page.rs file currently isn't included anywhere, and I think it's a good fit for the MEMORY MANAGEMENT - RUST entry. The file was originally added for use by Rust Binder, but I believe there is also work to use it in the upcoming scatterlist abstractions. Link: https://lkml.kernel.org/r/20250814075454.1596482-1-aliceryhl@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	iov_iter: iterate_folioq: fix handling of offset >= folio size	Dominique Martinet
	It's apparently possible to get an iov advanced all the way up to the end of the current page we're looking at, e.g. (gdb) p *iter $24 = {iter_type = 4 '\004', nofault = false, data_source = false, iov_offset = 4096, {__ubuf_iovec = { iov_base = 0xffff88800f5bc000, iov_len = 655}, {{__iov = 0xffff88800f5bc000, kvec = 0xffff88800f5bc000, bvec = 0xffff88800f5bc000, folioq = 0xffff88800f5bc000, xarray = 0xffff88800f5bc000, ubuf = 0xffff88800f5bc000}, count = 655}}, {nr_segs = 2, folioq_slot = 2 '\002', xarray_start = 2}} Where iov_offset is 4k with 4k-sized folios This should have been fine because we're only in the 2nd slot and there's another one after this, but iterate_folioq should not try to map a folio that skips the whole size, and more importantly part here does not end up zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not zero..), so skip forward to the "advance to next folio" code Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-0-a0ffad2b665a@codewreck.org Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-1-a0ffad2b665a@codewreck.org Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios") Reported-by: Maximilian Bosch <maximilian@mbosch.me> Reported-by: Ryan Lahfa <ryan@lahfa.xyz> Reported-by: Christian Theune <ct@flyingcircus.io> Reported-by: Arnout Engelen <arnout@bzzt.net> Link: https://lkml.kernel.org/r/D4LHHUNLG79Y.12PI0X6BEHRHW@mbosch.me/ Acked-by: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> [6.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	selftests/damon: fix selftests by installing drgn related script	Sang-Heon Jeon
	drgn_dump_damon_status is not installed during kselftest setup. It can break other tests which depend on drgn_dump_damon_status. Install drgn_dump_damon_status files to fix broken test. Link: https://lkml.kernel.org/r/20250812140046.660486-1-ekffu200098@gmail.com Fixes: f3e8e1e51362 ("selftests/damon: add drgn script for extracting damon status") Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Honggyu Kim <honggyu.kim@sk.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	.mailmap: add entry for Easwar Hariharan	Easwar Hariharan
	Map my old, obsolete work email address to my current one. Link: https://lkml.kernel.org/r/20250812180218.92755-1-easwar.hariharan@linux.microsoft.com Signed-off-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Cc: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Shannon Nelson <sln@onemain.com> Cc: Dmitry Baryshkov <lumag@kernel.org> Cc: Hans Verkuil <hverkuil@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	selftests/mm: add test for invalid multi VMA operations	Lorenzo Stoakes
	We can use UFFD to easily assert invalid multi VMA moves, so do so, asserting expected behaviour when VMAs invalid for a multi VMA operation are encountered. We assert both that such operations are not permitted, and that we do not even attempt to move the first VMA under these circumstances. We also assert that we can still move a single VMA regardless. We then assert that a partial failure can occur if the invalid VMA appears later in the range of multiple VMAs, both at the very next VMA, and also at the end of the range. As part of this change, we are using the is_range_valid() helper more aggressively. Therefore, fix a bug where stale buffered data would hang around on success, causing subsequent calls to is_range_valid() to potentially give invalid results. We simply have to fflush() the stream on success to resolve this issue. Link: https://lkml.kernel.org/r/c4fb86dd5ba37610583ad5fc0e0c2306ddf318b9.1754218667.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/mremap: catch invalid multi VMA moves earlier	Lorenzo Stoakes
	Previously, any attempt to solely move a VMA would require that the span specified reside within the span of that single VMA, with no gaps before or afterwards. After commit d23cb648e365 ("mm/mremap: permit mremap() move of multiple VMAs"), the multi VMA move permitted a gap to exist only after VMAs. This was done to provide maximum flexibility. However, We have consequently permitted this behaviour for the move of a single VMA including those not eligible for multi VMA move. The change introduced here means that we no longer permit non-eligible VMAs from being moved in this way. This is consistent, as it means all eligible VMA moves are treated the same, and all non-eligible moves are treated as they were before. This change does not break previous behaviour, which equally would have disallowed such a move (only in all cases). [lorenzo.stoakes@oracle.com: do not incorrectly reference invalid VMA in VM_WARN_ON_ONCE()] Link: https://lkml.kernel.org/r/b6dbda20-667e-4053-abae-8ed4fa84bb6c@lucifer.local Link: https://lkml.kernel.org/r/2b5aad5681573be85b5b8fac61399af6fb6b68b6.1754218667.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/mremap: allow multi-VMA move when filesystem uses thp_get_unmapped_area	Lorenzo Stoakes
	The multi-VMA move functionality introduced in commit d23cb648e365 ("mm/mremap: permit mremap() move of multiple VMA") doesn't allow moves of file-backed mappings which specify a custom f_op->get_unmapped_area handler excepting hugetlb and shmem. We expand this to include thp_get_unmapped_area to support file-backed mappings for filesystems which use large folios. Additionally, when the first VMA in a range is not compatible with a multi-VMA move, instead of moving the first VMA and returning an error, this series results in us not moving anything and returning an error immediately. Examining this second change in detail: The semantics of multi-VMA moves in mremap() very clearly indicate that a failure can result in a partial move of VMAs. This is in line with other aggregate operations within the kernel, which share these semantics. There are two classes of failures we're concerned with - eligiblity for mutli-VMA move, and transient failures that would occur even if the user individually moved each VMA. The latter is due to out-of-memory conditions (which, given the allocations involved are small, would likely be fatal in any case), or hitting the mapping limit. Regardless of the cause, transient issues would be fatal anyway, so it isn't really material which VMAs succeeded at being moved or not. However with when it comes to multi-VMA move eligiblity, we face another issue - we must allow a single VMA to succeed regardless of this eligiblity (as, of course, it is not a multi-VMA move) - but we must then fail multi-VMA operations. The two means by which VMAs may fail the eligbility test are - the VMAs being UFFD-armed, or the VMA being file-backed and providing its own f_op->get_unmapped_area() helper (because this may result in MREMAP_FIXED being disregarded), excepting those known to correctly handle MREMAP_FIXED. It is therefore conceivable that a user could erroneously try to use this functionality in these instances, and would prefer to not perform any move at all should that occur. This series therefore avoids any move of subsequent VMAs should the first be multi-VMA move ineligble and the input span exceeds that of the first VMA. We also add detailed test logic to assert that multi VMA move with ineligible VMAs functions as expected. This patch (of 3): We currently restrict multi-VMA move to avoid filesystems or drivers which provide a custom f_op->get_unmapped_area handler unless it is known to correctly handle MREMAP_FIXED. We do this so we do not get unexpected result when moving from one area to another (for instance, if the handler would align things resulting in the moved VMAs having different gaps than the original mapping). More and more filesystems are moving to using large folios, and typically do so (in part) by setting f_op->get_unmapped_area to thp_get_unmapped_area. When mremap() invokes the file system's get_unmapped MREMAP_FIXED, it does so via get_unmapped_area(), called in vrm_set_new_addr(). In order to do so, it converts the MREMAP_FIXED flag to a MAP_FIXED flag and passes this to the unmapped area handler. The __get_unmapped_area() function (called by get_unmapped_area()) in turn invokes the filesystem or driver's f_op->get_unmapped_area() handler. Therefore this is a point at which thp_get_unmapped_area() may be called (also, this is the case for anonymous mappings where the size is huge page aligned). thp_get_unmapped_area() calls thp_get_unmapped_area_vmflags() and __thp_get_unmapped_area() in turn (falling back to mm_get_unmapped_area_vm_flags() which is known to handle MAP_FIXED correctly). The __thp_get_unmapped_area() function in turn does nothing to change the address hint, nor the MAP_FIXED flag, only adjusting alignment parameters. It hten calls mm_get_unmapped_area_vmflags(), and in turn arch-specific unmapped area functions, all of which honour MAP_FIXED correctly. Therefore, we can safely add thp_get_unmapped_area to the known-good handlers. Link: https://lkml.kernel.org/r/cover.1754218667.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/4f2542340c29c84d3d470b0c605e916b192f6c81.1754218667.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/damon/core: fix commit_ops_filters by using correct nth function	Sang-Heon Jeon
	damos_commit_ops_filters() incorrectly uses damos_nth_filter() which iterates core_filters. As a result, performing a commit unintentionally corrupts ops_filters. Add damos_nth_ops_filter() which iterates ops_filters. Use this function to fix issues caused by wrong iteration. Link: https://lkml.kernel.org/r/20250810124201.15743-1-ekffu200098@gmail.com Fixes: 3607cc590f18 ("mm/damon/core: support committing ops_filters") # 6.15.x Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	tools/testing: add linux/args.h header and fix radix, VMA tests	Lorenzo Stoakes
	Commit 857d18f23ab1 ("cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks") accidentally broke the radix tree, VMA userland tests by including linux/args.h which is not present in the tools/include directory. This patch copies this over and adds an #ifdef block to avoid duplicate __CONCAT declaration in conflict with system headers when we ultimately include this. Link: https://lkml.kernel.org/r/20250811052654.33286-1-lorenzo.stoakes@oracle.com Fixes: 857d18f23ab1 ("cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/debug_vm_pgtable: clear page table entries at destroy_args()	Herton R. Krzesinski
	The mm/debug_vm_pagetable test allocates manually page table entries for the tests it runs, using also its manually allocated mm_struct. That in itself is ok, but when it exits, at destroy_args() it fails to clear those entries with the *_clear functions. The problem is that leaves stale entries. If another process allocates an mm_struct with a pgd at the same address, it may end up running into the stale entry. This is happening in practice on a debug kernel with CONFIG_DEBUG_VM_PGTABLE=y, for example this is the output with some extra debugging I added (it prints a warning trace if pgtables_bytes goes negative, in addition to the warning at check_mm() function): [ 2.539353] debug_vm_pgtable: [get_random_vaddr ]: random_vaddr is 0x7ea247140000 [ 2.539366] kmem_cache info [ 2.539374] kmem_cachep 0x000000002ce82385 - freelist 0x0000000000000000 - offset 0x508 [ 2.539447] debug_vm_pgtable: [init_args ]: args->mm is 0x000000002267cc9e (...) [ 2.552800] WARNING: CPU: 5 PID: 116 at include/linux/mm.h:2841 free_pud_range+0x8bc/0x8d0 [ 2.552816] Modules linked in: [ 2.552843] CPU: 5 UID: 0 PID: 116 Comm: modprobe Not tainted 6.12.0-105.debug_vm2.el10.ppc64le+debug #1 VOLUNTARY [ 2.552859] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW910.00 (VL910_062) hv:phyp pSeries [ 2.552872] NIP: c0000000007eef3c LR: c0000000007eef30 CTR: c0000000003d8c90 [ 2.552885] REGS: c0000000622e73b0 TRAP: 0700 Not tainted (6.12.0-105.debug_vm2.el10.ppc64le+debug) [ 2.552899] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002822 XER: 0000000a [ 2.552954] CFAR: c0000000008f03f0 IRQMASK: 0 [ 2.552954] GPR00: c0000000007eef30 c0000000622e7650 c000000002b1ac00 0000000000000001 [ 2.552954] GPR04: 0000000000000008 0000000000000000 c0000000007eef30 ffffffffffffffff [ 2.552954] GPR08: 00000000ffff00f5 0000000000000001 0000000000000048 0000000000004000 [ 2.552954] GPR12: 00000003fa440000 c000000017ffa300 c0000000051d9f80 ffffffffffffffdb [ 2.552954] GPR16: 0000000000000000 0000000000000008 000000000000000a 60000000000000e0 [ 2.552954] GPR20: 4080000000000000 c0000000113af038 00007fffcf130000 0000700000000000 [ 2.552954] GPR24: c000000062a6a000 0000000000000001 8000000062a68000 0000000000000001 [ 2.552954] GPR28: 000000000000000a c000000062ebc600 0000000000002000 c000000062ebc760 [ 2.553170] NIP [c0000000007eef3c] free_pud_range+0x8bc/0x8d0 [ 2.553185] LR [c0000000007eef30] free_pud_range+0x8b0/0x8d0 [ 2.553199] Call Trace: [ 2.553207] [c0000000622e7650] [c0000000007eef30] free_pud_range+0x8b0/0x8d0 (unreliable) [ 2.553229] [c0000000622e7750] [c0000000007f40b4] free_pgd_range+0x284/0x3b0 [ 2.553248] [c0000000622e7800] [c0000000007f4630] free_pgtables+0x450/0x570 [ 2.553274] [c0000000622e78e0] [c0000000008161c0] exit_mmap+0x250/0x650 [ 2.553292] [c0000000622e7a30] [c0000000001b95b8] __mmput+0x98/0x290 [ 2.558344] [c0000000622e7a80] [c0000000001d1018] exit_mm+0x118/0x1b0 [ 2.558361] [c0000000622e7ac0] [c0000000001d141c] do_exit+0x2ec/0x870 [ 2.558376] [c0000000622e7b60] [c0000000001d1ca8] do_group_exit+0x88/0x150 [ 2.558391] [c0000000622e7bb0] [c0000000001d1db8] sys_exit_group+0x48/0x50 [ 2.558407] [c0000000622e7be0] [c00000000003d810] system_call_exception+0x1e0/0x4c0 [ 2.558423] [c0000000622e7e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec (...) [ 2.558892] ---[ end trace 0000000000000000 ]--- [ 2.559022] BUG: Bad rss-counter state mm:000000002267cc9e type:MM_ANONPAGES val:1 [ 2.559037] BUG: non-zero pgtables_bytes on freeing mm: -6144 Here the modprobe process ended up with an allocated mm_struct from the mm_struct slab that was used before by the debug_vm_pgtable test. That is not a problem, since the mm_struct is initialized again etc., however, if it ends up using the same pgd table, it bumps into the old stale entry when clearing/freeing the page table entries, so it tries to free an entry already gone (that one which was allocated by the debug_vm_pgtable test), which also explains the negative pgtables_bytes since it's accounting for not allocated entries in the current process. As far as I looked pgd_{alloc,free} etc. does not clear entries, and clearing of the entries is explicitly done in the free_pgtables-> free_pgd_range->free_p4d_range->free_pud_range->free_pmd_range-> free_pte_range path. However, the debug_vm_pgtable test does not call free_pgtables, since it allocates mm_struct and entries manually for its test and eg. not goes through page faults. So it also should clear manually the entries before exit at destroy_args(). This problem was noticed on a reboot X number of times test being done on a powerpc host, with a debug kernel with CONFIG_DEBUG_VM_PGTABLE enabled. Depends on the system, but on a 100 times reboot loop the problem could manifest once or twice, if a process ends up getting the right mm->pgd entry with the stale entries used by mm/debug_vm_pagetable. After using this patch, I couldn't reproduce/experience the problems anymore. I was able to reproduce the problem as well on latest upstream kernel (6.16). I also modified destroy_args() to use mmput() instead of mmdrop(), there is no reason to hold mm_users reference and not release the mm_struct entirely, and in the output above with my debugging prints I already had patched it to use mmput, it did not fix the problem, but helped in the debugging as well. Link: https://lkml.kernel.org/r/20250731214051.4115182-1-herton@redhat.com Fixes: 3c9b84f044a9 ("mm/debug_vm_pgtable: introduce struct pgtable_debug_args") Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Gavin Shan <gshan@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	squashfs: fix memory leak in squashfs_fill_super	Phillip Lougher
	If sb_min_blocksize returns 0, squashfs_fill_super exits without freeing allocated memory (sb->s_fs_info). Fix this by moving the call to sb_min_blocksize to before memory is allocated. Link: https://lkml.kernel.org/r/20250811223740.110392-1-phillip@squashfs.org.uk Fixes: 734aa85390ea ("Squashfs: check return result of sb_min_blocksize") Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: Scott GUO <scottzhguo@tencent.com> Closes: https://lore.kernel.org/all/20250811061921.3807353-1-scott_gzh@163.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	kho: warn if KHO is disabled due to an error	Pasha Tatashin
	During boot scratch area is allocated based on command line parameters or auto calculated. However, scratch area may fail to allocate, and in that case KHO is disabled. Currently, no warning is printed that KHO is disabled, which makes it confusing for the end user to figure out why KHO is not available. Add the missing warning message. Link: https://lkml.kernel.org/r/20250808201804.772010-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	kho: mm: don't allow deferred struct page with KHO	Pasha Tatashin
	KHO uses struct pages for the preserved memory early in boot, however, with deferred struct page initialization, only a small portion of memory has properly initialized struct pages. This problem was detected where vmemmap is poisoned, and illegal flag combinations are detected. Don't allow them to be enabled together, and later we will have to teach KHO to work properly with deferred struct page init kernel feature. Link: https://lkml.kernel.org/r/20250808201804.772010-3-pasha.tatashin@soleen.com Fixes: 4e1d010e3bda ("kexec: add config option for KHO") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	kho: init new_physxa->phys_bits to fix lockdep	Pasha Tatashin
	Patch series "Several KHO Hotfixes". Three unrelated fixes for Kexec Handover. This patch (of 3): Lockdep shows the following warning: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. [<ffffffff810133a6>] dump_stack_lvl+0x66/0xa0 [<ffffffff8136012c>] assign_lock_key+0x10c/0x120 [<ffffffff81358bb4>] register_lock_class+0xf4/0x2f0 [<ffffffff813597ff>] __lock_acquire+0x7f/0x2c40 [<ffffffff81360cb0>] ? __pfx_hlock_conflict+0x10/0x10 [<ffffffff811707be>] ? native_flush_tlb_global+0x8e/0xa0 [<ffffffff8117096e>] ? __flush_tlb_all+0x4e/0xa0 [<ffffffff81172fc2>] ? __kernel_map_pages+0x112/0x140 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff81359556>] lock_acquire+0xe6/0x280 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff8100b9e0>] _raw_spin_lock+0x30/0x40 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff813ec327>] xa_load_or_alloc+0x67/0xe0 [<ffffffff813eb4c0>] kho_preserve_folio+0x90/0x100 [<ffffffff813ebb7f>] __kho_finalize+0xcf/0x400 [<ffffffff813ebef4>] kho_finalize+0x34/0x70 This is becase xa has its own lock, that is not initialized in xa_load_or_alloc. Modifiy __kho_preserve_order(), to properly call xa_init(&new_physxa->phys_bits); Link: https://lkml.kernel.org/r/20250808201804.772010-2-pasha.tatashin@soleen.com Fixes: fc33e4b44b27 ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	NFS: Fix a race when updating an existing write	Trond Myklebust
	After nfs_lock_and_join_requests() tests for whether the request is still attached to the mapping, nothing prevents a call to nfs_inode_remove_request() from succeeding until we actually lock the page group. The reason is that whoever called nfs_inode_remove_request() doesn't necessarily have a lock on the page group head. So in order to avoid races, let's take the page group lock earlier in nfs_lock_and_join_requests(), and hold it across the removal of the request in nfs_inode_remove_request(). Reported-by: Jeff Layton <jlayton@kernel.org> Tested-by: Joe Quanaim <jdq@meta.com> Tested-by: Andrew Steffen <aksteffen@meta.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: bd37d6fce184 ("NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request()") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-19	Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull mount fixes from Al Viro: "Fixes for several recent mount-related regressions" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: change_mnt_propagation(): calculate propagation source only if we'll need it use uniform permission checks for all mount propagation changes propagate_umount(): only surviving overmounts should be reparented fix the softlockups in attach_recursive_mnt()
2025-08-19	Merge tag 'ovl-fixes-6.17-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs fixes from Amir Goldstein: "Fixes for two fallouts from Neil's directory locking changes" * tag 'ovl-fixes-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: fix possible double unlink ovl: use I_MUTEX_PARENT when locking parent in ovl_create_temp()
2025-08-19	Merge tag 'vfs-6.17-rc3.fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix two memory leaks in pidfs - Prevent changing the idmapping of an already idmapped mount without OPEN_TREE_CLONE through open_tree_attr() - Don't fail listing extended attributes in kernfs when no extended attributes are set - Fix the return value in coredump_parse() - Fix the error handling for unbuffered writes in netfs - Fix broken data integrity guarantees for O_SYNC writes via iomap - Fix UAF in __mark_inode_dirty() - Keep inode->i_blkbits constant in fuse - Fix coredump selftests - Fix get_unused_fd_flags() usage in do_handle_open() - Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES - Fix use-after-free in bh_read() - Fix incorrect lflags value in the move_mount() syscall * tag 'vfs-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: signal: Fix memory leak for PIDFD_SELF* sentinels kernfs: don't fail listing extended attributes coredump: Fix return value in coredump_parse() fs/buffer: fix use-after-free when call bh_read() helper pidfs: Fix memory leak in pidfd_info() netfs: Fix unbuffered write error handling fhandle: do_handle_open() should get FD with user flags module: Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES fs: fix incorrect lflags value in the move_mount syscall selftests/coredump: Remove the read() that fails the test fuse: keep inode->i_blkbits constant iomap: Fix broken data integrity guarantees for O_SYNC writes selftests/mount_setattr: add smoke tests for open_tree_attr(2) bug open_tree_attr: do not allow id-mapping changes without OPEN_TREE_CLONE fs: writeback: fix use-after-free in __mark_inode_dirty()
2025-08-19	cifs: Fix oops due to uninitialised variable	David Howells
	Fix smb3_init_transform_rq() to initialise buffer to NULL before calling netfs_alloc_folioq_buffer() as netfs assumes it can append to the buffer it is given. Setting it to NULL means it should start a fresh buffer, but the value is currently undefined. Fixes: a2906d3316fc ("cifs: Switch crypto buffer to use a folio_queue rather than an xarray") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.org> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-19	change_mnt_propagation(): calculate propagation source only if we'll need it	Al Viro
	We only need it when mount in question was sending events downstream (then recepients need to switch to new master) or the mount is being turned into slave (then we need a new master for it). That wouldn't be a big deal, except that it causes quite a bit of work when umount_tree() is taking a large peer group out. Adding a trivial "don't bother calling propagation_source() unless we are going to use its results" logics improves the things quite a bit. We are still doing unnecessary work on bulk removals from propagation graph, but the full solution for that will have to wait for the next merge window. Fixes: 955336e204ab "do_make_slave(): choose new master sanely" Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-08-19	use uniform permission checks for all mount propagation changes	Al Viro
	do_change_type() and do_set_group() are operating on different aspects of the same thing - propagation graph. The latter asks for mounts involved to be mounted in namespace(s) the caller has CAP_SYS_ADMIN for. The former is a mess - originally it didn't even check that mount is mounted. That got fixed, but the resulting check turns out to be too strict for userland - in effect, we check that mount is in our namespace, having already checked that we have CAP_SYS_ADMIN there. What we really need (in both cases) is * only touch mounts that are mounted. That's a must-have constraint - data corruption happens if it get violated. * don't allow to mess with a namespace unless you already have enough permissions to do so (i.e. CAP_SYS_ADMIN in its userns). That's an equivalent of what do_set_group() does; let's extract that into a helper (may_change_propagation()) and use it in both do_set_group() and do_change_type(). Fixes: 12f147ddd6de "do_change_type(): refuse to operate on unmounted/not ours mounts" Acked-by: Andrei Vagin <avagin@gmail.com> Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-08-19	propagate_umount(): only surviving overmounts should be reparented	Al Viro
	... as the comments in reparent() clearly say. As it is, we reparent all overmounts of the mounts being taken out, including those that are taken out themselves. It's not only a potentially massive slowdown (on a pathological setup we might end up with O(N^2) time for N mounts being kicked out), it can end up with incorrect ->overmount in the surviving mounts. Fixes: f0d0ba19985d "Rewrite of propagate_umount()" Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-08-19	fix the softlockups in attach_recursive_mnt()	Al Viro
	In case when we mounting something on top of a large stack of overmounts, all of them being peers of each other, we get quadratic time by the depth of overmount stack. Easily fixed by doing commit_tree() before reparenting the overmount; simplifies commit_tree() as well - it doesn't need to skip the already mounted stuff that had been reparented on top of the new mounts. Since we are holding mount_lock through both reparenting and call of commit_tree(), the order does not matter from the mount hash point of view. Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com> Tested-by: "Lai, Yi" <yi1.lai@linux.intel.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Fixes: 663206854f02 "copy_tree(): don't link the mounts via mnt_list" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-08-19	regulator: tps65219: regulator: tps65219: Fix error codes in probe()	Dan Carpenter
	There is a copy and paste error and we accidentally use "PTR_ERR(rdev)" instead of "error". The "rdev" pointer is valid at this point. Also there is no need to print the error code in the error message because dev_err_probe() already prints that. So clean up the error message a bit. Fixes: 38c9f98db20a ("regulator: tps65219: Add support for TPS65215 Regulator IRQs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/aKRGmVdbvT1HBvm8@stanley.mountain Signed-off-by: Mark Brown <broonie@kernel.org>
2025-08-19	drm/xe: Assign ioctl xe file handler to vm in xe_vm_create	Piotr Piórkowski
	In several code paths, such as xe_pt_create(), the vm->xef field is used to determine whether a VM originates from userspace or the kernel. Previously, this handler was only assigned in xe_vm_create_ioctl(), after the VM was created by xe_vm_create(). However, xe_vm_create() triggers page table creation, and that function assumes vm->xef should be already set. This could lead to incorrect origin detection. To fix this problem and ensure consistency in the initialization of the VM object, let's move the assignment of this handler to xe_vm_create. v2: - take reference to the xe file object only when xef is not NULL - release the reference to the xe file object on the error path (Matthew) Fixes: 7f387e6012b6 ("drm/xe: add XE_BO_FLAG_PINNED_LATE_RESTORE") Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250811104358.2064150-2-piotr.piorkowski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> (cherry picked from commit 9337166fa1d80f7bb7c7d3a8f901f21c348c0f2a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-19	usb: xhci: fix host not responding after suspend and resume	Niklas Neronin
	Partially revert commit e1db856bd288 ("usb: xhci: remove '0' write to write-1-to-clear register") because the patch cleared the Interrupt Pending bit during interrupt enabling and disabling. The Interrupt Pending bit should only be cleared when the driver has handled the interrupt. Ideally, all interrupts should be handled before disabling the interrupt; consequently, no interrupt should be pending when enabling the interrupt. For this reason, keep the debug message informing if an interrupt is still pending when an interrupt is disabled. Because the Interrupt Pending bit is write-1-to-clear, writing '0' to it ensures that the state does not change. Link: https://lore.kernel.org/linux-usb/20250818231103.672ec7ed@foxbook Fixes: e1db856bd288 ("usb: xhci: remove '0' write to write-1-to-clear register") Closes: https://bbs.archlinux.org/viewtopic.php?id=307641 cc: stable@vger.kernel.org # 6.16+ Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250819125844.2042452-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19	usb: xhci: Fix slot_id resource race conflict	Weitao Wang
	xHC controller may immediately reuse a slot_id after it's disabled, giving it to a new enumerating device before the xhci driver freed all resources related to the disabled device. In such a scenario, device-A with slot_id equal to 1 is disconnecting while device-B is enumerating, device-B will fail to enumerate in the follow sequence. 1.[device-A] send disable slot command 2.[device-B] send enable slot command 3.[device-A] disable slot command completed and wakeup waiting thread 4.[device-B] enable slot command completed with slot_id equal to 1 and wakeup waiting thread 5.[device-B] driver checks that slot_id is still in use (by device-A) in xhci_alloc_virt_device, and fail to enumerate due to this conflict 6.[device-A] xhci->devs[slot_id] set to NULL in xhci_free_virt_device To fix driver's slot_id resources conflict, clear xhci->devs[slot_id] and xhci->dcbba->dev_context_ptrs[slot_id] pointers in the interrupt context when disable slot command completes successfully. Simultaneously, adjust function xhci_free_virt_device to accurately handle device release. [minor smatch warning and commit message fix -Mathias] Cc: stable@vger.kernel.org Fixes: 7faac1953ed1 ("xhci: avoid race between disable slot command and host runtime suspend") Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250819125844.2042452-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19	signal: Fix memory leak for PIDFD_SELF* sentinels	Adrian Huang (Lenovo)
	Commit f08d0c3a7111 ("pidfd: add PIDFD_SELF* sentinels to refer to own thread/process") introduced a leak by acquiring a pid reference through get_task_pid(), which increments pid->count but never drops it with put_pid(). As a result, kmemleak reports unreferenced pid objects after running tools/testing/selftests/pidfd/pidfd_test, for example: unreferenced object 0xff1100206757a940 (size 160): comm "pidfd_test", pid 16965, jiffies 4294853028 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 fd 57 50 04 .............WP. 5e 44 00 00 00 00 00 00 18 de 34 17 01 00 11 ff ^D........4..... backtrace (crc cd8844d4): kmem_cache_alloc_noprof+0x2f4/0x3f0 alloc_pid+0x54/0x3d0 copy_process+0xd58/0x1740 kernel_clone+0x99/0x3b0 __do_sys_clone3+0xbe/0x100 do_syscall_64+0x7b/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fix this by calling put_pid() after do_pidfd_send_signal() returns. Fixes: f08d0c3a7111 ("pidfd: add PIDFD_SELF* sentinels to refer to own thread/process") Signed-off-by: Adrian Huang (Lenovo) <adrianhuang0701@gmail.com> Link: https://lore.kernel.org/20250818134310.12273-1-adrianhuang0701@gmail.com Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-19	kernfs: don't fail listing extended attributes	Christian Brauner
	Userspace doesn't expect a failure to list extended attributes: $ ls -lA /sys/ ls: /sys/: No data available ls: /sys/kernel: No data available ls: /sys/power: No data available ls: /sys/class: No data available ls: /sys/devices: No data available ls: /sys/dev: No data available ls: /sys/hypervisor: No data available ls: /sys/fs: No data available ls: /sys/bus: No data available ls: /sys/firmware: No data available ls: /sys/block: No data available ls: /sys/module: No data available total 0 drwxr-xr-x 2 root root 0 Jan 1 1970 block drwxr-xr-x 52 root root 0 Jan 1 1970 bus drwxr-xr-x 88 root root 0 Jan 1 1970 class drwxr-xr-x 4 root root 0 Jan 1 1970 dev drwxr-xr-x 11 root root 0 Jan 1 1970 devices drwxr-xr-x 3 root root 0 Jan 1 1970 firmware drwxr-xr-x 10 root root 0 Jan 1 1970 fs drwxr-xr-x 2 root root 0 Jul 2 09:43 hypervisor drwxr-xr-x 14 root root 0 Jan 1 1970 kernel drwxr-xr-x 251 root root 0 Jan 1 1970 module drwxr-xr-x 3 root root 0 Jul 2 09:43 power Fix it by simply reporting success when no extended attributes are available instead of reporting ENODATA. Link: https://lore.kernel.org/78b13bcdae82ade95e88f315682966051f461dde.camel@linaro.org Fixes: d1f4e9026007 ("kernfs: remove iattr_mutex") # mainline only Reported-by: André Draszik <andre.draszik@linaro.org> Link: https://lore.kernel.org/20250819-ahndung-abgaben-524a535f8101@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-19	coredump: Fix return value in coredump_parse()	Dan Carpenter
	The coredump_parse() function is bool type. It should return true on success and false on failure. The cn_printf() returns zero on success or negative error codes. This mismatch means that when "return err;" here, it is treated as success instead of failure. Change it to return false instead. Fixes: a5715af549b2 ("coredump: make coredump_parse() return bool") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/aKRGu14w5vPSZLgv@stanley.mountain Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-19	fs/buffer: fix use-after-free when call bh_read() helper	Ye Bin
	There's issue as follows: BUG: KASAN: stack-out-of-bounds in end_buffer_read_sync+0xe3/0x110 Read of size 8 at addr ffffc9000168f7f8 by task swapper/3/0 CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.16.0-862.14.0.6.x86_64 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <IRQ> dump_stack_lvl+0x55/0x70 print_address_description.constprop.0+0x2c/0x390 print_report+0xb4/0x270 kasan_report+0xb8/0xf0 end_buffer_read_sync+0xe3/0x110 end_bio_bh_io_sync+0x56/0x80 blk_update_request+0x30a/0x720 scsi_end_request+0x51/0x2b0 scsi_io_completion+0xe3/0x480 ? scsi_device_unbusy+0x11e/0x160 blk_complete_reqs+0x7b/0x90 handle_softirqs+0xef/0x370 irq_exit_rcu+0xa5/0xd0 sysvec_apic_timer_interrupt+0x6e/0x90 </IRQ> Above issue happens when do ntfs3 filesystem mount, issue may happens as follows: mount IRQ ntfs_fill_super read_cache_page do_read_cache_folio filemap_read_folio mpage_read_folio do_mpage_readpage ntfs_get_block_vbo bh_read submit_bh wait_on_buffer(bh); blk_complete_reqs scsi_io_completion scsi_end_request blk_update_request end_bio_bh_io_sync end_buffer_read_sync __end_buffer_read_notouch unlock_buffer wait_on_buffer(bh);--> return will return to caller put_bh --> trigger stack-out-of-bounds In the mpage_read_folio() function, the stack variable 'map_bh' is passed to ntfs_get_block_vbo(). Once unlock_buffer() unlocks and wait_on_buffer() returns to continue processing, the stack variable is likely to be reclaimed. Consequently, during the end_buffer_read_sync() process, calling put_bh() may result in stack overrun. If the bh is not allocated on the stack, it belongs to a folio. Freeing a buffer head which belongs to a folio is done by drop_buffers() which will fail to free buffers which are still locked. So it is safe to call put_bh() before __end_buffer_read_notouch(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/20250811141830.343774-1-yebin@huaweicloud.com Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-19	most: core: Drop device reference after usage in get_channel()	Miaoqian Lin
	In get_channel(), the reference obtained by bus_find_device_by_name() was dropped via put_device() before accessing the device's driver data Move put_device() after usage to avoid potential issues. Fixes: 2485055394be ("staging: most: core: drop device reference") Cc: stable <stable@kernel.org> Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20250804082955.3621026-1-linmq006@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19	comedi: Make insn_rw_emulate_bits() do insn->n samples	Ian Abbott
	The `insn_rw_emulate_bits()` function is used as a default handler for `INSN_READ` instructions for subdevices that have a handler for `INSN_BITS` but not for `INSN_READ`. Similarly, it is used as a default handler for `INSN_WRITE` instructions for subdevices that have a handler for `INSN_BITS` but not for `INSN_WRITE`. It works by emulating the `INSN_READ` or `INSN_WRITE` instruction handling with a constructed `INSN_BITS` instruction. However, `INSN_READ` and `INSN_WRITE` instructions are supposed to be able read or write multiple samples, indicated by the `insn->n` value, but `insn_rw_emulate_bits()` currently only handles a single sample. For `INSN_READ`, the comedi core will copy `insn->n` samples back to user-space. (That triggered KASAN kernel-infoleak errors when `insn->n` was greater than 1, but that is being fixed more generally elsewhere in the comedi core.) Make `insn_rw_emulate_bits()` either handle `insn->n` samples, or return an error, to conform to the general expectation for `INSN_READ` and `INSN_WRITE` handlers. Fixes: ed9eccbe8970 ("Staging: add comedi core") Cc: stable <stable@kernel.org> # 5.13+ Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/20250725141034.87297-1-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19	comedi: Fix use of uninitialized memory in do_insn_ioctl() and ↵	Ian Abbott
	do_insnlist_ioctl() syzbot reports a KMSAN kernel-infoleak in `do_insn_ioctl()`. A kernel buffer is allocated to hold `insn->n` samples (each of which is an `unsigned int`). For some instruction types, `insn->n` samples are copied back to user-space, unless an error code is being returned. The problem is that not all the instruction handlers that need to return data to userspace fill in the whole `insn->n` samples, so that there is an information leak. There is a similar syzbot report for `do_insnlist_ioctl()`, although it does not have a reproducer for it at the time of writing. One culprit is `insn_rw_emulate_bits()` which is used as the handler for `INSN_READ` or `INSN_WRITE` instructions for subdevices that do not have a specific handler for that instruction, but do have an `INSN_BITS` handler. For `INSN_READ` it only fills in at most 1 sample, so if `insn->n` is greater than 1, the remaining `insn->n - 1` samples copied to userspace will be uninitialized kernel data. Another culprit is `vm80xx_ai_insn_read()` in the "vm80xx" driver. It never returns an error, even if it fails to fill the buffer. Fix it in `do_insn_ioctl()` and `do_insnlist_ioctl()` by making sure that uninitialized parts of the allocated buffer are zeroed before handling each instruction. Thanks to Arnaud Lecomte for their fix to `do_insn_ioctl()`. That fix replaced the call to `kmalloc_array()` with `kcalloc()`, but it is not always necessary to clear the whole buffer. Fixes: ed9eccbe8970 ("Staging: add comedi core") Reported-by: syzbot+a5e45f768aab5892da5d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a5e45f768aab5892da5d Reported-by: syzbot+fb4362a104d45ab09cf9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fb4362a104d45ab09cf9 Cc: stable <stable@kernel.org> # 5.13+ Cc: Arnaud Lecomte <contact@arnaud-lcm.com> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/20250725125324.80276-1-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19	comedi: pcl726: Prevent invalid irq number	Edward Adam Davis
	The reproducer passed in an irq number(0x80008000) that was too large, which triggered the oob. Added an interrupt number check to prevent users from passing in an irq number that was too large. If `it->options[1]` is 31, then `1 << it->options[1]` is still invalid because it shifts a 1-bit into the sign bit (which is UB in C). Possible solutions include reducing the upper bound on the `it->options[1]` value to 30 or lower, or using `1U << it->options[1]`. The old code would just not attempt to request the IRQ if the `options[1]` value were invalid. And it would still configure the device without interrupts even if the call to `request_irq` returned an error. So it would be better to combine this test with the test below. Fixes: fff46207245c ("staging: comedi: pcl726: enable the interrupt support code") Cc: stable <stable@kernel.org> # 5.13+ Reported-by: syzbot+5cd373521edd68bebcb3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5cd373521edd68bebcb3 Tested-by: syzbot+5cd373521edd68bebcb3@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reviewed-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/tencent_3C66983CC1369E962436264A50759176BF09@qq.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19	cdx: Fix off-by-one error in cdx_rpmsg_probe()	Thorsten Blum
	In cdx_rpmsg_probe(), strscpy() is incorrectly called with the length of the source string (excluding the NUL terminator) rather than the size of the destination buffer. This results in one character less being copied from 'cdx_rpmsg_id_table[0].name' to 'chinfo.name'. Use the destination buffer size instead to ensure the name is copied correctly. Cc: stable <stable@kernel.org> Fixes: 2a226927d9b8 ("cdx: add rpmsg communication channel for CDX") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20250806090512.121260-2-thorsten.blum@linux.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19	fpga: zynq_fpga: Fix the wrong usage of dma_map_sgtable()	Xu Yilun
	dma_map_sgtable() returns only 0 or the error code. Read sgt->nents to get the number of mapped segments. Fixes: 37e00703228a ("zynq_fpga: use sgtable-based scatterlist wrappers") Reported-by: Pavel Pisa <pisa@fel.cvut.cz> Closes: https://lore.kernel.org/linux-fpga/202508041548.22955.pisa@fel.cvut.cz/ Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Tested-by: Pavel Pisa <pisa@fel.cvut.cz> Link: https://lore.kernel.org/r/20250806070605.1920909-2-yilun.xu@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19	usb: typec: fusb302: Revert incorrect threaded irq fix	Sebastian Reichel
	The fusb302 irq handler has been carefully optimized by Hans de Goede in commit 207338ec5a27 ("usb: typec: fusb302: Improve suspend/resume handling"). A recent 'fix' undid most of that work to avoid a virtio-gpio driver bug. This reverts the incorrect fix, since it is of very low quality. It reverts the quirks from Hans change (and thus reintroduces the problems fixed by Hans) while keeping the overhead from the original change. The proper fix to support using fusb302 with an interrupt line provided by virtio-gpio must be implemented in the virtio driver instead, which should support disabling the IRQ from the fusb302 interrupt routine. Cc: Hans de Goede <hansg@kernel.org> Cc: Yongbo Zhang <giraffesnn123@gmail.com> Fixes: 1c2d81bded19 ("usb: typec: fusb302: fix scheduling while atomic when using virtio-gpio") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://lore.kernel.org/r/20250818-fusb302-unthreaded-irq-v1-1-3a9a11a9f56f@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19	USB: core: Update kerneldoc for usb_hcd_giveback_urb()	Alan Stern
	The kerneldoc added for usb_hcd_giveback_urb() by commit 41631d3616c3 ("usb: core: Replace in_interrupt() in comments") is unclear and incorrect. Update the text for greater clarity and to say that URBs for a root hub will always use a BH context for their completion. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/41eaae05-116a-4568-940c-eeb94ab6baa0@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19	net: ti: icssg-prueth: Fix HSR and switch offload Enablement during firwmare ↵	MD Danish Anwar
	reload. To enable HSR / Switch offload, certain configurations are needed. Currently they are done inside icssg_change_mode(). This function only gets called if we move from one mode to another without bringing the links up / down. Once in HSR / Switch mode, if we bring the links down and bring it back up again. The callback sequence is, - emac_ndo_stop() Firmwares are stopped - emac_ndo_open() Firmwares are loaded In this path icssg_change_mode() doesn't get called and as a result the configurations needed for HSR / Switch is not done. To fix this, put all these configurations in a separate function icssg_enable_fw_offload() and call this from both icssg_change_mode() and emac_ndo_open() Fixes: 56375086d093 ("net: ti: icssg-prueth: Enable HSR Tx duplication, Tx Tag and Rx Tag offload") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Link: https://patch.msgid.link/20250814105106.1491871-1-danishanwar@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-19	ppp: fix race conditions in ppp_fill_forward_path	Qingfang Deng
	ppp_fill_forward_path() has two race conditions: 1. The ppp->channels list can change between list_empty() and list_first_entry(), as ppp_lock() is not held. If the only channel is deleted in ppp_disconnect_channel(), list_first_entry() may access an empty head or a freed entry, and trigger a panic. 2. pch->chan can be NULL. When ppp_unregister_channel() is called, pch->chan is set to NULL before pch is removed from ppp->channels. Fix these by using a lockless RCU approach: - Use list_first_or_null_rcu() to safely test and access the first list entry. - Convert list modifications on ppp->channels to their RCU variants and add synchronize_net() after removal. - Check for a NULL pch->chan before dereferencing it. Fixes: f6efc675c9dd ("net: ppp: resolve forwarding path for bridge pppoe devices") Signed-off-by: Qingfang Deng <dqfext@gmail.com> Link: https://patch.msgid.link/20250814012559.3705-2-dqfext@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-19	net: ethernet: mtk_ppe: add RCU lock around dev_fill_forward_path	Qingfang Deng
	Ensure ndo_fill_forward_path() is called with RCU lock held. Fixes: 2830e314778d ("net: ethernet: mtk-ppe: fix traffic offload with bridged wlan") Signed-off-by: Qingfang Deng <dqfext@gmail.com> Link: https://patch.msgid.link/20250814012559.3705-1-dqfext@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-19	x86/bugs: Fix GDS mitigation selecting when mitigation is off	Li RongQing
	The current GDS mitigation logic incorrectly returns early when the attack vector mitigation is turned off, which leads to two problems: 1. CPUs without ARCH_CAP_GDS_CTRL support are incorrectly marked with GDS_MITIGATION_OFF when they should be marked as GDS_MITIGATION_UCODE_NEEDED. 2. The mitigation state checks and locking verification that follow are skipped, which means: - fail to detect if the mitigation was locked - miss the warning when trying to disable a locked mitigation Remove the early return to ensure proper mitigation state handling. This allows: - Proper mitigation classification for non-ARCH_CAP_GDS_CTRL CPUs - Complete mitigation state verification This also addresses the failed MSR 0x123 write attempt at boot on non-ARCH_CAP_GDS_CTRL CPUs: unchecked MSR access error: WRMSR to 0x123 (tried to write 0x0000000000000010) at rIP: ... (update_gds_msr) Call Trace: identify_secondary_cpu start_secondary common_startup_64 WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/bugs.c:1053 update_gds_msr [ bp: Massage, zap superfluous braces. ] Fixes: 8c7261abcb7ad ("x86/bugs: Add attack vector controls for GDS") Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://lore.kernel.org/20250819023356.2012-1-lirongqing@baidu.com
2025-08-19	drm/i915/gt: Relocate compression repacking WA for JSL/EHL	Sebastian Brzezinka
	CACHE_MODE_0 registers should be saved and restored as part of the context, not during engine reset. Move the related workaround (Disable Repacking for Compression) from rcs_engine_wa_init() to icl_ctx_workarounds_init() for Jasper Lake and Elkhart Lake platforms. This ensures the WA is applied during context initialisation. BSPEC: 11322 Fixes: 0ddae025ab6c ("drm/i915: Disable compression tricks on JSL") Closes: Fixes: 0ddae025ab6c ("drm/i915: Disable compression tricks on JSL") Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> Cc: stable@vger.kernel.org # v6.13+ Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/4feaa24094e019e000ceb6011d8cd419b0361b3f.1754902406.git.sebastian.brzezinka@intel.com (cherry picked from commit c9932f0d604e4c8f2c6018e598a322acb43c68a2) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
2025-08-19	drm/i915: silence rpm wakeref asserts on GEN11_GU_MISC_IIR access	Jani Nikula
	Commit 8d9908e8fe9c ("drm/i915/display: remove small micro-optimizations in irq handling") not only removed the optimizations, it also enabled wakeref asserts for the GEN11_GU_MISC_IIR access. Silence the asserts by wrapping the access inside intel_display_rpm_assert_{block,unblock}(). Reported-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Closes: https://lore.kernel.org/r/aG0tWkfmxWtxl_xc@zx2c4.com Fixes: 8d9908e8fe9c ("drm/i915/display: remove small micro-optimizations in irq handling") Cc: stable@vger.kernel.org # v6.13+ Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jouni Högander <jouni.hogander@intel.com> Link: https://lore.kernel.org/r/20250805115656.832235-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit cbd3baeffbc08052ce7dc53f11bf5524b4411056) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
2025-08-19	ALSA: hda/realtek: Audio disappears on HP 15-fc000 after warm boot again	Kailang Yang
	There was a similar bug in the past (Bug 217440), which was fixed for this laptop. The same issue is occurring again as of kernel v.6.12.2. The symptoms are very similar - initially audio works but after a warm reboot, the audio completely disappears until the computer is powered off (there is no audio output at all). The issue is also related by caused by a different change now. By bisecting different kernel versions, I found that reverting cc3d0b5dd989 in patch_realtek.c[] restores the sound and it works fine after the reboot. [] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/sound/pci/hda/patch_realtek.c?h=v6.12.2&id=4ed7f16070a8475c088ff423b2eb11ba15eb89b6 [ patch description reformatted by tiwai ] Fixes: cc3d0b5dd989 ("ALSA: hda/realtek: Update ALC256 depop procedure") Link: https://bugzilla.kernel.org/show_bug.cgi?id=220109 Signed-off-by: Kailang Yang <kailang@realtek.com> Link: https://lore.kernel.org/5317ca723c82447a938414fcca85cbf5@realtek.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-08-19	ALSA: hda/realtek: Fix headset mic on ASUS Zenbook 14	Vasiliy Kovalev
	Add a PCI quirk to enable microphone input on the headphone jack on the ASUS Zenbook 14 UM3406HA laptop. This model uses an ALC294 codec with CS35L41 amplifiers over I2C, and the existing fixup for it did not enable the headset microphone. A new fix is introduced to get the mic working while keeping the amplifier settings correct. Fixes: 61cbc08fdb04 ("ALSA: hda/realtek: Add quirks for ASUS 2024 Zenbooks") Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Link: https://patch.msgid.link/20250818204243.247297-1-kovalev@altlinux.org Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-08-18	scsi: ufs: ufs-qcom: Fix ESI null pointer dereference	Nitin Rawat
	ESI/MSI is a performance optimization feature that provides dedicated interrupts per MCQ hardware queue. This is optional feature and UFS MCQ should work with and without ESI feature. Commit e46a28cea29a ("scsi: ufs: qcom: Remove the MSI descriptor abuse") brings a regression in ESI (Enhanced System Interrupt) configuration that causes a null pointer dereference when Platform MSI allocation fails. The issue occurs in when platform_device_msi_init_and_alloc_irqs() in ufs_qcom_config_esi() fails (returns -EINVAL) but the current code uses __free() macro for automatic cleanup free MSI resources that were never successfully allocated. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Call trace: mutex_lock+0xc/0x54 (P) platform_device_msi_free_irqs_all+0x1c/0x40 ufs_qcom_config_esi+0x1d0/0x220 [ufs_qcom] ufshcd_config_mcq+0x28/0x104 ufshcd_init+0xa3c/0xf40 ufshcd_pltfrm_init+0x504/0x7d4 ufs_qcom_probe+0x20/0x58 [ufs_qcom] Fix by restructuring the ESI configuration to try MSI allocation first, before any other resource allocation and instead use explicit cleanup instead of __free() macro to avoid cleanup of unallocated resources. Tested on SM8750 platform with MCQ enabled, both with and without Platform ESI support. Fixes: e46a28cea29a ("scsi: ufs: qcom: Remove the MSI descriptor abuse") Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Nitin Rawat <quic_nitirawa@quicinc.com> Link: https://lore.kernel.org/r/20250811073330.20230-1-quic_nitirawa@quicinc.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-08-18	scsi: ufs: core: Rename ufshcd_wait_for_doorbell_clr()	Bart Van Assche
	The name ufshcd_wait_for_doorbell_clr() refers to legacy mode. Commit 8d077ede48c1 ("scsi: ufs: Optimize the command queueing code") added support for MCQ mode in this function. Since then the name of this function is misleading. Hence change the name of this function into something that is appropriate for both legacy and MCQ mode. Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250815155842.472867-5-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-08-18	scsi: ufs: core: Fix the return value documentation	Bart Van Assche
	ufshcd_wait_for_dev_cmd() and all its callers can return an OCS error. OCS errors are represented by positive integers. Remove the WARN_ONCE() statements that complain about positive error codes and update the documentation. Keep the behavior of ufshcd_wait_for_dev_cmd() because this return value may end be passed as the second argument of bsg_job_done() and bsg_job_done() handles positive and negative error codes differently. Cc: Peter Wang <peter.wang@mediatek.com> Fixes: cc59f3b68542 ("scsi: ufs: core: Improve return value documentation") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250815155842.472867-4-bvanassche@acm.org Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>