summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-09-05Merge tag '6.17-RC4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Fix two potential NULL pointer references - Two debugging improvements (to help debug recent issues) a new tracepoint, and minor improvement to DebugData - Trivial comment cleanup * tag '6.17-RC4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: prevent NULL pointer dereference in UTF16 conversion smb: client: show negotiated cipher in DebugData smb: client: add new tracepoint to trace lease break notification smb: client: fix spellings in comments smb: client: Fix NULL pointer dereference in cifs_debug_dirs_proc_show()
2025-09-05btrfs: don't allow adding block device of less than 1 MBMark Harmstone
Commit 15ae0410c37a79 ("btrfs-progs: add error handling for device_get_partition_size_fd_stat()") in btrfs-progs inadvertently changed it so that if the BLKGETSIZE64 ioctl on a block device returned a size of 0, this was no longer seen as an error condition. Unfortunately this is how disconnected NBD devices behave, meaning that with btrfs-progs 6.16 it's now possible to add a device you can't remove: # btrfs device add /dev/nbd0 /root/temp # btrfs device remove /dev/nbd0 /root/temp ERROR: error removing device '/dev/nbd0': Invalid argument This check should always have been done kernel-side anyway, so add a check in btrfs_init_new_device() that the new device doesn't have a size less than BTRFS_DEVICE_RANGE_RESERVED (i.e. 1 MB). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-05xfs: enable online fsck by default in KconfigDarrick J. Wong
Online fsck has been a part of upstream for over a year now without any serious problems. Turn it on by default in time for the 2025 LTS kernel, and get rid of the "say N if unsure" messages for the default Y options. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2025-09-05xfs: use deferred reaping for data device cow extentsDarrick J. Wong
Don't roll the whole transaction after every extent, that's rather inefficient. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2025-09-05xfs: remove deprecated sysctl knobsDarrick J. Wong
These sysctl knobs were scheduled for removal in September 2025. That time has come, so remove them. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2025-09-05xfs: remove static reap limits from repair.hDarrick J. Wong
Delete XREAP_MAX_BINVAL and XREAP_MAX_DEFER_CHAIN because the reap code now calculates those limits dynamically, so they're no longer needed. Move the third limit (XREP_MAX_ITRUNCATE_EFIS) to the one file that uses it. Note that the btree rebuilding code should reserve exactly the number of blocks needed to rebuild a btree, so it is rare that the newbt code will need to add any EFIs to the commit transaction. That's why that static limit remains. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2025-09-05xfs: remove deprecated mount optionsDarrick J. Wong
These four mount options were scheduled for removal in September 2025, so remove them now. Cc: preichl@redhat.com Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2025-09-05xfs: disable deprecated features by default in KconfigDarrick J. Wong
We promised to turn off these old features by default in September 2025. Do so now. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2025-09-05xfs: compute file mapping reap limits dynamicallyDarrick J. Wong
Reaping file fork mappings is a little different -- log recovery can free the blocks for us, so we only try to process a single mapping at a time. Therefore, we only need to figure out the maximum number of blocks that we can invalidate in a single transaction. The rough calculation here is: nr_extents = (logres - reservation used by any one step) / (space used per binval) Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2025-09-05xfs: compute realtime device CoW staging extent reap limits dynamicallyDarrick J. Wong
Calculate the maximum number of CoW staging extents that can be reaped in a single transaction chain. The rough calculation here is: nr_extents = (logres - reservation used by any one step) / (space used by intents per extent) Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2025-09-05xfs: compute data device CoW staging extent reap limits dynamicallyDarrick J. Wong
Calculate the maximum number of CoW staging extents that can be reaped in a single transaction chain. The rough calculation here is: nr_extents = (logres - reservation used by any one step) / (space used by intents per extent + space used for a few buffer invalidations) Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2025-09-05xfs: compute per-AG extent reap limits dynamicallyDarrick J. Wong
Calculate the maximum number of extents that can be reaped in a single transaction chain, and the number of buffers that can be invalidated in a single transaction. The rough calculation here is: nr_extents = (logres - reservation used by any one step) / (space used by intents per extent + space used per binval) Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2025-09-05xfs: convert the ifork reap code to use xreap_stateDarrick J. Wong
Convert the file fork reaping code to use struct xreap_state so that we can reuse the dynamic state tracking code. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2025-09-05xfs: prepare reaping code for dynamic limitsDarrick J. Wong
The online repair block reaping code employs static limits to decide if it's time to roll the transaction or finish the deferred item chains to avoid overflowing the scrub transaction's reservation. However, the use of static limits aren't great -- btree blocks are assumed to be scattered around the AG and the buffers need to be invalidated, whereas COW staging extents are usually contiguous and do not have buffers. We would like to configure the limits dynamically. To get ready for this, reorganize struct xreap_state to store dynamic limits, and add helpers to hide some of the details of how the limits are enforced. Also rename the "xreap roll" functions to include the word "binval" because they only exist to decide when we should roll the transaction to deal with buffer invalidations. No functional changes intended here. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2025-09-05xfs: use deferred intent items for reaping crosslinked blocksDarrick J. Wong
When we're removing rmap records for crosslinked blocks, use deferred intent items so that we can try to free/unmap as many of the old data structure's blocks as we can in the same transaction as the commit. Cc: <stable@vger.kernel.org> # v6.6 Fixes: 1c7ce115e52106 ("xfs: reap large AG metadata extents when possible") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2025-09-05fuse: virtio_fs: fix page fault for DAX page addressHaiyue Wang
The commit ced17ee32a99 ("Revert "virtio: reject shm region if length is zero"") exposes the following DAX page fault bug (this fix the failure that getting shm region alway returns false because of zero length): The commit 21aa65bf82a7 ("mm: remove callers of pfn_t functionality") handles the DAX physical page address incorrectly: the removed macro 'phys_to_pfn_t()' should be replaced with 'PHYS_PFN()'. [ 1.390321] BUG: unable to handle page fault for address: ffffd3fb40000008 [ 1.390875] #PF: supervisor read access in kernel mode [ 1.391257] #PF: error_code(0x0000) - not-present page [ 1.391509] PGD 0 P4D 0 [ 1.391626] Oops: Oops: 0000 [#1] SMP NOPTI [ 1.391806] CPU: 6 UID: 1000 PID: 162 Comm: weston Not tainted 6.17.0-rc3-WSL2-STABLE #2 PREEMPT(none) [ 1.392361] RIP: 0010:dax_to_folio+0x14/0x60 [ 1.392653] Code: 52 c9 c3 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 c1 ef 05 48 c1 e7 06 48 03 3d 34 b5 31 01 <48> 8b 57 08 48 89 f8 f6 c2 01 75 2b 66 90 c3 cc cc cc cc f7 c7 ff [ 1.393727] RSP: 0000:ffffaf7d04407aa8 EFLAGS: 00010086 [ 1.394003] RAX: 000000a000000000 RBX: ffffaf7d04407bb0 RCX: 0000000000000000 [ 1.394524] RDX: ffffd17b40000008 RSI: 0000000000000083 RDI: ffffd3fb40000000 [ 1.394967] RBP: 0000000000000011 R08: 000000a000000000 R09: 0000000000000000 [ 1.395400] R10: 0000000000001000 R11: ffffaf7d04407c10 R12: 0000000000000000 [ 1.395806] R13: ffffa020557be9c0 R14: 0000014000000001 R15: 0000725970e94000 [ 1.396268] FS: 000072596d6d2ec0(0000) GS:ffffa0222dc59000(0000) knlGS:0000000000000000 [ 1.396715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.397100] CR2: ffffd3fb40000008 CR3: 000000011579c005 CR4: 0000000000372ef0 [ 1.397518] Call Trace: [ 1.397663] <TASK> [ 1.397900] dax_insert_entry+0x13b/0x390 [ 1.398179] dax_fault_iter+0x2a5/0x6c0 [ 1.398443] dax_iomap_pte_fault+0x193/0x3c0 [ 1.398750] __fuse_dax_fault+0x8b/0x270 [ 1.398997] ? vm_mmap_pgoff+0x161/0x210 [ 1.399175] __do_fault+0x30/0x180 [ 1.399360] do_fault+0xc4/0x550 [ 1.399547] __handle_mm_fault+0x8e3/0xf50 [ 1.399731] ? do_syscall_64+0x72/0x1e0 [ 1.399958] handle_mm_fault+0x192/0x2f0 [ 1.400204] do_user_addr_fault+0x20e/0x700 [ 1.400418] exc_page_fault+0x66/0x150 [ 1.400602] asm_exc_page_fault+0x26/0x30 [ 1.400831] RIP: 0033:0x72596d1bf703 [ 1.401076] Code: 31 f6 45 31 e4 48 8d 15 b3 73 00 00 e8 06 03 00 00 8b 83 68 01 00 00 e9 8e fa ff ff 0f 1f 00 48 8b 44 24 08 4c 89 ee 48 89 df <c7> 00 21 43 34 12 e8 72 09 00 00 e9 6a fa ff ff 0f 1f 44 00 00 e8 [ 1.402172] RSP: 002b:00007ffc350f6dc0 EFLAGS: 00010202 [ 1.402488] RAX: 0000725970e94000 RBX: 00005b7c642c2560 RCX: 0000725970d359a7 [ 1.402898] RDX: 0000000000000003 RSI: 00007ffc350f6dc0 RDI: 00005b7c642c2560 [ 1.403284] RBP: 00007ffc350f6e90 R08: 000000000000000d R09: 0000000000000000 [ 1.403634] R10: 00007ffc350f6dd8 R11: 0000000000000246 R12: 0000000000000001 [ 1.404078] R13: 00007ffc350f6dc0 R14: 0000725970e29ce0 R15: 0000000000000003 [ 1.404450] </TASK> [ 1.404570] Modules linked in: [ 1.404821] CR2: ffffd3fb40000008 [ 1.405029] ---[ end trace 0000000000000000 ]--- [ 1.405323] RIP: 0010:dax_to_folio+0x14/0x60 [ 1.405556] Code: 52 c9 c3 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 c1 ef 05 48 c1 e7 06 48 03 3d 34 b5 31 01 <48> 8b 57 08 48 89 f8 f6 c2 01 75 2b 66 90 c3 cc cc cc cc f7 c7 ff [ 1.406639] RSP: 0000:ffffaf7d04407aa8 EFLAGS: 00010086 [ 1.406910] RAX: 000000a000000000 RBX: ffffaf7d04407bb0 RCX: 0000000000000000 [ 1.407379] RDX: ffffd17b40000008 RSI: 0000000000000083 RDI: ffffd3fb40000000 [ 1.407800] RBP: 0000000000000011 R08: 000000a000000000 R09: 0000000000000000 [ 1.408246] R10: 0000000000001000 R11: ffffaf7d04407c10 R12: 0000000000000000 [ 1.408666] R13: ffffa020557be9c0 R14: 0000014000000001 R15: 0000725970e94000 [ 1.409170] FS: 000072596d6d2ec0(0000) GS:ffffa0222dc59000(0000) knlGS:0000000000000000 [ 1.409608] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.409977] CR2: ffffd3fb40000008 CR3: 000000011579c005 CR4: 0000000000372ef0 [ 1.410437] Kernel panic - not syncing: Fatal exception [ 1.410857] Kernel Offset: 0xc000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: 21aa65bf82a7 ("mm: remove callers of pfn_t functionality") Signed-off-by: Haiyue Wang <haiyuewa@163.com> Link: https://lore.kernel.org/20250904120339.972-1-haiyuewa@163.com Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-05eventpoll: Replace rwlock with spinlockNam Cao
The ready event list of an epoll object is protected by read-write semaphore: - The consumer (waiter) acquires the write lock and takes items. - the producer (waker) takes the read lock and adds items. The point of this design is enabling epoll to scale well with large number of producers, as multiple producers can hold the read lock at the same time. Unfortunately, this implementation may cause scheduling priority inversion problem. Suppose the consumer has higher scheduling priority than the producer. The consumer needs to acquire the write lock, but may be blocked by the producer holding the read lock. Since read-write semaphore does not support priority-boosting for the readers (even with CONFIG_PREEMPT_RT=y), we have a case of priority inversion: a higher priority consumer is blocked by a lower priority producer. This problem was reported in [1]. Furthermore, this could also cause stall problem, as described in [2]. Fix this problem by replacing rwlock with spinlock. This reduces the event bandwidth, as the producers now have to contend with each other for the spinlock. According to the benchmark from https://github.com/rouming/test-tools/blob/master/stress-epoll.c: On 12 x86 CPUs: Before After Diff threads events/ms events/ms 8 7162 4956 -31% 16 8733 5383 -38% 32 7968 5572 -30% 64 10652 5739 -46% 128 11236 5931 -47% On 4 riscv CPUs: Before After Diff threads events/ms events/ms 8 2958 2833 -4% 16 3323 3097 -7% 32 3451 3240 -6% 64 3554 3178 -11% 128 3601 3235 -10% Although the numbers look bad, it should be noted that this benchmark creates multiple threads who do nothing except constantly generating new epoll events, thus contention on the spinlock is high. For real workload, the event rate is likely much lower, and the performance drop is not as bad. Using another benchmark (perf bench epoll wait) where spinlock contention is lower, improvement is even observed on x86: On 12 x86 CPUs: Before: Averaged 110279 operations/sec (+- 1.09%), total secs = 8 After: Averaged 114577 operations/sec (+- 2.25%), total secs = 8 On 4 riscv CPUs: Before: Averaged 175767 operations/sec (+- 0.62%), total secs = 8 After: Averaged 167396 operations/sec (+- 0.23%), total secs = 8 In conclusion, no one is likely to be upset over this change. After all, spinlock was used originally for years, and the commit which converted to rwlock didn't mention a real workload, just that the benchmark numbers are nice. This patch is not exactly the revert of commit a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention"), because git revert conflicts in some places which are not obvious on the resolution. This patch is intended to be backported, therefore go with the obvious approach: - Replace rwlock_t with spinlock_t one to one - Delete list_add_tail_lockless() and chain_epi_lockless(). These were introduced to allow producers to concurrently add items to the list. But now that spinlock no longer allows producers to touch the event list concurrently, these two functions are not necessary anymore. Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention") Signed-off-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/ec92458ea357ec503c737ead0f10b2c6e4c37d47.1752581388.git.namcao@linutronix.de Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Cc: stable@vger.kernel.org Reported-by: Frederic Weisbecker <frederic@kernel.org> Closes: https://lore.kernel.org/linux-rt-users/20210825132754.GA895675@lothringen/ [1] Reported-by: Valentin Schneider <vschneid@redhat.com> Closes: https://lore.kernel.org/linux-rt-users/xhsmhttqvnall.mognet@vschneid.remote.csb/ [2] Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-05xfs: Replace strncpy with memcpyMarcelo Moreira
The changes modernizes the code by aligning it with current kernel best practices. It improves code clarity and consistency, as strncpy is deprecated as explained in Documentation/process/deprecated.rst. This change does not alter the functionality or introduce any behavioral changes. Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Marcelo Moreira <marcelomoreira1905@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-04cifs: prevent NULL pointer dereference in UTF16 conversionMakar Semyonov
There can be a NULL pointer dereference bug here. NULL is passed to __cifs_sfu_make_node without checks, which passes it unchecked to cifs_strndup_to_utf16, which in turn passes it to cifs_local_to_utf16_bytes where '*from' is dereferenced, causing a crash. This patch adds a check for NULL 'src' in cifs_strndup_to_utf16 and returns NULL early to prevent dereferencing NULL pointer. Found by Linux Verification Center (linuxtesting.org) with SVACE Signed-off-by: Makar Semyonov <m.semenov@tssltd.ru> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-03binfmt_elf: preserve original ELF e_flags for core dumpsSvetlana Parfenova
Some architectures, such as RISC-V, use the ELF e_flags field to encode ABI-specific information (e.g., ISA extensions, fpu support). Debuggers like GDB rely on these flags in core dumps to correctly interpret optional register sets. If the flags are missing or incorrect, GDB may warn and ignore valid data, for example: warning: Unexpected size of section '.reg2/213' in core file. This can prevent access to fpu or other architecture-specific registers even when they were dumped. Save the e_flags field during ELF binary loading (in load_elf_binary()) into the mm_struct, and later retrieve it during core dump generation (in fill_note_info()). Kconfig option CONFIG_ARCH_HAS_ELF_CORE_EFLAGS is introduced for architectures that require this behaviour. Signed-off-by: Svetlana Parfenova <svetlana.parfenova@syntacore.com> Link: https://lore.kernel.org/r/20250901135350.619485-1-svetlana.parfenova@syntacore.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-03Merge tag 'v6.17-rc4-ksmbd-fix' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fix from Steve French: - fix handling filenames with ":" (colon) in them * tag 'v6.17-rc4-ksmbd-fix' of git://git.samba.org/ksmbd: ksmbd: allow a filename to contain colons on SMB3.1.1 posix extensions
2025-09-02smb: client: show negotiated cipher in DebugDataBharath SM
Print the negotiated encryption cipher type in DebugData Signed-off-by: Bharath SM <bharathsm@microsoft.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-02smb: client: add new tracepoint to trace lease break notificationBharath SM
Add smb3_lease_break_enter to trace lease break notifications, recording lease state, flags, epoch, and lease key. Align smb3_lease_not_found to use the same payload and print format. Signed-off-by: Bharath SM <bharathsm@microsoft.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-02smb: client: fix spellings in commentsBharath SM
correct spellings in comments Signed-off-by: Bharath SM <bharathsm@microsoft.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-02Merge tag 'mm-hotfixes-stable-2025-09-01-17-20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes. 13 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 11 of these fixes are for MM. This includes a three-patch series from Harry Yoo which fixes an intermittent boot failure which can occur on x86 systems. And a two-patch series from Alexander Gordeev which fixes a KASAN crash on S390 systems" * tag 'mm-hotfixes-stable-2025-09-01-17-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: fix possible deadlock in kmemleak x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() mm: introduce and use {pgd,p4d}_populate_kernel() mm: move page table sync declarations to linux/pgtable.h proc: fix missing pde_set_flags() for net proc files mm: fix accounting of memmap pages mm/damon/core: prevent unnecessary overflow in damos_set_effective_quota() kexec: add KEXEC_FILE_NO_CMA as a legal flag kasan: fix GCC mem-intrinsic prefix with sw tags mm/kasan: avoid lazy MMU mode hazards mm/kasan: fix vmalloc shadow memory (de-)population races kunit: kasan_test: disable fortify string checker on kasan_strings() test selftests/mm: fix FORCE_READ to read input value correctly mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE ocfs2: prevent release journal inode after journal shutdown rust: mm: mark VmaNew as transparent of_numa: fix uninitialized memory nodes causing kernel panic
2025-09-02Merge tag 'for-6.17-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix a few races related to inode link count - fix inode leak on failure to add link to inode - move transaction aborts closer to where they happen * tag 'for-6.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: avoid load/store tearing races when checking if an inode was logged btrfs: fix race between setting last_dir_index_offset and inode logging btrfs: fix race between logging inode and checking if it was logged before btrfs: simplify error handling logic for btrfs_link() btrfs: fix inode leak on failure to add link to inode btrfs: abort transaction on failure to add link to inode
2025-09-02btrfs: fix subvolume deletion lockup caused by inodes xarray raceOmar Sandoval
There is a race condition between inode eviction and inode caching that can cause a live struct btrfs_inode to be missing from the root->inodes xarray. Specifically, there is a window during evict() between the inode being unhashed and deleted from the xarray. If btrfs_iget() is called for the same inode in that window, it will be recreated and inserted into the xarray, but then eviction will delete the new entry, leaving nothing in the xarray: Thread 1 Thread 2 --------------------------------------------------------------- evict() remove_inode_hash() btrfs_iget_path() btrfs_iget_locked() btrfs_read_locked_inode() btrfs_add_inode_to_root() destroy_inode() btrfs_destroy_inode() btrfs_del_inode_from_root() __xa_erase In turn, this can cause issues for subvolume deletion. Specifically, if an inode is in this lost state, and all other inodes are evicted, then btrfs_del_inode_from_root() will call btrfs_add_dead_root() prematurely. If the lost inode has a delayed_node attached to it, then when btrfs_clean_one_deleted_snapshot() calls btrfs_kill_all_delayed_nodes(), it will loop forever because the delayed_nodes xarray will never become empty (unless memory pressure forces the inode out). We saw this manifest as soft lockups in production. Fix it by only deleting the xarray entry if it matches the given inode (using __xa_cmpxchg()). Fixes: 310b2f5d5a94 ("btrfs: use an xarray to track open inodes in a root") Cc: stable@vger.kernel.org # 6.11+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Co-authored-by: Leo Martins <loemra.dev@gmail.com> Signed-off-by: Leo Martins <loemra.dev@gmail.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-02btrfs: fix corruption reading compressed range when block size is smaller ↵Qu Wenruo
than page size [BUG] With 64K page size (aarch64 with 64K page size config) and 4K btrfs block size, the following workload can easily lead to a corrupted read: mkfs.btrfs -f -s 4k $dev > /dev/null mount -o compress $dev $mnt xfs_io -f -c "pwrite -S 0xff 0 64k" $mnt/base > /dev/null echo "correct result:" od -Ad -t x1 $mnt/base xfs_io -f -c "reflink $mnt/base 32k 0 32k" \ -c "reflink $mnt/base 0 32k 32k" \ -c "pwrite -S 0xff 60k 4k" $mnt/new > /dev/null echo "incorrect result:" od -Ad -t x1 $mnt/new umount $mnt This shows the following result: correct result: 0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0065536 incorrect result: 0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0032768 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0061440 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0065536 Notice the zero in the range [32K, 60K), which is incorrect. [CAUSE] With extra trace printk, it shows the following events during od: (some unrelated info removed like CPU and context) od-3457 btrfs_do_readpage: enter r/i=5/258 folio=0(65536) prev_em_start=0000000000000000 The "r/i" is indicating the root and inode number. In our case the file "new" is using ino 258 from fs tree (root 5). Here notice the @prev_em_start pointer is NULL. This means the btrfs_do_readpage() is called from btrfs_read_folio(), not from btrfs_readahead(). od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=0 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=4096 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=8192 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=12288 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=16384 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=20480 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=24576 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=28672 got em start=0 len=32768 These above 32K blocks will be read from the first half of the compressed data extent. od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=32768 got em start=32768 len=32768 Note here there is no btrfs_submit_compressed_read() call. Which is incorrect now. Although both extent maps at 0 and 32K are pointing to the same compressed data, their offsets are different thus can not be merged into the same read. So this means the compressed data read merge check is doing something wrong. od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=36864 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=40960 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=45056 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=49152 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=53248 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=57344 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=61440 skip uptodate od-3457 btrfs_submit_compressed_read: cb orig_bio: file off=0 len=61440 The function btrfs_submit_compressed_read() is only called at the end of folio read. The compressed bio will only have an extent map of range [0, 32K), but the original bio passed in is for the whole 64K folio. This will cause the decompression part to only fill the first 32K, leaving the rest untouched (aka, filled with zero). This incorrect compressed read merge leads to the above data corruption. There were similar problems that happened in the past, commit 808f80b46790 ("Btrfs: update fix for read corruption of compressed and shared extents") is doing pretty much the same fix for readahead. But that's back to 2015, where btrfs still only supports bs (block size) == ps (page size) cases. This means btrfs_do_readpage() only needs to handle a folio which contains exactly one block. Only btrfs_readahead() can lead to a read covering multiple blocks. Thus only btrfs_readahead() passes a non-NULL @prev_em_start pointer. With v5.15 kernel btrfs introduced bs < ps support. This breaks the above assumption that a folio can only contain one block. Now btrfs_read_folio() can also read multiple blocks in one go. But btrfs_read_folio() doesn't pass a @prev_em_start pointer, thus the existing bio force submission check will never be triggered. In theory, this can also happen for btrfs with large folios, but since large folio is still experimental, we don't need to bother it, thus only bs < ps support is affected for now. [FIX] Instead of passing @prev_em_start to do the proper compressed extent check, introduce one new member, btrfs_bio_ctrl::last_em_start, so that the existing bio force submission logic will always be triggered. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-02btrfs: accept and ignore compression level for lzoCalvin Owens
The compression level is meaningless for lzo, but before commit 3f093ccb95f30 ("btrfs: harden parsing of compression mount options"), it was silently ignored if passed. After that commit, passing a level with lzo fails to mount: BTRFS error: unrecognized compression value lzo:1 It seems reasonable for users to expect that lzo would permit a numeric level option, as all the other algos do, even though the kernel's implementation of LZO currently only supports a single level. Because it has always worked to pass a level, it seems likely to me that users in the real world are relying on doing so. This patch restores the old behavior, giving "lzo:N" the same semantics as all of the other compression algos. To be clear, silly variants like "lzo:one", "lzo:the_first_option", or "lzo:armageddon" also used to work. This isn't meant to suggest that any possible mis-interpretation of mount options that once worked must continue to work forever. This is an exceptional case where it makes sense to preserve compatibility, both because the mis-interpretation is reasonable, and because nothing tangible is sacrificed. Finally update btrfs_show_options() to ignore the level of LZO, as it is only the default level without any extra meaning. Fixes: 3f093ccb95f30 ("btrfs: harden parsing of compression mount options") Reviewed-by: Daniel Vacek <neelx@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-02btrfs: fix squota compressed stats leakBoris Burkov
The following workload on a squota enabled fs: btrfs subvol create mnt/subvol # ensure subvol extents get accounted sync btrfs qgroup create 1/1 mnt btrfs qgroup assign mnt/subvol 1/1 mnt btrfs qgroup delete mnt/subvol # make the cleaner thread run btrfs filesystem sync mnt sleep 1 btrfs filesystem sync mnt btrfs qgroup destroy 1/1 mnt will fail with EBUSY. The reason is that 1/1 does the quick accounting when we assign subvol to it, gaining its exclusive usage as excl and excl_cmpr. But then when we delete subvol, the decrement happens via record_squota_delta() which does not update excl_cmpr, as squotas does not make any distinction between compressed and normal extents. Thus, we increment excl_cmpr but never decrement it, and are unable to delete 1/1. The two possible fixes are to make squota always mirror excl and excl_cmpr or to make the fast accounting separately track the plain and cmpr numbers. The latter felt cleaner to me so that is what I opted for. Fixes: 1e0e9d5771c3 ("btrfs: add helper for recording simple quota deltas") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-02procfs: add "pidns" mount optionAleksa Sarai
Since the introduction of pid namespaces, their interaction with procfs has been entirely implicit in ways that require a lot of dancing around by programs that need to construct sandboxes with different PID namespaces. Being able to explicitly specify the pid namespace to use when constructing a procfs super block will allow programs to no longer need to fork off a process which does then does unshare(2) / setns(2) and forks again in order to construct a procfs in a pidns. So, provide a "pidns" mount option which allows such users to just explicitly state which pid namespace they want that procfs instance to use. This interface can be used with fsconfig(2) either with a file descriptor or a path: fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd); fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0); or with classic mount(2) / mount(8): // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid"); As this new API is effectively shorthand for setns(2) followed by mount(2), the permission model for this mirrors pidns_install() to avoid opening up new attack surfaces by loosening the existing permission model. In order to avoid having to RCU-protect all users of proc_pid_ns() (to avoid UAFs), attempting to reconfigure an existing procfs instance's pid namespace will error out with -EBUSY. Creating new procfs instances is quite cheap, so this should not be an impediment to most users, and lets us avoid a lot of churn in fs/proc/* for a feature that it seems unlikely userspace would use. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/20250805-procfs-pidns-api-v4-2-705f984940e7@cyphar.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-02openat2: don't trigger automounts with RESOLVE_NO_XDEVAskar Safin
openat2 had a bug: if we pass RESOLVE_NO_XDEV, then openat2 doesn't traverse through automounts, but may still trigger them. (See the link for full bug report with reproducer.) This commit fixes this bug. Link: https://lore.kernel.org/linux-fsdevel/20250817075252.4137628-1-safinaskar@zohomail.com/ Fixes: fddb5d430ad9fa91b49b1 ("open: introduce openat2(2) syscall") Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Cc: stable@vger.kernel.org Signed-off-by: Askar Safin <safinaskar@zohomail.com> Link: https://lore.kernel.org/20250825181233.2464822-5-safinaskar@zohomail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-02namei: move cross-device check to __traverse_mountsAskar Safin
This is preparation to RESOLVE_NO_XDEV fix in following commits. Also this commit makes LOOKUP_NO_XDEV logic more clear: now we immediately fail with EXDEV on first mount crossing instead of waiting for very end. No functional change intended Signed-off-by: Askar Safin <safinaskar@zohomail.com> Link: https://lore.kernel.org/20250825181233.2464822-4-safinaskar@zohomail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-02namei: remove LOOKUP_NO_XDEV check from handle_mountsAskar Safin
This is preparation to RESOLVE_NO_XDEV fix in following commits. No functional change intended. The only place that ever looks at ND_JUMPED in nd->state is complete_walk() and we are not going to reach it if handle_mounts() returns an error Signed-off-by: Askar Safin <safinaskar@zohomail.com> Link: https://lore.kernel.org/20250825181233.2464822-3-safinaskar@zohomail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-02namei: move cross-device check to traverse_mountsAskar Safin
This is preparation to RESOLVE_NO_XDEV fix in following commits. No functional change intended Signed-off-by: Askar Safin <safinaskar@zohomail.com> Link: https://lore.kernel.org/20250825181233.2464822-2-safinaskar@zohomail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01copy_process: pass clone_flags as u64 across calltreeSimon Schuster
With the introduction of clone3 in commit 7f192e3cd316 ("fork: add clone3") the effective bit width of clone_flags on all architectures was increased from 32-bit to 64-bit, with a new type of u64 for the flags. However, for most consumers of clone_flags the interface was not changed from the previous type of unsigned long. While this works fine as long as none of the new 64-bit flag bits (CLONE_CLEAR_SIGHAND and CLONE_INTO_CGROUP) are evaluated, this is still undesirable in terms of the principle of least surprise. Thus, this commit fixes all relevant interfaces of callees to sys_clone3/copy_process (excluding the architecture-specific copy_thread) to consistently pass clone_flags as u64, so that no truncation to 32-bit integers occurs on 32-bit architectures. Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com> Link: https://lore.kernel.org/20250901-nios2-implement-clone3-v2-2-53fcf5577d57@siemens-energy.com Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01cramfs: Verify inode mode when loading from diskTetsuo Handa
The inode mode loaded from corrupted disk can be invalid. Do like what commit 0a9e74051313 ("isofs: Verify inode mode when loading from disk") does. Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/429b3ef1-13de-4310-9a8e-c2dc9a36234a@I-love.SAKURA.ne.jp Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01fs: remove vfs_ioctl exportGreg Kroah-Hartman
vfs_ioctl() is no longer called by anything outside of fs/ioctl.c, so remove the global symbol and export as it is not needed. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/2025083038-carving-amuck-a4ae@gregkh Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01Merge tag 'fuse-fixes-6.17-rc5' of ↵Christian Brauner
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse into vfs.fixes fuse fixes for 6.17-rc5 * tag 'fuse-fixes-6.17-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (6 commits) fuse: Block access to folio overlimit fuse: fix fuseblk i_blkbits for iomap partial writes fuse: reflect cached blocksize if blocksize was changed fuse: prevent overflow in copy_file_range return value fuse: check if copy_file_range() returns larger than requested size fuse: do not allow mapping a non-regular backing file Link: https://lore.kernel.org/CAJfpeguEVMMyw_zCb+hbOuSxdE2Z3Raw=SJsq=Y56Ae6dn2W3g@mail.gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01inode: fix whitespace issuesChristian Brauner
Fix two minor whitespace issues. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01fs: add an icount_read helperJosef Bacik
Instead of doing direct access to ->i_count, add a helper to handle this. This will make it easier to convert i_count to a refcount later. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/9bc62a84c6b9d6337781203f60837bd98fbc4a96.1756222464.git.josef@toxicpanda.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01fs: rework iput logicJosef Bacik
Currently, if we are the last iput, and we have the I_DIRTY_TIME bit set, we will grab a reference on the inode again and then mark it dirty and then redo the put. This is to make sure we delay the time update for as long as possible. We can rework this logic to simply dec i_count if it is not 1, and if it is do the time update while still holding the i_count reference. Then we can replace the atomic_dec_and_lock with locking the ->i_lock and doing atomic_dec_and_test, since we did the atomic_add_unless above. Co-developed-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/be208b89bdb650202e712ce2bcfc407ac7044c7a.1756222464.git.josef@toxicpanda.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-31hfs: clear offset and space out of valid records in b-tree nodeViacheslav Dubeyko
Currently, hfs_brec_remove() executes moving records towards the location of deleted record and it updates offsets of moved records. However, the hfs_brec_remove() logic ignores the "mess" of b-tree node's free space and it doesn't touch the offsets out of records number. Potentially, it could confuse fsck or driver logic or to be a reason of potential corruption cases. This patch reworks the logic of hfs_brec_remove() by means of clearing freed space of b-tree node after the records moving. And it clear the last offset that keeping old location of free space because now the offset before this one is keeping the actual offset to the free space after the record deletion. Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250815194918.38165-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31hfs: add logic of correcting a next unused CNIDViacheslav Dubeyko
The generic/736 xfstest fails for HFS case: BEGIN TEST default (1 test): hfs Mon May 5 03:18:32 UTC 2025 DEVICE: /dev/vdb HFS_MKFS_OPTIONS: MOUNT_OPTIONS: MOUNT_OPTIONS FSTYP -- hfs PLATFORM -- Linux/x86_64 kvm-xfstests 6.15.0-rc4-xfstests-g00b827f0cffa #1 SMP PREEMPT_DYNAMIC Fri May 25 MKFS_OPTIONS -- /dev/vdc MOUNT_OPTIONS -- /dev/vdc /vdc generic/736 [03:18:33][ 3.510255] run fstests generic/736 at 2025-05-05 03:18:33 _check_generic_filesystem: filesystem on /dev/vdb is inconsistent (see /results/hfs/results-default/generic/736.full for details) Ran: generic/736 Failures: generic/736 Failed 1 of 1 tests The HFS volume becomes corrupted after the test run: sudo fsck.hfs -d /dev/loop50 ** /dev/loop50 Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K. Executing fsck_hfs (version 540.1-Linux). ** Checking HFS volume. The volume name is untitled ** Checking extents overflow file. ** Checking catalog file. ** Checking catalog hierarchy. ** Checking volume bitmap. ** Checking volume information. invalid MDB drNxtCNID Master Directory Block needs minor repair (1, 0) Verify Status: VIStat = 0x8000, ABTStat = 0x0000 EBTStat = 0x0000 CBTStat = 0x0000 CatStat = 0x00000000 ** Repairing volume. ** Rechecking volume. ** Checking HFS volume. The volume name is untitled ** Checking extents overflow file. ** Checking catalog file. ** Checking catalog hierarchy. ** Checking volume bitmap. ** Checking volume information. ** The volume untitled was repaired successfully. The main reason of the issue is the absence of logic that corrects mdb->drNxtCNID/HFS_SB(sb)->next_id (next unused CNID) after deleting a record in Catalog File. This patch introduces a hfs_correct_next_unused_CNID() method that implements the necessary logic. In the case of Catalog File's record delete operation, the function logic checks that (deleted_CNID + 1) == next_unused_CNID and it finds/sets the new value of next_unused_CNID. sudo ./check generic/736 FSTYP -- hfs PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.15.0+ #6 SMP PREEMPT_DYNAMIC Tue Jun 10 15:02:48 PDT 2025 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/736 33s Ran: generic/736 Passed all 1 tests sudo fsck.hfs -d /dev/loop50 ** /dev/loop50 Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K. Executing fsck_hfs (version 540.1-Linux). ** Checking HFS volume. The volume name is untitled ** Checking extents overflow file. ** Checking catalog file. ** Checking catalog hierarchy. ** Checking volume bitmap. ** Checking volume information. ** The volume untitled appears to be OK Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20250610231609.551930-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31hfsplus: fix KMSAN uninit-value issue in hfsplus_delete_cat()Viacheslav Dubeyko
The syzbot reported issue in hfsplus_delete_cat(): [ 70.682285][ T9333] ===================================================== [ 70.682943][ T9333] BUG: KMSAN: uninit-value in hfsplus_subfolders_dec+0x1d7/0x220 [ 70.683640][ T9333] hfsplus_subfolders_dec+0x1d7/0x220 [ 70.684141][ T9333] hfsplus_delete_cat+0x105d/0x12b0 [ 70.684621][ T9333] hfsplus_rmdir+0x13d/0x310 [ 70.685048][ T9333] vfs_rmdir+0x5ba/0x810 [ 70.685447][ T9333] do_rmdir+0x964/0xea0 [ 70.685833][ T9333] __x64_sys_rmdir+0x71/0xb0 [ 70.686260][ T9333] x64_sys_call+0xcd8/0x3cf0 [ 70.686695][ T9333] do_syscall_64+0xd9/0x1d0 [ 70.687119][ T9333] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 70.687646][ T9333] [ 70.687856][ T9333] Uninit was stored to memory at: [ 70.688311][ T9333] hfsplus_subfolders_inc+0x1c2/0x1d0 [ 70.688779][ T9333] hfsplus_create_cat+0x148e/0x1800 [ 70.689231][ T9333] hfsplus_mknod+0x27f/0x600 [ 70.689730][ T9333] hfsplus_mkdir+0x5a/0x70 [ 70.690146][ T9333] vfs_mkdir+0x483/0x7a0 [ 70.690545][ T9333] do_mkdirat+0x3f2/0xd30 [ 70.690944][ T9333] __x64_sys_mkdir+0x9a/0xf0 [ 70.691380][ T9333] x64_sys_call+0x2f89/0x3cf0 [ 70.691816][ T9333] do_syscall_64+0xd9/0x1d0 [ 70.692229][ T9333] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 70.692773][ T9333] [ 70.692990][ T9333] Uninit was stored to memory at: [ 70.693469][ T9333] hfsplus_subfolders_inc+0x1c2/0x1d0 [ 70.693960][ T9333] hfsplus_create_cat+0x148e/0x1800 [ 70.694438][ T9333] hfsplus_fill_super+0x21c1/0x2700 [ 70.694911][ T9333] mount_bdev+0x37b/0x530 [ 70.695320][ T9333] hfsplus_mount+0x4d/0x60 [ 70.695729][ T9333] legacy_get_tree+0x113/0x2c0 [ 70.696167][ T9333] vfs_get_tree+0xb3/0x5c0 [ 70.696588][ T9333] do_new_mount+0x73e/0x1630 [ 70.697013][ T9333] path_mount+0x6e3/0x1eb0 [ 70.697425][ T9333] __se_sys_mount+0x733/0x830 [ 70.697857][ T9333] __x64_sys_mount+0xe4/0x150 [ 70.698269][ T9333] x64_sys_call+0x2691/0x3cf0 [ 70.698704][ T9333] do_syscall_64+0xd9/0x1d0 [ 70.699117][ T9333] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 70.699730][ T9333] [ 70.699946][ T9333] Uninit was created at: [ 70.700378][ T9333] __alloc_pages_noprof+0x714/0xe60 [ 70.700843][ T9333] alloc_pages_mpol_noprof+0x2a2/0x9b0 [ 70.701331][ T9333] alloc_pages_noprof+0xf8/0x1f0 [ 70.701774][ T9333] allocate_slab+0x30e/0x1390 [ 70.702194][ T9333] ___slab_alloc+0x1049/0x33a0 [ 70.702635][ T9333] kmem_cache_alloc_lru_noprof+0x5ce/0xb20 [ 70.703153][ T9333] hfsplus_alloc_inode+0x5a/0xd0 [ 70.703598][ T9333] alloc_inode+0x82/0x490 [ 70.703984][ T9333] iget_locked+0x22e/0x1320 [ 70.704428][ T9333] hfsplus_iget+0x5c/0xba0 [ 70.704827][ T9333] hfsplus_btree_open+0x135/0x1dd0 [ 70.705291][ T9333] hfsplus_fill_super+0x1132/0x2700 [ 70.705776][ T9333] mount_bdev+0x37b/0x530 [ 70.706171][ T9333] hfsplus_mount+0x4d/0x60 [ 70.706579][ T9333] legacy_get_tree+0x113/0x2c0 [ 70.707019][ T9333] vfs_get_tree+0xb3/0x5c0 [ 70.707444][ T9333] do_new_mount+0x73e/0x1630 [ 70.707865][ T9333] path_mount+0x6e3/0x1eb0 [ 70.708270][ T9333] __se_sys_mount+0x733/0x830 [ 70.708711][ T9333] __x64_sys_mount+0xe4/0x150 [ 70.709158][ T9333] x64_sys_call+0x2691/0x3cf0 [ 70.709630][ T9333] do_syscall_64+0xd9/0x1d0 [ 70.710053][ T9333] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 70.710611][ T9333] [ 70.710842][ T9333] CPU: 3 UID: 0 PID: 9333 Comm: repro Not tainted 6.12.0-rc6-dirty #17 [ 70.711568][ T9333] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 70.712490][ T9333] ===================================================== [ 70.713085][ T9333] Disabling lock debugging due to kernel taint [ 70.713618][ T9333] Kernel panic - not syncing: kmsan.panic set ... [ 70.714159][ T9333] CPU: 3 UID: 0 PID: 9333 Comm: repro Tainted: G B 6.12.0-rc6-dirty #17 [ 70.715007][ T9333] Tainted: [B]=BAD_PAGE [ 70.715365][ T9333] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 70.716311][ T9333] Call Trace: [ 70.716621][ T9333] <TASK> [ 70.716899][ T9333] dump_stack_lvl+0x1fd/0x2b0 [ 70.717350][ T9333] dump_stack+0x1e/0x30 [ 70.717743][ T9333] panic+0x502/0xca0 [ 70.718116][ T9333] ? kmsan_get_metadata+0x13e/0x1c0 [ 70.718611][ T9333] kmsan_report+0x296/0x2a0 [ 70.719038][ T9333] ? __msan_metadata_ptr_for_load_4+0x24/0x40 [ 70.719859][ T9333] ? __msan_warning+0x96/0x120 [ 70.720345][ T9333] ? hfsplus_subfolders_dec+0x1d7/0x220 [ 70.720881][ T9333] ? hfsplus_delete_cat+0x105d/0x12b0 [ 70.721412][ T9333] ? hfsplus_rmdir+0x13d/0x310 [ 70.721880][ T9333] ? vfs_rmdir+0x5ba/0x810 [ 70.722458][ T9333] ? do_rmdir+0x964/0xea0 [ 70.722883][ T9333] ? __x64_sys_rmdir+0x71/0xb0 [ 70.723397][ T9333] ? x64_sys_call+0xcd8/0x3cf0 [ 70.723915][ T9333] ? do_syscall_64+0xd9/0x1d0 [ 70.724454][ T9333] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 70.725110][ T9333] ? vprintk_emit+0xd1f/0xe60 [ 70.725616][ T9333] ? vprintk_default+0x3f/0x50 [ 70.726175][ T9333] ? vprintk+0xce/0xd0 [ 70.726628][ T9333] ? _printk+0x17e/0x1b0 [ 70.727129][ T9333] ? __msan_metadata_ptr_for_load_4+0x24/0x40 [ 70.727739][ T9333] ? kmsan_get_metadata+0x13e/0x1c0 [ 70.728324][ T9333] __msan_warning+0x96/0x120 [ 70.728854][ T9333] hfsplus_subfolders_dec+0x1d7/0x220 [ 70.729479][ T9333] hfsplus_delete_cat+0x105d/0x12b0 [ 70.729984][ T9333] ? kmsan_get_shadow_origin_ptr+0x4a/0xb0 [ 70.730646][ T9333] ? __msan_metadata_ptr_for_load_4+0x24/0x40 [ 70.731296][ T9333] ? kmsan_get_metadata+0x13e/0x1c0 [ 70.731863][ T9333] hfsplus_rmdir+0x13d/0x310 [ 70.732390][ T9333] ? __pfx_hfsplus_rmdir+0x10/0x10 [ 70.732919][ T9333] vfs_rmdir+0x5ba/0x810 [ 70.733416][ T9333] ? kmsan_get_shadow_origin_ptr+0x4a/0xb0 [ 70.734044][ T9333] do_rmdir+0x964/0xea0 [ 70.734537][ T9333] __x64_sys_rmdir+0x71/0xb0 [ 70.735032][ T9333] x64_sys_call+0xcd8/0x3cf0 [ 70.735579][ T9333] do_syscall_64+0xd9/0x1d0 [ 70.736092][ T9333] ? irqentry_exit+0x16/0x60 [ 70.736637][ T9333] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 70.737269][ T9333] RIP: 0033:0x7fa9424eafc9 [ 70.737775][ T9333] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48 [ 70.739844][ T9333] RSP: 002b:00007fff099cd8d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000054 [ 70.740760][ T9333] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9424eafc9 [ 70.741642][ T9333] RDX: 006c6f72746e6f63 RSI: 000000000000000a RDI: 0000000020000100 [ 70.742543][ T9333] RBP: 00007fff099cd8e0 R08: 00007fff099cd910 R09: 00007fff099cd910 [ 70.743376][ T9333] R10: 0000000000000000 R11: 0000000000000202 R12: 0000565430642260 [ 70.744247][ T9333] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 70.745082][ T9333] </TASK> The main reason of the issue that struct hfsplus_inode_info has not been properly initialized for the case of root folder. In the case of root folder, hfsplus_fill_super() calls the hfsplus_iget() that implements only partial initialization of struct hfsplus_inode_info and subfolders field is not initialized by hfsplus_iget() logic. This patch implements complete initialization of struct hfsplus_inode_info in the hfsplus_iget() logic with the goal to prevent likewise issues for the case of root folder. Reported-by: syzbot <syzbot+fdedff847a0e5e84c39f@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=fdedff847a0e5e84c39f Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250825225103.326401-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31hfs: fix KMSAN uninit-value issue in hfs_find_set_zero_bits()Viacheslav Dubeyko
The syzbot reported issue in hfs_find_set_zero_bits(): ===================================================== BUG: KMSAN: uninit-value in hfs_find_set_zero_bits+0x74d/0xb60 fs/hfs/bitmap.c:45 hfs_find_set_zero_bits+0x74d/0xb60 fs/hfs/bitmap.c:45 hfs_vbm_search_free+0x13c/0x5b0 fs/hfs/bitmap.c:151 hfs_extend_file+0x6a5/0x1b00 fs/hfs/extent.c:408 hfs_get_block+0x435/0x1150 fs/hfs/extent.c:353 __block_write_begin_int+0xa76/0x3030 fs/buffer.c:2151 block_write_begin fs/buffer.c:2262 [inline] cont_write_begin+0x10e1/0x1bc0 fs/buffer.c:2601 hfs_write_begin+0x85/0x130 fs/hfs/inode.c:52 cont_expand_zero fs/buffer.c:2528 [inline] cont_write_begin+0x35a/0x1bc0 fs/buffer.c:2591 hfs_write_begin+0x85/0x130 fs/hfs/inode.c:52 hfs_file_truncate+0x1d6/0xe60 fs/hfs/extent.c:494 hfs_inode_setattr+0x964/0xaa0 fs/hfs/inode.c:654 notify_change+0x1993/0x1aa0 fs/attr.c:552 do_truncate+0x28f/0x310 fs/open.c:68 do_ftruncate+0x698/0x730 fs/open.c:195 do_sys_ftruncate fs/open.c:210 [inline] __do_sys_ftruncate fs/open.c:215 [inline] __se_sys_ftruncate fs/open.c:213 [inline] __x64_sys_ftruncate+0x11b/0x250 fs/open.c:213 x64_sys_call+0xfe3/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:78 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4154 [inline] slab_alloc_node mm/slub.c:4197 [inline] __kmalloc_cache_noprof+0x7f7/0xed0 mm/slub.c:4354 kmalloc_noprof include/linux/slab.h:905 [inline] hfs_mdb_get+0x1cc8/0x2a90 fs/hfs/mdb.c:175 hfs_fill_super+0x3d0/0xb80 fs/hfs/super.c:337 get_tree_bdev_flags+0x6e3/0x920 fs/super.c:1681 get_tree_bdev+0x38/0x50 fs/super.c:1704 hfs_get_tree+0x35/0x40 fs/hfs/super.c:388 vfs_get_tree+0xb0/0x5c0 fs/super.c:1804 do_new_mount+0x738/0x1610 fs/namespace.c:3902 path_mount+0x6db/0x1e90 fs/namespace.c:4226 do_mount fs/namespace.c:4239 [inline] __do_sys_mount fs/namespace.c:4450 [inline] __se_sys_mount+0x6eb/0x7d0 fs/namespace.c:4427 __x64_sys_mount+0xe4/0x150 fs/namespace.c:4427 x64_sys_call+0xfa7/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:166 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 1 UID: 0 PID: 12609 Comm: syz.1.2692 Not tainted 6.16.0-syzkaller #0 PREEMPT(none) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 ===================================================== The HFS_SB(sb)->bitmap buffer is allocated in hfs_mdb_get(): HFS_SB(sb)->bitmap = kmalloc(8192, GFP_KERNEL); Finally, it can trigger the reported issue because kmalloc() doesn't clear the allocated memory. If allocated memory contains only zeros, then everything will work pretty fine. But if the allocated memory contains the "garbage", then it can affect the bitmap operations and it triggers the reported issue. This patch simply exchanges the kmalloc() on kzalloc() with the goal to guarantee the correctness of bitmap operations. Because, newly created allocation bitmap should have all available blocks free. Potentially, initialization bitmap's read operation could not fill the whole allocated memory and "garbage" in the not initialized memory will be the reason of volume coruptions and file system driver bugs. Reported-by: syzbot <syzbot+773fa9d79b29bd8b6831@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=773fa9d79b29bd8b6831 Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250820230636.179085-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31hfs: make proper initalization of struct hfs_find_dataViacheslav Dubeyko
Potenatially, __hfs_ext_read_extent() could operate by not initialized values of fd->key after hfs_brec_find() call: static inline int __hfs_ext_read_extent(struct hfs_find_data *fd, struct hfs_extent *extent, u32 cnid, u32 block, u8 type) { int res; hfs_ext_build_key(fd->search_key, cnid, block, type); fd->key->ext.FNum = 0; res = hfs_brec_find(fd); if (res && res != -ENOENT) return res; if (fd->key->ext.FNum != fd->search_key->ext.FNum || fd->key->ext.FkType != fd->search_key->ext.FkType) return -ENOENT; if (fd->entrylength != sizeof(hfs_extent_rec)) return -EIO; hfs_bnode_read(fd->bnode, extent, fd->entryoffset, sizeof(hfs_extent_rec)); return 0; } This patch changes kmalloc() on kzalloc() in hfs_find_init() and intializes fd->record, fd->keyoffset, fd->keylength, fd->entryoffset, fd->entrylength for the case if hfs_brec_find() has been found nothing in the b-tree node. Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250818225252.126427-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31hfsplus: fix KMSAN uninit-value issue in __hfsplus_ext_cache_extent()Viacheslav Dubeyko
The syzbot reported issue in __hfsplus_ext_cache_extent(): [ 70.194323][ T9350] BUG: KMSAN: uninit-value in __hfsplus_ext_cache_extent+0x7d0/0x990 [ 70.195022][ T9350] __hfsplus_ext_cache_extent+0x7d0/0x990 [ 70.195530][ T9350] hfsplus_file_extend+0x74f/0x1cf0 [ 70.195998][ T9350] hfsplus_get_block+0xe16/0x17b0 [ 70.196458][ T9350] __block_write_begin_int+0x962/0x2ce0 [ 70.196959][ T9350] cont_write_begin+0x1000/0x1950 [ 70.197416][ T9350] hfsplus_write_begin+0x85/0x130 [ 70.197873][ T9350] generic_perform_write+0x3e8/0x1060 [ 70.198374][ T9350] __generic_file_write_iter+0x215/0x460 [ 70.198892][ T9350] generic_file_write_iter+0x109/0x5e0 [ 70.199393][ T9350] vfs_write+0xb0f/0x14e0 [ 70.199771][ T9350] ksys_write+0x23e/0x490 [ 70.200149][ T9350] __x64_sys_write+0x97/0xf0 [ 70.200570][ T9350] x64_sys_call+0x3015/0x3cf0 [ 70.201065][ T9350] do_syscall_64+0xd9/0x1d0 [ 70.201506][ T9350] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 70.202054][ T9350] [ 70.202279][ T9350] Uninit was created at: [ 70.202693][ T9350] __kmalloc_noprof+0x621/0xf80 [ 70.203149][ T9350] hfsplus_find_init+0x8d/0x1d0 [ 70.203602][ T9350] hfsplus_file_extend+0x6ca/0x1cf0 [ 70.204087][ T9350] hfsplus_get_block+0xe16/0x17b0 [ 70.204561][ T9350] __block_write_begin_int+0x962/0x2ce0 [ 70.205074][ T9350] cont_write_begin+0x1000/0x1950 [ 70.205547][ T9350] hfsplus_write_begin+0x85/0x130 [ 70.206017][ T9350] generic_perform_write+0x3e8/0x1060 [ 70.206519][ T9350] __generic_file_write_iter+0x215/0x460 [ 70.207042][ T9350] generic_file_write_iter+0x109/0x5e0 [ 70.207552][ T9350] vfs_write+0xb0f/0x14e0 [ 70.207961][ T9350] ksys_write+0x23e/0x490 [ 70.208375][ T9350] __x64_sys_write+0x97/0xf0 [ 70.208810][ T9350] x64_sys_call+0x3015/0x3cf0 [ 70.209255][ T9350] do_syscall_64+0xd9/0x1d0 [ 70.209680][ T9350] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 70.210230][ T9350] [ 70.210454][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Not tainted 6.12.0-rc5 #5 [ 70.211174][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 70.212115][ T9350] ===================================================== [ 70.212734][ T9350] Disabling lock debugging due to kernel taint [ 70.213284][ T9350] Kernel panic - not syncing: kmsan.panic set ... [ 70.213858][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Tainted: G B 6.12.0-rc5 #5 [ 70.214679][ T9350] Tainted: [B]=BAD_PAGE [ 70.215057][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 70.215999][ T9350] Call Trace: [ 70.216309][ T9350] <TASK> [ 70.216585][ T9350] dump_stack_lvl+0x1fd/0x2b0 [ 70.217025][ T9350] dump_stack+0x1e/0x30 [ 70.217421][ T9350] panic+0x502/0xca0 [ 70.217803][ T9350] ? kmsan_get_metadata+0x13e/0x1c0 [ 70.218294][ Message fromT sy9350] kmsan_report+0x296/slogd@syzkaller 0x2aat Aug 18 22:11:058 ... kernel :[ 70.213284][ T9350] Kernel panic - not syncing: kmsan.panic [ 70.220179][ T9350] ? kmsan_get_metadata+0x13e/0x1c0 set ... [ 70.221254][ T9350] ? __msan_warning+0x96/0x120 [ 70.222066][ T9350] ? __hfsplus_ext_cache_extent+0x7d0/0x990 [ 70.223023][ T9350] ? hfsplus_file_extend+0x74f/0x1cf0 [ 70.224120][ T9350] ? hfsplus_get_block+0xe16/0x17b0 [ 70.224946][ T9350] ? __block_write_begin_int+0x962/0x2ce0 [ 70.225756][ T9350] ? cont_write_begin+0x1000/0x1950 [ 70.226337][ T9350] ? hfsplus_write_begin+0x85/0x130 [ 70.226852][ T9350] ? generic_perform_write+0x3e8/0x1060 [ 70.227405][ T9350] ? __generic_file_write_iter+0x215/0x460 [ 70.227979][ T9350] ? generic_file_write_iter+0x109/0x5e0 [ 70.228540][ T9350] ? vfs_write+0xb0f/0x14e0 [ 70.228997][ T9350] ? ksys_write+0x23e/0x490 [ 70.229458][ T9350] ? __x64_sys_write+0x97/0xf0 [ 70.229939][ T9350] ? x64_sys_call+0x3015/0x3cf0 [ 70.230432][ T9350] ? do_syscall_64+0xd9/0x1d0 [ 70.230941][ T9350] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 70.231926][ T9350] ? kmsan_get_metadata+0x13e/0x1c0 [ 70.232738][ T9350] ? kmsan_internal_set_shadow_origin+0x77/0x110 [ 70.233711][ T9350] ? kmsan_get_metadata+0x13e/0x1c0 [ 70.234516][ T9350] ? kmsan_get_shadow_origin_ptr+0x4a/0xb0 [ 70.235398][ T9350] ? __msan_metadata_ptr_for_load_4+0x24/0x40 [ 70.236323][ T9350] ? hfsplus_brec_find+0x218/0x9f0 [ 70.237090][ T9350] ? __pfx_hfs_find_rec_by_key+0x10/0x10 [ 70.237938][ T9350] ? __msan_instrument_asm_store+0xbf/0xf0 [ 70.238827][ T9350] ? __msan_metadata_ptr_for_store_4+0x27/0x40 [ 70.239772][ T9350] ? __hfsplus_ext_write_extent+0x536/0x620 [ 70.240666][ T9350] ? kmsan_get_metadata+0x13e/0x1c0 [ 70.241175][ T9350] __msan_warning+0x96/0x120 [ 70.241645][ T9350] __hfsplus_ext_cache_extent+0x7d0/0x990 [ 70.242223][ T9350] hfsplus_file_extend+0x74f/0x1cf0 [ 70.242748][ T9350] hfsplus_get_block+0xe16/0x17b0 [ 70.243255][ T9350] ? kmsan_internal_set_shadow_origin+0x77/0x110 [ 70.243878][ T9350] ? kmsan_get_metadata+0x13e/0x1c0 [ 70.244400][ T9350] ? kmsan_get_shadow_origin_ptr+0x4a/0xb0 [ 70.244967][ T9350] __block_write_begin_int+0x962/0x2ce0 [ 70.245531][ T9350] ? __pfx_hfsplus_get_block+0x10/0x10 [ 70.246079][ T9350] cont_write_begin+0x1000/0x1950 [ 70.246598][ T9350] hfsplus_write_begin+0x85/0x130 [ 70.247105][ T9350] ? __pfx_hfsplus_get_block+0x10/0x10 [ 70.247650][ T9350] ? __pfx_hfsplus_write_begin+0x10/0x10 [ 70.248211][ T9350] generic_perform_write+0x3e8/0x1060 [ 70.248752][ T9350] __generic_file_write_iter+0x215/0x460 [ 70.249314][ T9350] generic_file_write_iter+0x109/0x5e0 [ 70.249856][ T9350] ? kmsan_internal_set_shadow_origin+0x77/0x110 [ 70.250487][ T9350] vfs_write+0xb0f/0x14e0 [ 70.250930][ T9350] ? __pfx_generic_file_write_iter+0x10/0x10 [ 70.251530][ T9350] ksys_write+0x23e/0x490 [ 70.251974][ T9350] __x64_sys_write+0x97/0xf0 [ 70.252450][ T9350] x64_sys_call+0x3015/0x3cf0 [ 70.252924][ T9350] do_syscall_64+0xd9/0x1d0 [ 70.253384][ T9350] ? irqentry_exit+0x16/0x60 [ 70.253844][ T9350] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 70.254430][ T9350] RIP: 0033:0x7f7a92adffc9 [ 70.254873][ T9350] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48 [ 70.256674][ T9350] RSP: 002b:00007fff0bca3188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 70.257485][ T9350] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7a92adffc9 [ 70.258246][ T9350] RDX: 000000000208e24b RSI: 0000000020000100 RDI: 0000000000000004 [ 70.258998][ T9350] RBP: 00007fff0bca31a0 R08: 00007fff0bca31a0 R09: 00007fff0bca31a0 [ 70.259769][ T9350] R10: 0000000000000000 R11: 0000000000000202 R12: 000055e0d75f8250 [ 70.260520][ T9350] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 70.261286][ T9350] </TASK> [ 70.262026][ T9350] Kernel Offset: disabled (gdb) l *__hfsplus_ext_cache_extent+0x7d0 0xffffffff8318aef0 is in __hfsplus_ext_cache_extent (fs/hfsplus/extents.c:168). 163 fd->key->ext.cnid = 0; 164 res = hfs_brec_find(fd, hfs_find_rec_by_key); 165 if (res && res != -ENOENT) 166 return res; 167 if (fd->key->ext.cnid != fd->search_key->ext.cnid || 168 fd->key->ext.fork_type != fd->search_key->ext.fork_type) 169 return -ENOENT; 170 if (fd->entrylength != sizeof(hfsplus_extent_rec)) 171 return -EIO; 172 hfs_bnode_read(fd->bnode, extent, fd->entryoffset, The __hfsplus_ext_cache_extent() calls __hfsplus_ext_read_extent(): res = __hfsplus_ext_read_extent(fd, hip->cached_extents, inode->i_ino, block, HFSPLUS_IS_RSRC(inode) ? HFSPLUS_TYPE_RSRC : HFSPLUS_TYPE_DATA); And if inode->i_ino could be equal to zero or any non-available CNID, then hfs_brec_find() could not find the record in the tree. As a result, fd->key could be compared with fd->search_key. But hfsplus_find_init() uses kmalloc() for fd->key and fd->search_key allocation: int hfs_find_init(struct hfs_btree *tree, struct hfs_find_data *fd) { <skipped> ptr = kmalloc(tree->max_key_len * 2 + 4, GFP_KERNEL); if (!ptr) return -ENOMEM; fd->search_key = ptr; fd->key = ptr + tree->max_key_len + 2; <skipped> } Finally, fd->key is still not initialized if hfs_brec_find() has found nothing. This patch changes kmalloc() on kzalloc() in hfs_find_init() and intializes fd->record, fd->keyoffset, fd->keylength, fd->entryoffset, fd->entrylength for the case if hfs_brec_find() has been found nothing in the b-tree node. Reported-by: syzbot <syzbot+55ad87f38795d6787521@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=55ad87f38795d6787521 Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250818225232.126402-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31hfs: validate record offset in hfsplus_bmap_allocYang Chenzhi
hfsplus_bmap_alloc can trigger a crash if a record offset or length is larger than node_size [ 15.264282] BUG: KASAN: slab-out-of-bounds in hfsplus_bmap_alloc+0x887/0x8b0 [ 15.265192] Read of size 8 at addr ffff8881085ca188 by task test/183 [ 15.265949] [ 15.266163] CPU: 0 UID: 0 PID: 183 Comm: test Not tainted 6.17.0-rc2-gc17b750b3ad9 #14 PREEMPT(voluntary) [ 15.266165] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.266167] Call Trace: [ 15.266168] <TASK> [ 15.266169] dump_stack_lvl+0x53/0x70 [ 15.266173] print_report+0xd0/0x660 [ 15.266181] kasan_report+0xce/0x100 [ 15.266185] hfsplus_bmap_alloc+0x887/0x8b0 [ 15.266208] hfs_btree_inc_height.isra.0+0xd5/0x7c0 [ 15.266217] hfsplus_brec_insert+0x870/0xb00 [ 15.266222] __hfsplus_ext_write_extent+0x428/0x570 [ 15.266225] __hfsplus_ext_cache_extent+0x5e/0x910 [ 15.266227] hfsplus_ext_read_extent+0x1b2/0x200 [ 15.266233] hfsplus_file_extend+0x5a7/0x1000 [ 15.266237] hfsplus_get_block+0x12b/0x8c0 [ 15.266238] __block_write_begin_int+0x36b/0x12c0 [ 15.266251] block_write_begin+0x77/0x110 [ 15.266252] cont_write_begin+0x428/0x720 [ 15.266259] hfsplus_write_begin+0x51/0x100 [ 15.266262] cont_write_begin+0x272/0x720 [ 15.266270] hfsplus_write_begin+0x51/0x100 [ 15.266274] generic_perform_write+0x321/0x750 [ 15.266285] generic_file_write_iter+0xc3/0x310 [ 15.266289] __kernel_write_iter+0x2fd/0x800 [ 15.266296] dump_user_range+0x2ea/0x910 [ 15.266301] elf_core_dump+0x2a94/0x2ed0 [ 15.266320] vfs_coredump+0x1d85/0x45e0 [ 15.266349] get_signal+0x12e3/0x1990 [ 15.266357] arch_do_signal_or_restart+0x89/0x580 [ 15.266362] irqentry_exit_to_user_mode+0xab/0x110 [ 15.266364] asm_exc_page_fault+0x26/0x30 [ 15.266366] RIP: 0033:0x41bd35 [ 15.266367] Code: bc d1 f3 0f 7f 27 f3 0f 7f 6f 10 f3 0f 7f 77 20 f3 0f 7f 7f 30 49 83 c0 0f 49 29 d0 48 8d 7c 17 31 e9 9f 0b 00 00 66 0f ef c0 <f3> 0f 6f 0e f3 0f 6f 56 10 66 0f 74 c1 66 0f d7 d0 49 83 f8f [ 15.266369] RSP: 002b:00007ffc9e62d078 EFLAGS: 00010283 [ 15.266371] RAX: 00007ffc9e62d100 RBX: 0000000000000000 RCX: 0000000000000000 [ 15.266372] RDX: 00000000000000e0 RSI: 0000000000000000 RDI: 00007ffc9e62d100 [ 15.266373] RBP: 0000400000000040 R08: 00000000000000e0 R09: 0000000000000000 [ 15.266374] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 15.266375] R13: 0000000000000000 R14: 0000000000000000 R15: 0000400000000000 [ 15.266376] </TASK> When calling hfsplus_bmap_alloc to allocate a free node, this function first retrieves the bitmap from header node and map node using node->page together with the offset and length from hfs_brec_lenoff ``` len = hfs_brec_lenoff(node, 2, &off16); off = off16; off += node->page_offset; pagep = node->page + (off >> PAGE_SHIFT); data = kmap_local_page(*pagep); ``` However, if the retrieved offset or length is invalid(i.e. exceeds node_size), the code may end up accessing pages outside the allocated range for this node. This patch adds proper validation of both offset and length before use, preventing out-of-bounds page access. Move is_bnode_offset_valid and check_and_correct_requested_length to hfsplus_fs.h, as they may be required by other functions. Reported-by: syzbot+356aed408415a56543cd@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67bcb4a6.050a0220.bbfd1.008f.GAE@google.com/ Signed-off-by: Yang Chenzhi <yang.chenzhi@vivo.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20250818141734.8559-2-yang.chenzhi@vivo.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31hfsplus: return EIO when type of hidden directory mismatch in ↵Yangtao Li
hfsplus_fill_super() If Catalog File contains corrupted record for the case of hidden directory's type, regard it as I/O error instead of Invalid argument. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20250805165905.3390154-1-frank.li@vivo.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>