Age | Commit message (Collapse) | Author |
|
Pull smb client fixes from Steve French:
- Fix two potential NULL pointer references
- Two debugging improvements (to help debug recent issues) a new
tracepoint, and minor improvement to DebugData
- Trivial comment cleanup
* tag '6.17-RC4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: prevent NULL pointer dereference in UTF16 conversion
smb: client: show negotiated cipher in DebugData
smb: client: add new tracepoint to trace lease break notification
smb: client: fix spellings in comments
smb: client: Fix NULL pointer dereference in cifs_debug_dirs_proc_show()
|
|
Commit 15ae0410c37a79 ("btrfs-progs: add error handling for
device_get_partition_size_fd_stat()") in btrfs-progs inadvertently
changed it so that if the BLKGETSIZE64 ioctl on a block device returned
a size of 0, this was no longer seen as an error condition.
Unfortunately this is how disconnected NBD devices behave, meaning that
with btrfs-progs 6.16 it's now possible to add a device you can't
remove:
# btrfs device add /dev/nbd0 /root/temp
# btrfs device remove /dev/nbd0 /root/temp
ERROR: error removing device '/dev/nbd0': Invalid argument
This check should always have been done kernel-side anyway, so add a
check in btrfs_init_new_device() that the new device doesn't have a size
less than BTRFS_DEVICE_RANGE_RESERVED (i.e. 1 MB).
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Online fsck has been a part of upstream for over a year now without any
serious problems. Turn it on by default in time for the 2025 LTS
kernel, and get rid of the "say N if unsure" messages for the default Y
options.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
|
Don't roll the whole transaction after every extent, that's rather
inefficient.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
These sysctl knobs were scheduled for removal in September 2025. That
time has come, so remove them.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
|
Delete XREAP_MAX_BINVAL and XREAP_MAX_DEFER_CHAIN because the reap code
now calculates those limits dynamically, so they're no longer needed.
Move the third limit (XREP_MAX_ITRUNCATE_EFIS) to the one file that uses
it. Note that the btree rebuilding code should reserve exactly the
number of blocks needed to rebuild a btree, so it is rare that the newbt
code will need to add any EFIs to the commit transaction. That's why
that static limit remains.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
These four mount options were scheduled for removal in September 2025,
so remove them now.
Cc: preichl@redhat.com
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
|
We promised to turn off these old features by default in September 2025.
Do so now.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
|
Reaping file fork mappings is a little different -- log recovery can
free the blocks for us, so we only try to process a single mapping at a
time. Therefore, we only need to figure out the maximum number of
blocks that we can invalidate in a single transaction.
The rough calculation here is:
nr_extents = (logres - reservation used by any one step) /
(space used per binval)
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Calculate the maximum number of CoW staging extents that can be reaped
in a single transaction chain. The rough calculation here is:
nr_extents = (logres - reservation used by any one step) /
(space used by intents per extent)
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Calculate the maximum number of CoW staging extents that can be reaped
in a single transaction chain. The rough calculation here is:
nr_extents = (logres - reservation used by any one step) /
(space used by intents per extent +
space used for a few buffer invalidations)
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Calculate the maximum number of extents that can be reaped in a single
transaction chain, and the number of buffers that can be invalidated in
a single transaction. The rough calculation here is:
nr_extents = (logres - reservation used by any one step) /
(space used by intents per extent +
space used per binval)
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Convert the file fork reaping code to use struct xreap_state so that we
can reuse the dynamic state tracking code.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The online repair block reaping code employs static limits to decide if
it's time to roll the transaction or finish the deferred item chains to
avoid overflowing the scrub transaction's reservation. However, the
use of static limits aren't great -- btree blocks are assumed to be
scattered around the AG and the buffers need to be invalidated, whereas
COW staging extents are usually contiguous and do not have buffers. We
would like to configure the limits dynamically.
To get ready for this, reorganize struct xreap_state to store dynamic
limits, and add helpers to hide some of the details of how the limits
are enforced. Also rename the "xreap roll" functions to include the
word "binval" because they only exist to decide when we should roll the
transaction to deal with buffer invalidations.
No functional changes intended here.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
When we're removing rmap records for crosslinked blocks, use deferred
intent items so that we can try to free/unmap as many of the old data
structure's blocks as we can in the same transaction as the commit.
Cc: <stable@vger.kernel.org> # v6.6
Fixes: 1c7ce115e52106 ("xfs: reap large AG metadata extents when possible")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The commit ced17ee32a99 ("Revert "virtio: reject shm region if length is zero"")
exposes the following DAX page fault bug (this fix the failure that getting shm
region alway returns false because of zero length):
The commit 21aa65bf82a7 ("mm: remove callers of pfn_t functionality") handles
the DAX physical page address incorrectly: the removed macro 'phys_to_pfn_t()'
should be replaced with 'PHYS_PFN()'.
[ 1.390321] BUG: unable to handle page fault for address: ffffd3fb40000008
[ 1.390875] #PF: supervisor read access in kernel mode
[ 1.391257] #PF: error_code(0x0000) - not-present page
[ 1.391509] PGD 0 P4D 0
[ 1.391626] Oops: Oops: 0000 [#1] SMP NOPTI
[ 1.391806] CPU: 6 UID: 1000 PID: 162 Comm: weston Not tainted 6.17.0-rc3-WSL2-STABLE #2 PREEMPT(none)
[ 1.392361] RIP: 0010:dax_to_folio+0x14/0x60
[ 1.392653] Code: 52 c9 c3 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 c1 ef 05 48 c1 e7 06 48 03 3d 34 b5 31 01 <48> 8b 57 08 48 89 f8 f6 c2 01 75 2b 66 90 c3 cc cc cc cc f7 c7 ff
[ 1.393727] RSP: 0000:ffffaf7d04407aa8 EFLAGS: 00010086
[ 1.394003] RAX: 000000a000000000 RBX: ffffaf7d04407bb0 RCX: 0000000000000000
[ 1.394524] RDX: ffffd17b40000008 RSI: 0000000000000083 RDI: ffffd3fb40000000
[ 1.394967] RBP: 0000000000000011 R08: 000000a000000000 R09: 0000000000000000
[ 1.395400] R10: 0000000000001000 R11: ffffaf7d04407c10 R12: 0000000000000000
[ 1.395806] R13: ffffa020557be9c0 R14: 0000014000000001 R15: 0000725970e94000
[ 1.396268] FS: 000072596d6d2ec0(0000) GS:ffffa0222dc59000(0000) knlGS:0000000000000000
[ 1.396715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.397100] CR2: ffffd3fb40000008 CR3: 000000011579c005 CR4: 0000000000372ef0
[ 1.397518] Call Trace:
[ 1.397663] <TASK>
[ 1.397900] dax_insert_entry+0x13b/0x390
[ 1.398179] dax_fault_iter+0x2a5/0x6c0
[ 1.398443] dax_iomap_pte_fault+0x193/0x3c0
[ 1.398750] __fuse_dax_fault+0x8b/0x270
[ 1.398997] ? vm_mmap_pgoff+0x161/0x210
[ 1.399175] __do_fault+0x30/0x180
[ 1.399360] do_fault+0xc4/0x550
[ 1.399547] __handle_mm_fault+0x8e3/0xf50
[ 1.399731] ? do_syscall_64+0x72/0x1e0
[ 1.399958] handle_mm_fault+0x192/0x2f0
[ 1.400204] do_user_addr_fault+0x20e/0x700
[ 1.400418] exc_page_fault+0x66/0x150
[ 1.400602] asm_exc_page_fault+0x26/0x30
[ 1.400831] RIP: 0033:0x72596d1bf703
[ 1.401076] Code: 31 f6 45 31 e4 48 8d 15 b3 73 00 00 e8 06 03 00 00 8b 83 68 01 00 00 e9 8e fa ff ff 0f 1f 00 48 8b 44 24 08 4c 89 ee 48 89 df <c7> 00 21 43 34 12 e8 72 09 00 00 e9 6a fa ff ff 0f 1f 44 00 00 e8
[ 1.402172] RSP: 002b:00007ffc350f6dc0 EFLAGS: 00010202
[ 1.402488] RAX: 0000725970e94000 RBX: 00005b7c642c2560 RCX: 0000725970d359a7
[ 1.402898] RDX: 0000000000000003 RSI: 00007ffc350f6dc0 RDI: 00005b7c642c2560
[ 1.403284] RBP: 00007ffc350f6e90 R08: 000000000000000d R09: 0000000000000000
[ 1.403634] R10: 00007ffc350f6dd8 R11: 0000000000000246 R12: 0000000000000001
[ 1.404078] R13: 00007ffc350f6dc0 R14: 0000725970e29ce0 R15: 0000000000000003
[ 1.404450] </TASK>
[ 1.404570] Modules linked in:
[ 1.404821] CR2: ffffd3fb40000008
[ 1.405029] ---[ end trace 0000000000000000 ]---
[ 1.405323] RIP: 0010:dax_to_folio+0x14/0x60
[ 1.405556] Code: 52 c9 c3 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 c1 ef 05 48 c1 e7 06 48 03 3d 34 b5 31 01 <48> 8b 57 08 48 89 f8 f6 c2 01 75 2b 66 90 c3 cc cc cc cc f7 c7 ff
[ 1.406639] RSP: 0000:ffffaf7d04407aa8 EFLAGS: 00010086
[ 1.406910] RAX: 000000a000000000 RBX: ffffaf7d04407bb0 RCX: 0000000000000000
[ 1.407379] RDX: ffffd17b40000008 RSI: 0000000000000083 RDI: ffffd3fb40000000
[ 1.407800] RBP: 0000000000000011 R08: 000000a000000000 R09: 0000000000000000
[ 1.408246] R10: 0000000000001000 R11: ffffaf7d04407c10 R12: 0000000000000000
[ 1.408666] R13: ffffa020557be9c0 R14: 0000014000000001 R15: 0000725970e94000
[ 1.409170] FS: 000072596d6d2ec0(0000) GS:ffffa0222dc59000(0000) knlGS:0000000000000000
[ 1.409608] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.409977] CR2: ffffd3fb40000008 CR3: 000000011579c005 CR4: 0000000000372ef0
[ 1.410437] Kernel panic - not syncing: Fatal exception
[ 1.410857] Kernel Offset: 0xc000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Fixes: 21aa65bf82a7 ("mm: remove callers of pfn_t functionality")
Signed-off-by: Haiyue Wang <haiyuewa@163.com>
Link: https://lore.kernel.org/20250904120339.972-1-haiyuewa@163.com
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The ready event list of an epoll object is protected by read-write
semaphore:
- The consumer (waiter) acquires the write lock and takes items.
- the producer (waker) takes the read lock and adds items.
The point of this design is enabling epoll to scale well with large number
of producers, as multiple producers can hold the read lock at the same
time.
Unfortunately, this implementation may cause scheduling priority inversion
problem. Suppose the consumer has higher scheduling priority than the
producer. The consumer needs to acquire the write lock, but may be blocked
by the producer holding the read lock. Since read-write semaphore does not
support priority-boosting for the readers (even with CONFIG_PREEMPT_RT=y),
we have a case of priority inversion: a higher priority consumer is blocked
by a lower priority producer. This problem was reported in [1].
Furthermore, this could also cause stall problem, as described in [2].
Fix this problem by replacing rwlock with spinlock.
This reduces the event bandwidth, as the producers now have to contend with
each other for the spinlock. According to the benchmark from
https://github.com/rouming/test-tools/blob/master/stress-epoll.c:
On 12 x86 CPUs:
Before After Diff
threads events/ms events/ms
8 7162 4956 -31%
16 8733 5383 -38%
32 7968 5572 -30%
64 10652 5739 -46%
128 11236 5931 -47%
On 4 riscv CPUs:
Before After Diff
threads events/ms events/ms
8 2958 2833 -4%
16 3323 3097 -7%
32 3451 3240 -6%
64 3554 3178 -11%
128 3601 3235 -10%
Although the numbers look bad, it should be noted that this benchmark
creates multiple threads who do nothing except constantly generating new
epoll events, thus contention on the spinlock is high. For real workload,
the event rate is likely much lower, and the performance drop is not as
bad.
Using another benchmark (perf bench epoll wait) where spinlock contention
is lower, improvement is even observed on x86:
On 12 x86 CPUs:
Before: Averaged 110279 operations/sec (+- 1.09%), total secs = 8
After: Averaged 114577 operations/sec (+- 2.25%), total secs = 8
On 4 riscv CPUs:
Before: Averaged 175767 operations/sec (+- 0.62%), total secs = 8
After: Averaged 167396 operations/sec (+- 0.23%), total secs = 8
In conclusion, no one is likely to be upset over this change. After all,
spinlock was used originally for years, and the commit which converted to
rwlock didn't mention a real workload, just that the benchmark numbers are
nice.
This patch is not exactly the revert of commit a218cc491420 ("epoll: use
rwlock in order to reduce ep_poll_callback() contention"), because git
revert conflicts in some places which are not obvious on the resolution.
This patch is intended to be backported, therefore go with the obvious
approach:
- Replace rwlock_t with spinlock_t one to one
- Delete list_add_tail_lockless() and chain_epi_lockless(). These were
introduced to allow producers to concurrently add items to the list.
But now that spinlock no longer allows producers to touch the event
list concurrently, these two functions are not necessary anymore.
Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/ec92458ea357ec503c737ead0f10b2c6e4c37d47.1752581388.git.namcao@linutronix.de
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: stable@vger.kernel.org
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Closes: https://lore.kernel.org/linux-rt-users/20210825132754.GA895675@lothringen/ [1]
Reported-by: Valentin Schneider <vschneid@redhat.com>
Closes: https://lore.kernel.org/linux-rt-users/xhsmhttqvnall.mognet@vschneid.remote.csb/ [2]
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The changes modernizes the code by aligning it with current kernel best
practices. It improves code clarity and consistency, as strncpy is deprecated
as explained in Documentation/process/deprecated.rst. This change does
not alter the functionality or introduce any behavioral changes.
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Marcelo Moreira <marcelomoreira1905@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
There can be a NULL pointer dereference bug here. NULL is passed to
__cifs_sfu_make_node without checks, which passes it unchecked to
cifs_strndup_to_utf16, which in turn passes it to
cifs_local_to_utf16_bytes where '*from' is dereferenced, causing a crash.
This patch adds a check for NULL 'src' in cifs_strndup_to_utf16 and
returns NULL early to prevent dereferencing NULL pointer.
Found by Linux Verification Center (linuxtesting.org) with SVACE
Signed-off-by: Makar Semyonov <m.semenov@tssltd.ru>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Some architectures, such as RISC-V, use the ELF e_flags field to encode
ABI-specific information (e.g., ISA extensions, fpu support). Debuggers
like GDB rely on these flags in core dumps to correctly interpret
optional register sets. If the flags are missing or incorrect, GDB may
warn and ignore valid data, for example:
warning: Unexpected size of section '.reg2/213' in core file.
This can prevent access to fpu or other architecture-specific registers
even when they were dumped.
Save the e_flags field during ELF binary loading (in load_elf_binary())
into the mm_struct, and later retrieve it during core dump generation
(in fill_note_info()). Kconfig option CONFIG_ARCH_HAS_ELF_CORE_EFLAGS
is introduced for architectures that require this behaviour.
Signed-off-by: Svetlana Parfenova <svetlana.parfenova@syntacore.com>
Link: https://lore.kernel.org/r/20250901135350.619485-1-svetlana.parfenova@syntacore.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Pull smb server fix from Steve French:
- fix handling filenames with ":" (colon) in them
* tag 'v6.17-rc4-ksmbd-fix' of git://git.samba.org/ksmbd:
ksmbd: allow a filename to contain colons on SMB3.1.1 posix extensions
|
|
Print the negotiated encryption cipher type in DebugData
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Add smb3_lease_break_enter to trace lease break notifications,
recording lease state, flags, epoch, and lease key. Align
smb3_lease_not_found to use the same payload and print format.
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
correct spellings in comments
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes. 13 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels. 11 of these
fixes are for MM.
This includes a three-patch series from Harry Yoo which fixes an
intermittent boot failure which can occur on x86 systems. And a
two-patch series from Alexander Gordeev which fixes a KASAN crash on
S390 systems"
* tag 'mm-hotfixes-stable-2025-09-01-17-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: fix possible deadlock in kmemleak
x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()
mm: introduce and use {pgd,p4d}_populate_kernel()
mm: move page table sync declarations to linux/pgtable.h
proc: fix missing pde_set_flags() for net proc files
mm: fix accounting of memmap pages
mm/damon/core: prevent unnecessary overflow in damos_set_effective_quota()
kexec: add KEXEC_FILE_NO_CMA as a legal flag
kasan: fix GCC mem-intrinsic prefix with sw tags
mm/kasan: avoid lazy MMU mode hazards
mm/kasan: fix vmalloc shadow memory (de-)population races
kunit: kasan_test: disable fortify string checker on kasan_strings() test
selftests/mm: fix FORCE_READ to read input value correctly
mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE
ocfs2: prevent release journal inode after journal shutdown
rust: mm: mark VmaNew as transparent
of_numa: fix uninitialized memory nodes causing kernel panic
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix a few races related to inode link count
- fix inode leak on failure to add link to inode
- move transaction aborts closer to where they happen
* tag 'for-6.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: avoid load/store tearing races when checking if an inode was logged
btrfs: fix race between setting last_dir_index_offset and inode logging
btrfs: fix race between logging inode and checking if it was logged before
btrfs: simplify error handling logic for btrfs_link()
btrfs: fix inode leak on failure to add link to inode
btrfs: abort transaction on failure to add link to inode
|
|
There is a race condition between inode eviction and inode caching that
can cause a live struct btrfs_inode to be missing from the root->inodes
xarray. Specifically, there is a window during evict() between the inode
being unhashed and deleted from the xarray. If btrfs_iget() is called
for the same inode in that window, it will be recreated and inserted
into the xarray, but then eviction will delete the new entry, leaving
nothing in the xarray:
Thread 1 Thread 2
---------------------------------------------------------------
evict()
remove_inode_hash()
btrfs_iget_path()
btrfs_iget_locked()
btrfs_read_locked_inode()
btrfs_add_inode_to_root()
destroy_inode()
btrfs_destroy_inode()
btrfs_del_inode_from_root()
__xa_erase
In turn, this can cause issues for subvolume deletion. Specifically, if
an inode is in this lost state, and all other inodes are evicted, then
btrfs_del_inode_from_root() will call btrfs_add_dead_root() prematurely.
If the lost inode has a delayed_node attached to it, then when
btrfs_clean_one_deleted_snapshot() calls btrfs_kill_all_delayed_nodes(),
it will loop forever because the delayed_nodes xarray will never become
empty (unless memory pressure forces the inode out). We saw this
manifest as soft lockups in production.
Fix it by only deleting the xarray entry if it matches the given inode
(using __xa_cmpxchg()).
Fixes: 310b2f5d5a94 ("btrfs: use an xarray to track open inodes in a root")
Cc: stable@vger.kernel.org # 6.11+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Co-authored-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
than page size
[BUG]
With 64K page size (aarch64 with 64K page size config) and 4K btrfs
block size, the following workload can easily lead to a corrupted read:
mkfs.btrfs -f -s 4k $dev > /dev/null
mount -o compress $dev $mnt
xfs_io -f -c "pwrite -S 0xff 0 64k" $mnt/base > /dev/null
echo "correct result:"
od -Ad -t x1 $mnt/base
xfs_io -f -c "reflink $mnt/base 32k 0 32k" \
-c "reflink $mnt/base 0 32k 32k" \
-c "pwrite -S 0xff 60k 4k" $mnt/new > /dev/null
echo "incorrect result:"
od -Ad -t x1 $mnt/new
umount $mnt
This shows the following result:
correct result:
0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0065536
incorrect result:
0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0032768 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0061440 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0065536
Notice the zero in the range [32K, 60K), which is incorrect.
[CAUSE]
With extra trace printk, it shows the following events during od:
(some unrelated info removed like CPU and context)
od-3457 btrfs_do_readpage: enter r/i=5/258 folio=0(65536) prev_em_start=0000000000000000
The "r/i" is indicating the root and inode number. In our case the file
"new" is using ino 258 from fs tree (root 5).
Here notice the @prev_em_start pointer is NULL. This means the
btrfs_do_readpage() is called from btrfs_read_folio(), not from
btrfs_readahead().
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=0 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=4096 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=8192 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=12288 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=16384 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=20480 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=24576 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=28672 got em start=0 len=32768
These above 32K blocks will be read from the first half of the
compressed data extent.
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=32768 got em start=32768 len=32768
Note here there is no btrfs_submit_compressed_read() call. Which is
incorrect now.
Although both extent maps at 0 and 32K are pointing to the same compressed
data, their offsets are different thus can not be merged into the same
read.
So this means the compressed data read merge check is doing something
wrong.
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=36864 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=40960 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=45056 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=49152 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=53248 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=57344 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=61440 skip uptodate
od-3457 btrfs_submit_compressed_read: cb orig_bio: file off=0 len=61440
The function btrfs_submit_compressed_read() is only called at the end of
folio read. The compressed bio will only have an extent map of range [0,
32K), but the original bio passed in is for the whole 64K folio.
This will cause the decompression part to only fill the first 32K,
leaving the rest untouched (aka, filled with zero).
This incorrect compressed read merge leads to the above data corruption.
There were similar problems that happened in the past, commit 808f80b46790
("Btrfs: update fix for read corruption of compressed and shared
extents") is doing pretty much the same fix for readahead.
But that's back to 2015, where btrfs still only supports bs (block size)
== ps (page size) cases.
This means btrfs_do_readpage() only needs to handle a folio which
contains exactly one block.
Only btrfs_readahead() can lead to a read covering multiple blocks.
Thus only btrfs_readahead() passes a non-NULL @prev_em_start pointer.
With v5.15 kernel btrfs introduced bs < ps support. This breaks the above
assumption that a folio can only contain one block.
Now btrfs_read_folio() can also read multiple blocks in one go.
But btrfs_read_folio() doesn't pass a @prev_em_start pointer, thus the
existing bio force submission check will never be triggered.
In theory, this can also happen for btrfs with large folios, but since
large folio is still experimental, we don't need to bother it, thus only
bs < ps support is affected for now.
[FIX]
Instead of passing @prev_em_start to do the proper compressed extent
check, introduce one new member, btrfs_bio_ctrl::last_em_start, so that
the existing bio force submission logic will always be triggered.
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The compression level is meaningless for lzo, but before commit
3f093ccb95f30 ("btrfs: harden parsing of compression mount options"),
it was silently ignored if passed.
After that commit, passing a level with lzo fails to mount:
BTRFS error: unrecognized compression value lzo:1
It seems reasonable for users to expect that lzo would permit a numeric
level option, as all the other algos do, even though the kernel's
implementation of LZO currently only supports a single level. Because it
has always worked to pass a level, it seems likely to me that users in
the real world are relying on doing so.
This patch restores the old behavior, giving "lzo:N" the same semantics
as all of the other compression algos.
To be clear, silly variants like "lzo:one", "lzo:the_first_option", or
"lzo:armageddon" also used to work. This isn't meant to suggest that
any possible mis-interpretation of mount options that once worked must
continue to work forever. This is an exceptional case where it makes
sense to preserve compatibility, both because the mis-interpretation is
reasonable, and because nothing tangible is sacrificed.
Finally update btrfs_show_options() to ignore the level of LZO, as it
is only the default level without any extra meaning.
Fixes: 3f093ccb95f30 ("btrfs: harden parsing of compression mount options")
Reviewed-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The following workload on a squota enabled fs:
btrfs subvol create mnt/subvol
# ensure subvol extents get accounted
sync
btrfs qgroup create 1/1 mnt
btrfs qgroup assign mnt/subvol 1/1 mnt
btrfs qgroup delete mnt/subvol
# make the cleaner thread run
btrfs filesystem sync mnt
sleep 1
btrfs filesystem sync mnt
btrfs qgroup destroy 1/1 mnt
will fail with EBUSY. The reason is that 1/1 does the quick accounting
when we assign subvol to it, gaining its exclusive usage as excl and
excl_cmpr. But then when we delete subvol, the decrement happens via
record_squota_delta() which does not update excl_cmpr, as squotas does
not make any distinction between compressed and normal extents. Thus,
we increment excl_cmpr but never decrement it, and are unable to delete
1/1. The two possible fixes are to make squota always mirror excl and
excl_cmpr or to make the fast accounting separately track the plain and
cmpr numbers. The latter felt cleaner to me so that is what I opted for.
Fixes: 1e0e9d5771c3 ("btrfs: add helper for recording simple quota deltas")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Since the introduction of pid namespaces, their interaction with procfs
has been entirely implicit in ways that require a lot of dancing around
by programs that need to construct sandboxes with different PID
namespaces.
Being able to explicitly specify the pid namespace to use when
constructing a procfs super block will allow programs to no longer need
to fork off a process which does then does unshare(2) / setns(2) and
forks again in order to construct a procfs in a pidns.
So, provide a "pidns" mount option which allows such users to just
explicitly state which pid namespace they want that procfs instance to
use. This interface can be used with fsconfig(2) either with a file
descriptor or a path:
fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
or with classic mount(2) / mount(8):
// mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
As this new API is effectively shorthand for setns(2) followed by
mount(2), the permission model for this mirrors pidns_install() to avoid
opening up new attack surfaces by loosening the existing permission
model.
In order to avoid having to RCU-protect all users of proc_pid_ns() (to
avoid UAFs), attempting to reconfigure an existing procfs instance's pid
namespace will error out with -EBUSY. Creating new procfs instances is
quite cheap, so this should not be an impediment to most users, and lets
us avoid a lot of churn in fs/proc/* for a feature that it seems
unlikely userspace would use.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/20250805-procfs-pidns-api-v4-2-705f984940e7@cyphar.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
openat2 had a bug: if we pass RESOLVE_NO_XDEV, then openat2
doesn't traverse through automounts, but may still trigger them.
(See the link for full bug report with reproducer.)
This commit fixes this bug.
Link: https://lore.kernel.org/linux-fsdevel/20250817075252.4137628-1-safinaskar@zohomail.com/
Fixes: fddb5d430ad9fa91b49b1 ("open: introduce openat2(2) syscall")
Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Cc: stable@vger.kernel.org
Signed-off-by: Askar Safin <safinaskar@zohomail.com>
Link: https://lore.kernel.org/20250825181233.2464822-5-safinaskar@zohomail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This is preparation to RESOLVE_NO_XDEV fix in following commits.
Also this commit makes LOOKUP_NO_XDEV logic more clear: now we
immediately fail with EXDEV on first mount crossing
instead of waiting for very end.
No functional change intended
Signed-off-by: Askar Safin <safinaskar@zohomail.com>
Link: https://lore.kernel.org/20250825181233.2464822-4-safinaskar@zohomail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This is preparation to RESOLVE_NO_XDEV fix in following commits.
No functional change intended.
The only place that ever looks at
ND_JUMPED in nd->state is complete_walk()
and we are not going to reach
it if handle_mounts() returns an error
Signed-off-by: Askar Safin <safinaskar@zohomail.com>
Link: https://lore.kernel.org/20250825181233.2464822-3-safinaskar@zohomail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This is preparation to RESOLVE_NO_XDEV fix in following commits.
No functional change intended
Signed-off-by: Askar Safin <safinaskar@zohomail.com>
Link: https://lore.kernel.org/20250825181233.2464822-2-safinaskar@zohomail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
With the introduction of clone3 in commit 7f192e3cd316 ("fork: add
clone3") the effective bit width of clone_flags on all architectures was
increased from 32-bit to 64-bit, with a new type of u64 for the flags.
However, for most consumers of clone_flags the interface was not
changed from the previous type of unsigned long.
While this works fine as long as none of the new 64-bit flag bits
(CLONE_CLEAR_SIGHAND and CLONE_INTO_CGROUP) are evaluated, this is still
undesirable in terms of the principle of least surprise.
Thus, this commit fixes all relevant interfaces of callees to
sys_clone3/copy_process (excluding the architecture-specific
copy_thread) to consistently pass clone_flags as u64, so that
no truncation to 32-bit integers occurs on 32-bit architectures.
Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com>
Link: https://lore.kernel.org/20250901-nios2-implement-clone3-v2-2-53fcf5577d57@siemens-energy.com
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The inode mode loaded from corrupted disk can be invalid. Do like what
commit 0a9e74051313 ("isofs: Verify inode mode when loading from disk")
does.
Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/429b3ef1-13de-4310-9a8e-c2dc9a36234a@I-love.SAKURA.ne.jp
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
vfs_ioctl() is no longer called by anything outside of fs/ioctl.c, so
remove the global symbol and export as it is not needed.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/2025083038-carving-amuck-a4ae@gregkh
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse into vfs.fixes
fuse fixes for 6.17-rc5
* tag 'fuse-fixes-6.17-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (6 commits)
fuse: Block access to folio overlimit
fuse: fix fuseblk i_blkbits for iomap partial writes
fuse: reflect cached blocksize if blocksize was changed
fuse: prevent overflow in copy_file_range return value
fuse: check if copy_file_range() returns larger than requested size
fuse: do not allow mapping a non-regular backing file
Link: https://lore.kernel.org/CAJfpeguEVMMyw_zCb+hbOuSxdE2Z3Raw=SJsq=Y56Ae6dn2W3g@mail.gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Fix two minor whitespace issues.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Instead of doing direct access to ->i_count, add a helper to handle
this. This will make it easier to convert i_count to a refcount later.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/9bc62a84c6b9d6337781203f60837bd98fbc4a96.1756222464.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Currently, if we are the last iput, and we have the I_DIRTY_TIME bit
set, we will grab a reference on the inode again and then mark it dirty
and then redo the put. This is to make sure we delay the time update
for as long as possible.
We can rework this logic to simply dec i_count if it is not 1, and if it
is do the time update while still holding the i_count reference.
Then we can replace the atomic_dec_and_lock with locking the ->i_lock
and doing atomic_dec_and_test, since we did the atomic_add_unless above.
Co-developed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/be208b89bdb650202e712ce2bcfc407ac7044c7a.1756222464.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Currently, hfs_brec_remove() executes moving records
towards the location of deleted record and it updates
offsets of moved records. However, the hfs_brec_remove()
logic ignores the "mess" of b-tree node's free space and
it doesn't touch the offsets out of records number.
Potentially, it could confuse fsck or driver logic or
to be a reason of potential corruption cases.
This patch reworks the logic of hfs_brec_remove()
by means of clearing freed space of b-tree node
after the records moving. And it clear the last
offset that keeping old location of free space
because now the offset before this one is keeping
the actual offset to the free space after the record
deletion.
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250815194918.38165-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
|
|
The generic/736 xfstest fails for HFS case:
BEGIN TEST default (1 test): hfs Mon May 5 03:18:32 UTC 2025
DEVICE: /dev/vdb
HFS_MKFS_OPTIONS:
MOUNT_OPTIONS: MOUNT_OPTIONS
FSTYP -- hfs
PLATFORM -- Linux/x86_64 kvm-xfstests 6.15.0-rc4-xfstests-g00b827f0cffa #1 SMP PREEMPT_DYNAMIC Fri May 25
MKFS_OPTIONS -- /dev/vdc
MOUNT_OPTIONS -- /dev/vdc /vdc
generic/736 [03:18:33][ 3.510255] run fstests generic/736 at 2025-05-05 03:18:33
_check_generic_filesystem: filesystem on /dev/vdb is inconsistent
(see /results/hfs/results-default/generic/736.full for details)
Ran: generic/736
Failures: generic/736
Failed 1 of 1 tests
The HFS volume becomes corrupted after the test run:
sudo fsck.hfs -d /dev/loop50
** /dev/loop50
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking HFS volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking catalog hierarchy.
** Checking volume bitmap.
** Checking volume information.
invalid MDB drNxtCNID
Master Directory Block needs minor repair
(1, 0)
Verify Status: VIStat = 0x8000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x0000 CatStat = 0x00000000
** Repairing volume.
** Rechecking volume.
** Checking HFS volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking catalog hierarchy.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled was repaired successfully.
The main reason of the issue is the absence of logic that
corrects mdb->drNxtCNID/HFS_SB(sb)->next_id (next unused
CNID) after deleting a record in Catalog File. This patch
introduces a hfs_correct_next_unused_CNID() method that
implements the necessary logic. In the case of Catalog File's
record delete operation, the function logic checks that
(deleted_CNID + 1) == next_unused_CNID and it finds/sets the new
value of next_unused_CNID.
sudo ./check generic/736
FSTYP -- hfs
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.15.0+ #6 SMP PREEMPT_DYNAMIC Tue Jun 10 15:02:48 PDT 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
generic/736 33s
Ran: generic/736
Passed all 1 tests
sudo fsck.hfs -d /dev/loop50
** /dev/loop50
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking HFS volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking catalog hierarchy.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled appears to be OK
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20250610231609.551930-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
|
|
The syzbot reported issue in hfsplus_delete_cat():
[ 70.682285][ T9333] =====================================================
[ 70.682943][ T9333] BUG: KMSAN: uninit-value in hfsplus_subfolders_dec+0x1d7/0x220
[ 70.683640][ T9333] hfsplus_subfolders_dec+0x1d7/0x220
[ 70.684141][ T9333] hfsplus_delete_cat+0x105d/0x12b0
[ 70.684621][ T9333] hfsplus_rmdir+0x13d/0x310
[ 70.685048][ T9333] vfs_rmdir+0x5ba/0x810
[ 70.685447][ T9333] do_rmdir+0x964/0xea0
[ 70.685833][ T9333] __x64_sys_rmdir+0x71/0xb0
[ 70.686260][ T9333] x64_sys_call+0xcd8/0x3cf0
[ 70.686695][ T9333] do_syscall_64+0xd9/0x1d0
[ 70.687119][ T9333] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 70.687646][ T9333]
[ 70.687856][ T9333] Uninit was stored to memory at:
[ 70.688311][ T9333] hfsplus_subfolders_inc+0x1c2/0x1d0
[ 70.688779][ T9333] hfsplus_create_cat+0x148e/0x1800
[ 70.689231][ T9333] hfsplus_mknod+0x27f/0x600
[ 70.689730][ T9333] hfsplus_mkdir+0x5a/0x70
[ 70.690146][ T9333] vfs_mkdir+0x483/0x7a0
[ 70.690545][ T9333] do_mkdirat+0x3f2/0xd30
[ 70.690944][ T9333] __x64_sys_mkdir+0x9a/0xf0
[ 70.691380][ T9333] x64_sys_call+0x2f89/0x3cf0
[ 70.691816][ T9333] do_syscall_64+0xd9/0x1d0
[ 70.692229][ T9333] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 70.692773][ T9333]
[ 70.692990][ T9333] Uninit was stored to memory at:
[ 70.693469][ T9333] hfsplus_subfolders_inc+0x1c2/0x1d0
[ 70.693960][ T9333] hfsplus_create_cat+0x148e/0x1800
[ 70.694438][ T9333] hfsplus_fill_super+0x21c1/0x2700
[ 70.694911][ T9333] mount_bdev+0x37b/0x530
[ 70.695320][ T9333] hfsplus_mount+0x4d/0x60
[ 70.695729][ T9333] legacy_get_tree+0x113/0x2c0
[ 70.696167][ T9333] vfs_get_tree+0xb3/0x5c0
[ 70.696588][ T9333] do_new_mount+0x73e/0x1630
[ 70.697013][ T9333] path_mount+0x6e3/0x1eb0
[ 70.697425][ T9333] __se_sys_mount+0x733/0x830
[ 70.697857][ T9333] __x64_sys_mount+0xe4/0x150
[ 70.698269][ T9333] x64_sys_call+0x2691/0x3cf0
[ 70.698704][ T9333] do_syscall_64+0xd9/0x1d0
[ 70.699117][ T9333] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 70.699730][ T9333]
[ 70.699946][ T9333] Uninit was created at:
[ 70.700378][ T9333] __alloc_pages_noprof+0x714/0xe60
[ 70.700843][ T9333] alloc_pages_mpol_noprof+0x2a2/0x9b0
[ 70.701331][ T9333] alloc_pages_noprof+0xf8/0x1f0
[ 70.701774][ T9333] allocate_slab+0x30e/0x1390
[ 70.702194][ T9333] ___slab_alloc+0x1049/0x33a0
[ 70.702635][ T9333] kmem_cache_alloc_lru_noprof+0x5ce/0xb20
[ 70.703153][ T9333] hfsplus_alloc_inode+0x5a/0xd0
[ 70.703598][ T9333] alloc_inode+0x82/0x490
[ 70.703984][ T9333] iget_locked+0x22e/0x1320
[ 70.704428][ T9333] hfsplus_iget+0x5c/0xba0
[ 70.704827][ T9333] hfsplus_btree_open+0x135/0x1dd0
[ 70.705291][ T9333] hfsplus_fill_super+0x1132/0x2700
[ 70.705776][ T9333] mount_bdev+0x37b/0x530
[ 70.706171][ T9333] hfsplus_mount+0x4d/0x60
[ 70.706579][ T9333] legacy_get_tree+0x113/0x2c0
[ 70.707019][ T9333] vfs_get_tree+0xb3/0x5c0
[ 70.707444][ T9333] do_new_mount+0x73e/0x1630
[ 70.707865][ T9333] path_mount+0x6e3/0x1eb0
[ 70.708270][ T9333] __se_sys_mount+0x733/0x830
[ 70.708711][ T9333] __x64_sys_mount+0xe4/0x150
[ 70.709158][ T9333] x64_sys_call+0x2691/0x3cf0
[ 70.709630][ T9333] do_syscall_64+0xd9/0x1d0
[ 70.710053][ T9333] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 70.710611][ T9333]
[ 70.710842][ T9333] CPU: 3 UID: 0 PID: 9333 Comm: repro Not tainted 6.12.0-rc6-dirty #17
[ 70.711568][ T9333] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 70.712490][ T9333] =====================================================
[ 70.713085][ T9333] Disabling lock debugging due to kernel taint
[ 70.713618][ T9333] Kernel panic - not syncing: kmsan.panic set ...
[ 70.714159][ T9333] CPU: 3 UID: 0 PID: 9333 Comm: repro Tainted: G B 6.12.0-rc6-dirty #17
[ 70.715007][ T9333] Tainted: [B]=BAD_PAGE
[ 70.715365][ T9333] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 70.716311][ T9333] Call Trace:
[ 70.716621][ T9333] <TASK>
[ 70.716899][ T9333] dump_stack_lvl+0x1fd/0x2b0
[ 70.717350][ T9333] dump_stack+0x1e/0x30
[ 70.717743][ T9333] panic+0x502/0xca0
[ 70.718116][ T9333] ? kmsan_get_metadata+0x13e/0x1c0
[ 70.718611][ T9333] kmsan_report+0x296/0x2a0
[ 70.719038][ T9333] ? __msan_metadata_ptr_for_load_4+0x24/0x40
[ 70.719859][ T9333] ? __msan_warning+0x96/0x120
[ 70.720345][ T9333] ? hfsplus_subfolders_dec+0x1d7/0x220
[ 70.720881][ T9333] ? hfsplus_delete_cat+0x105d/0x12b0
[ 70.721412][ T9333] ? hfsplus_rmdir+0x13d/0x310
[ 70.721880][ T9333] ? vfs_rmdir+0x5ba/0x810
[ 70.722458][ T9333] ? do_rmdir+0x964/0xea0
[ 70.722883][ T9333] ? __x64_sys_rmdir+0x71/0xb0
[ 70.723397][ T9333] ? x64_sys_call+0xcd8/0x3cf0
[ 70.723915][ T9333] ? do_syscall_64+0xd9/0x1d0
[ 70.724454][ T9333] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 70.725110][ T9333] ? vprintk_emit+0xd1f/0xe60
[ 70.725616][ T9333] ? vprintk_default+0x3f/0x50
[ 70.726175][ T9333] ? vprintk+0xce/0xd0
[ 70.726628][ T9333] ? _printk+0x17e/0x1b0
[ 70.727129][ T9333] ? __msan_metadata_ptr_for_load_4+0x24/0x40
[ 70.727739][ T9333] ? kmsan_get_metadata+0x13e/0x1c0
[ 70.728324][ T9333] __msan_warning+0x96/0x120
[ 70.728854][ T9333] hfsplus_subfolders_dec+0x1d7/0x220
[ 70.729479][ T9333] hfsplus_delete_cat+0x105d/0x12b0
[ 70.729984][ T9333] ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[ 70.730646][ T9333] ? __msan_metadata_ptr_for_load_4+0x24/0x40
[ 70.731296][ T9333] ? kmsan_get_metadata+0x13e/0x1c0
[ 70.731863][ T9333] hfsplus_rmdir+0x13d/0x310
[ 70.732390][ T9333] ? __pfx_hfsplus_rmdir+0x10/0x10
[ 70.732919][ T9333] vfs_rmdir+0x5ba/0x810
[ 70.733416][ T9333] ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[ 70.734044][ T9333] do_rmdir+0x964/0xea0
[ 70.734537][ T9333] __x64_sys_rmdir+0x71/0xb0
[ 70.735032][ T9333] x64_sys_call+0xcd8/0x3cf0
[ 70.735579][ T9333] do_syscall_64+0xd9/0x1d0
[ 70.736092][ T9333] ? irqentry_exit+0x16/0x60
[ 70.736637][ T9333] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 70.737269][ T9333] RIP: 0033:0x7fa9424eafc9
[ 70.737775][ T9333] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[ 70.739844][ T9333] RSP: 002b:00007fff099cd8d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000054
[ 70.740760][ T9333] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9424eafc9
[ 70.741642][ T9333] RDX: 006c6f72746e6f63 RSI: 000000000000000a RDI: 0000000020000100
[ 70.742543][ T9333] RBP: 00007fff099cd8e0 R08: 00007fff099cd910 R09: 00007fff099cd910
[ 70.743376][ T9333] R10: 0000000000000000 R11: 0000000000000202 R12: 0000565430642260
[ 70.744247][ T9333] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 70.745082][ T9333] </TASK>
The main reason of the issue that struct hfsplus_inode_info
has not been properly initialized for the case of root folder.
In the case of root folder, hfsplus_fill_super() calls
the hfsplus_iget() that implements only partial initialization of
struct hfsplus_inode_info and subfolders field is not
initialized by hfsplus_iget() logic.
This patch implements complete initialization of
struct hfsplus_inode_info in the hfsplus_iget() logic with
the goal to prevent likewise issues for the case of
root folder.
Reported-by: syzbot <syzbot+fdedff847a0e5e84c39f@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=fdedff847a0e5e84c39f
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250825225103.326401-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
|
|
The syzbot reported issue in hfs_find_set_zero_bits():
=====================================================
BUG: KMSAN: uninit-value in hfs_find_set_zero_bits+0x74d/0xb60 fs/hfs/bitmap.c:45
hfs_find_set_zero_bits+0x74d/0xb60 fs/hfs/bitmap.c:45
hfs_vbm_search_free+0x13c/0x5b0 fs/hfs/bitmap.c:151
hfs_extend_file+0x6a5/0x1b00 fs/hfs/extent.c:408
hfs_get_block+0x435/0x1150 fs/hfs/extent.c:353
__block_write_begin_int+0xa76/0x3030 fs/buffer.c:2151
block_write_begin fs/buffer.c:2262 [inline]
cont_write_begin+0x10e1/0x1bc0 fs/buffer.c:2601
hfs_write_begin+0x85/0x130 fs/hfs/inode.c:52
cont_expand_zero fs/buffer.c:2528 [inline]
cont_write_begin+0x35a/0x1bc0 fs/buffer.c:2591
hfs_write_begin+0x85/0x130 fs/hfs/inode.c:52
hfs_file_truncate+0x1d6/0xe60 fs/hfs/extent.c:494
hfs_inode_setattr+0x964/0xaa0 fs/hfs/inode.c:654
notify_change+0x1993/0x1aa0 fs/attr.c:552
do_truncate+0x28f/0x310 fs/open.c:68
do_ftruncate+0x698/0x730 fs/open.c:195
do_sys_ftruncate fs/open.c:210 [inline]
__do_sys_ftruncate fs/open.c:215 [inline]
__se_sys_ftruncate fs/open.c:213 [inline]
__x64_sys_ftruncate+0x11b/0x250 fs/open.c:213
x64_sys_call+0xfe3/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:78
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
slab_post_alloc_hook mm/slub.c:4154 [inline]
slab_alloc_node mm/slub.c:4197 [inline]
__kmalloc_cache_noprof+0x7f7/0xed0 mm/slub.c:4354
kmalloc_noprof include/linux/slab.h:905 [inline]
hfs_mdb_get+0x1cc8/0x2a90 fs/hfs/mdb.c:175
hfs_fill_super+0x3d0/0xb80 fs/hfs/super.c:337
get_tree_bdev_flags+0x6e3/0x920 fs/super.c:1681
get_tree_bdev+0x38/0x50 fs/super.c:1704
hfs_get_tree+0x35/0x40 fs/hfs/super.c:388
vfs_get_tree+0xb0/0x5c0 fs/super.c:1804
do_new_mount+0x738/0x1610 fs/namespace.c:3902
path_mount+0x6db/0x1e90 fs/namespace.c:4226
do_mount fs/namespace.c:4239 [inline]
__do_sys_mount fs/namespace.c:4450 [inline]
__se_sys_mount+0x6eb/0x7d0 fs/namespace.c:4427
__x64_sys_mount+0xe4/0x150 fs/namespace.c:4427
x64_sys_call+0xfa7/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:166
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
CPU: 1 UID: 0 PID: 12609 Comm: syz.1.2692 Not tainted 6.16.0-syzkaller #0 PREEMPT(none)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
=====================================================
The HFS_SB(sb)->bitmap buffer is allocated in hfs_mdb_get():
HFS_SB(sb)->bitmap = kmalloc(8192, GFP_KERNEL);
Finally, it can trigger the reported issue because kmalloc()
doesn't clear the allocated memory. If allocated memory contains
only zeros, then everything will work pretty fine.
But if the allocated memory contains the "garbage", then
it can affect the bitmap operations and it triggers
the reported issue.
This patch simply exchanges the kmalloc() on kzalloc()
with the goal to guarantee the correctness of bitmap operations.
Because, newly created allocation bitmap should have all
available blocks free. Potentially, initialization bitmap's read
operation could not fill the whole allocated memory and
"garbage" in the not initialized memory will be the reason of
volume coruptions and file system driver bugs.
Reported-by: syzbot <syzbot+773fa9d79b29bd8b6831@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=773fa9d79b29bd8b6831
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250820230636.179085-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
|
|
Potenatially, __hfs_ext_read_extent() could operate by
not initialized values of fd->key after hfs_brec_find() call:
static inline int __hfs_ext_read_extent(struct hfs_find_data *fd, struct hfs_extent *extent,
u32 cnid, u32 block, u8 type)
{
int res;
hfs_ext_build_key(fd->search_key, cnid, block, type);
fd->key->ext.FNum = 0;
res = hfs_brec_find(fd);
if (res && res != -ENOENT)
return res;
if (fd->key->ext.FNum != fd->search_key->ext.FNum ||
fd->key->ext.FkType != fd->search_key->ext.FkType)
return -ENOENT;
if (fd->entrylength != sizeof(hfs_extent_rec))
return -EIO;
hfs_bnode_read(fd->bnode, extent, fd->entryoffset, sizeof(hfs_extent_rec));
return 0;
}
This patch changes kmalloc() on kzalloc() in hfs_find_init()
and intializes fd->record, fd->keyoffset, fd->keylength,
fd->entryoffset, fd->entrylength for the case if hfs_brec_find()
has been found nothing in the b-tree node.
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250818225252.126427-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
|
|
The syzbot reported issue in __hfsplus_ext_cache_extent():
[ 70.194323][ T9350] BUG: KMSAN: uninit-value in __hfsplus_ext_cache_extent+0x7d0/0x990
[ 70.195022][ T9350] __hfsplus_ext_cache_extent+0x7d0/0x990
[ 70.195530][ T9350] hfsplus_file_extend+0x74f/0x1cf0
[ 70.195998][ T9350] hfsplus_get_block+0xe16/0x17b0
[ 70.196458][ T9350] __block_write_begin_int+0x962/0x2ce0
[ 70.196959][ T9350] cont_write_begin+0x1000/0x1950
[ 70.197416][ T9350] hfsplus_write_begin+0x85/0x130
[ 70.197873][ T9350] generic_perform_write+0x3e8/0x1060
[ 70.198374][ T9350] __generic_file_write_iter+0x215/0x460
[ 70.198892][ T9350] generic_file_write_iter+0x109/0x5e0
[ 70.199393][ T9350] vfs_write+0xb0f/0x14e0
[ 70.199771][ T9350] ksys_write+0x23e/0x490
[ 70.200149][ T9350] __x64_sys_write+0x97/0xf0
[ 70.200570][ T9350] x64_sys_call+0x3015/0x3cf0
[ 70.201065][ T9350] do_syscall_64+0xd9/0x1d0
[ 70.201506][ T9350] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 70.202054][ T9350]
[ 70.202279][ T9350] Uninit was created at:
[ 70.202693][ T9350] __kmalloc_noprof+0x621/0xf80
[ 70.203149][ T9350] hfsplus_find_init+0x8d/0x1d0
[ 70.203602][ T9350] hfsplus_file_extend+0x6ca/0x1cf0
[ 70.204087][ T9350] hfsplus_get_block+0xe16/0x17b0
[ 70.204561][ T9350] __block_write_begin_int+0x962/0x2ce0
[ 70.205074][ T9350] cont_write_begin+0x1000/0x1950
[ 70.205547][ T9350] hfsplus_write_begin+0x85/0x130
[ 70.206017][ T9350] generic_perform_write+0x3e8/0x1060
[ 70.206519][ T9350] __generic_file_write_iter+0x215/0x460
[ 70.207042][ T9350] generic_file_write_iter+0x109/0x5e0
[ 70.207552][ T9350] vfs_write+0xb0f/0x14e0
[ 70.207961][ T9350] ksys_write+0x23e/0x490
[ 70.208375][ T9350] __x64_sys_write+0x97/0xf0
[ 70.208810][ T9350] x64_sys_call+0x3015/0x3cf0
[ 70.209255][ T9350] do_syscall_64+0xd9/0x1d0
[ 70.209680][ T9350] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 70.210230][ T9350]
[ 70.210454][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Not tainted 6.12.0-rc5 #5
[ 70.211174][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 70.212115][ T9350] =====================================================
[ 70.212734][ T9350] Disabling lock debugging due to kernel taint
[ 70.213284][ T9350] Kernel panic - not syncing: kmsan.panic set ...
[ 70.213858][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Tainted: G B 6.12.0-rc5 #5
[ 70.214679][ T9350] Tainted: [B]=BAD_PAGE
[ 70.215057][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 70.215999][ T9350] Call Trace:
[ 70.216309][ T9350] <TASK>
[ 70.216585][ T9350] dump_stack_lvl+0x1fd/0x2b0
[ 70.217025][ T9350] dump_stack+0x1e/0x30
[ 70.217421][ T9350] panic+0x502/0xca0
[ 70.217803][ T9350] ? kmsan_get_metadata+0x13e/0x1c0
[ 70.218294][ Message fromT sy9350] kmsan_report+0x296/slogd@syzkaller 0x2aat Aug 18 22:11:058 ...
kernel
:[ 70.213284][ T9350] Kernel panic - not syncing: kmsan.panic [ 70.220179][ T9350] ? kmsan_get_metadata+0x13e/0x1c0
set ...
[ 70.221254][ T9350] ? __msan_warning+0x96/0x120
[ 70.222066][ T9350] ? __hfsplus_ext_cache_extent+0x7d0/0x990
[ 70.223023][ T9350] ? hfsplus_file_extend+0x74f/0x1cf0
[ 70.224120][ T9350] ? hfsplus_get_block+0xe16/0x17b0
[ 70.224946][ T9350] ? __block_write_begin_int+0x962/0x2ce0
[ 70.225756][ T9350] ? cont_write_begin+0x1000/0x1950
[ 70.226337][ T9350] ? hfsplus_write_begin+0x85/0x130
[ 70.226852][ T9350] ? generic_perform_write+0x3e8/0x1060
[ 70.227405][ T9350] ? __generic_file_write_iter+0x215/0x460
[ 70.227979][ T9350] ? generic_file_write_iter+0x109/0x5e0
[ 70.228540][ T9350] ? vfs_write+0xb0f/0x14e0
[ 70.228997][ T9350] ? ksys_write+0x23e/0x490
[ 70.229458][ T9350] ? __x64_sys_write+0x97/0xf0
[ 70.229939][ T9350] ? x64_sys_call+0x3015/0x3cf0
[ 70.230432][ T9350] ? do_syscall_64+0xd9/0x1d0
[ 70.230941][ T9350] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 70.231926][ T9350] ? kmsan_get_metadata+0x13e/0x1c0
[ 70.232738][ T9350] ? kmsan_internal_set_shadow_origin+0x77/0x110
[ 70.233711][ T9350] ? kmsan_get_metadata+0x13e/0x1c0
[ 70.234516][ T9350] ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[ 70.235398][ T9350] ? __msan_metadata_ptr_for_load_4+0x24/0x40
[ 70.236323][ T9350] ? hfsplus_brec_find+0x218/0x9f0
[ 70.237090][ T9350] ? __pfx_hfs_find_rec_by_key+0x10/0x10
[ 70.237938][ T9350] ? __msan_instrument_asm_store+0xbf/0xf0
[ 70.238827][ T9350] ? __msan_metadata_ptr_for_store_4+0x27/0x40
[ 70.239772][ T9350] ? __hfsplus_ext_write_extent+0x536/0x620
[ 70.240666][ T9350] ? kmsan_get_metadata+0x13e/0x1c0
[ 70.241175][ T9350] __msan_warning+0x96/0x120
[ 70.241645][ T9350] __hfsplus_ext_cache_extent+0x7d0/0x990
[ 70.242223][ T9350] hfsplus_file_extend+0x74f/0x1cf0
[ 70.242748][ T9350] hfsplus_get_block+0xe16/0x17b0
[ 70.243255][ T9350] ? kmsan_internal_set_shadow_origin+0x77/0x110
[ 70.243878][ T9350] ? kmsan_get_metadata+0x13e/0x1c0
[ 70.244400][ T9350] ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[ 70.244967][ T9350] __block_write_begin_int+0x962/0x2ce0
[ 70.245531][ T9350] ? __pfx_hfsplus_get_block+0x10/0x10
[ 70.246079][ T9350] cont_write_begin+0x1000/0x1950
[ 70.246598][ T9350] hfsplus_write_begin+0x85/0x130
[ 70.247105][ T9350] ? __pfx_hfsplus_get_block+0x10/0x10
[ 70.247650][ T9350] ? __pfx_hfsplus_write_begin+0x10/0x10
[ 70.248211][ T9350] generic_perform_write+0x3e8/0x1060
[ 70.248752][ T9350] __generic_file_write_iter+0x215/0x460
[ 70.249314][ T9350] generic_file_write_iter+0x109/0x5e0
[ 70.249856][ T9350] ? kmsan_internal_set_shadow_origin+0x77/0x110
[ 70.250487][ T9350] vfs_write+0xb0f/0x14e0
[ 70.250930][ T9350] ? __pfx_generic_file_write_iter+0x10/0x10
[ 70.251530][ T9350] ksys_write+0x23e/0x490
[ 70.251974][ T9350] __x64_sys_write+0x97/0xf0
[ 70.252450][ T9350] x64_sys_call+0x3015/0x3cf0
[ 70.252924][ T9350] do_syscall_64+0xd9/0x1d0
[ 70.253384][ T9350] ? irqentry_exit+0x16/0x60
[ 70.253844][ T9350] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 70.254430][ T9350] RIP: 0033:0x7f7a92adffc9
[ 70.254873][ T9350] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[ 70.256674][ T9350] RSP: 002b:00007fff0bca3188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 70.257485][ T9350] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7a92adffc9
[ 70.258246][ T9350] RDX: 000000000208e24b RSI: 0000000020000100 RDI: 0000000000000004
[ 70.258998][ T9350] RBP: 00007fff0bca31a0 R08: 00007fff0bca31a0 R09: 00007fff0bca31a0
[ 70.259769][ T9350] R10: 0000000000000000 R11: 0000000000000202 R12: 000055e0d75f8250
[ 70.260520][ T9350] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 70.261286][ T9350] </TASK>
[ 70.262026][ T9350] Kernel Offset: disabled
(gdb) l *__hfsplus_ext_cache_extent+0x7d0
0xffffffff8318aef0 is in __hfsplus_ext_cache_extent (fs/hfsplus/extents.c:168).
163 fd->key->ext.cnid = 0;
164 res = hfs_brec_find(fd, hfs_find_rec_by_key);
165 if (res && res != -ENOENT)
166 return res;
167 if (fd->key->ext.cnid != fd->search_key->ext.cnid ||
168 fd->key->ext.fork_type != fd->search_key->ext.fork_type)
169 return -ENOENT;
170 if (fd->entrylength != sizeof(hfsplus_extent_rec))
171 return -EIO;
172 hfs_bnode_read(fd->bnode, extent, fd->entryoffset,
The __hfsplus_ext_cache_extent() calls __hfsplus_ext_read_extent():
res = __hfsplus_ext_read_extent(fd, hip->cached_extents, inode->i_ino,
block, HFSPLUS_IS_RSRC(inode) ?
HFSPLUS_TYPE_RSRC :
HFSPLUS_TYPE_DATA);
And if inode->i_ino could be equal to zero or any non-available CNID,
then hfs_brec_find() could not find the record in the tree. As a result,
fd->key could be compared with fd->search_key. But hfsplus_find_init()
uses kmalloc() for fd->key and fd->search_key allocation:
int hfs_find_init(struct hfs_btree *tree, struct hfs_find_data *fd)
{
<skipped>
ptr = kmalloc(tree->max_key_len * 2 + 4, GFP_KERNEL);
if (!ptr)
return -ENOMEM;
fd->search_key = ptr;
fd->key = ptr + tree->max_key_len + 2;
<skipped>
}
Finally, fd->key is still not initialized if hfs_brec_find()
has found nothing.
This patch changes kmalloc() on kzalloc() in hfs_find_init()
and intializes fd->record, fd->keyoffset, fd->keylength,
fd->entryoffset, fd->entrylength for the case if hfs_brec_find()
has been found nothing in the b-tree node.
Reported-by: syzbot <syzbot+55ad87f38795d6787521@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=55ad87f38795d6787521
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250818225232.126402-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
|
|
hfsplus_bmap_alloc can trigger a crash if a
record offset or length is larger than node_size
[ 15.264282] BUG: KASAN: slab-out-of-bounds in hfsplus_bmap_alloc+0x887/0x8b0
[ 15.265192] Read of size 8 at addr ffff8881085ca188 by task test/183
[ 15.265949]
[ 15.266163] CPU: 0 UID: 0 PID: 183 Comm: test Not tainted 6.17.0-rc2-gc17b750b3ad9 #14 PREEMPT(voluntary)
[ 15.266165] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 15.266167] Call Trace:
[ 15.266168] <TASK>
[ 15.266169] dump_stack_lvl+0x53/0x70
[ 15.266173] print_report+0xd0/0x660
[ 15.266181] kasan_report+0xce/0x100
[ 15.266185] hfsplus_bmap_alloc+0x887/0x8b0
[ 15.266208] hfs_btree_inc_height.isra.0+0xd5/0x7c0
[ 15.266217] hfsplus_brec_insert+0x870/0xb00
[ 15.266222] __hfsplus_ext_write_extent+0x428/0x570
[ 15.266225] __hfsplus_ext_cache_extent+0x5e/0x910
[ 15.266227] hfsplus_ext_read_extent+0x1b2/0x200
[ 15.266233] hfsplus_file_extend+0x5a7/0x1000
[ 15.266237] hfsplus_get_block+0x12b/0x8c0
[ 15.266238] __block_write_begin_int+0x36b/0x12c0
[ 15.266251] block_write_begin+0x77/0x110
[ 15.266252] cont_write_begin+0x428/0x720
[ 15.266259] hfsplus_write_begin+0x51/0x100
[ 15.266262] cont_write_begin+0x272/0x720
[ 15.266270] hfsplus_write_begin+0x51/0x100
[ 15.266274] generic_perform_write+0x321/0x750
[ 15.266285] generic_file_write_iter+0xc3/0x310
[ 15.266289] __kernel_write_iter+0x2fd/0x800
[ 15.266296] dump_user_range+0x2ea/0x910
[ 15.266301] elf_core_dump+0x2a94/0x2ed0
[ 15.266320] vfs_coredump+0x1d85/0x45e0
[ 15.266349] get_signal+0x12e3/0x1990
[ 15.266357] arch_do_signal_or_restart+0x89/0x580
[ 15.266362] irqentry_exit_to_user_mode+0xab/0x110
[ 15.266364] asm_exc_page_fault+0x26/0x30
[ 15.266366] RIP: 0033:0x41bd35
[ 15.266367] Code: bc d1 f3 0f 7f 27 f3 0f 7f 6f 10 f3 0f 7f 77 20 f3 0f 7f 7f 30 49 83 c0 0f 49 29 d0 48 8d 7c 17 31 e9 9f 0b 00 00 66 0f ef c0 <f3> 0f 6f 0e f3 0f 6f 56 10 66 0f 74 c1 66 0f d7 d0 49 83 f8f
[ 15.266369] RSP: 002b:00007ffc9e62d078 EFLAGS: 00010283
[ 15.266371] RAX: 00007ffc9e62d100 RBX: 0000000000000000 RCX: 0000000000000000
[ 15.266372] RDX: 00000000000000e0 RSI: 0000000000000000 RDI: 00007ffc9e62d100
[ 15.266373] RBP: 0000400000000040 R08: 00000000000000e0 R09: 0000000000000000
[ 15.266374] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 15.266375] R13: 0000000000000000 R14: 0000000000000000 R15: 0000400000000000
[ 15.266376] </TASK>
When calling hfsplus_bmap_alloc to allocate a free node, this function
first retrieves the bitmap from header node and map node using node->page
together with the offset and length from hfs_brec_lenoff
```
len = hfs_brec_lenoff(node, 2, &off16);
off = off16;
off += node->page_offset;
pagep = node->page + (off >> PAGE_SHIFT);
data = kmap_local_page(*pagep);
```
However, if the retrieved offset or length is invalid(i.e. exceeds
node_size), the code may end up accessing pages outside the allocated
range for this node.
This patch adds proper validation of both offset and length before use,
preventing out-of-bounds page access. Move is_bnode_offset_valid and
check_and_correct_requested_length to hfsplus_fs.h, as they may be
required by other functions.
Reported-by: syzbot+356aed408415a56543cd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67bcb4a6.050a0220.bbfd1.008f.GAE@google.com/
Signed-off-by: Yang Chenzhi <yang.chenzhi@vivo.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20250818141734.8559-2-yang.chenzhi@vivo.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
|
|
hfsplus_fill_super()
If Catalog File contains corrupted record for the case of
hidden directory's type, regard it as I/O error instead of
Invalid argument.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20250805165905.3390154-1-frank.li@vivo.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
|