Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- Assorted fixes for the OP-TEE based pseudo-EFI variable store
- Fix for an OOB access when looking up the same non-existing efivarfs
entry multiple times in parallel
* tag 'efi-fixes-for-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efivarfs: Fix slab-out-of-bounds in efivarfs_d_compare
efi: stmm: Drop unneeded null pointer check
efi: stmm: Drop unused EFI error from setup_mm_hdr arguments
efi: stmm: Do not return EFI_OUT_OF_RESOURCES on internal errors
efi: stmm: Fix incorrect buffer allocation method
|
|
Pull smb client fixes from Steve French:
- Fix possible refcount leak in compound operations
- Fix remap_file_range() return code mapping, found by generic/157
* tag 'v6.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
fs/smb: Fix inconsistent refcnt update
smb3 client: fix return code mapping of remap_file_range
|
|
Pull xfs fixes from Carlos Maiolino:
"The highlight I'd like to point here is related to the XFS_RT
Kconfig, which has been updated to be enabled by default now if
CONFIG_BLK_DEV_ZONED is enabled.
This also contains a few fixes for zoned devices support in XFS,
specially related to swapon requests in inodes belonging to the zoned
FS.
A null-ptr dereference fix in the xattr data, due to a mishandling of
medium errors generated by block devices is also included"
* tag 'xfs-fixes-6.17-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: do not propagate ENODATA disk errors into xattr code
xfs: reject swapon for inodes on a zoned file system earlier
xfs: kick off inodegc when failing to reserve zoned blocks
xfs: remove xfs_last_used_zone
xfs: Default XFS_RT to Y if CONFIG_BLK_DEV_ZONED is enabled
|
|
Observed on kernel 6.6 (present on master as well):
BUG: KASAN: slab-out-of-bounds in memcmp+0x98/0xd0
Call trace:
kasan_check_range+0xe8/0x190
__asan_loadN+0x1c/0x28
memcmp+0x98/0xd0
efivarfs_d_compare+0x68/0xd8
__d_lookup_rcu_op_compare+0x178/0x218
__d_lookup_rcu+0x1f8/0x228
d_alloc_parallel+0x150/0x648
lookup_open.isra.0+0x5f0/0x8d0
open_last_lookups+0x264/0x828
path_openat+0x130/0x3f8
do_filp_open+0x114/0x248
do_sys_openat2+0x340/0x3c0
__arm64_sys_openat+0x120/0x1a0
If dentry->d_name.len < EFI_VARIABLE_GUID_LEN , 'guid' can become
negative, leadings to oob. The issue can be triggered by parallel
lookups using invalid filename:
T1 T2
lookup_open
->lookup
simple_lookup
d_add
// invalid dentry is added to hash list
lookup_open
d_alloc_parallel
__d_lookup_rcu
__d_lookup_rcu_op_compare
hlist_bl_for_each_entry_rcu
// invalid dentry can be retrieved
->d_compare
efivarfs_d_compare
// oob
Fix it by checking 'guid' before cmp.
Fixes: da27a24383b2 ("efivarfs: guid part of filenames are case-insensitive")
Signed-off-by: Li Nan <linan122@huawei.com>
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
A possible inconsistent update of refcount was identified in `smb2_compound_op`.
Such inconsistent update could lead to possible resource leaks.
Why it is a possible bug:
1. In the comment section of the function, it clearly states that the
reference to `cfile` should be dropped after calling this function.
2. Every control flow path would check and drop the reference to
`cfile`, except the patched one.
3. Existing callers would not handle refcount update of `cfile` if
-ENOMEM is returned.
To fix the bug, an extra goto label "out" is added, to make sure that the
cleanup logic would always be respected. As the problem is caused by the
allocation failure of `vars`, the cleanup logic between label "finished"
and "out" can be safely ignored. According to the definition of function
`is_replayable_error`, the error code of "-ENOMEM" is not recoverable.
Therefore, the replay logic also gets ignored.
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
ENODATA (aka ENOATTR) has a very specific meaning in the xfs xattr code;
namely, that the requested attribute name could not be found.
However, a medium error from disk may also return ENODATA. At best,
this medium error may escape to userspace as "attribute not found"
when in fact it's an IO (disk) error.
At worst, we may oops in xfs_attr_leaf_get() when we do:
error = xfs_attr_leaf_hasname(args, &bp);
if (error == -ENOATTR) {
xfs_trans_brelse(args->trans, bp);
return error;
}
because an ENODATA/ENOATTR error from disk leaves us with a null bp,
and the xfs_trans_brelse will then null-deref it.
As discussed on the list, we really need to modify the lower level
IO functions to trap all disk errors and ensure that we don't let
unique errors like this leak up into higher xfs functions - many
like this should be remapped to EIO.
However, this patch directly addresses a reported bug in the xattr
code, and should be safe to backport to stable kernels. A larger-scope
patch to handle more unique errors at lower levels can follow later.
(Note, prior to 07120f1abdff we did not oops, but we did return the
wrong error code to userspace.)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines")
Cc: stable@vger.kernel.org # v5.9+
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
We were returning -EOPNOTSUPP for various remap_file_range cases
but for some of these the copy_file_range_syscall() requires -EINVAL
to be returned (e.g. where source and target file ranges overlap when
source and target are the same file). This fixes xfstest generic/157
which was expecting EINVAL for that (and also e.g. for when the src
offset is beyond end of file).
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Fix swapped handling of lru_gen and lru_gen_full debugfs files in
vmscan
- Fix debugfs mount options (uid, gid, mode) being silently ignored
- Fix leak of devres action in the unwind path of Devres::new()
- Documentation:
- Expand and fix documentation of (outdated) Device, DeviceContext
and generic driver infrastructure
- Fix C header link of faux device abstractions
- Clarify expected interaction with the security team
- Smooth text flow in the security bug reporting process
documentation
* tag 'driver-core-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
Documentation: smooth the text flow in the security bug reporting process
Documentation: clarify the expected collaboration with security bugs reporters
debugfs: fix mount options not being applied
rust: devres: fix leaking call to devm_add_action()
rust: faux: fix C header link
driver: rust: expand documentation for driver infrastructure
device: rust: expand documentation for Device
device: rust: expand documentation for DeviceContext
mm/vmscan: fix inverted polarity in lru_gen_seq_show()
|
|
Pull smb client fix from Steve French:
"Fix for netfs smb3 oops"
* tag '6.17-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix oops due to uninitialised variable
|
|
Pull NFS client fix from Trond Myklebust:
- NFS: Fix a data corrupting race when updating an existing write
* tag 'nfs-for-6.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a race when updating an existing write
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"20 hotfixes. 10 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels. 17 of these
fixes are for MM.
As usual, singletons all over the place, apart from a three-patch
series of KHO followup work from Pasha which is actually also a bunch
of singletons"
* tag 'mm-hotfixes-stable-2025-08-21-18-17' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/mremap: fix WARN with uffd that has remap events disabled
mm/damon/sysfs-schemes: put damos dests dir after removing its files
mm/migrate: fix NULL movable_ops if CONFIG_ZSMALLOC=m
mm/damon/core: fix damos_commit_filter not changing allow
mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
MAINTAINERS: mark MGLRU as maintained
mm: rust: add page.rs to MEMORY MANAGEMENT - RUST
iov_iter: iterate_folioq: fix handling of offset >= folio size
selftests/damon: fix selftests by installing drgn related script
.mailmap: add entry for Easwar Hariharan
selftests/mm: add test for invalid multi VMA operations
mm/mremap: catch invalid multi VMA moves earlier
mm/mremap: allow multi-VMA move when filesystem uses thp_get_unmapped_area
mm/damon/core: fix commit_ops_filters by using correct nth function
tools/testing: add linux/args.h header and fix radix, VMA tests
mm/debug_vm_pgtable: clear page table entries at destroy_args()
squashfs: fix memory leak in squashfs_fill_super
kho: warn if KHO is disabled due to an error
kho: mm: don't allow deferred struct page with KHO
kho: init new_physxa->phys_bits to fix lockdep
|
|
Pull smb server fixes from Steve French:
- fix refcount issue that can cause memory leak
- rate limit repeated connections from IPv6, not just IPv4 addresses
- fix potential null pointer access of smb direct work queue
* tag '6.17-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix refcount leak causing resource not released
ksmbd: extend the connection limiting mechanism to support IPv6
smb: server: split ksmbd_rdma_stop_listening() out of ksmbd_rdma_destroy()
|
|
If sb_min_blocksize returns 0, squashfs_fill_super exits without freeing
allocated memory (sb->s_fs_info).
Fix this by moving the call to sb_min_blocksize to before memory is
allocated.
Link: https://lkml.kernel.org/r/20250811223740.110392-1-phillip@squashfs.org.uk
Fixes: 734aa85390ea ("Squashfs: check return result of sb_min_blocksize")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: Scott GUO <scottzhguo@tencent.com>
Closes: https://lore.kernel.org/all/20250811061921.3807353-1-scott_gzh@163.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After nfs_lock_and_join_requests() tests for whether the request is
still attached to the mapping, nothing prevents a call to
nfs_inode_remove_request() from succeeding until we actually lock the
page group.
The reason is that whoever called nfs_inode_remove_request() doesn't
necessarily have a lock on the page group head.
So in order to avoid races, let's take the page group lock earlier in
nfs_lock_and_join_requests(), and hold it across the removal of the
request in nfs_inode_remove_request().
Reported-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Joe Quanaim <jdq@meta.com>
Tested-by: Andrew Steffen <aksteffen@meta.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes: bd37d6fce184 ("NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request()")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Pull mount fixes from Al Viro:
"Fixes for several recent mount-related regressions"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
change_mnt_propagation(): calculate propagation source only if we'll need it
use uniform permission checks for all mount propagation changes
propagate_umount(): only surviving overmounts should be reparented
fix the softlockups in attach_recursive_mnt()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fixes from Amir Goldstein:
"Fixes for two fallouts from Neil's directory locking changes"
* tag 'ovl-fixes-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fix possible double unlink
ovl: use I_MUTEX_PARENT when locking parent in ovl_create_temp()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix two memory leaks in pidfs
- Prevent changing the idmapping of an already idmapped mount without
OPEN_TREE_CLONE through open_tree_attr()
- Don't fail listing extended attributes in kernfs when no extended
attributes are set
- Fix the return value in coredump_parse()
- Fix the error handling for unbuffered writes in netfs
- Fix broken data integrity guarantees for O_SYNC writes via iomap
- Fix UAF in __mark_inode_dirty()
- Keep inode->i_blkbits constant in fuse
- Fix coredump selftests
- Fix get_unused_fd_flags() usage in do_handle_open()
- Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES
- Fix use-after-free in bh_read()
- Fix incorrect lflags value in the move_mount() syscall
* tag 'vfs-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
signal: Fix memory leak for PIDFD_SELF* sentinels
kernfs: don't fail listing extended attributes
coredump: Fix return value in coredump_parse()
fs/buffer: fix use-after-free when call bh_read() helper
pidfs: Fix memory leak in pidfd_info()
netfs: Fix unbuffered write error handling
fhandle: do_handle_open() should get FD with user flags
module: Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES
fs: fix incorrect lflags value in the move_mount syscall
selftests/coredump: Remove the read() that fails the test
fuse: keep inode->i_blkbits constant
iomap: Fix broken data integrity guarantees for O_SYNC writes
selftests/mount_setattr: add smoke tests for open_tree_attr(2) bug
open_tree_attr: do not allow id-mapping changes without OPEN_TREE_CLONE
fs: writeback: fix use-after-free in __mark_inode_dirty()
|
|
Fix smb3_init_transform_rq() to initialise buffer to NULL before calling
netfs_alloc_folioq_buffer() as netfs assumes it can append to the buffer it
is given. Setting it to NULL means it should start a fresh buffer, but the
value is currently undefined.
Fixes: a2906d3316fc ("cifs: Switch crypto buffer to use a folio_queue rather than an xarray")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We only need it when mount in question was sending events downstream (then
recepients need to switch to new master) or the mount is being turned into
slave (then we need a new master for it).
That wouldn't be a big deal, except that it causes quite a bit of work
when umount_tree() is taking a large peer group out. Adding a trivial
"don't bother calling propagation_source() unless we are going to use
its results" logics improves the things quite a bit.
We are still doing unnecessary work on bulk removals from propagation graph,
but the full solution for that will have to wait for the next merge window.
Fixes: 955336e204ab "do_make_slave(): choose new master sanely"
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
do_change_type() and do_set_group() are operating on different
aspects of the same thing - propagation graph. The latter
asks for mounts involved to be mounted in namespace(s) the caller
has CAP_SYS_ADMIN for. The former is a mess - originally it
didn't even check that mount *is* mounted. That got fixed,
but the resulting check turns out to be too strict for userland -
in effect, we check that mount is in our namespace, having already
checked that we have CAP_SYS_ADMIN there.
What we really need (in both cases) is
* only touch mounts that are mounted. That's a must-have
constraint - data corruption happens if it get violated.
* don't allow to mess with a namespace unless you already
have enough permissions to do so (i.e. CAP_SYS_ADMIN in its userns).
That's an equivalent of what do_set_group() does; let's extract that
into a helper (may_change_propagation()) and use it in both
do_set_group() and do_change_type().
Fixes: 12f147ddd6de "do_change_type(): refuse to operate on unmounted/not ours mounts"
Acked-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... as the comments in reparent() clearly say. As it is, we reparent
*all* overmounts of the mounts being taken out, including those that
are taken out themselves. It's not only a potentially massive slowdown
(on a pathological setup we might end up with O(N^2) time for N mounts
being kicked out), it can end up with incorrect ->overmount in the
surviving mounts.
Fixes: f0d0ba19985d "Rewrite of propagate_umount()"
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
In case when we mounting something on top of a large stack of overmounts,
all of them being peers of each other, we get quadratic time by the
depth of overmount stack. Easily fixed by doing commit_tree() before
reparenting the overmount; simplifies commit_tree() as well - it doesn't
need to skip the already mounted stuff that had been reparented on top
of the new mounts.
Since we are holding mount_lock through both reparenting and call of
commit_tree(), the order does not matter from the mount hash point
of view.
Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com>
Tested-by: "Lai, Yi" <yi1.lai@linux.intel.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 663206854f02 "copy_tree(): don't link the mounts via mnt_list"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
No point in going down into the iomap mapping loop when we know it
will be rejected.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
XFS processes truncating unlinked inodes asynchronously and thus the free
space pool only sees them with a delay. The non-zoned write path thus
calls into inodegc to accelerate this processing before failing an
allocation due the lack of free blocks. Do the same for the zoned space
reservation.
Fixes: 0bb2193056b5 ("xfs: add support for zoned space reservations")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
This was my first attempt at caching the last used zone. But it turns out
for O_DIRECT or RWF_DONTCACHE that operate concurrently or in very short
sequence, the bmap btree does not record a written extent yet, so it fails.
Because it then still finds the last written zone it can lead to a weird
ping-pong around a few zones with writers seeing different values.
Remove it entirely as the later added xfs_cached_zone actually does a
much better job enforcing the locality as the zone is associated with the
inode in the MRU cache as soon as the zone is selected.
Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
XFS support for zoned block devices requires the realtime subvolume
support (XFS_RT) to be enabled. Change the default configuration value
of XFS_RT from N to CONFIG_BLK_DEV_ZONED to align with this requirement.
This change still allows the user to disable XFS_RT if this feature is
not desired for the user use case.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Userspace doesn't expect a failure to list extended attributes:
$ ls -lA /sys/
ls: /sys/: No data available
ls: /sys/kernel: No data available
ls: /sys/power: No data available
ls: /sys/class: No data available
ls: /sys/devices: No data available
ls: /sys/dev: No data available
ls: /sys/hypervisor: No data available
ls: /sys/fs: No data available
ls: /sys/bus: No data available
ls: /sys/firmware: No data available
ls: /sys/block: No data available
ls: /sys/module: No data available
total 0
drwxr-xr-x 2 root root 0 Jan 1 1970 block
drwxr-xr-x 52 root root 0 Jan 1 1970 bus
drwxr-xr-x 88 root root 0 Jan 1 1970 class
drwxr-xr-x 4 root root 0 Jan 1 1970 dev
drwxr-xr-x 11 root root 0 Jan 1 1970 devices
drwxr-xr-x 3 root root 0 Jan 1 1970 firmware
drwxr-xr-x 10 root root 0 Jan 1 1970 fs
drwxr-xr-x 2 root root 0 Jul 2 09:43 hypervisor
drwxr-xr-x 14 root root 0 Jan 1 1970 kernel
drwxr-xr-x 251 root root 0 Jan 1 1970 module
drwxr-xr-x 3 root root 0 Jul 2 09:43 power
Fix it by simply reporting success when no extended attributes are
available instead of reporting ENODATA.
Link: https://lore.kernel.org/78b13bcdae82ade95e88f315682966051f461dde.camel@linaro.org
Fixes: d1f4e9026007 ("kernfs: remove iattr_mutex") # mainline only
Reported-by: André Draszik <andre.draszik@linaro.org>
Link: https://lore.kernel.org/20250819-ahndung-abgaben-524a535f8101@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The coredump_parse() function is bool type. It should return true on
success and false on failure. The cn_printf() returns zero on success
or negative error codes. This mismatch means that when "return err;"
here, it is treated as success instead of failure. Change it to return
false instead.
Fixes: a5715af549b2 ("coredump: make coredump_parse() return bool")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/aKRGu14w5vPSZLgv@stanley.mountain
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
There's issue as follows:
BUG: KASAN: stack-out-of-bounds in end_buffer_read_sync+0xe3/0x110
Read of size 8 at addr ffffc9000168f7f8 by task swapper/3/0
CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.16.0-862.14.0.6.x86_64
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_address_description.constprop.0+0x2c/0x390
print_report+0xb4/0x270
kasan_report+0xb8/0xf0
end_buffer_read_sync+0xe3/0x110
end_bio_bh_io_sync+0x56/0x80
blk_update_request+0x30a/0x720
scsi_end_request+0x51/0x2b0
scsi_io_completion+0xe3/0x480
? scsi_device_unbusy+0x11e/0x160
blk_complete_reqs+0x7b/0x90
handle_softirqs+0xef/0x370
irq_exit_rcu+0xa5/0xd0
sysvec_apic_timer_interrupt+0x6e/0x90
</IRQ>
Above issue happens when do ntfs3 filesystem mount, issue may happens
as follows:
mount IRQ
ntfs_fill_super
read_cache_page
do_read_cache_folio
filemap_read_folio
mpage_read_folio
do_mpage_readpage
ntfs_get_block_vbo
bh_read
submit_bh
wait_on_buffer(bh);
blk_complete_reqs
scsi_io_completion
scsi_end_request
blk_update_request
end_bio_bh_io_sync
end_buffer_read_sync
__end_buffer_read_notouch
unlock_buffer
wait_on_buffer(bh);--> return will return to caller
put_bh
--> trigger stack-out-of-bounds
In the mpage_read_folio() function, the stack variable 'map_bh' is
passed to ntfs_get_block_vbo(). Once unlock_buffer() unlocks and
wait_on_buffer() returns to continue processing, the stack variable
is likely to be reclaimed. Consequently, during the end_buffer_read_sync()
process, calling put_bh() may result in stack overrun.
If the bh is not allocated on the stack, it belongs to a folio. Freeing
a buffer head which belongs to a folio is done by drop_buffers() which
will fail to free buffers which are still locked. So it is safe to call
put_bh() before __end_buffer_read_notouch().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/20250811141830.343774-1-yebin@huaweicloud.com
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Several zoned mode fixes, mount option printing fixups, folio state
handling fixes and one log replay fix.
- zoned mode:
- zone activation and finish fixes
- block group reservation fixes
- mount option fixes:
- bring back printing of mount options with key=value that got
accidentally dropped during mount option parsing in 6.8
- fix inverse logic or typos when printing nodatasum/nodatacow
- folio status fixes:
- writeback fixes in zoned mode
- properly reset dirty/writeback if submission fails
- properly handle TOWRITE xarray mark/tag
- do not set mtime/ctime to current time when unlinking for log
replay"
* tag 'for-6.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix printing of mount info messages for NODATACOW/NODATASUM
btrfs: restore mount option info messages during mount
btrfs: fix incorrect log message for nobarrier mount option
btrfs: fix buffer index in wait_eb_writebacks()
btrfs: subpage: keep TOWRITE tag until folio is cleaned
btrfs: clear TAG_TOWRITE from buffer tree when submitting a tree block
btrfs: do not set mtime/ctime to current time when unlinking for log replay
btrfs: clear block dirty if btrfs_writepage_cow_fixup() failed
btrfs: clear block dirty if submit_one_sector() failed
btrfs: zoned: limit active zones to max_open_zones
btrfs: zoned: fix write time activation failure for metadata block group
btrfs: zoned: fix data relocation block group reservation
btrfs: zoned: skip ZONE FINISH of conventional zones
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
- Fix fast commit checks for file systems with ea_inode enabled
- Don't drop the i_version mount option on a remount
- Fix FIEMAP reporting when there are holes in a bigalloc file system
- Don't fail when mounting read-only when there are inodes in the
orphan file
- Fix hole length overflow for indirect mapped files on file systems
with an 8k or 16k block file system
* tag 'ext4_for_linus-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: prevent softlockup in jbd2_log_do_checkpoint()
ext4: fix incorrect function name in comment
ext4: use kmalloc_array() for array space allocation
ext4: fix hole length calculation overflow in non-extent inodes
ext4: don't try to clear the orphan_present feature block device is r/o
ext4: fix reserved gdt blocks handling in fsmap
ext4: fix fsmap end of range reporting with bigalloc
ext4: remove redundant __GFP_NOWARN
ext4: fix unused variable warning in ext4_init_new_dir
ext4: remove useless if check
ext4: check fast symlink for ea_inode correctly
ext4: preserve SB_I_VERSION on remount
ext4: show the default enabled i_version option
|
|
commit 9d23967b18c6 ("ovl: simplify an error path in
ovl_copy_up_workdir()") introduced the helper ovl_cleanup_unlocked(),
which is later used in several following patches to re-acquire the parent
inode lock and unlink a dentry that was earlier found using lookup.
This helper was eventually renamed to ovl_cleanup().
The helper ovl_parent_lock() is used to re-acquire the parent inode lock.
After acquiring the parent inode lock, the helper verifies that the
dentry has not since been moved to another parent, but it failed to
verify that the dentry wasn't unlinked from the parent.
This means that now every call to ovl_cleanup() could potentially
race with another thread, unlinking the dentry to be cleaned up
underneath overlayfs and trigger a vfs assertion.
Reported-by: syzbot+ec9fab8b7f0386b98a17@syzkaller.appspotmail.com
Tested-by: syzbot+ec9fab8b7f0386b98a17@syzkaller.appspotmail.com
Fixes: 9d23967b18c6 ("ovl: simplify an error path in ovl_copy_up_workdir()")
Suggested-by: NeilBrown <neil@brown.name>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
ovl_create_temp() treats "workdir" as a parent in which it creates an
object so it should use I_MUTEX_PARENT.
Prior to the commit identified below the lock was taken by the caller
which sometimes used I_MUTEX_PARENT and sometimes used I_MUTEX_NORMAL.
The use of I_MUTEX_NORMAL was incorrect but unfortunately copied into
ovl_create_temp().
Note to backporters: This patch only applies after the last Fixes given
below (post v6.16). To fix the bug in v6.7 and later the
inode_lock() call in ovl_copy_up_workdir() needs to nest using
I_MUTEX_PARENT.
Link: https://lore.kernel.org/all/67a72070.050a0220.3d72c.0022.GAE@google.com/
Cc: stable@vger.kernel.org
Reported-by: syzbot+7836a68852a10ec3d790@syzkaller.appspotmail.com
Tested-by: syzbot+7836a68852a10ec3d790@syzkaller.appspotmail.com
Fixes: c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held")
Fixes: d2c995581c7c ("ovl: Call ovl_create_temp() without lock held.")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
When ksmbd_conn_releasing(opinfo->conn) returns true,the refcount was not
decremented properly, causing a refcount leak that prevents the count from
reaching zero and the memory from being released.
Cc: stable@vger.kernel.org
Signed-off-by: Ziyan Xu <ziyan@securitygossip.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Update the connection tracking logic to handle both IPv4 and IPv6
address families.
Cc: stable@vger.kernel.org
Fixes: e6bb91939740 ("ksmbd: limit repeated connections from clients with the same IP")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We can't call destroy_workqueue(smb_direct_wq); before stop_sessions()!
Otherwise already existing connections try to use smb_direct_wq as
a NULL pointer.
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Mount options (uid, gid, mode) are silently ignored when debugfs is
mounted. This is a regression introduced during the conversion to the
new mount API.
When the mount API conversion was done, the parsed options were never
applied to the superblock when it was reused. As a result, the mount
options were ignored when debugfs was mounted.
Fix this by following the same pattern as the tracefs fix in commit
e4d32142d1de ("tracing: Fix tracefs mount options"). Call
debugfs_reconfigure() in debugfs_get_tree() to apply the mount options
to the superblock after it has been created or reused.
As an example, with the bug the "mode" mount option is ignored:
$ mount -o mode=0666 -t debugfs debugfs /tmp/debugfs_test
$ mount | grep debugfs_test
debugfs on /tmp/debugfs_test type debugfs (rw,relatime)
$ ls -ld /tmp/debugfs_test
drwx------ 25 root root 0 Aug 4 14:16 /tmp/debugfs_test
With the fix applied, it works as expected:
$ mount -o mode=0666 -t debugfs debugfs /tmp/debugfs_test
$ mount | grep debugfs_test
debugfs on /tmp/debugfs_test type debugfs (rw,relatime,mode=666)
$ ls -ld /tmp/debugfs_test
drw-rw-rw- 37 root root 0 Aug 2 17:28 /tmp/debugfs_test
Fixes: a20971c18752 ("vfs: Convert debugfs to use the new mount API")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220406
Cc: stable <stable@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
Link: https://lore.kernel.org/r/20250816-debugfs-mount-opts-v3-1-d271dad57b5b@posteo.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull xfs fixes from Carlos Maiolino:
- Fix an assert trigger introduced during the merge window
- Prevent atomic writes to be used with DAX
- Prevent users from using the max_atomic_write mount option without
reflink, as atomic writes > 1block are not supported without reflink
- Fix a null-pointer-deref in a tracepoint
* tag 'xfs-fixes-6.17-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: split xfs_zone_record_blocks
xfs: fix scrub trace with null pointer in quotacheck
xfs: reject max_atomic_write mount option for no reflink
xfs: disallow atomic writes on DAX
fs/dax: Reject IOCB_ATOMIC in dax_iomap_rw()
xfs: remove XFS_IBULK_SAME_AG
xfs: fully decouple XFS_IBULK* flags from XFS_IWALK* flags
xfs: fix frozen file system assert in xfs_trans_alloc
|
|
After running the program 'ioctl_pidfd03' of Linux Test Project (LTP) or
the program 'pidfd_info_test' in 'tools/testing/selftests/pidfd' of the
kernel source, kmemleak reports the following memory leaks:
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xff110020e5988000 (size 8216):
comm "ioctl_pidfd03", pid 10853, jiffies 4294800031
hex dump (first 32 bytes):
02 40 00 00 00 00 00 00 10 00 00 00 00 00 00 00 .@..............
00 00 00 00 af 01 00 00 80 00 00 00 00 00 00 00 ................
backtrace (crc 69483047):
kmem_cache_alloc_node_noprof+0x2fb/0x410
copy_process+0x178/0x1740
kernel_clone+0x99/0x3b0
__do_sys_clone3+0xbe/0x100
do_syscall_64+0x7b/0x2c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
unreferenced object 0xff11002097b70000 (size 8216):
comm "pidfd_info_test", pid 11840, jiffies 4294889165
hex dump (first 32 bytes):
06 40 00 00 00 00 00 00 10 00 00 00 00 00 00 00 .@..............
00 00 00 00 b5 00 00 00 80 00 00 00 00 00 00 00 ................
backtrace (crc a6286bb7):
kmem_cache_alloc_node_noprof+0x2fb/0x410
copy_process+0x178/0x1740
kernel_clone+0x99/0x3b0
__do_sys_clone3+0xbe/0x100
do_syscall_64+0x7b/0x2c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
The leak occurs because pidfd_info() obtains a task_struct via
get_pid_task() but never calls put_task_struct() to drop the reference,
leaving task->usage unbalanced.
Fix the issue by adding '__free(put_task) = NULL' to the local variable
'task', ensuring that put_task_struct() is automatically invoked when
the variable goes out of scope.
Fixes: 7477d7dce48a ("pidfs: allow to retrieve exit information")
Signed-off-by: Adrian Huang (Lenovo) <adrianhuang0701@gmail.com>
Link: https://lore.kernel.org/20250814094453.15232-1-adrianhuang0701@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If all the subrequests in an unbuffered write stream fail, the subrequest
collector doesn't update the stream->transferred value and it retains its
initial LONG_MAX value. Unfortunately, if all active streams fail, then we
take the smallest value of { LONG_MAX, LONG_MAX, ... } as the value to set
in wreq->transferred - which is then returned from ->write_iter().
LONG_MAX was chosen as the initial value so that all the streams can be
quickly assessed by taking the smallest value of all stream->transferred -
but this only works if we've set any of them.
Fix this by adding a flag to indicate whether the value in
stream->transferred is valid and checking that when we integrate the
values. stream->transferred can then be initialised to zero.
This was found by running the generic/750 xfstest against cifs with
cache=none. It splices data to the target file. Once (if) it has used up
all the available scratch space, the writes start failing with ENOSPC.
This causes ->write_iter() to fail. However, it was returning
wreq->transferred, i.e. LONG_MAX, rather than an error (because it thought
the amount transferred was non-zero) and iter_file_splice_write() would
then try to clean up that amount of pipe bufferage - leading to an oops
when it overran. The kernel log showed:
CIFS: VFS: Send error in write = -28
followed by:
BUG: kernel NULL pointer dereference, address: 0000000000000008
with:
RIP: 0010:iter_file_splice_write+0x3a4/0x520
do_splice+0x197/0x4e0
or:
RIP: 0010:pipe_buf_release (include/linux/pipe_fs_i.h:282)
iter_file_splice_write (fs/splice.c:755)
Also put a warning check into splice to announce if ->write_iter() returned
that it had written more than it was asked to.
Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Reported-by: Xiaoli Feng <fengxiaoli0714@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220445
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/915443.1755207950@warthog.procyon.org.uk
cc: Paulo Alcantara <pc@manguebit.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In f07c7cc4684a, do_handle_open() was switched to use the automatic
cleanup method for getting a FD. In that change it was also switched
to pass O_CLOEXEC unconditionally to get_unused_fd_flags() instead
of passing the user-specified flags.
I don't see anything in that commit description that indicates this was
intentional, so I am assuming it was an oversight.
With this fix, the FD will again be opened with, or without, O_CLOEXEC
according to what the user requested.
Fixes: f07c7cc4684a ("fhandle: simplify error handling")
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Link: https://lore.kernel.org/20250814235431.995876-4-tahbertschinger@gmail.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pull smb client fixes from Steve French:
- Fix unlink race and rename races
- SMB3.1.1 compression fix
- Avoid unneeded strlen calls in cifs_get_spnego_key
- Fix slab out of bounds in parse_server_interfaces()
- Fix mid leak and server buffer leak
- smbdirect send error path fix
- update internal version #
- Fix unneeded response time update in negotiate protocol
* tag '6.17-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: remove redundant lstrp update in negotiate protocol
cifs: update internal version number
smb: client: don't wait for info->send_pending == 0 on error
smb: client: fix mid_q_entry memleak leak with per-mid locking
smb3: fix for slab out of bounds on mount to ksmbd
cifs: avoid extra calls to strlen() in cifs_get_spnego_key()
cifs: Fix collect_sample() to handle any iterator type
smb: client: fix race with concurrent opens in rename(2)
smb: client: fix race with concurrent opens in unlink(2)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Align FSDAX enablement among multiple devices
- Fix EROFS_FS_ZIP_ACCEL build dependency again to prevent forcing
CRYPTO{,_DEFLATE}=y even if EROFS=m
- Fix atomic context detection to properly launch kworkers on demand
- Fix block count statistics for 48-bit addressing support
* tag 'erofs-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix block count report when 48-bit layout is on
erofs: fix atomic context detection when !CONFIG_DEBUG_LOCK_ALLOC
erofs: Do not select tristate symbols from bool symbols
erofs: Fallback to normal access if DAX is not supported on extra device
|
|
Both jbd2_log_do_checkpoint() and jbd2_journal_shrink_checkpoint_list()
periodically release j_list_lock after processing a batch of buffers to
avoid long hold times on the j_list_lock. However, since both functions
contend for j_list_lock, the combined time spent waiting and processing
can be significant.
jbd2_journal_shrink_checkpoint_list() explicitly calls cond_resched() when
need_resched() is true to avoid softlockups during prolonged operations.
But jbd2_log_do_checkpoint() only exits its loop when need_resched() is
true, relying on potentially sleeping functions like __flush_batch() or
wait_on_buffer() to trigger rescheduling. If those functions do not sleep,
the kernel may hit a softlockup.
watchdog: BUG: soft lockup - CPU#3 stuck for 156s! [kworker/u129:2:373]
CPU: 3 PID: 373 Comm: kworker/u129:2 Kdump: loaded Not tainted 6.6.0+ #10
Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.27 06/13/2017
Workqueue: writeback wb_workfn (flush-7:2)
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : native_queued_spin_lock_slowpath+0x358/0x418
lr : jbd2_log_do_checkpoint+0x31c/0x438 [jbd2]
Call trace:
native_queued_spin_lock_slowpath+0x358/0x418
jbd2_log_do_checkpoint+0x31c/0x438 [jbd2]
__jbd2_log_wait_for_space+0xfc/0x2f8 [jbd2]
add_transaction_credits+0x3bc/0x418 [jbd2]
start_this_handle+0xf8/0x560 [jbd2]
jbd2__journal_start+0x118/0x228 [jbd2]
__ext4_journal_start_sb+0x110/0x188 [ext4]
ext4_do_writepages+0x3dc/0x740 [ext4]
ext4_writepages+0xa4/0x190 [ext4]
do_writepages+0x94/0x228
__writeback_single_inode+0x48/0x318
writeback_sb_inodes+0x204/0x590
__writeback_inodes_wb+0x54/0xf8
wb_writeback+0x2cc/0x3d8
wb_do_writeback+0x2e0/0x2f8
wb_workfn+0x80/0x2a8
process_one_work+0x178/0x3e8
worker_thread+0x234/0x3b8
kthread+0xf0/0x108
ret_from_fork+0x10/0x20
So explicitly call cond_resched() in jbd2_log_do_checkpoint() to avoid
softlockup.
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://patch.msgid.link/20250812063752.912130-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Since commit 6b730a405037 “ext4: hoist ext4_block_write_begin and
replace the __block_write_begin”, the comment should be updated
accordingly from "__block_write_begin" to "ext4_block_write_begin".
Fixes: 6b730a405037 (“ext4: hoist ext4_block_write_begin and replace...")
Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://patch.msgid.link/20250812021709.1120716-1-liubaolin12138@163.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Commit 34331d7beed7 ("smb: client: fix first command failure during
re-negotiation") addressed a race condition by updating lstrp before
entering negotiate state. However, this approach may have some unintended
side effects.
The lstrp field is documented as "when we got last response from this
server", and updating it before actually receiving a server response
could potentially affect other mechanisms that rely on this timestamp.
For example, the SMB echo detection logic also uses lstrp as a reference
point. In scenarios with frequent user operations during reconnect states,
the repeated calls to cifs_negotiate_protocol() might continuously
update lstrp, which could interfere with the echo detection timing.
Additionally, commit 266b5d02e14f ("smb: client: fix race condition in
negotiate timeout by using more precise timing") introduced a dedicated
neg_start field specifically for tracking negotiate start time. This
provides a more precise solution for the original race condition while
preserving the intended semantics of lstrp.
Since the race condition is now properly handled by the neg_start
mechanism, the lstrp update in cifs_negotiate_protocol() is no longer
necessary and can be safely removed.
Fixes: 266b5d02e14f ("smb: client: fix race condition in negotiate timeout by using more precise timing")
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
to 2.56
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We already called ib_drain_qp() before and that makes sure
send_done() was called with IB_WC_WR_FLUSH_ERR, but
didn't called atomic_dec_and_test(&sc->send_io.pending.count)
So we may never reach the info->send_pending == 0 condition.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 5349ae5e05fa ("smb: client: let send_done() cleanup before calling smbd_disconnect_rdma_connection()")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This is step 4/4 of a patch series to fix mid_q_entry memory leaks
caused by race conditions in callback execution.
In compound_send_recv(), when wait_for_response() is interrupted by
signals, the code attempts to cancel pending requests by changing
their callbacks to cifs_cancelled_callback. However, there's a race
condition between signal interruption and network response processing
that causes both mid_q_entry and server buffer leaks:
```
User foreground process cifsd
cifs_readdir
open_cached_dir
cifs_send_recv
compound_send_recv
smb2_setup_request
smb2_mid_entry_alloc
smb2_get_mid_entry
smb2_mid_entry_alloc
mempool_alloc // alloc mid
kref_init(&temp->refcount); // refcount = 1
mid[0]->callback = cifs_compound_callback;
mid[1]->callback = cifs_compound_last_callback;
smb_send_rqst
rc = wait_for_response
wait_event_state TASK_KILLABLE
cifs_demultiplex_thread
allocate_buffers
server->bigbuf = cifs_buf_get()
standard_receive3
->find_mid()
smb2_find_mid
__smb2_find_mid
kref_get(&mid->refcount) // +1
cifs_handle_standard
handle_mid
/* bigbuf will also leak */
mid->resp_buf = server->bigbuf
server->bigbuf = NULL;
dequeue_mid
/* in for loop */
mids[0]->callback
cifs_compound_callback
/* Signal interrupts wait: rc = -ERESTARTSYS */
/* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) *?
midQ[0]->callback = cifs_cancelled_callback;
cancelled_mid[i] = true;
/* The change comes too late */
mid->mid_state = MID_RESPONSE_READY
release_mid // -1
/* cancelled_mid[i] == true causes mid won't be released
in compound_send_recv cleanup */
/* cifs_cancelled_callback won't executed to release mid */
```
The root cause is that there's a race between callback assignment and
execution.
Fix this by introducing per-mid locking:
- Add spinlock_t mid_lock to struct mid_q_entry
- Add mid_execute_callback() for atomic callback execution
- Use mid_lock in cancellation paths to ensure atomicity
This ensures that either the original callback or the cancellation
callback executes atomically, preventing reference count leaks when
requests are interrupted by signals.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220404
Fixes: ee258d79159a ("CIFS: Move credit processing to mid callbacks for SMB3")
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
With KASAN enabled, it is possible to get a slab out of bounds
during mount to ksmbd due to missing check in parse_server_interfaces()
(see below):
BUG: KASAN: slab-out-of-bounds in
parse_server_interfaces+0x14ee/0x1880 [cifs]
Read of size 4 at addr ffff8881433dba98 by task mount/9827
CPU: 5 UID: 0 PID: 9827 Comm: mount Tainted: G
OE 6.16.0-rc2-kasan #2 PREEMPT(voluntary)
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Dell Inc. Precision Tower 3620/0MWYPT,
BIOS 2.13.1 06/14/2019
Call Trace:
<TASK>
dump_stack_lvl+0x9f/0xf0
print_report+0xd1/0x670
__virt_addr_valid+0x22c/0x430
? parse_server_interfaces+0x14ee/0x1880 [cifs]
? kasan_complete_mode_report_info+0x2a/0x1f0
? parse_server_interfaces+0x14ee/0x1880 [cifs]
kasan_report+0xd6/0x110
parse_server_interfaces+0x14ee/0x1880 [cifs]
__asan_report_load_n_noabort+0x13/0x20
parse_server_interfaces+0x14ee/0x1880 [cifs]
? __pfx_parse_server_interfaces+0x10/0x10 [cifs]
? trace_hardirqs_on+0x51/0x60
SMB3_request_interfaces+0x1ad/0x3f0 [cifs]
? __pfx_SMB3_request_interfaces+0x10/0x10 [cifs]
? SMB2_tcon+0x23c/0x15d0 [cifs]
smb3_qfs_tcon+0x173/0x2b0 [cifs]
? __pfx_smb3_qfs_tcon+0x10/0x10 [cifs]
? cifs_get_tcon+0x105d/0x2120 [cifs]
? do_raw_spin_unlock+0x5d/0x200
? cifs_get_tcon+0x105d/0x2120 [cifs]
? __pfx_smb3_qfs_tcon+0x10/0x10 [cifs]
cifs_mount_get_tcon+0x369/0xb90 [cifs]
? dfs_cache_find+0xe7/0x150 [cifs]
dfs_mount_share+0x985/0x2970 [cifs]
? check_path.constprop.0+0x28/0x50
? save_trace+0x54/0x370
? __pfx_dfs_mount_share+0x10/0x10 [cifs]
? __lock_acquire+0xb82/0x2ba0
? __kasan_check_write+0x18/0x20
cifs_mount+0xbc/0x9e0 [cifs]
? __pfx_cifs_mount+0x10/0x10 [cifs]
? do_raw_spin_unlock+0x5d/0x200
? cifs_setup_cifs_sb+0x29d/0x810 [cifs]
cifs_smb3_do_mount+0x263/0x1990 [cifs]
Reported-by: Namjae Jeon <linkinjeon@kernel.org>
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|