| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix potential NULL pointer dereference when replaying tree log after
an error
- release path before initializing extent tree to avoid potential
deadlock when allocating new inode
- on filesystems with block size > page size
- fix potential read out of bounds during encoded read of an inline
extent
- only enforce free space tree if v1 cache is required
- print correct tree id in error message
* tag 'for-6.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: show correct warning if can't read data reloc tree
btrfs: fix NULL pointer dereference in do_abort_log_replay()
btrfs: force free space tree for bs > ps cases
btrfs: only enforce free space tree if v1 cache is required for bs < ps cases
btrfs: release path before initializing extent tree in btrfs_read_locked_inode()
btrfs: avoid access-beyond-folio for bs > ps encoded writes
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Remove incorrect __user annotation from struct xattr_args::value
- Documentation fix: Add missing kernel-doc description for the @isnew
parameter in ilookup5_nowait() to silence Sphinx warnings
- Documentation fix: Fix kernel-doc comment for __start_dirop() - the
function name in the comment was wrong and the @state parameter was
undocumented
- Replace dynamic folio_batch allocation with stack allocation in
iomap_zero_range(). The dynamic allocation was problematic for
ext4-on-iomap work (didn't handle allocation failure properly) and
triggered lockdep complaints. Uses a flag instead to control batch
usage
- Re-add #ifdef guards around PIDFD_GET_<ns-type>_NAMESPACE ioctls.
When a namespace type is disabled, ns->ops is NULL, causes crashes
during inode eviction when closing the fd. The ifdefs were removed in
a recent simplification but are still needed
- Fixe a race where a folio could be unlocked before the trailing zeros
(for EOF within the page) were written
- Split out a dedicated lease_dispose_list() helper since lease code
paths always know they're disposing of leases. Removes unnecessary
runtime flag checks and prepares for upcoming lease_manager
enhancements
- Fix userland delegation requests succeeding despite conflicting
opens. Previously, FL_LAYOUT and FL_DELEG leases bypassed conflict
checks (a hack for nfsd). Adds new ->lm_open_conflict() lease_manager
operation so userland delegations get proper conflict checking while
nfsd can continue its own conflict handling
- Fix LOOKUP_CACHED path lookups incorrectly falling through to the
slow path. After legitimize_links() calls were conditionally elided,
the routine would always fail with LOOKUP_CACHED regardless of
whether there were any links. Now the flag is checked at the two
callsites before calling legitimize_links()
- Fix bug in media fd allocation in media_request_alloc()
- Fix mismatched API calls in ecryptfs_mknod(): was calling
end_removing() instead of end_creating() after
ecryptfs_start_creating_dentry()
- Fix dentry reference count leak in ecryptfs_mkdir(): a dget() of the
lower parent dir was added but never dput()'d, causing BUG during
lower filesystem unmount due to the still-in-use dentry
* tag 'vfs-6.19-rc5.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
pidfs: protect PIDFD_GET_* ioctls() via ifdef
ecryptfs: Release lower parent dentry after creating dir
ecryptfs: Fix improper mknod pairing of start_creating()/end_removing()
get rid of bogus __user in struct xattr_args::value
VFS: fix __start_dirop() kernel-doc warnings
fs: Describe @isnew parameter in ilookup5_nowait()
fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED
netfs: Fix early read unlock of page with EOF in middle
filelock: allow lease_managers to dictate what qualifies as a conflict
filelock: add lease_dispose_list() helper
iomap: replace folio_batch allocation with stack allocation
media: mc: fix potential use-after-free in media_request_alloc()
|
|
We originally protected PIDFD_GET_<ns-type>_NAMESPACE ioctls() through
ifdefs and recent rework made it possible to drop them. There was an
oversight though. When the relevant namespace is turned off ns->ops will
be NULL so even though opening a file descriptor is perfectly legitimate
it would fail during inode eviction when the file was closed.
The simple fix would be to check ns->ops for NULL and continue allow to
retrieve namespace fds from pidfds but we don't allow retrieving them
when the relevant namespace type is turned off. So keep the
simplification but add the ifdefs back in.
Link: https://lore.kernel.org/20251222214907.GA189632@quark
Link: https://patch.msgid.link/20251224-ununterbrochen-gagen-ea949b83f8f2@brauner
Fixes: a71e4f103aed ("pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls")
Tested-by: Brendan Jackman <jackmanb@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"A set of NFSD fixes for stable that arrived after the merge window:
- Remove an invalid NFS status code
- Fix an fstests failure when using pNFS
- Fix a UAF in v4_end_grace()
- Fix the administrative interface used to revoke NFSv4 state
- Fix a memory leak reported by syzbot"
* tag 'nfsd-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: net ref data still needs to be freed even if net hasn't startup
nfsd: check that server is running in unlock_filesystem
nfsd: use correct loop termination in nfsd4_revoke_states()
nfsd: provide locking for v4_end_grace
NFSD: Fix permission check for read access to executable-only files
NFSD: Remove NFSERR_EAGAIN
|
|
If a filesystem is missing its data reloc tree, we get something like
this in dmesg:
BTRFS warning (device loop11): failed to read root (objectid=4): -2
BTRFS error (device loop11): open_ctree failed: -2
objectid is BTRFS_DEV_TREE_OBJECTID, but this should actually be the
value of BTRFS_DATA_RELOC_TREE_OBJECTID.
btrfs_read_roots() prints location.objectid on failure, but this isn't
set when reading the data reloc tree. Set location.objectid to the
correct value on failure, so that the error message makes sense.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Coverity reported a NULL pointer dereference issue (CID 1666756) in
do_abort_log_replay(). When btrfs_alloc_path() fails in
replay_one_buffer(), wc->subvol_path is NULL, but btrfs_abort_log_replay()
calls do_abort_log_replay() which unconditionally dereferences
wc->subvol_path when attempting to print debug information. Fix this by
adding a NULL check before dereferencing wc->subvol_path in
do_abort_log_replay().
Fixes: 2753e4917624 ("btrfs: dump detailed info and specific messages on log replay failures")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
Currently we only enforcing the free space tree for bs < ps cases, but
with the recently added bs > ps support, we lack the free space tree
enforcing, causing explicit v1 cache mount option to fail on bs > ps
cases:
# mount -o space_cache=v1 /dev/test/scratch1 /mnt/btrfs/
mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/mapper/test-scratch1, missing codepage or helper program, or other error.
dmesg(1) may have more information after failed mount system call.
# dmesg -t | tail -n7
BTRFS: device fsid ac14a6fa-4ec9-449e-aec9-7d1777bfdc06 devid 1 transid 11 /dev/mapper/test-scratch1 (253:3) scanned by mount (2849)
BTRFS info (device dm-3): first mount of filesystem ac14a6fa-4ec9-449e-aec9-7d1777bfdc06
BTRFS info (device dm-3): using crc32c checksum algorithm
BTRFS warning (device dm-3): support for block size 8192 with page size 4096 is experimental, some features may be missing
BTRFS warning (device dm-3): space cache v1 is being deprecated and will be removed in a future release, please use -o space_cache=v2
BTRFS warning (device dm-3): v1 space cache is not supported for page size 4096 with sectorsize 8192
BTRFS error (device dm-3): open_ctree failed: -22
[FIX]
Just enable the same free space tree for bs > ps cases, aligning the
behavior to bs < ps cases.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
Since the introduction of btrfs bs < ps support, v1 cache was never on
the plan due to its hard coded PAGE_SIZE usage, and the future plan to
properly deprecate it.
However for bs < ps cases, even if 'nospace_cache,clear_cache' mount
option is specified, it's never respected and free space tree is always
enabled:
mkfs.btrfs -f -O ^bgt,fst $dev
mount $dev $mnt -o clear_cache,nospace_cache
umount $mnt
btrfs ins dump-super $dev
...
compat_ro_flags 0x3
( FREE_SPACE_TREE |
FREE_SPACE_TREE_VALID )
...
This means a different behavior compared to bs >= ps cases.
[CAUSE]
The forcing usage of v2 space cache is done inside
btrfs_set_free_space_cache_settings(), however it never checks if we're
even using space cache but always enabling v2 cache.
[FIX]
Instead unconditionally enable v2 cache, only forcing v2 cache if the
old v1 cache is required.
Now v2 space cache can be properly disabled on bs < ps cases:
mkfs.btrfs -f -O ^bgt,fst $dev
mount $dev $mnt -o clear_cache,nospace_cache
umount $mnt
btrfs ins dump-super $dev
...
compat_ro_flags 0x0
...
Fixes: 9f73f1aef98b ("btrfs: force v2 space cache usage for subpage mount")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In btrfs_read_locked_inode() we are calling btrfs_init_file_extent_tree()
while holding a path with a read locked leaf from a subvolume tree, and
btrfs_init_file_extent_tree() may do a GFP_KERNEL allocation, which can
trigger reclaim.
This can create a circular lock dependency which lockdep warns about with
the following splat:
[6.1433] ======================================================
[6.1574] WARNING: possible circular locking dependency detected
[6.1583] 6.18.0+ #4 Tainted: G U
[6.1591] ------------------------------------------------------
[6.1599] kswapd0/117 is trying to acquire lock:
[6.1606] ffff8d9b6333c5b8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1625]
but task is already holding lock:
[6.1633] ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60
[6.1646]
which lock already depends on the new lock.
[6.1657]
the existing dependency chain (in reverse order) is:
[6.1667]
-> #2 (fs_reclaim){+.+.}-{0:0}:
[6.1677] fs_reclaim_acquire+0x9d/0xd0
[6.1685] __kmalloc_cache_noprof+0x59/0x750
[6.1694] btrfs_init_file_extent_tree+0x90/0x100
[6.1702] btrfs_read_locked_inode+0xc3/0x6b0
[6.1710] btrfs_iget+0xbb/0xf0
[6.1716] btrfs_lookup_dentry+0x3c5/0x8e0
[6.1724] btrfs_lookup+0x12/0x30
[6.1731] lookup_open.isra.0+0x1aa/0x6a0
[6.1739] path_openat+0x5f7/0xc60
[6.1746] do_filp_open+0xd6/0x180
[6.1753] do_sys_openat2+0x8b/0xe0
[6.1760] __x64_sys_openat+0x54/0xa0
[6.1768] do_syscall_64+0x97/0x3e0
[6.1776] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[6.1784]
-> #1 (btrfs-tree-00){++++}-{3:3}:
[6.1794] lock_release+0x127/0x2a0
[6.1801] up_read+0x1b/0x30
[6.1808] btrfs_search_slot+0x8e0/0xff0
[6.1817] btrfs_lookup_inode+0x52/0xd0
[6.1825] __btrfs_update_delayed_inode+0x73/0x520
[6.1833] btrfs_commit_inode_delayed_inode+0x11a/0x120
[6.1842] btrfs_log_inode+0x608/0x1aa0
[6.1849] btrfs_log_inode_parent+0x249/0xf80
[6.1857] btrfs_log_dentry_safe+0x3e/0x60
[6.1865] btrfs_sync_file+0x431/0x690
[6.1872] do_fsync+0x39/0x80
[6.1879] __x64_sys_fsync+0x13/0x20
[6.1887] do_syscall_64+0x97/0x3e0
[6.1894] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[6.1903]
-> #0 (&delayed_node->mutex){+.+.}-{3:3}:
[6.1913] __lock_acquire+0x15e9/0x2820
[6.1920] lock_acquire+0xc9/0x2d0
[6.1927] __mutex_lock+0xcc/0x10a0
[6.1934] __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1944] btrfs_evict_inode+0x20b/0x4b0
[6.1952] evict+0x15a/0x2f0
[6.1958] prune_icache_sb+0x91/0xd0
[6.1966] super_cache_scan+0x150/0x1d0
[6.1974] do_shrink_slab+0x155/0x6f0
[6.1981] shrink_slab+0x48e/0x890
[6.1988] shrink_one+0x11a/0x1f0
[6.1995] shrink_node+0xbfd/0x1320
[6.1002] balance_pgdat+0x67f/0xc60
[6.1321] kswapd+0x1dc/0x3e0
[6.1643] kthread+0xff/0x240
[6.1965] ret_from_fork+0x223/0x280
[6.1287] ret_from_fork_asm+0x1a/0x30
[6.1616]
other info that might help us debug this:
[6.1561] Chain exists of:
&delayed_node->mutex --> btrfs-tree-00 --> fs_reclaim
[6.1503] Possible unsafe locking scenario:
[6.1110] CPU0 CPU1
[6.1411] ---- ----
[6.1707] lock(fs_reclaim);
[6.1998] lock(btrfs-tree-00);
[6.1291] lock(fs_reclaim);
[6.1581] lock(&delayed_node->mutex);
[6.1874]
*** DEADLOCK ***
[6.1716] 2 locks held by kswapd0/117:
[6.1999] #0: ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60
[6.1294] #1: ffff8d998344b0e0 (&type->s_umount_key#40){++++}- {3:3}, at: super_cache_scan+0x37/0x1d0
[6.1596]
stack backtrace:
[6.1183] CPU: 11 UID: 0 PID: 117 Comm: kswapd0 Tainted: G U 6.18.0+ #4 PREEMPT(lazy)
[6.1185] Tainted: [U]=USER
[6.1186] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
[6.1187] Call Trace:
[6.1187] <TASK>
[6.1189] dump_stack_lvl+0x6e/0xa0
[6.1192] print_circular_bug.cold+0x17a/0x1c0
[6.1194] check_noncircular+0x175/0x190
[6.1197] __lock_acquire+0x15e9/0x2820
[6.1200] lock_acquire+0xc9/0x2d0
[6.1201] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1204] __mutex_lock+0xcc/0x10a0
[6.1206] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1208] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1211] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1213] __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1215] btrfs_evict_inode+0x20b/0x4b0
[6.1217] ? lock_acquire+0xc9/0x2d0
[6.1220] evict+0x15a/0x2f0
[6.1222] prune_icache_sb+0x91/0xd0
[6.1224] super_cache_scan+0x150/0x1d0
[6.1226] do_shrink_slab+0x155/0x6f0
[6.1228] shrink_slab+0x48e/0x890
[6.1229] ? shrink_slab+0x2d2/0x890
[6.1231] shrink_one+0x11a/0x1f0
[6.1234] shrink_node+0xbfd/0x1320
[6.1236] ? shrink_node+0xa2d/0x1320
[6.1236] ? shrink_node+0xbd3/0x1320
[6.1239] ? balance_pgdat+0x67f/0xc60
[6.1239] balance_pgdat+0x67f/0xc60
[6.1241] ? finish_task_switch.isra.0+0xc4/0x2a0
[6.1246] kswapd+0x1dc/0x3e0
[6.1247] ? __pfx_autoremove_wake_function+0x10/0x10
[6.1249] ? __pfx_kswapd+0x10/0x10
[6.1250] kthread+0xff/0x240
[6.1251] ? __pfx_kthread+0x10/0x10
[6.1253] ret_from_fork+0x223/0x280
[6.1255] ? __pfx_kthread+0x10/0x10
[6.1257] ret_from_fork_asm+0x1a/0x30
[6.1260] </TASK>
This is because:
1) The fsync task is holding an inode's delayed node mutex (for a
directory) while calling __btrfs_update_delayed_inode() and that needs
to do a search on the subvolume's btree (therefore read lock some
extent buffers);
2) The lookup task, at btrfs_lookup(), triggered reclaim with the
GFP_KERNEL allocation done by btrfs_init_file_extent_tree() while
holding a read lock on a subvolume leaf;
3) The reclaim triggered kswapd which is doing inode eviction for the
directory inode the fsync task is using as an argument to
btrfs_commit_inode_delayed_inode() - but in that call chain we are
trying to read lock the same leaf that the lookup task is holding
while calling btrfs_init_file_extent_tree() and doing the GFP_KERNEL
allocation.
Fix this by calling btrfs_init_file_extent_tree() after we don't need the
path anymore and release it in btrfs_read_locked_inode().
Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/linux-btrfs/6e55113a22347c3925458a5d840a18401a38b276.camel@linux.intel.com/
Fixes: 8679d2687c35 ("btrfs: initialize inode::file_extent_tree after i_mode has been set")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[POTENTIAL BUG]
If the system page size is 4K and fs block size is 8K, and max_inline
mount option is set to 6K, we can inline a 6K sized data extent.
Then a encoded write submitted a compressed extent which is at file
offset 0, and the compressed length is 6K, which is allowed to be inlined.
Now a read beyond page boundary is triggered inside write_extent_buffer()
from insert_inline_extent().
[CAUSE]
Currently the function __cow_file_range_inline() can only accept a
single folio.
For regular compressed write path, we always allocate the compressed
folios using the minimal order matching the block size, thus the
@compressed_folio should always cover a full fs block thus it is fine.
But for encoded writes, they allocate page size folios, this means we
can hit a case where the compressed data is smaller than block size but
still larger than page size, in that case __cow_file_range_inline() will
be called with @compressed_size larger than a page.
In that case we will trigger a read beyond the folio inside
insert_inline_extent().
Thankfully this is not that common, as the default max_inline is only
2048 bytes, smaller than PAGE_SIZE, and bs > ps support is still
experimental.
[FIX]
We need to either allow insert_inline_extent() to accept a page array to
properly support such case, or reject such inline extent.
The latter is a much simpler solution, and considering bs > ps will stay
as a corner case and non-default max_inline will be even rarer, I don't
think we really need to fulfill such niche.
So just reject any inline extent that's larger than PAGE_SIZE, and add
an extra ASSERT() to insert_inline_extent() to catch such beyond-boundary
access.
Fixes: ec20799064c8 ("btrfs: enable encoded read/write/send for bs > ps cases")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix potential deadlock due to mismatching transaction states when
waiting for the current transaction
- fix squota accounting with nested snapshots
- fix quota inheritance of qgroups with multiple parent qgroups
- fix NULL inode pointer in evict tracepoint
- fix writes beyond end of file on systems with 64K page size and 4K
block size
- fix logging of inodes after exchange rename
- fix use after free when using ref_tracker feature
- space reservation fixes
* tag 'for-6.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix reservation leak in some error paths when inserting inline extent
btrfs: do not free data reservation in fallback from inline due to -ENOSPC
btrfs: fix use-after-free warning in btrfs_get_or_create_delayed_node()
btrfs: always detect conflicting inodes when logging inode refs
btrfs: fix beyond-EOF write handling
btrfs: fix deadlock in wait_current_trans() due to ignored transaction type
btrfs: fix NULL dereference on root when tracing inode eviction
btrfs: qgroup: update all parent qgroups when doing quick inherit
btrfs: fix qgroup_snapshot_quick_inherit() squota bug
|
|
When the NFSD instance doesn't to startup, the net ref data memory is
not properly reclaimed, which triggers the memory leak issue reported
by syzbot [1].
To avoid the problem reported in [1], the net ref data memory reclamation
action is moved outside of nfsd_net_up when the net is shutdown.
[1]
unreferenced object 0xffff88812a39dfc0 (size 64):
backtrace (crc a2262fc6):
percpu_ref_init+0x94/0x1e0 lib/percpu-refcount.c:76
nfsd_create_serv+0xbe/0x260 fs/nfsd/nfssvc.c:605
nfsd_nl_listener_set_doit+0x62/0xb00 fs/nfsd/nfsctl.c:1882
genl_family_rcv_msg_doit+0x11e/0x190 net/netlink/genetlink.c:1115
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x2fd/0x440 net/netlink/genetlink.c:1210
BUG: memory leak
Reported-by: syzbot+6ee3b889bdeada0a6226@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6ee3b889bdeada0a6226
Fixes: 39972494e318 ("nfsd: update percpu_ref to manage references on nfsd_net")
Cc: stable@vger.kernel.org
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If we are trying to unlock the filesystem via an administrative
interface and nfsd isn't running, it crashes the server. This
happens currently because nfsd4_revoke_states() access state
structures (eg., conf_id_hashtbl) that has been freed as a part
of the server shutdown.
[ 59.465072] Call trace:
[ 59.465308] nfsd4_revoke_states+0x1b4/0x898 [nfsd] (P)
[ 59.465830] write_unlock_fs+0x258/0x440 [nfsd]
[ 59.466278] nfsctl_transaction_write+0xb0/0x120 [nfsd]
[ 59.466780] vfs_write+0x1f0/0x938
[ 59.467088] ksys_write+0xfc/0x1f8
[ 59.467395] __arm64_sys_write+0x74/0xb8
[ 59.467746] invoke_syscall.constprop.0+0xdc/0x1e8
[ 59.468177] do_el0_svc+0x154/0x1d8
[ 59.468489] el0_svc+0x40/0xe0
[ 59.468767] el0t_64_sync_handler+0xa0/0xe8
[ 59.469138] el0t_64_sync+0x1ac/0x1b0
Ensure this can't happen by taking the nfsd_mutex and checking that
the server is still up, and then holding the mutex across the call to
nfsd4_revoke_states().
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes: 1ac3629bf0125 ("nfsd: prepare for supporting admin-revocation of state")
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The loop in nfsd4_revoke_states() stops one too early because
the end value given is CLIENT_HASH_MASK where it should be
CLIENT_HASH_SIZE.
This means that an admin request to drop all locks for a filesystem will
miss locks held by clients which hash to the maximum possible hash value.
Fixes: 1ac3629bf012 ("nfsd: prepare for supporting admin-revocation of state")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Writing to v4_end_grace can race with server shutdown and result in
memory being accessed after it was freed - reclaim_str_hashtbl in
particularly.
We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
held while client_tracking_op->init() is called and that can wait for
an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
deadlock.
nfsd4_end_grace() is also called by the landromat work queue and this
doesn't require locking as server shutdown will stop the work and wait
for it before freeing anything that nfsd4_end_grace() might access.
However, we must be sure that writing to v4_end_grace doesn't restart
the work item after shutdown has already waited for it. For this we
add a new flag protected with nn->client_lock. It is set only while it
is safe to make client tracking calls, and v4_end_grace only schedules
work while the flag is set with the spinlock held.
So this patch adds a nfsd_net field "client_tracking_active" which is
set as described. Another field "grace_end_forced", is set when
v4_end_grace is written. After this is set, and providing
client_tracking_active is set, the laundromat is scheduled.
This "grace_end_forced" field bypasses other checks for whether the
grace period has finished.
This resolves a race which can result in use-after-free.
Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/T/#t
Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neil@brown.name>
Tested-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Commit abc02e5602f7 ("NFSD: Support write delegations in LAYOUTGET")
added NFSD_MAY_OWNER_OVERRIDE to the access flags passed from
nfsd4_layoutget() to fh_verify(). This causes LAYOUTGET to fail for
executable-only files, and causes xfstests generic/126 to fail on
pNFS SCSI.
To allow read access to executable-only files, what we really want is:
1. The "permissions" portion of the access flags (the lower 6 bits)
must be exactly NFSD_MAY_READ
2. The "hints" portion of the access flags (the upper 26 bits) can
contain any combination of NFSD_MAY_OWNER_OVERRIDE and
NFSD_MAY_READ_IF_EXEC
Fixes: abc02e5602f7 ("NFSD: Support write delegations in LAYOUTGET")
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881.
None of these RFCs have an NFS status code that match the numeric
value "11".
Based on the meaning of the EAGAIN errno, I presume the use of this
status in NFSD means NFS4ERR_DELAY. So replace the one usage of
nfserr_eagain, and remove it from NFSD's NFS status conversion
tables.
As far as I can tell, NFSERR_EAGAIN has existed since the pre-git
era, but was not actually used by any code until commit f4e44b393389
("NFSD: delay unmount source's export after inter-server copy
completed."), at which time it become possible for NFSD to return
a status code of 11 (which is not valid NFS protocol).
Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull smb server fixes from Steve French:
- Fix memory leak
- Fix two refcount leaks
- Fix error path in create_smb2_pipe
* tag 'v6.19-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd:
smb/server: fix refcount leak in smb2_open()
smb/server: fix refcount leak in parse_durable_handle_context()
smb/server: call ksmbd_session_rpc_close() on error path in create_smb2_pipe()
ksmbd: Fix memory leak in get_file_all_info()
|
|
Pull smb client fixes from Steve French:
- Fix array out of bounds error in copy_file_range
- Add tracepoint to help debug ioctl failures
* tag 'v6.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix UBSAN array-index-out-of-bounds in smb2_copychunk_range
smb3 client: add missing tracepoint for unsupported ioctls
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"A set of NFSD fixes that arrived just a bit late for the 6.19 merge
window.
Regression fix:
- Avoid unnecessarily breaking a timestamp delegation
Stable fixes:
- Fix a crasher in nlm4svc_proc_test()
- Fix nfsd_file reference leak during write delegation
- Fix error flow in client_states_open()"
* tag 'nfsd-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: Drop the client reference in client_states_open()
nfsd: use ATTR_DELEG in nfsd4_finalize_deleg_timestamps()
nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
lockd: fix vfs_test_lock() calls
|
|
struct copychunk_ioctl_req::ChunkCount is annotated with
__counted_by_le() as the number of elements in Chunks[].
smb2_copychunk_range reuses ChunkCount to store the number of chunks
sent in the current iteration. If a later iteration populates more
chunks than a previous one, the stale smaller value trips UBSAN.
Set ChunkCount to chunk_count (allocated capacity) before populating
Chunks[].
Fixes: cc26f593dc19 ("smb: move copychunk definitions to common/smb2pdu.h")
Link: https://lore.kernel.org/linux-cifs/CAH2r5ms9AWLy8WZ04Cpq5XOeVK64tcrUQ6__iMW+yk1VPzo1BA@mail.gmail.com
Tested-by: Youling Tang <tangyouling@kylinos.cn>
Acked-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In debugging a recent problem with an xfstest, noticed that we weren't
tracing cases where the ioctl was not supported. Add dynamic tracepoint:
"trace-cmd record -e smb3_unsupported_ioctl"
and then after running an app which calls unsupported ioctl,
"trace-cmd show"would display e.g.
xfs_io-7289 [012] ..... 1205.137765: smb3_unsupported_ioctl: xid=19 fid=0x4535bb84 ioctl cmd=0x801c581f
Acked-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When ksmbd_vfs_getattr() fails, the reference count of ksmbd_file
must be released.
Suggested-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When the command is a replay operation and -ENOEXEC is returned,
the refcount of ksmbd_file must be released.
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When ksmbd_iov_pin_rsp() fails, we should call ksmbd_session_rpc_close().
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In get_file_all_info(), if vfs_getattr() fails, the function returns
immediately without freeing the allocated filename, leading to a memory
leak.
Fix this by freeing the filename before returning in this error case.
Fixes: 5614c8c487f6a ("ksmbd: replace generic_fillattr with vfs_getattr")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Pull smb client fix from Steve French:
- Fix potential memory leak
* tag 'v6.19-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix memory and information leak in smb3_reconfigure()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Introduce DMA Rust helpers to avoid build errors when !CONFIG_HAS_DMA
- Remove unnecessary (and hence incorrect) endian conversion in the
Rust PCI driver sample code
- Fix memory leak in the unwind path of debugfs_change_name()
- Support non-const struct software_node pointers in
SOFTWARE_NODE_REFERENCE(), after introducing _Generic()
- Avoid NULL pointer dereference in the unwind path of
simple_xattrs_free()
* tag 'driver-core-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
fs/kernfs: null-ptr deref in simple_xattrs_free()
software node: Also support referencing non-constant software nodes
debugfs: Fix memleak in debugfs_change_name().
samples: rust: fix endianness issue in rust_driver_pci
rust: dma: add helpers for architectures without CONFIG_HAS_DMA
|
|
Pull smb server fixes from Steve French:
- Fix parsing of SMB1 negotiate request by adjusting offsets affected
by the removal of the RFC1002 length field from the SMB header
- Update minimum PDU size macros for both SMB1 and SMB2
- Rename smb2_get_msg function to smb_get_msg to better reflect its
role in handling both SMB1 and SMB2 requests
* tag 'v6.19-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd:
smb/server: fix minimum SMB2 PDU size
smb/server: fix minimum SMB1 PDU size
ksmbd: rename smb2_get_msg to smb_get_msg
ksmbd: Fix to handle removal of rfc1002 header from smb_hdr
|
|
In error path, call drop_client() to drop the reference
obtained by get_nfsdfs_clp().
Fixes: 78599c42ae3c ("nfsd4: add file to display list of client's opens")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
When finalizing timestamps that have never been updated and preparing to
release the delegation lease, the notify_change() call can trigger a
delegation break, and fail to update the timestamps. When this happens,
there will be messages like this in dmesg:
[ 2709.375785] Unable to update timestamps on inode 00:39:263: -11
Since this code is going to release the lease just after updating the
timestamps, breaking the delegation is undesirable. Fix this by setting
ATTR_DELEG in ia_valid, in order to avoid the delegation break.
Fixes: e5e9b24ab8fa ("nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites
fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if
the client already has a SHARE_ACCESS_READ open from a previous OPEN
operation, this action overwrites the existing pointer without
releasing its reference, orphaning the previous reference.
Additionally, the function originally stored the same nfsd_file
pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with
only a single reference. When put_deleg_file() runs, it clears
fi_rdeleg_file and calls nfs4_file_put_access() to release the file.
However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when
the fi_access[O_RDONLY] counter drops to zero. If another READ open
exists on the file, the counter remains elevated and the nfsd_file
reference from the delegation is never released. This potentially
causes open conflicts on that file.
Then, on server shutdown, these leaks cause __nfsd_file_cache_purge()
to encounter files with an elevated reference count that cannot be
cleaned up, ultimately triggering a BUG() in kmem_cache_destroy()
because there are still nfsd_file objects allocated in that cache.
Fixes: e7a8ebc305f2 ("NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Usage of vfs_test_lock() is somewhat confused. Documentation suggests
it is given a "lock" but this is not the case. It is given a struct
file_lock which contains some details of the sort of lock it should be
looking for.
In particular passing a "file_lock" containing fl_lmops or fl_ops is
meaningless and possibly confusing.
This is particularly problematic in lockd. nlmsvc_testlock() receives
an initialised "file_lock" from xdr-decode, including manager ops and an
owner. It then mistakenly passes this to vfs_test_lock() which might
replace the owner and the ops. This can lead to confusion when freeing
the lock.
The primary role of the 'struct file_lock' passed to vfs_test_lock() is
to report a conflicting lock that was found, so it makes more sense for
nlmsvc_testlock() to pass "conflock", which it uses for returning the
conflicting lock.
With this change, freeing of the lock is not confused and code in
__nlm4svc_proc_test() and __nlmsvc_proc_test() can be simplified.
Documentation for vfs_test_lock() is improved to reflect its real
purpose, and a WARN_ON_ONCE() is added to avoid a similar problem in the
future.
Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Closes: https://lore.kernel.org/all/20251021130506.45065-1-okorniev@redhat.com
Signed-off-by: NeilBrown <neil@brown.name>
Fixes: 20fa19027286 ("nfs: add export operations")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"A set of NFSD fixes that arrived just a bit late for the 6.19 merge
window.
Regression fixes:
- Mark variable __maybe_unused to avoid W=1 build break
Stable fixes:
- NFSv4 file creation neglects setting ACL
- Clear TIME_DELEG in the suppattr_exclcreat bitmap
- Clear SECLABEL in the suppattr_exclcreat bitmap
- Fix memory leak in nfsd_create_serv error paths
- Bound check rq_pages index in inline path
- Return 0 on success from svc_rdma_copy_inline_range
- Use rc_pageoff for memcpy byte offset
- Avoid NULL deref on zero length gss_token in gss_read_proxy_verf"
* tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: NFSv4 file creation neglects setting ACL
NFSD: Clear TIME_DELEG in the suppattr_exclcreat bitmap
NFSD: Clear SECLABEL in the suppattr_exclcreat bitmap
nfsd: fix memory leak in nfsd_create_serv error paths
nfsd: Mark variable __maybe_unused to avoid W=1 build break
svcrdma: bound check rq_pages index in inline path
svcrdma: return 0 on success from svc_rdma_copy_inline_range
svcrdma: use rc_pageoff for memcpy byte offset
SUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in gss_read_proxy_verf
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
"Junbeom reported that synchronous reads could hit unintended EIOs
under memory pressure due to incorrect error propagation in
z_erofs_decompress_queue(), where earlier physical clusters in the
same decompression queue may be served for another readahead.
This addresses the issue by decompressing each physical cluster
independently as long as disk I/Os succeed, rather than being impacted
by the error status of previous physical clusters in the same queue.
Summary:
- Fix unexpected EIOs under memory pressure caused by recent
incorrect error propagation logic"
* tag 'erofs-for-6.19-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix unexpected EIO under memory pressure
|
|
In smb3_reconfigure(), if smb3_sync_session_ctx_passwords() fails, the
function returns immediately without freeing and erasing the newly
allocated new_password and new_password2. This causes both a memory leak
and a potential information leak.
Fix this by calling kfree_sensitive() on both password buffers before
returning in this error case.
Fixes: 0f0e357902957 ("cifs: during remount, make sure passwords are in sync")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix a mkdir-induced usage count imbalance that tripped a umount_check()
BUG while unmounting the lower filesystem. Commit f046fbb4d81d
("ecryptfs: use new start_creating/start_removing APIs") added a new
dget() of the lower parent dir, in ecryptfs_mkdir(), but did not dput()
the dentry before returning from that function.
The BUG output as seen while running the eCryptfs test suite:
$ ./run_tests.sh -b 131072 -c safe,destructive -f ext4 -K -t lp-926292.sh
...
Running eCryptfs filesystem tests on ext4
lp-926292
------------[ cut here ]------------
BUG: Dentry ffff8e6692d11988{i=c,n=ECRYPTFS_FNEK_ENCRYPTED.FXZuRGZL7QAFtER.JeA46DtdKqkkQx9H2Vpmv234J5CU8YSsrUwZJK4AbXbrN5WkZ348wnqstovKKxA-} still in use (1) [unmount of ext4 loop0]
WARNING: CPU: 7 PID: 950 at fs/dcache.c:1590 umount_check+0x5e/0x80
Modules linked in: md5 libmd5 ecryptfs encrypted_keys ext4 crc16 mbcache jbd2
CPU: 7 UID: 0 PID: 950 Comm: umount Not tainted 6.18.0-rc1-00013-gf046fbb4d81d #17 PREEMPT(full)
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
RIP: 0010:umount_check+0x5e/0x80
Code: 88 38 06 00 00 48 8b 40 28 4c 8b 08 48 8b 46 68 48 85 c0 74 04 48 8b 50 38 51 48 c7 c7 60 32 9c b5 48 89 f1 e8 43 5e ca ff 90 <0f> 0b 90 90 58 31 c0 e9 46 9d 6c 00 41 83 f8 01 75 b8 eb a3 66 66
RSP: 0018:ffffa19940c4bdd0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8e6692fad4c0 RCX: 0000000000000000
RDX: 0000000000000004 RSI: ffffa19940c4bc70 RDI: 00000000ffffffff
RBP: ffffffffb4eb5930 R08: 00000000ffffdfff R09: 0000000000000001
R10: 00000000ffffdfff R11: ffffffffb5c8a9e0 R12: ffff8e6692fad4c0
R13: ffff8e6692fad4c0 R14: ffff8e6692d11a40 R15: ffff8e6692d11988
FS: 00007f6b4b491800(0000) GS:ffff8e670506e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6b4b5f8d40 CR3: 0000000114eb7001 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
<TASK>
d_walk+0xfd/0x370
shrink_dcache_for_umount+0x4d/0x140
generic_shutdown_super+0x20/0x160
kill_block_super+0x1a/0x40
ext4_kill_sb+0x22/0x40 [ext4]
deactivate_locked_super+0x33/0xa0
cleanup_mnt+0xba/0x150
task_work_run+0x5c/0xa0
exit_to_user_mode_loop+0xac/0xb0
do_syscall_64+0x2ab/0xfa0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6b4b6c2a2b
Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 b9 83 0d 00 f7 d8
RSP: 002b:00007ffcd5b8b498 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 000055b84af0b9e0 RCX: 00007f6b4b6c2a2b
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055b84af0bdf0
RBP: 00007ffcd5b8b570 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000103 R11: 0000000000000246 R12: 000055b84af0bae0
R13: 0000000000000000 R14: 000055b84af0bdf0 R15: 0000000000000000
</TASK>
---[ end trace 0000000000000000 ]---
EXT4-fs (loop0): unmounting filesystem 00d9ea41-f61e-43d0-a449-6be03e7e8428.
EXT4-fs (loop0): sb orphan head is 12
sb_info orphan list:
inode loop0:12 at ffff8e66950e1df0: mode 40700, nlink 0, next 0
Assertion failure in ext4_put_super() at fs/ext4/super.c:1345: 'list_empty(&sbi->s_orphan)'
Fixes: f046fbb4d81d ("ecryptfs: use new start_creating/start_removing APIs")
Signed-off-by: Tyler Hicks <code@tyhicks.com>
Link: https://patch.msgid.link/20251223194153.2818445-3-code@tyhicks.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The ecryptfs_start_creating_dentry() function must be paired with the
end_creating() function. Fix ecryptfs_mknod() so that end_creating() is
properly called in the return path, instead of end_removing().
Fixes: f046fbb4d81d ("ecryptfs: use new start_creating/start_removing APIs")
Signed-off-by: Tyler Hicks <code@tyhicks.com>
Link: https://patch.msgid.link/20251223194153.2818445-2-code@tyhicks.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Sphinx report kernel-doc warnings:
WARNING: ./fs/namei.c:2853 function parameter 'state' not described in '__start_dirop'
WARNING: ./fs/namei.c:2853 expecting prototype for start_dirop(). Prototype was for __start_dirop() instead
Fix them up.
Fixes: ff7c4ea11a05c8 ("VFS: add start_creating_killable() and start_removing_killable()")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20251219024620.22880-3-bagasdotme@gmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Sphinx reports kernel-doc warning:
WARNING: ./fs/inode.c:1607 function parameter 'isnew' not described in 'ilookup5_nowait'
Describe the parameter.
Fixes: a27628f4363435 ("fs: rework I_NEW handling to operate without fences")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20251219024620.22880-2-bagasdotme@gmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Otherwise the slowpath can be taken by the caller, defeating the flag.
This regressed after calls to legitimize_links() started being
conditionally elided and stems from the routine always failing
after seeing the flag, regardless if there were any links.
In order to address both the bug and the weird semantics make it illegal
to call legitimize_links() with LOOKUP_CACHED and handle the problem at
the two callsites.
Fixes: 7c179096e77eca21 ("fs: add predicts based on nd->depth")
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251220054023.142134-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The read result collection for buffered reads seems to run ahead of the
completion of subrequests under some circumstances, as can be seen in the
following log snippet:
9p_client_res: client 18446612686390831168 response P9_TREAD tag 0 err 0
...
netfs_sreq: R=00001b55[1] DOWN TERM f=192 s=0 5fb2/5fb2 s=5 e=0
...
netfs_collect_folio: R=00001b55 ix=00004 r=4000-5000 t=4000/5fb2
netfs_folio: i=157f3 ix=00004-00004 read-done
netfs_folio: i=157f3 ix=00004-00004 read-unlock
netfs_collect_folio: R=00001b55 ix=00005 r=5000-5fb2 t=5000/5fb2
netfs_folio: i=157f3 ix=00005-00005 read-done
netfs_folio: i=157f3 ix=00005-00005 read-unlock
...
netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=c
netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=8
...
netfs_sreq: R=00001b55[2] ZERO SUBMT f=000 s=5fb2 0/4e s=0 e=0
netfs_sreq: R=00001b55[2] ZERO TERM f=102 s=5fb2 4e/4e s=5 e=0
The 'cto=5fb2' indicates the collected file pos we've collected results to
so far - but we still have 0x4e more bytes to go - so we shouldn't have
collected folio ix=00005 yet. The 'ZERO' subreq that clears the tail
happens after we unlock the folio, allowing the application to see the
uncleared tail through mmap.
The problem is that netfs_read_unlock_folios() will unlock a folio in which
the amount of read results collected hits EOF position - but the ZERO
subreq lies beyond that and so happens after.
Fix this by changing the end check to always be the end of the folio and
never the end of the file.
In the future, I should look at clearing to the end of the folio here rather
than adding a ZERO subreq to do this. On the other hand, the ZERO subreq can
run in parallel with an async READ subreq. Further, the ZERO subreq may still
be necessary to, say, handle extents in a ceph file that don't have any
backing store and are thus implicitly all zeros.
This can be reproduced by creating a file, the size of which doesn't align
to a page boundary, e.g. 24998 (0x5fb2) bytes and then doing something
like:
xfs_io -c "mmap -r 0 0x6000" -c "madvise -d 0 0x6000" \
-c "mread -v 0 0x6000" /xfstest.test/x
The last 0x4e bytes should all be 00, but if the tail hasn't been cleared
yet, you may see rubbish there. This can be reproduced with kafs by
modifying the kernel to disable the call to netfs_read_subreq_progress()
and to stop afs_issue_read() from doing the async call for NETFS_READAHEAD.
Reproduction can be made easier by inserting an mdelay(100) in
netfs_issue_read() for the ZERO-subreq case.
AFS and CIFS are normally unlikely to show this as they dispatch READ ops
asynchronously, which allows the ZERO-subreq to finish first. 9P's READ op is
completely synchronous, so the ZERO-subreq will always happen after. It isn't
seen all the time, though, because the collection may be done in a worker
thread.
Reported-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Link: https://lore.kernel.org/r/8622834.T7Z3S40VBb@weasel/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/938162.1766233900@warthog.procyon.org.uk
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Tested-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Acked-by: Dominique Martinet <asmadeus@codewreck.org>
Suggested-by: Dominique Martinet <asmadeus@codewreck.org>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: v9fs@lists.linux.dev
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
There exists a null pointer dereference in simple_xattrs_free() as
part of the __kernfs_new_node() routine. Within __kernfs_new_node(),
err_out4 calls simple_xattr_free(), but kn->iattr may be NULL if
__kernfs_setattr() was never called. As a result, the first argument to
simple_xattrs_free() may be NULL + 0x38, and no NULL check is done
internally, causing an incorrect pointer dereference.
Add a check to ensure kn->iattr is not NULL, meaning __kernfs_setattr()
has been called and kn->iattr is allocated. Note that struct kernfs_node
kn is allocated with kmem_cache_zalloc, so we can assume kn->iattr will
be NULL if not allocated.
An alternative fix could be to not call simple_xattrs_free() at all. As
was previously discussed during the initial patch, simple_xattrs_free()
is not strictly needed and is included to be consistent with
kernfs_free_rcu(), which also helps the function maintain correctness if
changes are made in __kernfs_new_node().
Reported-by: syzbot+6aaf7f48ae034ab0ea97@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6aaf7f48ae034ab0ea97
Fixes: 382b1e8f30f7 ("kernfs: fix memory leak of kernfs_iattrs in __kernfs_new_node")
Signed-off-by: Will Rosenberg <whrosenb@asu.edu>
Link: https://patch.msgid.link/20251217060107.4171558-1-whrosenb@asu.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The minimum SMB2 PDU size should be updated to the size of
`struct smb2_pdu` (that is, the size of `struct smb2_hdr` + 2).
Suggested-by: David Howells <dhowells@redhat.com>
Suggested-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: David Howells <dhowells@redhat.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Since the RFC1002 header has been removed from `struct smb_hdr`,
the minimum SMB1 PDU size should be updated as well.
Fixes: 83bfbd0bb902 ("cifs: Remove the RFC1002 header from smb_hdr")
Suggested-by: David Howells <dhowells@redhat.com>
Suggested-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: David Howells <dhowells@redhat.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
With the removal of the RFC1002 length field from the SMB header,
smb2_get_msg is now used to get the smb1 request from the request buffer.
Since this function is no longer exclusive to smb2 and now supports smb1
as well, This patch rename it to smb_get_msg to better reflect its usage.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The commit that removed the RFC1002 header from struct smb_hdr didn't also
fix the places in ksmbd that use it in order to provide graceful rejection
of SMB1 protocol requests.
Fixes: 83bfbd0bb902 ("cifs: Remove the RFC1002 header from smb_hdr")
Reported-by: Namjae Jeon <linkinjeon@kernel.org>
Link: https://lore.kernel.org/r/CAKYAXd9Ju4MFkkH5Jxfi1mO0AWEr=R35M3vQ_Xa7Yw34JoNZ0A@mail.gmail.com/
Cc: ChenXiaoSong <chenxiaosong.chenxiaosong@linux.dev>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
erofs readahead could fail with ENOMEM under the memory pressure because
it tries to alloc_page with GFP_NOWAIT | GFP_NORETRY, while GFP_KERNEL
for a regular read. And if readahead fails (with non-uptodate folios),
the original request will then fall back to synchronous read, and
`.read_folio()` should return appropriate errnos.
However, in scenarios where readahead and read operations compete,
read operation could return an unintended EIO because of an incorrect
error propagation.
To resolve this, this patch modifies the behavior so that, when the
PCL is for read(which means pcl.besteffort is true), it attempts actual
decompression instead of propagating the privios error except initial EIO.
- Page size: 4K
- The original size of FileA: 16K
- Compress-ratio per PCL: 50% (Uncompressed 8K -> Compressed 4K)
[page0, page1] [page2, page3]
[PCL0]---------[PCL1]
- functions declaration:
. pread(fd, buf, count, offset)
. readahead(fd, offset, count)
- Thread A tries to read the last 4K
- Thread B tries to do readahead 8K from 4K
- RA, besteffort == false
- R, besteffort == true
<process A> <process B>
pread(FileA, buf, 4K, 12K)
do readahead(page3) // failed with ENOMEM
wait_lock(page3)
if (!uptodate(page3))
goto do_read
readahead(FileA, 4K, 8K)
// Here create PCL-chain like below:
// [null, page1] [page2, null]
// [PCL0:RA]-----[PCL1:RA]
...
do read(page3) // found [PCL1:RA] and add page3 into it,
// and then, change PCL1 from RA to R
...
// Now, PCL-chain is as below:
// [null, page1] [page2, page3]
// [PCL0:RA]-----[PCL1:R]
// try to decompress PCL-chain...
z_erofs_decompress_queue
err = 0;
// failed with ENOMEM, so page 1
// only for RA will not be uptodated.
// it's okay.
err = decompress([PCL0:RA], err)
// However, ENOMEM propagated to next
// PCL, even though PCL is not only
// for RA but also for R. As a result,
// it just failed with ENOMEM without
// trying any decompression, so page2
// and page3 will not be uptodated.
** BUG HERE ** --> err = decompress([PCL1:R], err)
return err as ENOMEM
...
wait_lock(page3)
if (!uptodate(page3))
return EIO <-- Return an unexpected EIO!
...
Fixes: 2349d2fa02db ("erofs: sunset unneeded NOFAILs")
Cc: stable@vger.kernel.org
Reviewed-by: Jaewook Kim <jw5454.kim@samsung.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Junbeom Yeom <junbeom.yeom@samsung.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Pull xfs fixes from Carlos Maiolino:
"This contains a few fixes for zoned devices support, an UAF and a
compiler warning, and some cleaning up"
* tag 'xfs-fixes-6.19-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix the zoned RT growfs check for zone alignment
xfs: validate that zoned RT devices are zone aligned
xfs: fix XFS_ERRTAG_FORCE_ZERO_RANGE for zoned file system
xfs: fix a memory leak in xfs_buf_item_init()
xfs: fix stupid compiler warning
xfs: fix a UAF problem in xattr repair
xfs: ignore discard return value
|
|
syzbot reported memleak in debugfs_change_name(). [0]
When lookup_noperm_unlocked() fails, new_name is leaked.
Let's fix it by reusing kfree_const() at the end of
debugfs_change_name().
[0]:
BUG: memory leak
unreferenced object 0xffff8881110bb308 (size 8):
comm "syz.0.17", pid 6090, jiffies 4294942958
hex dump (first 8 bytes):
2e 00 00 00 00 00 00 00 ........
backtrace (crc ecfc7064):
kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline]
slab_post_alloc_hook mm/slub.c:4953 [inline]
slab_alloc_node mm/slub.c:5258 [inline]
__do_kmalloc_node mm/slub.c:5651 [inline]
__kmalloc_node_track_caller_noprof+0x3b2/0x670 mm/slub.c:5759
__kmemdup_nul mm/util.c:64 [inline]
kstrdup+0x3c/0x80 mm/util.c:84
kstrdup_const+0x63/0x80 mm/util.c:104
kvasprintf_const+0xca/0x110 lib/kasprintf.c:48
debugfs_change_name+0xf6/0x5d0 fs/debugfs/inode.c:854
cfg80211_dev_rename+0xd8/0x110 net/wireless/core.c:149
nl80211_set_wiphy+0x102/0x1770 net/wireless/nl80211.c:3844
genl_family_rcv_msg_doit+0x11e/0x190 net/netlink/genetlink.c:1115
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x2fd/0x440 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x93/0x1d0 net/netlink/af_netlink.c:2550
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x3a3/0x4f0 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x335/0x6b0 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:718 [inline]
__sock_sendmsg net/socket.c:733 [inline]
____sys_sendmsg+0x562/0x5a0 net/socket.c:2608
___sys_sendmsg+0xc8/0x130 net/socket.c:2662
__sys_sendmsg+0xc7/0x140 net/socket.c:2694
Fixes: 833d2b3a072f7 ("Add start_renaming_two_dentries()")
Reported-by: syzbot+3d7ca9c802c547f8550a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69369d82.a70a0220.38f243.009f.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251208094551.46184-1-kuniyu@google.com
[ Fix minor typo in commit message. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|