summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-06-27xfs: fix unmount hang with unflushable inodes stuck in the AILDave Chinner
Unmount of a shutdown filesystem can hang with stale inode cluster buffers in the AIL like so: [95964.140623] Call Trace: [95964.144641] __schedule+0x699/0xb70 [95964.154003] schedule+0x64/0xd0 [95964.156851] xfs_ail_push_all_sync+0x9b/0xf0 [95964.164816] xfs_unmount_flush_inodes+0x41/0x70 [95964.168698] xfs_unmountfs+0x7f/0x170 [95964.171846] xfs_fs_put_super+0x3b/0x90 [95964.175216] generic_shutdown_super+0x77/0x160 [95964.178060] kill_block_super+0x1b/0x40 [95964.180553] xfs_kill_sb+0x12/0x30 [95964.182796] deactivate_locked_super+0x38/0x100 [95964.185735] deactivate_super+0x41/0x50 [95964.188245] cleanup_mnt+0x9f/0x160 [95964.190519] __cleanup_mnt+0x12/0x20 [95964.192899] task_work_run+0x89/0xb0 [95964.195221] resume_user_mode_work+0x4f/0x60 [95964.197931] syscall_exit_to_user_mode+0x76/0xb0 [95964.201003] do_syscall_64+0x74/0x130 $ pstree -N mnt |grep umount |-check-parallel---nsexec---run_test.sh---753---umount It always seems to be generic/753 that triggers this, and repeating a quick group test run triggers it every 10-15 iterations. Hence it generally triggers once up every 30-40 minutes of test time. just running generic/753 by itself or concurrently with a limited group of tests doesn't reproduce this issue at all. Tracing on a hung system shows the AIL repeating every 50ms a log force followed by an attempt to push pinned, aborted inodes from the AIL (trimmed for brevity): xfs_log_force: lsn 0x1c caller xfsaild+0x18e xfs_log_force: lsn 0x0 caller xlog_cil_flush+0xbd xfs_log_force: lsn 0x1c caller xfs_log_force+0x77 xfs_ail_pinned: lip 0xffff88826014afa0 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED xfs_ail_pinned: lip 0xffff88814000a708 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED xfs_ail_pinned: lip 0xffff88810b850c80 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED xfs_ail_pinned: lip 0xffff88810b850af0 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED xfs_ail_pinned: lip 0xffff888165cf0a28 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED xfs_ail_pinned: lip 0xffff88810b850bb8 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED .... The inode log items are marked as aborted, which means that either: a) a transaction commit has occurred, seen an error or shutdown, and called xfs_trans_free_items() to abort the items. This should happen before any pinning of log items occurs. or b) a dirty transaction has been cancelled. This should also happen before any pinning of log items occurs. or c) AIL insertion at journal IO completion is marked as aborted. In this case, the log item is pinned by the CIL until journal IO completes and hence needs to be unpinned. This is then done after the ->iop_committed() callback is run, so the pin count should be balanced correctly. Yet none of these seemed to be occurring. Further tracing indicated this: d) Shutdown during CIL pushing resulting in log item completion being called from checkpoint abort processing. Items are unpinned and released without serialisation against each other, journal IO completion or transaction commit completion. In this case, we may still have a transaction commit in flight that holds a reference to a xfs_buf_log_item (BLI) after CIL insertion. e.g. a synchronous transaction will flush the CIL before the transaction is torn down. The concurrent CIL push then aborts insertion it and drops the commit/AIL reference to the BLI. This can leave the transaction commit context with the last reference to the BLI which is dropped here: xfs_trans_free_items() ->iop_release xfs_buf_item_release xfs_buf_item_put if (XFS_LI_ABORTED) xfs_trans_ail_delete xfs_buf_item_relse() Unlike the journal completion ->iop_unpin path, this path does not run stale buffer completion process when it drops the last reference, hence leaving the stale inodes attached to the buffer sitting the AIL. There are no other references to those inodes, so there is no other mechanism to remove them from the AIL. Hence unmount hangs. The buffer lock context for stale buffers is passed to the last BLI reference. This is normally the last BLI unpin on journal IO completion. The unpin then processes the stale buffer completion and releases the buffer lock. However, if the final unpin from journal IO completion (or CIL push abort) does not hold the last reference to the BLI, there -must- still be a transaction context that references the BLI, and so that context must perform the stale buffer completion processing before the buffer is unlocked and the BLI torn down. The fix for this is to rework the xfs_buf_item_relse() path to run stale buffer completion processing if it drops the last reference to the BLI. We still hold the buffer locked, so the buffer owner and lock context is the same as if we passed the BLI and buffer to the ->iop_unpin() context to finish stale process on journal commit. However, we have to be careful here. In a shutdown state, we can be freeing dirty BLIs from xfs_buf_item_put() via xfs_trans_brelse() and xfs_trans_bdetach(). The existing code handles this case by considering shutdown state as "aborted", but in doing so largely masks the failure to clean up stale BLI state from the xfs_buf_item_relse() path. i.e regardless of the shutdown state and whether the item is in the AIL, we must finish the stale buffer cleanup if we are are dropping the last BLI reference from the ->iop_relse path in transaction commit context. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-06-27xfs: factor out stale buffer item completionDave Chinner
The stale buffer item completion handling is currently only done from BLI unpinning. We need to perform this function from where-ever the last reference to the BLI is dropped, so first we need to factor this code out into a helper. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-06-27xfs: rearrange code in xfs_buf_item.cDave Chinner
The code to initialise, release and free items is all the way down the bottom of the file. Upcoming fixes need to these functions earlier in the file, so move them to the top. There is one code change in this move - the parameter to xfs_buf_item_relse() is changed from the xfs_buf to the xfs_buf_log_item - the thing that the function is releasing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-06-27xfs: add tracepoints for stale pinned inode state debugDave Chinner
I needed more insight into how stale inodes were getting stuck on the AIL after a forced shutdown when running fsstress. These are the tracepoints I added for that purpose. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-06-27xfs: avoid dquot buffer pin deadlockDave Chinner
On shutdown when quotas are enabled, the shutdown can deadlock trying to unpin the dquot buffer buf_log_item like so: [ 3319.483590] task:kworker/20:0H state:D stack:14360 pid:1962230 tgid:1962230 ppid:2 task_flags:0x4208060 flags:0x00004000 [ 3319.493966] Workqueue: xfs-log/dm-6 xlog_ioend_work [ 3319.498458] Call Trace: [ 3319.500800] <TASK> [ 3319.502809] __schedule+0x699/0xb70 [ 3319.512672] schedule+0x64/0xd0 [ 3319.515573] schedule_timeout+0x30/0xf0 [ 3319.528125] __down_common+0xc3/0x200 [ 3319.531488] __down+0x1d/0x30 [ 3319.534186] down+0x48/0x50 [ 3319.540501] xfs_buf_lock+0x3d/0xe0 [ 3319.543609] xfs_buf_item_unpin+0x85/0x1b0 [ 3319.547248] xlog_cil_committed+0x289/0x570 [ 3319.571411] xlog_cil_process_committed+0x6d/0x90 [ 3319.575590] xlog_state_shutdown_callbacks+0x52/0x110 [ 3319.580017] xlog_force_shutdown+0x169/0x1a0 [ 3319.583780] xlog_ioend_work+0x7c/0xb0 [ 3319.587049] process_scheduled_works+0x1d6/0x400 [ 3319.591127] worker_thread+0x202/0x2e0 [ 3319.594452] kthread+0x20c/0x240 The CIL push has seen the deadlock, so it has aborted the push and is running CIL checkpoint completion to abort all the items in the checkpoint. This calls ->iop_unpin(remove = true) to clean up the log items in the checkpoint. When a buffer log item is unpined like this, it needs to lock the buffer to run io completion to correctly fail the buffer and run all the required completions to fail attached log items as well. In this case, the attempt to lock the buffer on unpin is hanging because the buffer is already locked. I suspected a leaked XFS_BLI_HOLD state because of XFS_BLI_STALE handling changes I was testing, so I went looking for pin events on HOLD buffers and unpin events on locked buffer. That isolated this one buffer with these two events: xfs_buf_item_pin: dev 251:6 daddr 0xa910 bbcount 0x2 hold 2 pincount 0 lock 0 flags DONE|KMEM recur 0 refcount 1 bliflags HOLD|DIRTY|LOGGED liflags DIRTY .... xfs_buf_item_unpin: dev 251:6 daddr 0xa910 bbcount 0x2 hold 4 pincount 1 lock 0 flags DONE|KMEM recur 0 refcount 1 bliflags DIRTY liflags ABORTED Firstly, bbcount = 0x2, which means it is not a single sector structure. That rules out every xfs_trans_bhold() case except one: dquot buffers. Then hung task dumping gave this trace: [ 3197.312078] task:fsync-tester state:D stack:12080 pid:2051125 tgid:2051125 ppid:1643233 task_flags:0x400000 flags:0x00004002 [ 3197.323007] Call Trace: [ 3197.325581] <TASK> [ 3197.327727] __schedule+0x699/0xb70 [ 3197.334582] schedule+0x64/0xd0 [ 3197.337672] schedule_timeout+0x30/0xf0 [ 3197.350139] wait_for_completion+0xbd/0x180 [ 3197.354235] __flush_workqueue+0xef/0x4e0 [ 3197.362229] xlog_cil_force_seq+0xa0/0x300 [ 3197.374447] xfs_log_force+0x77/0x230 [ 3197.378015] xfs_qm_dqunpin_wait+0x49/0xf0 [ 3197.382010] xfs_qm_dqflush+0x55/0x460 [ 3197.385663] xfs_qm_dquot_isolate+0x29e/0x4d0 [ 3197.389977] __list_lru_walk_one+0x141/0x220 [ 3197.398867] list_lru_walk_one+0x10/0x20 [ 3197.402713] xfs_qm_shrink_scan+0x6a/0x100 [ 3197.406699] do_shrink_slab+0x18a/0x350 [ 3197.410512] shrink_slab+0xf7/0x430 [ 3197.413967] drop_slab+0x97/0xf0 [ 3197.417121] drop_caches_sysctl_handler+0x59/0xc0 [ 3197.421654] proc_sys_call_handler+0x18b/0x280 [ 3197.426050] proc_sys_write+0x13/0x20 [ 3197.429750] vfs_write+0x2b8/0x3e0 [ 3197.438532] ksys_write+0x7e/0xf0 [ 3197.441742] __x64_sys_write+0x1b/0x30 [ 3197.445363] x64_sys_call+0x2c72/0x2f60 [ 3197.449044] do_syscall_64+0x6c/0x140 [ 3197.456341] entry_SYSCALL_64_after_hwframe+0x76/0x7e Yup, another test run by check-parallel is running drop_caches concurrently and the dquot shrinker for the hung filesystem is running. That's trying to flush a dirty dquot from reclaim context, and it waiting on a log force to complete. xfs_qm_dqflush is called with the dquot buffer held locked, and so we've called xfs_log_force() with that buffer locked. Now the log force is waiting for a workqueue flush to complete, and that workqueue flush is waiting of CIL checkpoint processing to finish. The CIL checkpoint processing is aborting all the log items it has, and that requires locking aborted buffers to cancel them. Now, normally this isn't a problem if we are issuing a log force to unpin an object, because the ->iop_unpin() method wakes pin waiters first. That results in the pin waiter finishing off whatever it was doing, dropping the lock and then xfs_buf_item_unpin() can lock the buffer and fail it. However, xfs_qm_dqflush() is waiting on the -dquot- unpin event, not the dquot buffer unpin event, and so it never gets woken and so does not drop the buffer lock. Inodes do not have this problem, as they can only be written from one spot (->iop_push) whilst dquots can be written from multiple places (memory reclaim, ->iop_push, xfs_dq_dqpurge, and quotacheck). The reason that the dquot buffer has an attached buffer log item is that it has been recently allocated. Initialisation of the dquot buffer logs the buffer directly, thereby pinning it in memory. We then modify the dquot in a separate operation, and have memory reclaim racing with a shutdown and we trigger this deadlock. check-parallel reproduces this reliably on 1kB FSB filesystems with quota enabled because it does all of these things concurrently without having to explicitly write tests to exercise these corner case conditions. xfs_qm_dquot_logitem_push() doesn't have this deadlock because it checks if the dquot is pinned before locking the dquot buffer and skipping it if it is pinned. This means the xfs_qm_dqunpin_wait() log force in xfs_qm_dqflush() never triggers and we unlock the buffer safely allowing a concurrent shutdown to fail the buffer appropriately. xfs_qm_dqpurge() could have this problem as it is called from quotacheck and we might have allocated dquot buffers when recording the quota updates. This can be fixed by calling xfs_qm_dqunpin_wait() before we lock the dquot buffer. Because we hold the dquot locked, nothing will be able to add to the pin count between the unpin_wait and the dqflush callout, so this now makes xfs_qm_dqpurge() safe against this race. xfs_qm_dquot_isolate() can also be fixed this same way but, quite frankly, we shouldn't be doing IO in memory reclaim context. If the dquot is pinned or dirty, simply rotate it and let memory reclaim come back to it later, same as we do for inodes. This then gets rid of the nasty issue in xfs_qm_flush_one() where quotacheck writeback races with memory reclaim flushing the dquots. We can lift xfs_qm_dqunpin_wait() up into this code, then get rid of the "can't get the dqflush lock" buffer write to cycle the dqlfush lock and enable it to be flushed again. checking if the dquot is pinned and returning -EAGAIN so that the dquot walk will revisit the dquot again later. Finally, with xfs_qm_dqunpin_wait() lifted into all the callers, we can remove it from the xfs_qm_dqflush() code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-06-27xfs: catch stale AGF/AGF metadataDave Chinner
There is a race condition that can trigger in dmflakey fstests that can result in asserts in xfs_ialloc_read_agi() and xfs_alloc_read_agf() firing. The asserts look like this: XFS: Assertion failed: pag->pagf_freeblks == be32_to_cpu(agf->agf_freeblks), file: fs/xfs/libxfs/xfs_alloc.c, line: 3440 ..... Call Trace: <TASK> xfs_alloc_read_agf+0x2ad/0x3a0 xfs_alloc_fix_freelist+0x280/0x720 xfs_alloc_vextent_prepare_ag+0x42/0x120 xfs_alloc_vextent_iterate_ags+0x67/0x260 xfs_alloc_vextent_start_ag+0xe4/0x1c0 xfs_bmapi_allocate+0x6fe/0xc90 xfs_bmapi_convert_delalloc+0x338/0x560 xfs_map_blocks+0x354/0x580 iomap_writepages+0x52b/0xa70 xfs_vm_writepages+0xd7/0x100 do_writepages+0xe1/0x2c0 __writeback_single_inode+0x44/0x340 writeback_sb_inodes+0x2d0/0x570 __writeback_inodes_wb+0x9c/0xf0 wb_writeback+0x139/0x2d0 wb_workfn+0x23e/0x4c0 process_scheduled_works+0x1d4/0x400 worker_thread+0x234/0x2e0 kthread+0x147/0x170 ret_from_fork+0x3e/0x50 ret_from_fork_asm+0x1a/0x30 I've seen the AGI variant from scrub running on the filesysetm after unmount failed due to systemd interference: XFS: Assertion failed: pag->pagi_freecount == be32_to_cpu(agi->agi_freecount) || xfs_is_shutdown(pag->pag_mount), file: fs/xfs/libxfs/xfs_ialloc.c, line: 2804 ..... Call Trace: <TASK> xfs_ialloc_read_agi+0xee/0x150 xchk_perag_drain_and_lock+0x7d/0x240 xchk_ag_init+0x34/0x90 xchk_inode_xref+0x7b/0x220 xchk_inode+0x14d/0x180 xfs_scrub_metadata+0x2e2/0x510 xfs_ioc_scrub_metadata+0x62/0xb0 xfs_file_ioctl+0x446/0xbf0 __se_sys_ioctl+0x6f/0xc0 __x64_sys_ioctl+0x1d/0x30 x64_sys_call+0x1879/0x2ee0 do_syscall_64+0x68/0x130 ? exc_page_fault+0x62/0xc0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Essentially, it is the same problem. When _flakey_drop_and_remount() loads the drop-writes table, it makes all writes silently fail. Writes are reported to the fs as completed successfully, but they are not issued to the backing store. The filesystem sees the successful write completion and marks the metadata buffer clean and removes it from the AIL. If this happens at the same time as memory pressure is occuring, the now-clean AGF and/or AGI buffers can be reclaimed from memory. Shortly afterwards, but before _flakey_drop_and_remount() runs unmount, background writeback is kicked and it tries to allocate blocks for the dirty pages in memory. This then tries to access the AGF buffer we just turfed out of memory. It's not found, so it gets read in from disk. This is all fine, except for the fact that the last writeback of the AGF did not actually reach disk. The AGF on disk is stale compared to the in-memory state held by the perag, and so they don't match and the assert fires. Then other operations on that inode hang because the task was killed whilst holding inode locks. e.g: Workqueue: xfs-conv/dm-12 xfs_end_io Call Trace: <TASK> __schedule+0x650/0xb10 schedule+0x6d/0xf0 schedule_preempt_disabled+0x15/0x30 rwsem_down_write_slowpath+0x31a/0x5f0 down_write+0x43/0x60 xfs_ilock+0x1a8/0x210 xfs_trans_alloc_inode+0x9c/0x240 xfs_iomap_write_unwritten+0xe3/0x300 xfs_end_ioend+0x90/0x130 xfs_end_io+0xce/0x100 process_scheduled_works+0x1d4/0x400 worker_thread+0x234/0x2e0 kthread+0x147/0x170 ret_from_fork+0x3e/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> and it's all down hill from there. Memory pressure is one way to trigger this, another is to run "echo 3 > /proc/sys/vm/drop_caches" randomly while tests are running. Regardless of how it is triggered, this effectively takes down the system once umount hangs because it's holding a sb->s_umount lock exclusive and now every sync(1) call gets stuck on it. Fix this by replacing the asserts with a corruption detection check and a shutdown. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-06-27xfs: xfs_ifree_cluster vs xfs_iflush_shutdown_abort deadlockDave Chinner
Lock order of xfs_ifree_cluster() is cluster buffer -> try ILOCK -> IFLUSHING, except for the last inode in the cluster that is triggering the free. In that case, the lock order is ILOCK -> cluster buffer -> IFLUSHING. xfs_iflush_cluster() uses cluster buffer -> try ILOCK -> IFLUSHING, so this can safely run concurrently with xfs_ifree_cluster(). xfs_inode_item_precommit() uses ILOCK -> cluster buffer, but this cannot race with xfs_ifree_cluster() so being in a different order will not trigger a deadlock. xfs_reclaim_inode() during a filesystem shutdown uses ILOCK -> IFLUSHING -> cluster buffer via xfs_iflush_shutdown_abort(), and this deadlocks against xfs_ifree_cluster() like so: sysrq: Show Blocked State task:kworker/10:37 state:D stack:12560 pid:276182 tgid:276182 ppid:2 flags:0x00004000 Workqueue: xfs-inodegc/dm-3 xfs_inodegc_worker Call Trace: <TASK> __schedule+0x650/0xb10 schedule+0x6d/0xf0 schedule_timeout+0x8b/0x180 schedule_timeout_uninterruptible+0x1e/0x30 xfs_ifree+0x326/0x730 xfs_inactive_ifree+0xcb/0x230 xfs_inactive+0x2c8/0x380 xfs_inodegc_worker+0xaa/0x180 process_scheduled_works+0x1d4/0x400 worker_thread+0x234/0x2e0 kthread+0x147/0x170 ret_from_fork+0x3e/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> task:fsync-tester state:D stack:12160 pid:2255943 tgid:2255943 ppid:3988702 flags:0x00004006 Call Trace: <TASK> __schedule+0x650/0xb10 schedule+0x6d/0xf0 schedule_timeout+0x31/0x180 __down_common+0xbe/0x1f0 __down+0x1d/0x30 down+0x48/0x50 xfs_buf_lock+0x3d/0xe0 xfs_iflush_shutdown_abort+0x51/0x1e0 xfs_icwalk_ag+0x386/0x690 xfs_reclaim_inodes_nr+0x114/0x160 xfs_fs_free_cached_objects+0x19/0x20 super_cache_scan+0x17b/0x1a0 do_shrink_slab+0x180/0x350 shrink_slab+0xf8/0x430 drop_slab+0x97/0xf0 drop_caches_sysctl_handler+0x59/0xc0 proc_sys_call_handler+0x189/0x280 proc_sys_write+0x13/0x20 vfs_write+0x33d/0x3f0 ksys_write+0x7c/0xf0 __x64_sys_write+0x1b/0x30 x64_sys_call+0x271d/0x2ee0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x76/0x7e We can't change the lock order of xfs_ifree_cluster() - XFS_ISTALE and XFS_IFLUSHING are serialised through to journal IO completion by the cluster buffer lock being held. There's quite a few asserts in the code that check that XFS_ISTALE does not occur out of sync with buffer locking (e.g. in xfs_iflush_cluster). There's also a dependency on the inode log item being removed from the buffer before XFS_IFLUSHING is cleared, also with asserts that trigger on this. Further, we don't have a requirement for the inode to be locked when completing or aborting inode flushing because all the inode state updates are serialised by holding the cluster buffer lock across the IO to completion. We can't check for XFS_IRECLAIM in xfs_ifree_mark_inode_stale() and skip the inode, because there is no guarantee that the inode will be reclaimed. Hence it *must* be marked XFS_ISTALE regardless of whether reclaim is preparing to free that inode. Similarly, we can't check for IFLUSHING before locking the inode because that would result in dirty inodes not being marked with ISTALE in the event of racing with XFS_IRECLAIM. Hence we have to address this issue from the xfs_reclaim_inode() side. It is clear that we cannot hold the inode locked here when calling xfs_iflush_shutdown_abort() because it is the inode->buffer lock order that causes the deadlock against xfs_ifree_cluster(). Hence we need to drop the ILOCK before aborting the inode in the shutdown case. Once we've aborted the inode, we can grab the ILOCK again and then immediately reclaim it as it is now guaranteed to be clean. Note that dropping the ILOCK in xfs_reclaim_inode() means that it can now be locked by xfs_ifree_mark_inode_stale() and seen whilst in this state. This is safe because we have left the XFS_IFLUSHING flag on the inode and so xfs_ifree_mark_inode_stale() will simply set XFS_ISTALE and move to the next inode. An ASSERT check in this path needs to be tweaked to take into account this new shutdown interaction. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-06-26Merge tag 'bcachefs-2025-06-26' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: - Lots of small check/repair fixes, primarily in subvol loop and directory structure loop (when involving snapshots). - Fix a few 6.16 regressions: rare UAF in the foreground allocator path when taking a transaction restart from the transaction bump allocator, and some small fallout from the change to log the error being corrected in the journal when repairing errors, also some fallout from the btree node read error logging improvements. (Alan, Bharadwaj) - New option: journal_rewind This lets the entire filesystem be reset to an earlier point in time. Note that this is only a disaster recovery tool, and right now there are major caveats to using it (discards should be disabled, in particular), but it successfully restored the filesystem of one of the users who was bit by the subvolume deletion bug and didn't have backups. I'll likely be making some changes to the discard path in the future to make this a reliable recovery tool. - Some new btree iterator tracepoints, for tracking down some livelock-ish behaviour we've been seeing in the main data write path. * tag 'bcachefs-2025-06-26' of git://evilpiepirate.org/bcachefs: (51 commits) bcachefs: Plumb correct ip to trans_relock_fail tracepoint bcachefs: Ensure we rewind to run recovery passes bcachefs: Ensure btree node scan runs before checking for scanned nodes bcachefs: btree_root_unreadable_and_scan_found_nothing should not be autofix bcachefs: fix bch2_journal_keys_peek_prev_min() underflow bcachefs: Use wait_on_allocator() when allocating journal bcachefs: Check for bad write buffer key when moving from journal bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blocked bcachefs: Fix range in bch2_lookup_indirect_extent() error path bcachefs: fix spurious error_throw bcachefs: Add missing bch2_err_class() to fileattr_set() bcachefs: Add missing key type checks to check_snapshot_exists() bcachefs: Don't log fsck err in the journal if doing repair elsewhere bcachefs: Fix *__bch2_trans_subbuf_alloc() error path bcachefs: Fix missing newlines before ero bcachefs: fix spurious error in read_btree_roots() bcachefs: fsck: Fix oops in key_visible_in_snapshot() bcachefs: fsck: fix unhandled restart in topology repair bcachefs: fsck: Fix check_directory_structure when no check_dirents bcachefs: Fix restart handling in btree_node_scrub_work() ...
2025-06-26NFSv4/flexfiles: Fix handling of NFS level errors in I/OTrond Myklebust
Allow the flexfiles error handling to recognise NFS level errors (as opposed to RPC level errors) and handle them separately. The main motivator is the NFSERR_PERM errors that get returned if the NFS client connects to the data server through a port number that is lower than 1024. In that case, the client should disconnect and retry a READ on a different data server, or it should retry a WRITE after reconnecting. Reviewed-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-06-26cifs: Fix reading into an ITER_FOLIOQ from the smbdirect codeDavid Howells
When performing a file read from RDMA, smbd_recv() prints an "Invalid msg type 4" error and fails the I/O. This is due to the switch-statement there not handling the ITER_FOLIOQ handed down from netfslib. Fix this by collapsing smbd_recv_buf() and smbd_recv_page() into smbd_recv() and just using copy_to_iter() instead of memcpy(). This future-proofs the function too, in case more ITER_* types are added. Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: Tom Talpey <tom@talpey.com> cc: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-26cifs: Fix the smbd_response slab to allow usercopyDavid Howells
The handling of received data in the smbdirect client code involves using copy_to_iter() to copy data from the smbd_reponse struct's packet trailer to a folioq buffer provided by netfslib that encapsulates a chunk of pagecache. If, however, CONFIG_HARDENED_USERCOPY=y, this will result in the checks then performed in copy_to_iter() oopsing with something like the following: CIFS: Attempting to mount //172.31.9.1/test CIFS: VFS: RDMA transport established usercopy: Kernel memory exposure attempt detected from SLUB object 'smbd_response_0000000091e24ea1' (offset 81, size 63)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102! ... RIP: 0010:usercopy_abort+0x6c/0x80 ... Call Trace: <TASK> __check_heap_object+0xe3/0x120 __check_object_size+0x4dc/0x6d0 smbd_recv+0x77f/0xfe0 [cifs] cifs_readv_from_socket+0x276/0x8f0 [cifs] cifs_read_from_socket+0xcd/0x120 [cifs] cifs_demultiplex_thread+0x7e9/0x2d50 [cifs] kthread+0x396/0x830 ret_from_fork+0x2b8/0x3b0 ret_from_fork_asm+0x1a/0x30 The problem is that the smbd_response slab's packet field isn't marked as being permitted for usercopy. Fix this by passing parameters to kmem_slab_create() to indicate that copy_to_iter() is permitted from the packet region of the smbd_response slab objects, less the header space. Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Reported-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/acb7f612-df26-4e2a-a35d-7cd040f513e1@samba.org/ Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Stefan Metzmacher <metze@samba.org> Tested-by: Stefan Metzmacher <metze@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-26smb: client: fix potential deadlock when reconnecting channelsPaulo Alcantara
Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order and prevent the following deadlock from happening ====================================================== WARNING: possible circular locking dependency detected 6.16.0-rc3-build2+ #1301 Tainted: G S W ------------------------------------------------------ cifsd/6055 is trying to acquire lock: ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200 but task is already holding lock: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&ret_buf->chan_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_setup_session+0x81/0x4b0 cifs_get_smb_ses+0x771/0x900 cifs_mount_get_session+0x7e/0x170 cifs_mount+0x92/0x2d0 cifs_smb3_do_mount+0x161/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&ret_buf->ses_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_match_super+0x101/0x320 sget+0xab/0x270 cifs_smb3_do_mount+0x1e0/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}: check_noncircular+0x95/0xc0 check_prev_add+0x115/0x2f0 validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_signal_cifsd_for_reconnect+0x134/0x200 __cifs_reconnect+0x8f/0x500 cifs_handle_standard+0x112/0x280 cifs_demultiplex_thread+0x64d/0xbc0 kthread+0x2f7/0x310 ret_from_fork+0x2a/0x230 ret_from_fork_asm+0x1a/0x30 other info that might help us debug this: Chain exists of: &tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ret_buf->chan_lock); lock(&ret_buf->ses_lock); lock(&ret_buf->chan_lock); lock(&tcp_ses->srv_lock); *** DEADLOCK *** 3 locks held by cifsd/6055: #0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200 #1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200 #2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 Cc: linux-cifs@vger.kernel.org Reported-by: David Howells <dhowells@redhat.com> Fixes: d7d7a66aacd6 ("cifs: avoid use of global locks for high contention data") Reviewed-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-26bcachefs: Plumb correct ip to trans_relock_fail tracepointKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-26bcachefs: Ensure we rewind to run recovery passesKent Overstreet
Fix a 6.16 regression from the recovery pass rework, which introduced a bug where calling bch2_run_explicit_recovery_pass() would only return the error code to rewind recovery for the first call that scheduled that recovery pass. If the error code from the first call was swallowed (because it was called by an asynchronous codepath), subsequent calls would go "ok, this pass is already marked as needing to run" and return 0. Fixing this ensures that check_topology bails out to run btree_node_scan before doing any repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-26bcachefs: Ensure btree node scan runs before checking for scanned nodesKent Overstreet
Previously, calling bch2_btree_has_scanned_nodes() when btree node scan hadn't actually run would erroniously return false - causing us to think a btree was entirely gone. This fixes a 6.16 regression from moving the scheduling of btree node scan out of bch2_btree_lost_data() (fixing the bug where we'd schedule it persistently in the superblock) and only scheduling it when check_toploogy() is asking for scanned btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-26bcachefs: btree_root_unreadable_and_scan_found_nothing should not be autofixKent Overstreet
Autofix is specified in btree_gc.c if it's not an important btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-25Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull mount fixes from Al Viro: "Several mount-related fixes" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: userns and mnt_idmap leak in open_tree_attr(2) attach_recursive_mnt(): do not lock the covering tree when sliding something under it replace collect_mounts()/drop_collected_mounts() with a safer variant
2025-06-25fuse: fix runtime warning on truncate_folio_batch_exceptionals()Haiyue Wang
The WARN_ON_ONCE is introduced on truncate_folio_batch_exceptionals() to capture whether the filesystem has removed all DAX entries or not. And the fix has been applied on the filesystem xfs and ext4 by the commit 0e2f80afcfa6 ("fs/dax: ensure all pages are idle prior to filesystem unmount"). Apply the missed fix on filesystem fuse to fix the runtime warning: [ 2.011450] ------------[ cut here ]------------ [ 2.011873] WARNING: CPU: 0 PID: 145 at mm/truncate.c:89 truncate_folio_batch_exceptionals+0x272/0x2b0 [ 2.012468] Modules linked in: [ 2.012718] CPU: 0 UID: 1000 PID: 145 Comm: weston Not tainted 6.16.0-rc2-WSL2-STABLE #2 PREEMPT(undef) [ 2.013292] RIP: 0010:truncate_folio_batch_exceptionals+0x272/0x2b0 [ 2.013704] Code: 48 63 d0 41 29 c5 48 8d 1c d5 00 00 00 00 4e 8d 6c 2a 01 49 c1 e5 03 eb 09 48 83 c3 08 49 39 dd 74 83 41 f6 44 1c 08 01 74 ef <0f> 0b 49 8b 34 1e 48 89 ef e8 10 a2 17 00 eb df 48 8b 7d 00 e8 35 [ 2.014845] RSP: 0018:ffffa47ec33f3b10 EFLAGS: 00010202 [ 2.015279] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 2.015884] RDX: 0000000000000000 RSI: ffffa47ec33f3ca0 RDI: ffff98aa44f3fa80 [ 2.016377] RBP: ffff98aa44f3fbf0 R08: ffffa47ec33f3ba8 R09: 0000000000000000 [ 2.016942] R10: 0000000000000001 R11: 0000000000000000 R12: ffffa47ec33f3ca0 [ 2.017437] R13: 0000000000000008 R14: ffffa47ec33f3ba8 R15: 0000000000000000 [ 2.017972] FS: 000079ce006afa40(0000) GS:ffff98aade441000(0000) knlGS:0000000000000000 [ 2.018510] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.018987] CR2: 000079ce03e74000 CR3: 000000010784f006 CR4: 0000000000372eb0 [ 2.019518] Call Trace: [ 2.019729] <TASK> [ 2.019901] truncate_inode_pages_range+0xd8/0x400 [ 2.020280] ? timerqueue_add+0x66/0xb0 [ 2.020574] ? get_nohz_timer_target+0x2a/0x140 [ 2.020904] ? timerqueue_add+0x66/0xb0 [ 2.021231] ? timerqueue_del+0x2e/0x50 [ 2.021646] ? __remove_hrtimer+0x39/0x90 [ 2.022017] ? srso_alias_untrain_ret+0x1/0x10 [ 2.022497] ? psi_group_change+0x136/0x350 [ 2.023046] ? _raw_spin_unlock+0xe/0x30 [ 2.023514] ? finish_task_switch.isra.0+0x8d/0x280 [ 2.024068] ? __schedule+0x532/0xbd0 [ 2.024551] fuse_evict_inode+0x29/0x190 [ 2.025131] evict+0x100/0x270 [ 2.025641] ? _atomic_dec_and_lock+0x39/0x50 [ 2.026316] ? __pfx_generic_delete_inode+0x10/0x10 [ 2.026843] __dentry_kill+0x71/0x180 [ 2.027335] dput+0xeb/0x1b0 [ 2.027725] __fput+0x136/0x2b0 [ 2.028054] __x64_sys_close+0x3d/0x80 [ 2.028469] do_syscall_64+0x6d/0x1b0 [ 2.028832] ? clear_bhb_loop+0x30/0x80 [ 2.029182] ? clear_bhb_loop+0x30/0x80 [ 2.029533] ? clear_bhb_loop+0x30/0x80 [ 2.029902] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 2.030423] RIP: 0033:0x79ce03d0d067 [ 2.030820] Code: b8 ff ff ff ff e9 3e ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 c3 a7 f8 ff [ 2.032354] RSP: 002b:00007ffef0498948 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 2.032939] RAX: ffffffffffffffda RBX: 00007ffef0498960 RCX: 000079ce03d0d067 [ 2.033612] RDX: 0000000000000003 RSI: 0000000000001000 RDI: 000000000000000d [ 2.034289] RBP: 00007ffef0498a30 R08: 000000000000000d R09: 0000000000000000 [ 2.034944] R10: 00007ffef0498978 R11: 0000000000000246 R12: 0000000000000001 [ 2.035610] R13: 00007ffef0498960 R14: 000079ce03e09ce0 R15: 0000000000000003 [ 2.036301] </TASK> [ 2.036532] ---[ end trace 0000000000000000 ]--- Link: https://lkml.kernel.org/r/20250621171507.3770-1-haiyuewa@163.com Fixes: bde708f1a65d ("fs/dax: always remove DAX page-cache entries when breaking layouts") Signed-off-by: Haiyue Wang <haiyuewa@163.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-25fs/proc/task_mmu: fix PAGE_IS_PFNZERO detection for the huge zero folioDavid Hildenbrand
is_zero_pfn() does not work for the huge zero folio. Fix it by using is_huge_zero_pmd(). This can cause the PAGEMAP_SCAN ioctl against /proc/pid/pagemap to present pages as PAGE_IS_PRESENT rather than as PAGE_IS_PFNZERO. Found by code inspection. Link: https://lkml.kernel.org/r/20250617143532.2375383-1-david@redhat.com Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-25smb: client: remove \t from TP_printk statementsStefan Metzmacher
The generate '[FAILED TO PARSE]' strings in trace-cmd report output like this: rm-5298 [001] 6084.533748493: smb3_exit_err: [FAILED TO PARSE] xid=972 func_name=cifs_rmdir rc=-39 rm-5298 [001] 6084.533959234: smb3_enter: [FAILED TO PARSE] xid=973 func_name=cifs_closedir rm-5298 [001] 6084.533967630: smb3_close_enter: [FAILED TO PARSE] xid=973 fid=94489281833 tid=1 sesid=96758029877361 rm-5298 [001] 6084.534004008: smb3_cmd_enter: [FAILED TO PARSE] tid=1 sesid=96758029877361 cmd=6 mid=566 rm-5298 [001] 6084.552248232: smb3_cmd_done: [FAILED TO PARSE] tid=1 sesid=96758029877361 cmd=6 mid=566 rm-5298 [001] 6084.552280542: smb3_close_done: [FAILED TO PARSE] xid=973 fid=94489281833 tid=1 sesid=96758029877361 rm-5298 [001] 6084.552316034: smb3_exit_done: [FAILED TO PARSE] xid=973 func_name=cifs_closedir Cc: stable@vger.kernel.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-25smb: client: let smbd_post_send_iter() respect the peers max_send_size and ↵Stefan Metzmacher
transmit all data We should not send smbdirect_data_transfer messages larger than the negotiated max_send_size, typically 1364 bytes, which means 24 bytes of the smbdirect_data_transfer header + 1340 payload bytes. This happened when doing an SMB2 write with more than 1340 bytes (which is done inline as it's below rdma_readwrite_threshold). It means the peer resets the connection. When testing between cifs.ko and ksmbd.ko something like this is logged: client: CIFS: VFS: RDMA transport re-established siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 CIFS: VFS: \\carina Send error in SessSetup = -11 smb2_reconnect: 12 callbacks suppressed CIFS: VFS: reconnect tcon failed rc = -11 CIFS: VFS: reconnect tcon failed rc = -11 CIFS: VFS: reconnect tcon failed rc = -11 CIFS: VFS: SMB: Zero rsize calculated, using minimum value 65536 and: CIFS: VFS: RDMA transport re-established siw: got TERMINATE. layer 1, type 2, code 2 CIFS: VFS: smbd_recv:1894 disconnected siw: got TERMINATE. layer 1, type 2, code 2 The ksmbd dmesg is showing things like: smb_direct: Recv error. status='local length error (1)' opcode=128 smb_direct: disconnected smb_direct: Recv error. status='local length error (1)' opcode=128 ksmbd: smb_direct: disconnected ksmbd: sock_read failed: -107 As smbd_post_send_iter() limits the transmitted number of bytes we need loop over it in order to transmit the whole iter. Reviewed-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Tested-by: Meetakshi Setiya <msetiya@microsoft.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: <stable+noautosel@kernel.org> # sp->max_send_size should be info->max_send_size in backports Fixes: 3d78fe73fa12 ("cifs: Build the RDMA SGE list directly from an iterator") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-24bcachefs: fix bch2_journal_keys_peek_prev_min() underflowKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24bcachefs: Use wait_on_allocator() when allocating journalKent Overstreet
wait_on_allocator() emits debug info when we hang trying to allocate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24bcachefs: Check for bad write buffer key when moving from journalKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blockedAlan Huang
Reported-by: syzbot+d540192e763531d307ff@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24userns and mnt_idmap leak in open_tree_attr(2)Al Viro
Once want_mount_setattr() has returned a positive, it does require finish_mount_kattr() to release ->mnt_userns. Failing do_mount_setattr() does not change that. As the result, we can end up leaking userns and possibly mnt_idmap as well. Fixes: c4a16820d901 ("fs: add open_tree_attr()") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-24fuse: fix fuse_fill_write_pages() upper bound calculationJoanne Koong
This fixes a bug in commit 63c69ad3d18a ("fuse: refactor fuse_fill_write_pages()") where max_pages << PAGE_SHIFT is mistakenly used as the calculation for the max_pages upper limit but there's the possibility that copy_folio_from_iter_atomic() may copy over bytes from the iov_iter that are less than the full length of the folio, which would lead to exceeding max_pages. This commit fixes it by adding a 'ap->num_folios < max_folios' check. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/20250614000114.910380-1-joannelkoong@gmail.com Fixes: 63c69ad3d18a ("fuse: refactor fuse_fill_write_pages()") Tested-by: Brian Foster <bfoster@redhat.com> Reported-by: Brian Foster <bfoster@redhat.com> Closes: https://lore.kernel.org/linux-fsdevel/aEq4haEQScwHIWK6@bfoster/ Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-23Merge tag 'f2fs-for-6.16-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: - fix double-unlock introduced by the recent folio conversion - fix stale page content beyond EOF complained by xfstests/generic/363 * tag 'f2fs-for-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: fix to zero post-eof page f2fs: Fix __write_node_folio() conversion
2025-06-23Merge tag 'for-6.16-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Fixes: - fix invalid inode pointer dereferences during log replay - fix a race between renames and directory logging - fix shutting down delayed iput worker - fix device byte accounting when dropping chunk - in zoned mode, fix offset calculations for DUP profile when conventional and sequential zones are used together Regression fixes: - fix possible double unlock of extent buffer tree (xarray conversion) - in zoned mode, fix extent buffer refcount when writing out extents (xarray conversion) Error handling fixes and updates: - handle unexpected extent type when replaying log - check and warn if there are remaining delayed inodes when putting a root - fix assertion when building free space tree - handle csum tree error with mount option 'rescue=ibadroot' Other: - error message updates: add prefix to all scrub related messages, include other information in messages" * tag 'for-6.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix alloc_offset calculation for partly conventional block groups btrfs: handle csum tree error with rescue=ibadroots correctly btrfs: fix race between async reclaim worker and close_ctree() btrfs: fix assertion when building free space tree btrfs: don't silently ignore unexpected extent type when replaying log btrfs: fix invalid inode pointer dereferences during log replay btrfs: fix double unlock of buffer_tree xarray when releasing subpage eb btrfs: update superblock's device bytes_used when dropping chunk btrfs: fix a race between renames and directory logging btrfs: scrub: add prefix for the error messages btrfs: warn if leaking delayed_nodes in btrfs_put_root() btrfs: fix delayed ref refcount leak in debug assertion btrfs: include root in error message when unlinking inode btrfs: don't drop a reference if btrfs_check_write_meta_pointer() fails
2025-06-23attach_recursive_mnt(): do not lock the covering tree when sliding something ↵Al Viro
under it If we are propagating across the userns boundary, we need to lock the mounts added there. However, in case when something has already been mounted there and we end up sliding a new tree under that, the stuff that had been there before should not get locked. IOW, lock_mnt_tree() should be called before we reparent the preexisting tree on top of what we are adding. Fixes: 3bd045cc9c4b ("separate copying and locking mount tree on cross-userns copies") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-23replace collect_mounts()/drop_collected_mounts() with a safer variantAl Viro
collect_mounts() has several problems - one can't iterate over the results directly, so it has to be done with callback passed to iterate_mounts(); it has an oopsable race with d_invalidate(); it creates temporary clones of mounts invisibly for sync umount (IOW, you can have non-lazy umount succeed leaving filesystem not mounted anywhere and yet still busy). A saner approach is to give caller an array of struct path that would pin every mount in a subtree, without cloning any mounts. * collect_mounts()/drop_collected_mounts()/iterate_mounts() is gone * collect_paths(where, preallocated, size) gives either ERR_PTR(-E...) or a pointer to array of struct path, one for each chunk of tree visible under 'where' (i.e. the first element is a copy of where, followed by (mount,root) for everything mounted under it - the same set collect_mounts() would give). Unlike collect_mounts(), the mounts are *not* cloned - we just get pinning references to the roots of subtrees in the caller's namespace. Array is terminated by {NULL, NULL} struct path. If it fits into preallocated array (on-stack, normally), that's where it goes; otherwise it's allocated by kmalloc_array(). Passing 0 as size means that 'preallocated' is ignored (and expected to be NULL). * drop_collected_paths(paths, preallocated) is given the array returned by an earlier call of collect_paths() and the preallocated array passed to that call. All mount/dentry references are dropped and array is kfree'd if it's not equal to 'preallocated'. * instead of iterate_mounts(), users should just iterate over array of struct path - nothing exotic is needed for that. Existing users (all in audit_tree.c) are converted. [folded a fix for braino reported by Venkat Rao Bagalkote <venkat88@linux.ibm.com>] Fixes: 80b5dce8c59b0 ("vfs: Add a function to lazily unmount all mounts from any dentry") Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-23NFSv4/pNFS: Fix a race to wake on NFS_LAYOUT_DRAINBenjamin Coddington
We found a few different systems hung up in writeback waiting on the same page lock, and one task waiting on the NFS_LAYOUT_DRAIN bit in pnfs_update_layout(), however the pnfs_layout_hdr's plh_outstanding count was zero. It seems most likely that this is another race between the waiter and waker similar to commit ed0172af5d6f ("SUNRPC: Fix a race to wake a sync task"). Fix it up by applying the advised barrier. Fixes: 880265c77ac4 ("pNFS: Avoid a live lock condition in pnfs_update_layout()") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-06-23nfs: Clean up /proc/net/rpc/nfs when nfs_fs_proc_net_init() fails.Kuniyuki Iwashima
syzbot reported a warning below [1] following a fault injection in nfs_fs_proc_net_init(). [0] When nfs_fs_proc_net_init() fails, /proc/net/rpc/nfs is not removed. Later, rpc_proc_exit() tries to remove /proc/net/rpc, and the warning is logged as the directory is not empty. Let's handle the error of nfs_fs_proc_net_init() properly. [0]: FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 1 UID: 0 PID: 6120 Comm: syz.2.27 Not tainted 6.16.0-rc1-syzkaller-00010-g2c4a1f3fe03e #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) should_fail_ex (lib/fault-inject.c:73 lib/fault-inject.c:174) should_failslab (mm/failslab.c:46) kmem_cache_alloc_noprof (mm/slub.c:4178 mm/slub.c:4204) __proc_create (fs/proc/generic.c:427) proc_create_reg (fs/proc/generic.c:554) proc_create_net_data (fs/proc/proc_net.c:120) nfs_fs_proc_net_init (fs/nfs/client.c:1409) nfs_net_init (fs/nfs/inode.c:2600) ops_init (net/core/net_namespace.c:138) setup_net (net/core/net_namespace.c:443) copy_net_ns (net/core/net_namespace.c:576) create_new_namespaces (kernel/nsproxy.c:110) unshare_nsproxy_namespaces (kernel/nsproxy.c:218 (discriminator 4)) ksys_unshare (kernel/fork.c:3123) __x64_sys_unshare (kernel/fork.c:3190) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) </TASK> [1]: remove_proc_entry: removing non-empty directory 'net/rpc', leaking at least 'nfs' WARNING: CPU: 1 PID: 6120 at fs/proc/generic.c:727 remove_proc_entry+0x45e/0x530 fs/proc/generic.c:727 Modules linked in: CPU: 1 UID: 0 PID: 6120 Comm: syz.2.27 Not tainted 6.16.0-rc1-syzkaller-00010-g2c4a1f3fe03e #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 RIP: 0010:remove_proc_entry+0x45e/0x530 fs/proc/generic.c:727 Code: 3c 02 00 0f 85 85 00 00 00 48 8b 93 d8 00 00 00 4d 89 f0 4c 89 e9 48 c7 c6 40 ba a2 8b 48 c7 c7 60 b9 a2 8b e8 33 81 1d ff 90 <0f> 0b 90 90 e9 5f fe ff ff e8 04 69 5e ff 90 48 b8 00 00 00 00 00 RSP: 0018:ffffc90003637b08 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88805f534140 RCX: ffffffff817a92c8 RDX: ffff88807da99e00 RSI: ffffffff817a92d5 RDI: 0000000000000001 RBP: ffff888033431ac0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888033431a00 R13: ffff888033431ae4 R14: ffff888033184724 R15: dffffc0000000000 FS: 0000555580328500(0000) GS:ffff888124a62000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f71733743e0 CR3: 000000007f618000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> sunrpc_exit_net+0x46/0x90 net/sunrpc/sunrpc_syms.c:76 ops_exit_list net/core/net_namespace.c:200 [inline] ops_undo_list+0x2eb/0xab0 net/core/net_namespace.c:253 setup_net+0x2e1/0x510 net/core/net_namespace.c:457 copy_net_ns+0x2a6/0x5f0 net/core/net_namespace.c:574 create_new_namespaces+0x3ea/0xa90 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:218 ksys_unshare+0x45b/0xa40 kernel/fork.c:3121 __do_sys_unshare kernel/fork.c:3192 [inline] __se_sys_unshare kernel/fork.c:3190 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:3190 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x490 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fa1a6b8e929 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff3a090368 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007fa1a6db5fa0 RCX: 00007fa1a6b8e929 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000080 RBP: 00007fa1a6c10b39 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fa1a6db5fa0 R14: 00007fa1a6db5fa0 R15: 0000000000000001 </TASK> Fixes: d47151b79e32 ("nfs: expose /proc/net/sunrpc/nfs in net namespaces") Reported-by: syzbot+a4cc4ac22daa4a71b87c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a4cc4ac22daa4a71b87c Tested-by: syzbot+a4cc4ac22daa4a71b87c@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-06-23smb: client: fix regression with native SMB symlinksPaulo Alcantara
Some users and customers reported that their backup/copy tools started to fail when the directory being copied contained symlink targets that the client couldn't parse - even when those symlinks weren't followed. Fix this by allowing lstat(2) and readlink(2) to succeed even when the client can't resolve the symlink target, restoring old behavior. Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Reported-by: Remy Monsen <monsen@monsen.cc> Closes: https://lore.kernel.org/r/CAN+tdP7y=jqw3pBndZAGjQv0ObFq8Q=+PUDHgB36HdEz9QA6FQ@mail.gmail.com Reported-by: Pierguido Lambri <plambri@redhat.com> Fixes: 12b466eb52d9 ("cifs: Fix creating and resolving absolute NT-style symlinks") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-23fs: export anon_inode_make_secure_inode() and fix secretmem LSM bypassShivank Garg
Export anon_inode_make_secure_inode() to allow KVM guest_memfd to create anonymous inodes with proper security context. This replaces the current pattern of calling alloc_anon_inode() followed by inode_init_security_anon() for creating security context manually. This change also fixes a security regression in secretmem where the S_PRIVATE flag was not cleared after alloc_anon_inode(), causing LSM/SELinux checks to be bypassed for secretmem file descriptors. As guest_memfd currently resides in the KVM module, we need to export this symbol for use outside the core kernel. In the future, guest_memfd might be moved to core-mm, at which point the symbols no longer would have to be exported. When/if that happens is still unclear. Fixes: 2bfe15c52612 ("mm: create security context for memfd_secret inodes") Suggested-by: David Hildenbrand <david@redhat.com> Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Shivank Garg <shivankg@amd.com> Link: https://lore.kernel.org/20250620070328.803704-3-shivankg@amd.com Acked-by: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-22Merge tag 'x86_urgent_for_v6.16_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure the array tracking which kernel text positions need to be alternatives-patched doesn't get mishandled by out-of-order modifications, leading to it overflowing and causing page faults when patching - Avoid an infinite loop when early code does a ranged TLB invalidation before the broadcast TLB invalidation count of how many pages it can flush, has been read from CPUID - Fix a CONFIG_MODULES typo - Disable broadcast TLB invalidation when PTI is enabled to avoid an overflow of the bitmap tracking dynamic ASIDs which need to be flushed when the kernel switches between the user and kernel address space - Handle the case of a CPU going offline and thus reporting zeroes when reading top-level events in the resctrl code * tag 'x86_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Fix int3 handling failure from broken text_poke array x86/mm: Fix early boot use of INVPLGB x86/its: Fix an ifdef typo in its_alloc() x86/mm: Disable INVLPGB when PTI is enabled x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem
2025-06-22Merge tag 'v6.16-rc2-smb3-client-fixes-v2' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Multichannel channel allocation fix for Kerberos mounts - Two reconnect fixes - Fix netfs_writepages crash with smbdirect/RDMA - Directory caching fix - Three minor cleanup fixes - Log error when close cached dirs fails * tag 'v6.16-rc2-smb3-client-fixes-v2' of git://git.samba.org/sfrench/cifs-2.6: smb: minor fix to use SMB2_NTLMV2_SESSKEY_SIZE for auth_key size smb: minor fix to use sizeof to initialize flags_string buffer smb: Use loff_t for directory position in cached_dirents smb: Log an error when close_all_cached_dirs fails cifs: Fix prepare_write to negotiate wsize if needed smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma() smb: client: fix first command failure during re-negotiation cifs: Remove duplicate fattr->cf_dtype assignment from wsl_to_fattr() function smb: fix secondary channel creation issue with kerberos by populating hostname when adding channels
2025-06-22bcachefs: Fix range in bch2_lookup_indirect_extent() error pathKent Overstreet
Before calling bch2_indirect_extent_missing_error(), we have to calculate the missing range, which is the intersection of the reflink pointer and the non-indirect-extent we found. The calculation didn't take into account that the returned extent may span the iter position, leading to an infinite loop when we (unnecessarily) resized the extent we were returning to one that didn't extend past the offset we were looking up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-22bcachefs: fix spurious error_throwKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-22bcachefs: Add missing bch2_err_class() to fileattr_set()Kent Overstreet
Make sure we return a standard error code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-21Merge tag 'nfsd-6.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Two fixes for commits in the nfsd-6.16 merge - One fix for the recently-added NFSD netlink facility - One fix for a remote SunRPC crasher * tag 'nfsd-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: sunrpc: handle SVC_GARBAGE during svc auth processing as auth error nfsd: use threads array as-is in netlink interface SUNRPC: Cleanup/fix initial rq_pages allocation NFSD: Avoid corruption of a referring call list
2025-06-21Merge tag 'erofs-for-6.16-rc3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Use the mounter’s credentials for file-backed mounts to resolve Android SELinux permission issues - Remove the unused trace event `erofs_destroy_inode` - Error out on crafted out-of-file-range encoded extents - Remove an incorrect check for encoded extents * tag 'erofs-for-6.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: remove a superfluous check for encoded extents erofs: refuse crafted out-of-file-range encoded extents erofs: remove unused trace event erofs_destroy_inode erofs: impersonate the opener's credentials when accessing backing file
2025-06-21smb: minor fix to use SMB2_NTLMV2_SESSKEY_SIZE for auth_key sizeBharath SM
Replaced hardcoded value 16 with SMB2_NTLMV2_SESSKEY_SIZE in the auth_key definition and memcpy call. Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-21smb: minor fix to use sizeof to initialize flags_string bufferBharath SM
Replaced hardcoded length with sizeof(flags_string). Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-21smb: Use loff_t for directory position in cached_direntsBharath SM
Change the pos field in struct cached_dirents from int to loff_t to support large directory offsets. This avoids overflow and matches kernel conventions for directory positions. Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-21smb: Log an error when close_all_cached_dirs failsPaul Aurich
Under low-memory conditions, close_all_cached_dirs() can't move the dentries to a separate list to dput() them once the locks are dropped. This will result in a "Dentry still in use" error, so add an error message that makes it clear this is what happened: [ 495.281119] CIFS: VFS: \\otters.example.com\share Out of memory while dropping dentries [ 495.281595] ------------[ cut here ]------------ [ 495.281887] BUG: Dentry ffff888115531138{i=78,n=/} still in use (2) [unmount of cifs cifs] [ 495.282391] WARNING: CPU: 1 PID: 2329 at fs/dcache.c:1536 umount_check+0xc8/0xf0 Also, bail out of looping through all tcons as soon as a single allocation fails, since we're already in trouble, and kmalloc() attempts for subseqeuent tcons are likely to fail just like the first one did. Signed-off-by: Paul Aurich <paul@darkrain42.org> Acked-by: Bharath SM <bharathsm@microsoft.com> Suggested-by: Ruben Devos <rdevos@oxya.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-21cifs: Fix prepare_write to negotiate wsize if neededDavid Howells
Fix cifs_prepare_write() to negotiate the wsize if it is unset. Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Paulo Alcantara <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-21smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma()Stefan Metzmacher
This fixes the following problem: [ 749.901015] [ T8673] run fstests cifs/001 at 2025-06-17 09:40:30 [ 750.346409] [ T9870] ================================================================== [ 750.346814] [ T9870] BUG: KASAN: slab-out-of-bounds in smb_set_sge+0x2cc/0x3b0 [cifs] [ 750.347330] [ T9870] Write of size 8 at addr ffff888011082890 by task xfs_io/9870 [ 750.347705] [ T9870] [ 750.348077] [ T9870] CPU: 0 UID: 0 PID: 9870 Comm: xfs_io Kdump: loaded Not tainted 6.16.0-rc2-metze.02+ #1 PREEMPT(voluntary) [ 750.348082] [ T9870] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 750.348085] [ T9870] Call Trace: [ 750.348086] [ T9870] <TASK> [ 750.348088] [ T9870] dump_stack_lvl+0x76/0xa0 [ 750.348106] [ T9870] print_report+0xd1/0x640 [ 750.348116] [ T9870] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 750.348120] [ T9870] ? kasan_complete_mode_report_info+0x26/0x210 [ 750.348124] [ T9870] kasan_report+0xe7/0x130 [ 750.348128] [ T9870] ? smb_set_sge+0x2cc/0x3b0 [cifs] [ 750.348262] [ T9870] ? smb_set_sge+0x2cc/0x3b0 [cifs] [ 750.348377] [ T9870] __asan_report_store8_noabort+0x17/0x30 [ 750.348381] [ T9870] smb_set_sge+0x2cc/0x3b0 [cifs] [ 750.348496] [ T9870] smbd_post_send_iter+0x1990/0x3070 [cifs] [ 750.348625] [ T9870] ? __pfx_smbd_post_send_iter+0x10/0x10 [cifs] [ 750.348741] [ T9870] ? update_stack_state+0x2a0/0x670 [ 750.348749] [ T9870] ? cifs_flush+0x153/0x320 [cifs] [ 750.348870] [ T9870] ? cifs_flush+0x153/0x320 [cifs] [ 750.348990] [ T9870] ? update_stack_state+0x2a0/0x670 [ 750.348995] [ T9870] smbd_send+0x58c/0x9c0 [cifs] [ 750.349117] [ T9870] ? __pfx_smbd_send+0x10/0x10 [cifs] [ 750.349231] [ T9870] ? unwind_get_return_address+0x65/0xb0 [ 750.349235] [ T9870] ? __pfx_stack_trace_consume_entry+0x10/0x10 [ 750.349242] [ T9870] ? arch_stack_walk+0xa7/0x100 [ 750.349250] [ T9870] ? stack_trace_save+0x92/0xd0 [ 750.349254] [ T9870] __smb_send_rqst+0x931/0xec0 [cifs] [ 750.349374] [ T9870] ? kernel_text_address+0x173/0x190 [ 750.349379] [ T9870] ? kasan_save_stack+0x39/0x70 [ 750.349382] [ T9870] ? kasan_save_track+0x18/0x70 [ 750.349385] [ T9870] ? __kasan_slab_alloc+0x9d/0xa0 [ 750.349389] [ T9870] ? __pfx___smb_send_rqst+0x10/0x10 [cifs] [ 750.349508] [ T9870] ? smb2_mid_entry_alloc+0xb4/0x7e0 [cifs] [ 750.349626] [ T9870] ? cifs_call_async+0x277/0xb00 [cifs] [ 750.349746] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs] [ 750.349867] [ T9870] ? netfs_do_issue_write+0xc2/0x340 [netfs] [ 750.349900] [ T9870] ? netfs_advance_write+0x45b/0x1270 [netfs] [ 750.349929] [ T9870] ? netfs_write_folio+0xd6c/0x1be0 [netfs] [ 750.349958] [ T9870] ? netfs_writepages+0x2e9/0xa80 [netfs] [ 750.349987] [ T9870] ? do_writepages+0x21f/0x590 [ 750.349993] [ T9870] ? filemap_fdatawrite_wbc+0xe1/0x140 [ 750.349997] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.350002] [ T9870] smb_send_rqst+0x22e/0x2f0 [cifs] [ 750.350131] [ T9870] ? __pfx_smb_send_rqst+0x10/0x10 [cifs] [ 750.350255] [ T9870] ? local_clock_noinstr+0xe/0xd0 [ 750.350261] [ T9870] ? kasan_save_alloc_info+0x37/0x60 [ 750.350268] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.350271] [ T9870] ? _raw_spin_lock+0x81/0xf0 [ 750.350275] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.350278] [ T9870] ? smb2_setup_async_request+0x293/0x580 [cifs] [ 750.350398] [ T9870] cifs_call_async+0x477/0xb00 [cifs] [ 750.350518] [ T9870] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs] [ 750.350636] [ T9870] ? __pfx_cifs_call_async+0x10/0x10 [cifs] [ 750.350756] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.350760] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.350763] [ T9870] ? __smb2_plain_req_init+0x933/0x1090 [cifs] [ 750.350891] [ T9870] smb2_async_writev+0x15ff/0x2460 [cifs] [ 750.351008] [ T9870] ? sched_clock_noinstr+0x9/0x10 [ 750.351012] [ T9870] ? local_clock_noinstr+0xe/0xd0 [ 750.351018] [ T9870] ? __pfx_smb2_async_writev+0x10/0x10 [cifs] [ 750.351144] [ T9870] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 750.351150] [ T9870] ? _raw_spin_unlock+0xe/0x40 [ 750.351154] [ T9870] ? cifs_pick_channel+0x242/0x370 [cifs] [ 750.351275] [ T9870] cifs_issue_write+0x256/0x610 [cifs] [ 750.351554] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs] [ 750.351677] [ T9870] netfs_do_issue_write+0xc2/0x340 [netfs] [ 750.351710] [ T9870] netfs_advance_write+0x45b/0x1270 [netfs] [ 750.351740] [ T9870] ? rolling_buffer_append+0x12d/0x440 [netfs] [ 750.351769] [ T9870] netfs_write_folio+0xd6c/0x1be0 [netfs] [ 750.351798] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.351804] [ T9870] netfs_writepages+0x2e9/0xa80 [netfs] [ 750.351835] [ T9870] ? __pfx_netfs_writepages+0x10/0x10 [netfs] [ 750.351864] [ T9870] ? exit_files+0xab/0xe0 [ 750.351867] [ T9870] ? do_exit+0x148f/0x2980 [ 750.351871] [ T9870] ? do_group_exit+0xb5/0x250 [ 750.351874] [ T9870] ? arch_do_signal_or_restart+0x92/0x630 [ 750.351879] [ T9870] ? exit_to_user_mode_loop+0x98/0x170 [ 750.351882] [ T9870] ? do_syscall_64+0x2cf/0xd80 [ 750.351886] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.351890] [ T9870] do_writepages+0x21f/0x590 [ 750.351894] [ T9870] ? __pfx_do_writepages+0x10/0x10 [ 750.351897] [ T9870] filemap_fdatawrite_wbc+0xe1/0x140 [ 750.351901] [ T9870] __filemap_fdatawrite_range+0xba/0x100 [ 750.351904] [ T9870] ? __pfx___filemap_fdatawrite_range+0x10/0x10 [ 750.351912] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.351916] [ T9870] filemap_write_and_wait_range+0x7d/0xf0 [ 750.351920] [ T9870] cifs_flush+0x153/0x320 [cifs] [ 750.352042] [ T9870] filp_flush+0x107/0x1a0 [ 750.352046] [ T9870] filp_close+0x14/0x30 [ 750.352049] [ T9870] put_files_struct.part.0+0x126/0x2a0 [ 750.352053] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.352058] [ T9870] exit_files+0xab/0xe0 [ 750.352061] [ T9870] do_exit+0x148f/0x2980 [ 750.352065] [ T9870] ? __pfx_do_exit+0x10/0x10 [ 750.352069] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.352072] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0 [ 750.352076] [ T9870] do_group_exit+0xb5/0x250 [ 750.352080] [ T9870] get_signal+0x22d3/0x22e0 [ 750.352086] [ T9870] ? __pfx_get_signal+0x10/0x10 [ 750.352089] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100 [ 750.352101] [ T9870] ? folio_add_lru+0xda/0x120 [ 750.352105] [ T9870] arch_do_signal_or_restart+0x92/0x630 [ 750.352109] [ T9870] ? __pfx_arch_do_signal_or_restart+0x10/0x10 [ 750.352115] [ T9870] exit_to_user_mode_loop+0x98/0x170 [ 750.352118] [ T9870] do_syscall_64+0x2cf/0xd80 [ 750.352123] [ T9870] ? __kasan_check_read+0x11/0x20 [ 750.352126] [ T9870] ? count_memcg_events+0x1b4/0x420 [ 750.352132] [ T9870] ? handle_mm_fault+0x148/0x690 [ 750.352136] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0 [ 750.352140] [ T9870] ? __kasan_check_read+0x11/0x20 [ 750.352143] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100 [ 750.352146] [ T9870] ? irqentry_exit_to_user_mode+0x2e/0x250 [ 750.352151] [ T9870] ? irqentry_exit+0x43/0x50 [ 750.352154] [ T9870] ? exc_page_fault+0x75/0xe0 [ 750.352160] [ T9870] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.352163] [ T9870] RIP: 0033:0x7858c94ab6e2 [ 750.352167] [ T9870] Code: Unable to access opcode bytes at 0x7858c94ab6b8. [ 750.352175] [ T9870] RSP: 002b:00007858c9248ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000022 [ 750.352179] [ T9870] RAX: fffffffffffffdfe RBX: 00007858c92496c0 RCX: 00007858c94ab6e2 [ 750.352182] [ T9870] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 750.352184] [ T9870] RBP: 00007858c9248d10 R08: 0000000000000000 R09: 0000000000000000 [ 750.352185] [ T9870] R10: 0000000000000000 R11: 0000000000000246 R12: fffffffffffffde0 [ 750.352187] [ T9870] R13: 0000000000000020 R14: 0000000000000002 R15: 00007ffc072d2230 [ 750.352191] [ T9870] </TASK> [ 750.352195] [ T9870] [ 750.395206] [ T9870] Allocated by task 9870 on cpu 0 at 750.346406s: [ 750.395523] [ T9870] kasan_save_stack+0x39/0x70 [ 750.395532] [ T9870] kasan_save_track+0x18/0x70 [ 750.395536] [ T9870] kasan_save_alloc_info+0x37/0x60 [ 750.395539] [ T9870] __kasan_slab_alloc+0x9d/0xa0 [ 750.395543] [ T9870] kmem_cache_alloc_noprof+0x13c/0x3f0 [ 750.395548] [ T9870] mempool_alloc_slab+0x15/0x20 [ 750.395553] [ T9870] mempool_alloc_noprof+0x135/0x340 [ 750.395557] [ T9870] smbd_post_send_iter+0x63e/0x3070 [cifs] [ 750.395694] [ T9870] smbd_send+0x58c/0x9c0 [cifs] [ 750.395819] [ T9870] __smb_send_rqst+0x931/0xec0 [cifs] [ 750.395950] [ T9870] smb_send_rqst+0x22e/0x2f0 [cifs] [ 750.396081] [ T9870] cifs_call_async+0x477/0xb00 [cifs] [ 750.396232] [ T9870] smb2_async_writev+0x15ff/0x2460 [cifs] [ 750.396359] [ T9870] cifs_issue_write+0x256/0x610 [cifs] [ 750.396492] [ T9870] netfs_do_issue_write+0xc2/0x340 [netfs] [ 750.396544] [ T9870] netfs_advance_write+0x45b/0x1270 [netfs] [ 750.396576] [ T9870] netfs_write_folio+0xd6c/0x1be0 [netfs] [ 750.396608] [ T9870] netfs_writepages+0x2e9/0xa80 [netfs] [ 750.396639] [ T9870] do_writepages+0x21f/0x590 [ 750.396643] [ T9870] filemap_fdatawrite_wbc+0xe1/0x140 [ 750.396647] [ T9870] __filemap_fdatawrite_range+0xba/0x100 [ 750.396651] [ T9870] filemap_write_and_wait_range+0x7d/0xf0 [ 750.396656] [ T9870] cifs_flush+0x153/0x320 [cifs] [ 750.396787] [ T9870] filp_flush+0x107/0x1a0 [ 750.396791] [ T9870] filp_close+0x14/0x30 [ 750.396795] [ T9870] put_files_struct.part.0+0x126/0x2a0 [ 750.396800] [ T9870] exit_files+0xab/0xe0 [ 750.396803] [ T9870] do_exit+0x148f/0x2980 [ 750.396808] [ T9870] do_group_exit+0xb5/0x250 [ 750.396813] [ T9870] get_signal+0x22d3/0x22e0 [ 750.396817] [ T9870] arch_do_signal_or_restart+0x92/0x630 [ 750.396822] [ T9870] exit_to_user_mode_loop+0x98/0x170 [ 750.396827] [ T9870] do_syscall_64+0x2cf/0xd80 [ 750.396832] [ T9870] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.396836] [ T9870] [ 750.397150] [ T9870] The buggy address belongs to the object at ffff888011082800 which belongs to the cache smbd_request_0000000008f3bd7b of size 144 [ 750.397798] [ T9870] The buggy address is located 0 bytes to the right of allocated 144-byte region [ffff888011082800, ffff888011082890) [ 750.398469] [ T9870] [ 750.398800] [ T9870] The buggy address belongs to the physical page: [ 750.399141] [ T9870] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11082 [ 750.399148] [ T9870] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 750.399155] [ T9870] page_type: f5(slab) [ 750.399161] [ T9870] raw: 000fffffc0000000 ffff888022d65640 dead000000000122 0000000000000000 [ 750.399165] [ T9870] raw: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000 [ 750.399169] [ T9870] page dumped because: kasan: bad access detected [ 750.399172] [ T9870] [ 750.399505] [ T9870] Memory state around the buggy address: [ 750.399863] [ T9870] ffff888011082780: fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 750.400247] [ T9870] ffff888011082800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 750.400618] [ T9870] >ffff888011082880: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 750.400982] [ T9870] ^ [ 750.401370] [ T9870] ffff888011082900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 750.401774] [ T9870] ffff888011082980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 750.402171] [ T9870] ================================================================== [ 750.402696] [ T9870] Disabling lock debugging due to kernel taint [ 750.403202] [ T9870] BUG: unable to handle page fault for address: ffff8880110a2000 [ 750.403797] [ T9870] #PF: supervisor write access in kernel mode [ 750.404204] [ T9870] #PF: error_code(0x0003) - permissions violation [ 750.404581] [ T9870] PGD 5ce01067 P4D 5ce01067 PUD 5ce02067 PMD 78aa063 PTE 80000000110a2021 [ 750.404969] [ T9870] Oops: Oops: 0003 [#1] SMP KASAN PTI [ 750.405394] [ T9870] CPU: 0 UID: 0 PID: 9870 Comm: xfs_io Kdump: loaded Tainted: G B 6.16.0-rc2-metze.02+ #1 PREEMPT(voluntary) [ 750.406510] [ T9870] Tainted: [B]=BAD_PAGE [ 750.406967] [ T9870] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 750.407440] [ T9870] RIP: 0010:smb_set_sge+0x15c/0x3b0 [cifs] [ 750.408065] [ T9870] Code: 48 83 f8 ff 0f 84 b0 00 00 00 48 ba 00 00 00 00 00 fc ff df 4c 89 e1 48 c1 e9 03 80 3c 11 00 0f 85 69 01 00 00 49 8d 7c 24 08 <49> 89 04 24 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 0f [ 750.409283] [ T9870] RSP: 0018:ffffc90005e2e758 EFLAGS: 00010246 [ 750.409803] [ T9870] RAX: ffff888036c53400 RBX: ffffc90005e2e878 RCX: 1ffff11002214400 [ 750.410323] [ T9870] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff8880110a2008 [ 750.411217] [ T9870] RBP: ffffc90005e2e798 R08: 0000000000000001 R09: 0000000000000400 [ 750.411770] [ T9870] R10: ffff888011082800 R11: 0000000000000000 R12: ffff8880110a2000 [ 750.412325] [ T9870] R13: 0000000000000000 R14: ffffc90005e2e888 R15: ffff88801a4b6000 [ 750.412901] [ T9870] FS: 0000000000000000(0000) GS:ffff88812bc68000(0000) knlGS:0000000000000000 [ 750.413477] [ T9870] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 750.414077] [ T9870] CR2: ffff8880110a2000 CR3: 000000005b0a6005 CR4: 00000000000726f0 [ 750.414654] [ T9870] Call Trace: [ 750.415211] [ T9870] <TASK> [ 750.415748] [ T9870] smbd_post_send_iter+0x1990/0x3070 [cifs] [ 750.416449] [ T9870] ? __pfx_smbd_post_send_iter+0x10/0x10 [cifs] [ 750.417128] [ T9870] ? update_stack_state+0x2a0/0x670 [ 750.417685] [ T9870] ? cifs_flush+0x153/0x320 [cifs] [ 750.418380] [ T9870] ? cifs_flush+0x153/0x320 [cifs] [ 750.419055] [ T9870] ? update_stack_state+0x2a0/0x670 [ 750.419624] [ T9870] smbd_send+0x58c/0x9c0 [cifs] [ 750.420297] [ T9870] ? __pfx_smbd_send+0x10/0x10 [cifs] [ 750.420936] [ T9870] ? unwind_get_return_address+0x65/0xb0 [ 750.421456] [ T9870] ? __pfx_stack_trace_consume_entry+0x10/0x10 [ 750.421954] [ T9870] ? arch_stack_walk+0xa7/0x100 [ 750.422460] [ T9870] ? stack_trace_save+0x92/0xd0 [ 750.422948] [ T9870] __smb_send_rqst+0x931/0xec0 [cifs] [ 750.423579] [ T9870] ? kernel_text_address+0x173/0x190 [ 750.424056] [ T9870] ? kasan_save_stack+0x39/0x70 [ 750.424813] [ T9870] ? kasan_save_track+0x18/0x70 [ 750.425323] [ T9870] ? __kasan_slab_alloc+0x9d/0xa0 [ 750.425831] [ T9870] ? __pfx___smb_send_rqst+0x10/0x10 [cifs] [ 750.426548] [ T9870] ? smb2_mid_entry_alloc+0xb4/0x7e0 [cifs] [ 750.427231] [ T9870] ? cifs_call_async+0x277/0xb00 [cifs] [ 750.427882] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs] [ 750.428909] [ T9870] ? netfs_do_issue_write+0xc2/0x340 [netfs] [ 750.429425] [ T9870] ? netfs_advance_write+0x45b/0x1270 [netfs] [ 750.429882] [ T9870] ? netfs_write_folio+0xd6c/0x1be0 [netfs] [ 750.430345] [ T9870] ? netfs_writepages+0x2e9/0xa80 [netfs] [ 750.430809] [ T9870] ? do_writepages+0x21f/0x590 [ 750.431239] [ T9870] ? filemap_fdatawrite_wbc+0xe1/0x140 [ 750.431652] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.432041] [ T9870] smb_send_rqst+0x22e/0x2f0 [cifs] [ 750.432586] [ T9870] ? __pfx_smb_send_rqst+0x10/0x10 [cifs] [ 750.433108] [ T9870] ? local_clock_noinstr+0xe/0xd0 [ 750.433482] [ T9870] ? kasan_save_alloc_info+0x37/0x60 [ 750.433855] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.434214] [ T9870] ? _raw_spin_lock+0x81/0xf0 [ 750.434561] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.434903] [ T9870] ? smb2_setup_async_request+0x293/0x580 [cifs] [ 750.435394] [ T9870] cifs_call_async+0x477/0xb00 [cifs] [ 750.435892] [ T9870] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs] [ 750.436388] [ T9870] ? __pfx_cifs_call_async+0x10/0x10 [cifs] [ 750.436881] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.437237] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.437579] [ T9870] ? __smb2_plain_req_init+0x933/0x1090 [cifs] [ 750.438062] [ T9870] smb2_async_writev+0x15ff/0x2460 [cifs] [ 750.438557] [ T9870] ? sched_clock_noinstr+0x9/0x10 [ 750.438906] [ T9870] ? local_clock_noinstr+0xe/0xd0 [ 750.439293] [ T9870] ? __pfx_smb2_async_writev+0x10/0x10 [cifs] [ 750.439786] [ T9870] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 750.440143] [ T9870] ? _raw_spin_unlock+0xe/0x40 [ 750.440495] [ T9870] ? cifs_pick_channel+0x242/0x370 [cifs] [ 750.440989] [ T9870] cifs_issue_write+0x256/0x610 [cifs] [ 750.441492] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs] [ 750.441987] [ T9870] netfs_do_issue_write+0xc2/0x340 [netfs] [ 750.442387] [ T9870] netfs_advance_write+0x45b/0x1270 [netfs] [ 750.442969] [ T9870] ? rolling_buffer_append+0x12d/0x440 [netfs] [ 750.443376] [ T9870] netfs_write_folio+0xd6c/0x1be0 [netfs] [ 750.443768] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.444145] [ T9870] netfs_writepages+0x2e9/0xa80 [netfs] [ 750.444541] [ T9870] ? __pfx_netfs_writepages+0x10/0x10 [netfs] [ 750.444936] [ T9870] ? exit_files+0xab/0xe0 [ 750.445312] [ T9870] ? do_exit+0x148f/0x2980 [ 750.445672] [ T9870] ? do_group_exit+0xb5/0x250 [ 750.446028] [ T9870] ? arch_do_signal_or_restart+0x92/0x630 [ 750.446402] [ T9870] ? exit_to_user_mode_loop+0x98/0x170 [ 750.446762] [ T9870] ? do_syscall_64+0x2cf/0xd80 [ 750.447132] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.447499] [ T9870] do_writepages+0x21f/0x590 [ 750.447859] [ T9870] ? __pfx_do_writepages+0x10/0x10 [ 750.448236] [ T9870] filemap_fdatawrite_wbc+0xe1/0x140 [ 750.448595] [ T9870] __filemap_fdatawrite_range+0xba/0x100 [ 750.448953] [ T9870] ? __pfx___filemap_fdatawrite_range+0x10/0x10 [ 750.449336] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.449697] [ T9870] filemap_write_and_wait_range+0x7d/0xf0 [ 750.450062] [ T9870] cifs_flush+0x153/0x320 [cifs] [ 750.450592] [ T9870] filp_flush+0x107/0x1a0 [ 750.450952] [ T9870] filp_close+0x14/0x30 [ 750.451322] [ T9870] put_files_struct.part.0+0x126/0x2a0 [ 750.451678] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.452033] [ T9870] exit_files+0xab/0xe0 [ 750.452401] [ T9870] do_exit+0x148f/0x2980 [ 750.452751] [ T9870] ? __pfx_do_exit+0x10/0x10 [ 750.453109] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.453459] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0 [ 750.453787] [ T9870] do_group_exit+0xb5/0x250 [ 750.454082] [ T9870] get_signal+0x22d3/0x22e0 [ 750.454406] [ T9870] ? __pfx_get_signal+0x10/0x10 [ 750.454709] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100 [ 750.455031] [ T9870] ? folio_add_lru+0xda/0x120 [ 750.455347] [ T9870] arch_do_signal_or_restart+0x92/0x630 [ 750.455656] [ T9870] ? __pfx_arch_do_signal_or_restart+0x10/0x10 [ 750.455967] [ T9870] exit_to_user_mode_loop+0x98/0x170 [ 750.456282] [ T9870] do_syscall_64+0x2cf/0xd80 [ 750.456591] [ T9870] ? __kasan_check_read+0x11/0x20 [ 750.456897] [ T9870] ? count_memcg_events+0x1b4/0x420 [ 750.457280] [ T9870] ? handle_mm_fault+0x148/0x690 [ 750.457616] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0 [ 750.457925] [ T9870] ? __kasan_check_read+0x11/0x20 [ 750.458297] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100 [ 750.458672] [ T9870] ? irqentry_exit_to_user_mode+0x2e/0x250 [ 750.459191] [ T9870] ? irqentry_exit+0x43/0x50 [ 750.459600] [ T9870] ? exc_page_fault+0x75/0xe0 [ 750.460130] [ T9870] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.460570] [ T9870] RIP: 0033:0x7858c94ab6e2 [ 750.461206] [ T9870] Code: Unable to access opcode bytes at 0x7858c94ab6b8. [ 750.461780] [ T9870] RSP: 002b:00007858c9248ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000022 [ 750.462327] [ T9870] RAX: fffffffffffffdfe RBX: 00007858c92496c0 RCX: 00007858c94ab6e2 [ 750.462653] [ T9870] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 750.462969] [ T9870] RBP: 00007858c9248d10 R08: 0000000000000000 R09: 0000000000000000 [ 750.463290] [ T9870] R10: 0000000000000000 R11: 0000000000000246 R12: fffffffffffffde0 [ 750.463640] [ T9870] R13: 0000000000000020 R14: 0000000000000002 R15: 00007ffc072d2230 [ 750.463965] [ T9870] </TASK> [ 750.464285] [ T9870] Modules linked in: siw ib_uverbs ccm cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs softdog vboxsf vboxguest cpuid intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_vsec polyval_clmulni ghash_clmulni_intel sha1_ssse3 aesni_intel rapl i2c_piix4 i2c_smbus joydev input_leds mac_hid sunrpc binfmt_misc kvm_intel kvm irqbypass sch_fq_codel efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci dmi_sysfs ip_tables x_tables autofs4 hid_generic vboxvideo usbhid drm_vram_helper psmouse vga16fb vgastate drm_ttm_helper serio_raw hid ahci libahci ttm pata_acpi video wmi [last unloaded: vboxguest] [ 750.467127] [ T9870] CR2: ffff8880110a2000 cc: Tom Talpey <tom@talpey.com> cc: linux-cifs@vger.kernel.org Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Tom Talpey <tom@talpey.com> Fixes: c45ebd636c32 ("cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-21smb: client: fix first command failure during re-negotiationzhangjian
after fabc4ed200f9, server_unresponsive add a condition to check whether client need to reconnect depending on server->lstrp. When client failed to reconnect for some time and abort connection, server->lstrp is updated for the last time. In the following scene, server->lstrp is too old. This cause next command failure in re-negotiation rather than waiting for re-negotiation done. 1. mount -t cifs -o username=Everyone,echo_internal=10 //$server_ip/export /mnt 2. ssh $server_ip "echo b > /proc/sysrq-trigger &" 3. ls /mnt 4. sleep 21s 5. ssh $server_ip "service firewalld stop" 6. ls # return EHOSTDOWN If the interval between 5 and 6 is too small, 6 may trigger sending negotiation request. Before backgrounding cifsd thread try to receive negotiation response from server in cifs_readv_from_socket, server_unresponsive may trigger cifs_reconnect which cause 6 to be failed: ls thread ---------------- smb2_negotiate server->tcpStatus = CifsInNegotiate compound_send_recv wait_for_compound_request cifsd thread ---------------- cifs_readv_from_socket server_unresponsive server->tcpStatus == CifsInNegotiate && jiffies > server->lstrp + 20s cifs_reconnect cifs_abort_connection: mid_state = MID_RETRY_NEEDED ls thread ---------------- cifs_sync_mid_result return EAGAIN smb2_negotiate return EHOSTDOWN Though server->lstrp means last server response time, it is updated in cifs_abort_connection and cifs_get_tcp_session. We can also update server->lstrp before switching into CifsInNegotiate state to avoid failure in 6. Fixes: 7ccc1465465d ("smb: client: fix hang in wait_for_response() for negproto") Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Acked-by: Meetakshi Setiya <msetiya@microsoft.com> Signed-off-by: zhangjian <zhangjian496@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-20erofs: remove a superfluous check for encoded extentsGao Xiang
It is possible when an inode is split into segments for multi-threaded compression, and the tail extent of a segment could also be small. Fixes: 1d191b4ca51d ("erofs: implement encoded extent metadata") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250620153108.1368029-1-hsiangkao@linux.alibaba.com