Age | Commit message (Collapse) | Author |
|
Unmount of a shutdown filesystem can hang with stale inode cluster
buffers in the AIL like so:
[95964.140623] Call Trace:
[95964.144641] __schedule+0x699/0xb70
[95964.154003] schedule+0x64/0xd0
[95964.156851] xfs_ail_push_all_sync+0x9b/0xf0
[95964.164816] xfs_unmount_flush_inodes+0x41/0x70
[95964.168698] xfs_unmountfs+0x7f/0x170
[95964.171846] xfs_fs_put_super+0x3b/0x90
[95964.175216] generic_shutdown_super+0x77/0x160
[95964.178060] kill_block_super+0x1b/0x40
[95964.180553] xfs_kill_sb+0x12/0x30
[95964.182796] deactivate_locked_super+0x38/0x100
[95964.185735] deactivate_super+0x41/0x50
[95964.188245] cleanup_mnt+0x9f/0x160
[95964.190519] __cleanup_mnt+0x12/0x20
[95964.192899] task_work_run+0x89/0xb0
[95964.195221] resume_user_mode_work+0x4f/0x60
[95964.197931] syscall_exit_to_user_mode+0x76/0xb0
[95964.201003] do_syscall_64+0x74/0x130
$ pstree -N mnt |grep umount
|-check-parallel---nsexec---run_test.sh---753---umount
It always seems to be generic/753 that triggers this, and repeating
a quick group test run triggers it every 10-15 iterations. Hence it
generally triggers once up every 30-40 minutes of test time. just
running generic/753 by itself or concurrently with a limited group
of tests doesn't reproduce this issue at all.
Tracing on a hung system shows the AIL repeating every 50ms a log
force followed by an attempt to push pinned, aborted inodes from the
AIL (trimmed for brevity):
xfs_log_force: lsn 0x1c caller xfsaild+0x18e
xfs_log_force: lsn 0x0 caller xlog_cil_flush+0xbd
xfs_log_force: lsn 0x1c caller xfs_log_force+0x77
xfs_ail_pinned: lip 0xffff88826014afa0 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED
xfs_ail_pinned: lip 0xffff88814000a708 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED
xfs_ail_pinned: lip 0xffff88810b850c80 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED
xfs_ail_pinned: lip 0xffff88810b850af0 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED
xfs_ail_pinned: lip 0xffff888165cf0a28 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED
xfs_ail_pinned: lip 0xffff88810b850bb8 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED
....
The inode log items are marked as aborted, which means that either:
a) a transaction commit has occurred, seen an error or shutdown, and
called xfs_trans_free_items() to abort the items. This should happen
before any pinning of log items occurs.
or
b) a dirty transaction has been cancelled. This should also happen
before any pinning of log items occurs.
or
c) AIL insertion at journal IO completion is marked as aborted. In
this case, the log item is pinned by the CIL until journal IO
completes and hence needs to be unpinned. This is then done after
the ->iop_committed() callback is run, so the pin count should be
balanced correctly.
Yet none of these seemed to be occurring. Further tracing indicated
this:
d) Shutdown during CIL pushing resulting in log item completion
being called from checkpoint abort processing. Items are unpinned
and released without serialisation against each other, journal IO
completion or transaction commit completion.
In this case, we may still have a transaction commit in flight that
holds a reference to a xfs_buf_log_item (BLI) after CIL insertion.
e.g. a synchronous transaction will flush the CIL before the
transaction is torn down. The concurrent CIL push then aborts
insertion it and drops the commit/AIL reference to the BLI. This can
leave the transaction commit context with the last reference to the
BLI which is dropped here:
xfs_trans_free_items()
->iop_release
xfs_buf_item_release
xfs_buf_item_put
if (XFS_LI_ABORTED)
xfs_trans_ail_delete
xfs_buf_item_relse()
Unlike the journal completion ->iop_unpin path, this path does not
run stale buffer completion process when it drops the last
reference, hence leaving the stale inodes attached to the buffer
sitting the AIL. There are no other references to those inodes, so
there is no other mechanism to remove them from the AIL. Hence
unmount hangs.
The buffer lock context for stale buffers is passed to the last BLI
reference. This is normally the last BLI unpin on journal IO
completion. The unpin then processes the stale buffer completion and
releases the buffer lock. However, if the final unpin from journal
IO completion (or CIL push abort) does not hold the last reference
to the BLI, there -must- still be a transaction context that
references the BLI, and so that context must perform the stale
buffer completion processing before the buffer is unlocked and the
BLI torn down.
The fix for this is to rework the xfs_buf_item_relse() path to run
stale buffer completion processing if it drops the last reference to
the BLI. We still hold the buffer locked, so the buffer owner and
lock context is the same as if we passed the BLI and buffer to the
->iop_unpin() context to finish stale process on journal commit.
However, we have to be careful here. In a shutdown state, we can be
freeing dirty BLIs from xfs_buf_item_put() via xfs_trans_brelse()
and xfs_trans_bdetach(). The existing code handles this case by
considering shutdown state as "aborted", but in doing so
largely masks the failure to clean up stale BLI state from the
xfs_buf_item_relse() path. i.e regardless of the shutdown state and
whether the item is in the AIL, we must finish the stale buffer
cleanup if we are are dropping the last BLI reference from the
->iop_relse path in transaction commit context.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The stale buffer item completion handling is currently only done
from BLI unpinning. We need to perform this function from where-ever
the last reference to the BLI is dropped, so first we need to
factor this code out into a helper.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The code to initialise, release and free items is all the way down
the bottom of the file. Upcoming fixes need to these functions
earlier in the file, so move them to the top.
There is one code change in this move - the parameter to
xfs_buf_item_relse() is changed from the xfs_buf to the
xfs_buf_log_item - the thing that the function is releasing.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
I needed more insight into how stale inodes were getting stuck on
the AIL after a forced shutdown when running fsstress. These are the
tracepoints I added for that purpose.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
On shutdown when quotas are enabled, the shutdown can deadlock
trying to unpin the dquot buffer buf_log_item like so:
[ 3319.483590] task:kworker/20:0H state:D stack:14360 pid:1962230 tgid:1962230 ppid:2 task_flags:0x4208060 flags:0x00004000
[ 3319.493966] Workqueue: xfs-log/dm-6 xlog_ioend_work
[ 3319.498458] Call Trace:
[ 3319.500800] <TASK>
[ 3319.502809] __schedule+0x699/0xb70
[ 3319.512672] schedule+0x64/0xd0
[ 3319.515573] schedule_timeout+0x30/0xf0
[ 3319.528125] __down_common+0xc3/0x200
[ 3319.531488] __down+0x1d/0x30
[ 3319.534186] down+0x48/0x50
[ 3319.540501] xfs_buf_lock+0x3d/0xe0
[ 3319.543609] xfs_buf_item_unpin+0x85/0x1b0
[ 3319.547248] xlog_cil_committed+0x289/0x570
[ 3319.571411] xlog_cil_process_committed+0x6d/0x90
[ 3319.575590] xlog_state_shutdown_callbacks+0x52/0x110
[ 3319.580017] xlog_force_shutdown+0x169/0x1a0
[ 3319.583780] xlog_ioend_work+0x7c/0xb0
[ 3319.587049] process_scheduled_works+0x1d6/0x400
[ 3319.591127] worker_thread+0x202/0x2e0
[ 3319.594452] kthread+0x20c/0x240
The CIL push has seen the deadlock, so it has aborted the push and
is running CIL checkpoint completion to abort all the items in the
checkpoint. This calls ->iop_unpin(remove = true) to clean up the
log items in the checkpoint.
When a buffer log item is unpined like this, it needs to lock the
buffer to run io completion to correctly fail the buffer and run all
the required completions to fail attached log items as well. In this
case, the attempt to lock the buffer on unpin is hanging because the
buffer is already locked.
I suspected a leaked XFS_BLI_HOLD state because of XFS_BLI_STALE
handling changes I was testing, so I went looking for
pin events on HOLD buffers and unpin events on locked buffer. That
isolated this one buffer with these two events:
xfs_buf_item_pin: dev 251:6 daddr 0xa910 bbcount 0x2 hold 2 pincount 0 lock 0 flags DONE|KMEM recur 0 refcount 1 bliflags HOLD|DIRTY|LOGGED liflags DIRTY
....
xfs_buf_item_unpin: dev 251:6 daddr 0xa910 bbcount 0x2 hold 4 pincount 1 lock 0 flags DONE|KMEM recur 0 refcount 1 bliflags DIRTY liflags ABORTED
Firstly, bbcount = 0x2, which means it is not a single sector
structure. That rules out every xfs_trans_bhold() case except one:
dquot buffers.
Then hung task dumping gave this trace:
[ 3197.312078] task:fsync-tester state:D stack:12080 pid:2051125 tgid:2051125 ppid:1643233 task_flags:0x400000 flags:0x00004002
[ 3197.323007] Call Trace:
[ 3197.325581] <TASK>
[ 3197.327727] __schedule+0x699/0xb70
[ 3197.334582] schedule+0x64/0xd0
[ 3197.337672] schedule_timeout+0x30/0xf0
[ 3197.350139] wait_for_completion+0xbd/0x180
[ 3197.354235] __flush_workqueue+0xef/0x4e0
[ 3197.362229] xlog_cil_force_seq+0xa0/0x300
[ 3197.374447] xfs_log_force+0x77/0x230
[ 3197.378015] xfs_qm_dqunpin_wait+0x49/0xf0
[ 3197.382010] xfs_qm_dqflush+0x55/0x460
[ 3197.385663] xfs_qm_dquot_isolate+0x29e/0x4d0
[ 3197.389977] __list_lru_walk_one+0x141/0x220
[ 3197.398867] list_lru_walk_one+0x10/0x20
[ 3197.402713] xfs_qm_shrink_scan+0x6a/0x100
[ 3197.406699] do_shrink_slab+0x18a/0x350
[ 3197.410512] shrink_slab+0xf7/0x430
[ 3197.413967] drop_slab+0x97/0xf0
[ 3197.417121] drop_caches_sysctl_handler+0x59/0xc0
[ 3197.421654] proc_sys_call_handler+0x18b/0x280
[ 3197.426050] proc_sys_write+0x13/0x20
[ 3197.429750] vfs_write+0x2b8/0x3e0
[ 3197.438532] ksys_write+0x7e/0xf0
[ 3197.441742] __x64_sys_write+0x1b/0x30
[ 3197.445363] x64_sys_call+0x2c72/0x2f60
[ 3197.449044] do_syscall_64+0x6c/0x140
[ 3197.456341] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Yup, another test run by check-parallel is running drop_caches
concurrently and the dquot shrinker for the hung filesystem is
running. That's trying to flush a dirty dquot from reclaim context,
and it waiting on a log force to complete. xfs_qm_dqflush is called
with the dquot buffer held locked, and so we've called
xfs_log_force() with that buffer locked.
Now the log force is waiting for a workqueue flush to complete, and
that workqueue flush is waiting of CIL checkpoint processing to
finish.
The CIL checkpoint processing is aborting all the log items it has,
and that requires locking aborted buffers to cancel them.
Now, normally this isn't a problem if we are issuing a log force
to unpin an object, because the ->iop_unpin() method wakes pin
waiters first. That results in the pin waiter finishing off whatever
it was doing, dropping the lock and then xfs_buf_item_unpin() can
lock the buffer and fail it.
However, xfs_qm_dqflush() is waiting on the -dquot- unpin event, not
the dquot buffer unpin event, and so it never gets woken and so does
not drop the buffer lock.
Inodes do not have this problem, as they can only be written from
one spot (->iop_push) whilst dquots can be written from multiple
places (memory reclaim, ->iop_push, xfs_dq_dqpurge, and quotacheck).
The reason that the dquot buffer has an attached buffer log item is
that it has been recently allocated. Initialisation of the dquot
buffer logs the buffer directly, thereby pinning it in memory. We
then modify the dquot in a separate operation, and have memory
reclaim racing with a shutdown and we trigger this deadlock.
check-parallel reproduces this reliably on 1kB FSB filesystems with
quota enabled because it does all of these things concurrently
without having to explicitly write tests to exercise these corner
case conditions.
xfs_qm_dquot_logitem_push() doesn't have this deadlock because it
checks if the dquot is pinned before locking the dquot buffer and
skipping it if it is pinned. This means the xfs_qm_dqunpin_wait()
log force in xfs_qm_dqflush() never triggers and we unlock the
buffer safely allowing a concurrent shutdown to fail the buffer
appropriately.
xfs_qm_dqpurge() could have this problem as it is called from
quotacheck and we might have allocated dquot buffers when recording
the quota updates. This can be fixed by calling
xfs_qm_dqunpin_wait() before we lock the dquot buffer. Because we
hold the dquot locked, nothing will be able to add to the pin count
between the unpin_wait and the dqflush callout, so this now makes
xfs_qm_dqpurge() safe against this race.
xfs_qm_dquot_isolate() can also be fixed this same way but, quite
frankly, we shouldn't be doing IO in memory reclaim context. If the
dquot is pinned or dirty, simply rotate it and let memory reclaim
come back to it later, same as we do for inodes.
This then gets rid of the nasty issue in xfs_qm_flush_one() where
quotacheck writeback races with memory reclaim flushing the dquots.
We can lift xfs_qm_dqunpin_wait() up into this code, then get rid of
the "can't get the dqflush lock" buffer write to cycle the dqlfush
lock and enable it to be flushed again. checking if the dquot is
pinned and returning -EAGAIN so that the dquot walk will revisit the
dquot again later.
Finally, with xfs_qm_dqunpin_wait() lifted into all the callers,
we can remove it from the xfs_qm_dqflush() code.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
There is a race condition that can trigger in dmflakey fstests that
can result in asserts in xfs_ialloc_read_agi() and
xfs_alloc_read_agf() firing. The asserts look like this:
XFS: Assertion failed: pag->pagf_freeblks == be32_to_cpu(agf->agf_freeblks), file: fs/xfs/libxfs/xfs_alloc.c, line: 3440
.....
Call Trace:
<TASK>
xfs_alloc_read_agf+0x2ad/0x3a0
xfs_alloc_fix_freelist+0x280/0x720
xfs_alloc_vextent_prepare_ag+0x42/0x120
xfs_alloc_vextent_iterate_ags+0x67/0x260
xfs_alloc_vextent_start_ag+0xe4/0x1c0
xfs_bmapi_allocate+0x6fe/0xc90
xfs_bmapi_convert_delalloc+0x338/0x560
xfs_map_blocks+0x354/0x580
iomap_writepages+0x52b/0xa70
xfs_vm_writepages+0xd7/0x100
do_writepages+0xe1/0x2c0
__writeback_single_inode+0x44/0x340
writeback_sb_inodes+0x2d0/0x570
__writeback_inodes_wb+0x9c/0xf0
wb_writeback+0x139/0x2d0
wb_workfn+0x23e/0x4c0
process_scheduled_works+0x1d4/0x400
worker_thread+0x234/0x2e0
kthread+0x147/0x170
ret_from_fork+0x3e/0x50
ret_from_fork_asm+0x1a/0x30
I've seen the AGI variant from scrub running on the filesysetm
after unmount failed due to systemd interference:
XFS: Assertion failed: pag->pagi_freecount == be32_to_cpu(agi->agi_freecount) || xfs_is_shutdown(pag->pag_mount), file: fs/xfs/libxfs/xfs_ialloc.c, line: 2804
.....
Call Trace:
<TASK>
xfs_ialloc_read_agi+0xee/0x150
xchk_perag_drain_and_lock+0x7d/0x240
xchk_ag_init+0x34/0x90
xchk_inode_xref+0x7b/0x220
xchk_inode+0x14d/0x180
xfs_scrub_metadata+0x2e2/0x510
xfs_ioc_scrub_metadata+0x62/0xb0
xfs_file_ioctl+0x446/0xbf0
__se_sys_ioctl+0x6f/0xc0
__x64_sys_ioctl+0x1d/0x30
x64_sys_call+0x1879/0x2ee0
do_syscall_64+0x68/0x130
? exc_page_fault+0x62/0xc0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Essentially, it is the same problem. When _flakey_drop_and_remount()
loads the drop-writes table, it makes all writes silently fail. Writes
are reported to the fs as completed successfully, but they are not
issued to the backing store. The filesystem sees the successful
write completion and marks the metadata buffer clean and removes it
from the AIL.
If this happens at the same time as memory pressure is occuring,
the now-clean AGF and/or AGI buffers can be reclaimed from memory.
Shortly afterwards, but before _flakey_drop_and_remount() runs
unmount, background writeback is kicked and it tries to allocate
blocks for the dirty pages in memory. This then tries to access the
AGF buffer we just turfed out of memory. It's not found, so it gets
read in from disk.
This is all fine, except for the fact that the last writeback of the
AGF did not actually reach disk. The AGF on disk is stale compared
to the in-memory state held by the perag, and so they don't match
and the assert fires.
Then other operations on that inode hang because the task was killed
whilst holding inode locks. e.g:
Workqueue: xfs-conv/dm-12 xfs_end_io
Call Trace:
<TASK>
__schedule+0x650/0xb10
schedule+0x6d/0xf0
schedule_preempt_disabled+0x15/0x30
rwsem_down_write_slowpath+0x31a/0x5f0
down_write+0x43/0x60
xfs_ilock+0x1a8/0x210
xfs_trans_alloc_inode+0x9c/0x240
xfs_iomap_write_unwritten+0xe3/0x300
xfs_end_ioend+0x90/0x130
xfs_end_io+0xce/0x100
process_scheduled_works+0x1d4/0x400
worker_thread+0x234/0x2e0
kthread+0x147/0x170
ret_from_fork+0x3e/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
and it's all down hill from there.
Memory pressure is one way to trigger this, another is to run "echo
3 > /proc/sys/vm/drop_caches" randomly while tests are running.
Regardless of how it is triggered, this effectively takes down the
system once umount hangs because it's holding a sb->s_umount lock
exclusive and now every sync(1) call gets stuck on it.
Fix this by replacing the asserts with a corruption detection check
and a shutdown.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Lock order of xfs_ifree_cluster() is cluster buffer -> try ILOCK
-> IFLUSHING, except for the last inode in the cluster that is
triggering the free. In that case, the lock order is ILOCK ->
cluster buffer -> IFLUSHING.
xfs_iflush_cluster() uses cluster buffer -> try ILOCK -> IFLUSHING,
so this can safely run concurrently with xfs_ifree_cluster().
xfs_inode_item_precommit() uses ILOCK -> cluster buffer, but this
cannot race with xfs_ifree_cluster() so being in a different order
will not trigger a deadlock.
xfs_reclaim_inode() during a filesystem shutdown uses ILOCK ->
IFLUSHING -> cluster buffer via xfs_iflush_shutdown_abort(), and
this deadlocks against xfs_ifree_cluster() like so:
sysrq: Show Blocked State
task:kworker/10:37 state:D stack:12560 pid:276182 tgid:276182 ppid:2 flags:0x00004000
Workqueue: xfs-inodegc/dm-3 xfs_inodegc_worker
Call Trace:
<TASK>
__schedule+0x650/0xb10
schedule+0x6d/0xf0
schedule_timeout+0x8b/0x180
schedule_timeout_uninterruptible+0x1e/0x30
xfs_ifree+0x326/0x730
xfs_inactive_ifree+0xcb/0x230
xfs_inactive+0x2c8/0x380
xfs_inodegc_worker+0xaa/0x180
process_scheduled_works+0x1d4/0x400
worker_thread+0x234/0x2e0
kthread+0x147/0x170
ret_from_fork+0x3e/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
task:fsync-tester state:D stack:12160 pid:2255943 tgid:2255943 ppid:3988702 flags:0x00004006
Call Trace:
<TASK>
__schedule+0x650/0xb10
schedule+0x6d/0xf0
schedule_timeout+0x31/0x180
__down_common+0xbe/0x1f0
__down+0x1d/0x30
down+0x48/0x50
xfs_buf_lock+0x3d/0xe0
xfs_iflush_shutdown_abort+0x51/0x1e0
xfs_icwalk_ag+0x386/0x690
xfs_reclaim_inodes_nr+0x114/0x160
xfs_fs_free_cached_objects+0x19/0x20
super_cache_scan+0x17b/0x1a0
do_shrink_slab+0x180/0x350
shrink_slab+0xf8/0x430
drop_slab+0x97/0xf0
drop_caches_sysctl_handler+0x59/0xc0
proc_sys_call_handler+0x189/0x280
proc_sys_write+0x13/0x20
vfs_write+0x33d/0x3f0
ksys_write+0x7c/0xf0
__x64_sys_write+0x1b/0x30
x64_sys_call+0x271d/0x2ee0
do_syscall_64+0x68/0x130
entry_SYSCALL_64_after_hwframe+0x76/0x7e
We can't change the lock order of xfs_ifree_cluster() - XFS_ISTALE
and XFS_IFLUSHING are serialised through to journal IO completion
by the cluster buffer lock being held.
There's quite a few asserts in the code that check that XFS_ISTALE
does not occur out of sync with buffer locking (e.g. in
xfs_iflush_cluster). There's also a dependency on the inode log item
being removed from the buffer before XFS_IFLUSHING is cleared, also
with asserts that trigger on this.
Further, we don't have a requirement for the inode to be locked when
completing or aborting inode flushing because all the inode state
updates are serialised by holding the cluster buffer lock across the
IO to completion.
We can't check for XFS_IRECLAIM in xfs_ifree_mark_inode_stale() and
skip the inode, because there is no guarantee that the inode will be
reclaimed. Hence it *must* be marked XFS_ISTALE regardless of
whether reclaim is preparing to free that inode. Similarly, we can't
check for IFLUSHING before locking the inode because that would
result in dirty inodes not being marked with ISTALE in the event of
racing with XFS_IRECLAIM.
Hence we have to address this issue from the xfs_reclaim_inode()
side. It is clear that we cannot hold the inode locked here when
calling xfs_iflush_shutdown_abort() because it is the inode->buffer
lock order that causes the deadlock against xfs_ifree_cluster().
Hence we need to drop the ILOCK before aborting the inode in the
shutdown case. Once we've aborted the inode, we can grab the ILOCK
again and then immediately reclaim it as it is now guaranteed to be
clean.
Note that dropping the ILOCK in xfs_reclaim_inode() means that it
can now be locked by xfs_ifree_mark_inode_stale() and seen whilst in
this state. This is safe because we have left the XFS_IFLUSHING flag
on the inode and so xfs_ifree_mark_inode_stale() will simply set
XFS_ISTALE and move to the next inode. An ASSERT check in this path
needs to be tweaked to take into account this new shutdown
interaction.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Pull bcachefs fixes from Kent Overstreet:
- Lots of small check/repair fixes, primarily in subvol loop and
directory structure loop (when involving snapshots).
- Fix a few 6.16 regressions: rare UAF in the foreground allocator path
when taking a transaction restart from the transaction bump
allocator, and some small fallout from the change to log the error
being corrected in the journal when repairing errors, also some
fallout from the btree node read error logging improvements.
(Alan, Bharadwaj)
- New option: journal_rewind
This lets the entire filesystem be reset to an earlier point in time.
Note that this is only a disaster recovery tool, and right now there
are major caveats to using it (discards should be disabled, in
particular), but it successfully restored the filesystem of one of
the users who was bit by the subvolume deletion bug and didn't have
backups. I'll likely be making some changes to the discard path in
the future to make this a reliable recovery tool.
- Some new btree iterator tracepoints, for tracking down some
livelock-ish behaviour we've been seeing in the main data write path.
* tag 'bcachefs-2025-06-26' of git://evilpiepirate.org/bcachefs: (51 commits)
bcachefs: Plumb correct ip to trans_relock_fail tracepoint
bcachefs: Ensure we rewind to run recovery passes
bcachefs: Ensure btree node scan runs before checking for scanned nodes
bcachefs: btree_root_unreadable_and_scan_found_nothing should not be autofix
bcachefs: fix bch2_journal_keys_peek_prev_min() underflow
bcachefs: Use wait_on_allocator() when allocating journal
bcachefs: Check for bad write buffer key when moving from journal
bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blocked
bcachefs: Fix range in bch2_lookup_indirect_extent() error path
bcachefs: fix spurious error_throw
bcachefs: Add missing bch2_err_class() to fileattr_set()
bcachefs: Add missing key type checks to check_snapshot_exists()
bcachefs: Don't log fsck err in the journal if doing repair elsewhere
bcachefs: Fix *__bch2_trans_subbuf_alloc() error path
bcachefs: Fix missing newlines before ero
bcachefs: fix spurious error in read_btree_roots()
bcachefs: fsck: Fix oops in key_visible_in_snapshot()
bcachefs: fsck: fix unhandled restart in topology repair
bcachefs: fsck: Fix check_directory_structure when no check_dirents
bcachefs: Fix restart handling in btree_node_scrub_work()
...
|
|
Allow the flexfiles error handling to recognise NFS level errors (as
opposed to RPC level errors) and handle them separately. The main
motivator is the NFSERR_PERM errors that get returned if the NFS client
connects to the data server through a port number that is lower than
1024. In that case, the client should disconnect and retry a READ on a
different data server, or it should retry a WRITE after reconnecting.
Reviewed-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
When performing a file read from RDMA, smbd_recv() prints an "Invalid msg
type 4" error and fails the I/O. This is due to the switch-statement there
not handling the ITER_FOLIOQ handed down from netfslib.
Fix this by collapsing smbd_recv_buf() and smbd_recv_page() into
smbd_recv() and just using copy_to_iter() instead of memcpy(). This
future-proofs the function too, in case more ITER_* types are added.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Talpey <tom@talpey.com>
cc: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The handling of received data in the smbdirect client code involves using
copy_to_iter() to copy data from the smbd_reponse struct's packet trailer
to a folioq buffer provided by netfslib that encapsulates a chunk of
pagecache.
If, however, CONFIG_HARDENED_USERCOPY=y, this will result in the checks
then performed in copy_to_iter() oopsing with something like the following:
CIFS: Attempting to mount //172.31.9.1/test
CIFS: VFS: RDMA transport established
usercopy: Kernel memory exposure attempt detected from SLUB object 'smbd_response_0000000091e24ea1' (offset 81, size 63)!
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:102!
...
RIP: 0010:usercopy_abort+0x6c/0x80
...
Call Trace:
<TASK>
__check_heap_object+0xe3/0x120
__check_object_size+0x4dc/0x6d0
smbd_recv+0x77f/0xfe0 [cifs]
cifs_readv_from_socket+0x276/0x8f0 [cifs]
cifs_read_from_socket+0xcd/0x120 [cifs]
cifs_demultiplex_thread+0x7e9/0x2d50 [cifs]
kthread+0x396/0x830
ret_from_fork+0x2b8/0x3b0
ret_from_fork_asm+0x1a/0x30
The problem is that the smbd_response slab's packet field isn't marked as
being permitted for usercopy.
Fix this by passing parameters to kmem_slab_create() to indicate that
copy_to_iter() is permitted from the packet region of the smbd_response
slab objects, less the header space.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Reported-by: Stefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/acb7f612-df26-4e2a-a35d-7cd040f513e1@samba.org/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Tested-by: Stefan Metzmacher <metze@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order
and prevent the following deadlock from happening
======================================================
WARNING: possible circular locking dependency detected
6.16.0-rc3-build2+ #1301 Tainted: G S W
------------------------------------------------------
cifsd/6055 is trying to acquire lock:
ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200
but task is already holding lock:
ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&ret_buf->chan_lock){+.+.}-{3:3}:
validate_chain+0x1cf/0x270
__lock_acquire+0x60e/0x780
lock_acquire.part.0+0xb4/0x1f0
_raw_spin_lock+0x2f/0x40
cifs_setup_session+0x81/0x4b0
cifs_get_smb_ses+0x771/0x900
cifs_mount_get_session+0x7e/0x170
cifs_mount+0x92/0x2d0
cifs_smb3_do_mount+0x161/0x460
smb3_get_tree+0x55/0x90
vfs_get_tree+0x46/0x180
do_new_mount+0x1b0/0x2e0
path_mount+0x6ee/0x740
do_mount+0x98/0xe0
__do_sys_mount+0x148/0x180
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (&ret_buf->ses_lock){+.+.}-{3:3}:
validate_chain+0x1cf/0x270
__lock_acquire+0x60e/0x780
lock_acquire.part.0+0xb4/0x1f0
_raw_spin_lock+0x2f/0x40
cifs_match_super+0x101/0x320
sget+0xab/0x270
cifs_smb3_do_mount+0x1e0/0x460
smb3_get_tree+0x55/0x90
vfs_get_tree+0x46/0x180
do_new_mount+0x1b0/0x2e0
path_mount+0x6ee/0x740
do_mount+0x98/0xe0
__do_sys_mount+0x148/0x180
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}:
check_noncircular+0x95/0xc0
check_prev_add+0x115/0x2f0
validate_chain+0x1cf/0x270
__lock_acquire+0x60e/0x780
lock_acquire.part.0+0xb4/0x1f0
_raw_spin_lock+0x2f/0x40
cifs_signal_cifsd_for_reconnect+0x134/0x200
__cifs_reconnect+0x8f/0x500
cifs_handle_standard+0x112/0x280
cifs_demultiplex_thread+0x64d/0xbc0
kthread+0x2f7/0x310
ret_from_fork+0x2a/0x230
ret_from_fork_asm+0x1a/0x30
other info that might help us debug this:
Chain exists of:
&tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ret_buf->chan_lock);
lock(&ret_buf->ses_lock);
lock(&ret_buf->chan_lock);
lock(&tcp_ses->srv_lock);
*** DEADLOCK ***
3 locks held by cifsd/6055:
#0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200
#1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200
#2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200
Cc: linux-cifs@vger.kernel.org
Reported-by: David Howells <dhowells@redhat.com>
Fixes: d7d7a66aacd6 ("cifs: avoid use of global locks for high contention data")
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix a 6.16 regression from the recovery pass rework, which introduced a
bug where calling bch2_run_explicit_recovery_pass() would only return
the error code to rewind recovery for the first call that scheduled that
recovery pass.
If the error code from the first call was swallowed (because it was
called by an asynchronous codepath), subsequent calls would go "ok, this
pass is already marked as needing to run" and return 0.
Fixing this ensures that check_topology bails out to run btree_node_scan
before doing any repair.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, calling bch2_btree_has_scanned_nodes() when btree node
scan hadn't actually run would erroniously return false - causing us to
think a btree was entirely gone.
This fixes a 6.16 regression from moving the scheduling of btree node
scan out of bch2_btree_lost_data() (fixing the bug where we'd schedule
it persistently in the superblock) and only scheduling it when
check_toploogy() is asking for scanned btree nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Autofix is specified in btree_gc.c if it's not an important btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Pull mount fixes from Al Viro:
"Several mount-related fixes"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
userns and mnt_idmap leak in open_tree_attr(2)
attach_recursive_mnt(): do not lock the covering tree when sliding something under it
replace collect_mounts()/drop_collected_mounts() with a safer variant
|
|
The WARN_ON_ONCE is introduced on truncate_folio_batch_exceptionals() to
capture whether the filesystem has removed all DAX entries or not.
And the fix has been applied on the filesystem xfs and ext4 by the commit
0e2f80afcfa6 ("fs/dax: ensure all pages are idle prior to filesystem
unmount").
Apply the missed fix on filesystem fuse to fix the runtime warning:
[ 2.011450] ------------[ cut here ]------------
[ 2.011873] WARNING: CPU: 0 PID: 145 at mm/truncate.c:89 truncate_folio_batch_exceptionals+0x272/0x2b0
[ 2.012468] Modules linked in:
[ 2.012718] CPU: 0 UID: 1000 PID: 145 Comm: weston Not tainted 6.16.0-rc2-WSL2-STABLE #2 PREEMPT(undef)
[ 2.013292] RIP: 0010:truncate_folio_batch_exceptionals+0x272/0x2b0
[ 2.013704] Code: 48 63 d0 41 29 c5 48 8d 1c d5 00 00 00 00 4e 8d 6c 2a 01 49 c1 e5 03 eb 09 48 83 c3 08 49 39 dd 74 83 41 f6 44 1c 08 01 74 ef <0f> 0b 49 8b 34 1e 48 89 ef e8 10 a2 17 00 eb df 48 8b 7d 00 e8 35
[ 2.014845] RSP: 0018:ffffa47ec33f3b10 EFLAGS: 00010202
[ 2.015279] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2.015884] RDX: 0000000000000000 RSI: ffffa47ec33f3ca0 RDI: ffff98aa44f3fa80
[ 2.016377] RBP: ffff98aa44f3fbf0 R08: ffffa47ec33f3ba8 R09: 0000000000000000
[ 2.016942] R10: 0000000000000001 R11: 0000000000000000 R12: ffffa47ec33f3ca0
[ 2.017437] R13: 0000000000000008 R14: ffffa47ec33f3ba8 R15: 0000000000000000
[ 2.017972] FS: 000079ce006afa40(0000) GS:ffff98aade441000(0000) knlGS:0000000000000000
[ 2.018510] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.018987] CR2: 000079ce03e74000 CR3: 000000010784f006 CR4: 0000000000372eb0
[ 2.019518] Call Trace:
[ 2.019729] <TASK>
[ 2.019901] truncate_inode_pages_range+0xd8/0x400
[ 2.020280] ? timerqueue_add+0x66/0xb0
[ 2.020574] ? get_nohz_timer_target+0x2a/0x140
[ 2.020904] ? timerqueue_add+0x66/0xb0
[ 2.021231] ? timerqueue_del+0x2e/0x50
[ 2.021646] ? __remove_hrtimer+0x39/0x90
[ 2.022017] ? srso_alias_untrain_ret+0x1/0x10
[ 2.022497] ? psi_group_change+0x136/0x350
[ 2.023046] ? _raw_spin_unlock+0xe/0x30
[ 2.023514] ? finish_task_switch.isra.0+0x8d/0x280
[ 2.024068] ? __schedule+0x532/0xbd0
[ 2.024551] fuse_evict_inode+0x29/0x190
[ 2.025131] evict+0x100/0x270
[ 2.025641] ? _atomic_dec_and_lock+0x39/0x50
[ 2.026316] ? __pfx_generic_delete_inode+0x10/0x10
[ 2.026843] __dentry_kill+0x71/0x180
[ 2.027335] dput+0xeb/0x1b0
[ 2.027725] __fput+0x136/0x2b0
[ 2.028054] __x64_sys_close+0x3d/0x80
[ 2.028469] do_syscall_64+0x6d/0x1b0
[ 2.028832] ? clear_bhb_loop+0x30/0x80
[ 2.029182] ? clear_bhb_loop+0x30/0x80
[ 2.029533] ? clear_bhb_loop+0x30/0x80
[ 2.029902] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 2.030423] RIP: 0033:0x79ce03d0d067
[ 2.030820] Code: b8 ff ff ff ff e9 3e ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 c3 a7 f8 ff
[ 2.032354] RSP: 002b:00007ffef0498948 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 2.032939] RAX: ffffffffffffffda RBX: 00007ffef0498960 RCX: 000079ce03d0d067
[ 2.033612] RDX: 0000000000000003 RSI: 0000000000001000 RDI: 000000000000000d
[ 2.034289] RBP: 00007ffef0498a30 R08: 000000000000000d R09: 0000000000000000
[ 2.034944] R10: 00007ffef0498978 R11: 0000000000000246 R12: 0000000000000001
[ 2.035610] R13: 00007ffef0498960 R14: 000079ce03e09ce0 R15: 0000000000000003
[ 2.036301] </TASK>
[ 2.036532] ---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20250621171507.3770-1-haiyuewa@163.com
Fixes: bde708f1a65d ("fs/dax: always remove DAX page-cache entries when breaking layouts")
Signed-off-by: Haiyue Wang <haiyuewa@163.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
is_zero_pfn() does not work for the huge zero folio. Fix it by using
is_huge_zero_pmd().
This can cause the PAGEMAP_SCAN ioctl against /proc/pid/pagemap to
present pages as PAGE_IS_PRESENT rather than as PAGE_IS_PFNZERO.
Found by code inspection.
Link: https://lkml.kernel.org/r/20250617143532.2375383-1-david@redhat.com
Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The generate '[FAILED TO PARSE]' strings in trace-cmd report output like this:
rm-5298 [001] 6084.533748493: smb3_exit_err: [FAILED TO PARSE] xid=972 func_name=cifs_rmdir rc=-39
rm-5298 [001] 6084.533959234: smb3_enter: [FAILED TO PARSE] xid=973 func_name=cifs_closedir
rm-5298 [001] 6084.533967630: smb3_close_enter: [FAILED TO PARSE] xid=973 fid=94489281833 tid=1 sesid=96758029877361
rm-5298 [001] 6084.534004008: smb3_cmd_enter: [FAILED TO PARSE] tid=1 sesid=96758029877361 cmd=6 mid=566
rm-5298 [001] 6084.552248232: smb3_cmd_done: [FAILED TO PARSE] tid=1 sesid=96758029877361 cmd=6 mid=566
rm-5298 [001] 6084.552280542: smb3_close_done: [FAILED TO PARSE] xid=973 fid=94489281833 tid=1 sesid=96758029877361
rm-5298 [001] 6084.552316034: smb3_exit_done: [FAILED TO PARSE] xid=973 func_name=cifs_closedir
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
transmit all data
We should not send smbdirect_data_transfer messages larger than
the negotiated max_send_size, typically 1364 bytes, which means
24 bytes of the smbdirect_data_transfer header + 1340 payload bytes.
This happened when doing an SMB2 write with more than 1340 bytes
(which is done inline as it's below rdma_readwrite_threshold).
It means the peer resets the connection.
When testing between cifs.ko and ksmbd.ko something like this
is logged:
client:
CIFS: VFS: RDMA transport re-established
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
CIFS: VFS: \\carina Send error in SessSetup = -11
smb2_reconnect: 12 callbacks suppressed
CIFS: VFS: reconnect tcon failed rc = -11
CIFS: VFS: reconnect tcon failed rc = -11
CIFS: VFS: reconnect tcon failed rc = -11
CIFS: VFS: SMB: Zero rsize calculated, using minimum value 65536
and:
CIFS: VFS: RDMA transport re-established
siw: got TERMINATE. layer 1, type 2, code 2
CIFS: VFS: smbd_recv:1894 disconnected
siw: got TERMINATE. layer 1, type 2, code 2
The ksmbd dmesg is showing things like:
smb_direct: Recv error. status='local length error (1)' opcode=128
smb_direct: disconnected
smb_direct: Recv error. status='local length error (1)' opcode=128
ksmbd: smb_direct: disconnected
ksmbd: sock_read failed: -107
As smbd_post_send_iter() limits the transmitted number of bytes
we need loop over it in order to transmit the whole iter.
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Meetakshi Setiya <msetiya@microsoft.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: <stable+noautosel@kernel.org> # sp->max_send_size should be info->max_send_size in backports
Fixes: 3d78fe73fa12 ("cifs: Build the RDMA SGE list directly from an iterator")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
wait_on_allocator() emits debug info when we hang trying to allocate.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reported-by: syzbot+d540192e763531d307ff@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Once want_mount_setattr() has returned a positive, it does require
finish_mount_kattr() to release ->mnt_userns. Failing do_mount_setattr()
does not change that.
As the result, we can end up leaking userns and possibly mnt_idmap as
well.
Fixes: c4a16820d901 ("fs: add open_tree_attr()")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This fixes a bug in commit 63c69ad3d18a ("fuse: refactor
fuse_fill_write_pages()") where max_pages << PAGE_SHIFT is mistakenly
used as the calculation for the max_pages upper limit but there's the
possibility that copy_folio_from_iter_atomic() may copy over bytes
from the iov_iter that are less than the full length of the folio,
which would lead to exceeding max_pages.
This commit fixes it by adding a 'ap->num_folios < max_folios' check.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/20250614000114.910380-1-joannelkoong@gmail.com
Fixes: 63c69ad3d18a ("fuse: refactor fuse_fill_write_pages()")
Tested-by: Brian Foster <bfoster@redhat.com>
Reported-by: Brian Foster <bfoster@redhat.com>
Closes: https://lore.kernel.org/linux-fsdevel/aEq4haEQScwHIWK6@bfoster/
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
- fix double-unlock introduced by the recent folio conversion
- fix stale page content beyond EOF complained by xfstests/generic/363
* tag 'f2fs-for-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: fix to zero post-eof page
f2fs: Fix __write_node_folio() conversion
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Fixes:
- fix invalid inode pointer dereferences during log replay
- fix a race between renames and directory logging
- fix shutting down delayed iput worker
- fix device byte accounting when dropping chunk
- in zoned mode, fix offset calculations for DUP profile when
conventional and sequential zones are used together
Regression fixes:
- fix possible double unlock of extent buffer tree (xarray
conversion)
- in zoned mode, fix extent buffer refcount when writing out extents
(xarray conversion)
Error handling fixes and updates:
- handle unexpected extent type when replaying log
- check and warn if there are remaining delayed inodes when putting a
root
- fix assertion when building free space tree
- handle csum tree error with mount option 'rescue=ibadroot'
Other:
- error message updates: add prefix to all scrub related messages,
include other information in messages"
* tag 'for-6.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: fix alloc_offset calculation for partly conventional block groups
btrfs: handle csum tree error with rescue=ibadroots correctly
btrfs: fix race between async reclaim worker and close_ctree()
btrfs: fix assertion when building free space tree
btrfs: don't silently ignore unexpected extent type when replaying log
btrfs: fix invalid inode pointer dereferences during log replay
btrfs: fix double unlock of buffer_tree xarray when releasing subpage eb
btrfs: update superblock's device bytes_used when dropping chunk
btrfs: fix a race between renames and directory logging
btrfs: scrub: add prefix for the error messages
btrfs: warn if leaking delayed_nodes in btrfs_put_root()
btrfs: fix delayed ref refcount leak in debug assertion
btrfs: include root in error message when unlinking inode
btrfs: don't drop a reference if btrfs_check_write_meta_pointer() fails
|
|
under it
If we are propagating across the userns boundary, we need to lock the
mounts added there. However, in case when something has already
been mounted there and we end up sliding a new tree under that,
the stuff that had been there before should not get locked.
IOW, lock_mnt_tree() should be called before we reparent the
preexisting tree on top of what we are adding.
Fixes: 3bd045cc9c4b ("separate copying and locking mount tree on cross-userns copies")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
collect_mounts() has several problems - one can't iterate over the results
directly, so it has to be done with callback passed to iterate_mounts();
it has an oopsable race with d_invalidate(); it creates temporary clones
of mounts invisibly for sync umount (IOW, you can have non-lazy umount
succeed leaving filesystem not mounted anywhere and yet still busy).
A saner approach is to give caller an array of struct path that would pin
every mount in a subtree, without cloning any mounts.
* collect_mounts()/drop_collected_mounts()/iterate_mounts() is gone
* collect_paths(where, preallocated, size) gives either ERR_PTR(-E...) or
a pointer to array of struct path, one for each chunk of tree visible under
'where' (i.e. the first element is a copy of where, followed by (mount,root)
for everything mounted under it - the same set collect_mounts() would give).
Unlike collect_mounts(), the mounts are *not* cloned - we just get pinning
references to the roots of subtrees in the caller's namespace.
Array is terminated by {NULL, NULL} struct path. If it fits into
preallocated array (on-stack, normally), that's where it goes; otherwise
it's allocated by kmalloc_array(). Passing 0 as size means that 'preallocated'
is ignored (and expected to be NULL).
* drop_collected_paths(paths, preallocated) is given the array returned
by an earlier call of collect_paths() and the preallocated array passed to that
call. All mount/dentry references are dropped and array is kfree'd if it's not
equal to 'preallocated'.
* instead of iterate_mounts(), users should just iterate over array
of struct path - nothing exotic is needed for that. Existing users (all in
audit_tree.c) are converted.
[folded a fix for braino reported by Venkat Rao Bagalkote <venkat88@linux.ibm.com>]
Fixes: 80b5dce8c59b0 ("vfs: Add a function to lazily unmount all mounts from any dentry")
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We found a few different systems hung up in writeback waiting on the same
page lock, and one task waiting on the NFS_LAYOUT_DRAIN bit in
pnfs_update_layout(), however the pnfs_layout_hdr's plh_outstanding count
was zero.
It seems most likely that this is another race between the waiter and waker
similar to commit ed0172af5d6f ("SUNRPC: Fix a race to wake a sync task").
Fix it up by applying the advised barrier.
Fixes: 880265c77ac4 ("pNFS: Avoid a live lock condition in pnfs_update_layout()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
syzbot reported a warning below [1] following a fault injection in
nfs_fs_proc_net_init(). [0]
When nfs_fs_proc_net_init() fails, /proc/net/rpc/nfs is not removed.
Later, rpc_proc_exit() tries to remove /proc/net/rpc, and the warning
is logged as the directory is not empty.
Let's handle the error of nfs_fs_proc_net_init() properly.
[0]:
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
CPU: 1 UID: 0 PID: 6120 Comm: syz.2.27 Not tainted 6.16.0-rc1-syzkaller-00010-g2c4a1f3fe03e #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:123)
should_fail_ex (lib/fault-inject.c:73 lib/fault-inject.c:174)
should_failslab (mm/failslab.c:46)
kmem_cache_alloc_noprof (mm/slub.c:4178 mm/slub.c:4204)
__proc_create (fs/proc/generic.c:427)
proc_create_reg (fs/proc/generic.c:554)
proc_create_net_data (fs/proc/proc_net.c:120)
nfs_fs_proc_net_init (fs/nfs/client.c:1409)
nfs_net_init (fs/nfs/inode.c:2600)
ops_init (net/core/net_namespace.c:138)
setup_net (net/core/net_namespace.c:443)
copy_net_ns (net/core/net_namespace.c:576)
create_new_namespaces (kernel/nsproxy.c:110)
unshare_nsproxy_namespaces (kernel/nsproxy.c:218 (discriminator 4))
ksys_unshare (kernel/fork.c:3123)
__x64_sys_unshare (kernel/fork.c:3190)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
</TASK>
[1]:
remove_proc_entry: removing non-empty directory 'net/rpc', leaking at least 'nfs'
WARNING: CPU: 1 PID: 6120 at fs/proc/generic.c:727 remove_proc_entry+0x45e/0x530 fs/proc/generic.c:727
Modules linked in:
CPU: 1 UID: 0 PID: 6120 Comm: syz.2.27 Not tainted 6.16.0-rc1-syzkaller-00010-g2c4a1f3fe03e #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
RIP: 0010:remove_proc_entry+0x45e/0x530 fs/proc/generic.c:727
Code: 3c 02 00 0f 85 85 00 00 00 48 8b 93 d8 00 00 00 4d 89 f0 4c 89 e9 48 c7 c6 40 ba a2 8b 48 c7 c7 60 b9 a2 8b e8 33 81 1d ff 90 <0f> 0b 90 90 e9 5f fe ff ff e8 04 69 5e ff 90 48 b8 00 00 00 00 00
RSP: 0018:ffffc90003637b08 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88805f534140 RCX: ffffffff817a92c8
RDX: ffff88807da99e00 RSI: ffffffff817a92d5 RDI: 0000000000000001
RBP: ffff888033431ac0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff888033431a00
R13: ffff888033431ae4 R14: ffff888033184724 R15: dffffc0000000000
FS: 0000555580328500(0000) GS:ffff888124a62000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f71733743e0 CR3: 000000007f618000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
sunrpc_exit_net+0x46/0x90 net/sunrpc/sunrpc_syms.c:76
ops_exit_list net/core/net_namespace.c:200 [inline]
ops_undo_list+0x2eb/0xab0 net/core/net_namespace.c:253
setup_net+0x2e1/0x510 net/core/net_namespace.c:457
copy_net_ns+0x2a6/0x5f0 net/core/net_namespace.c:574
create_new_namespaces+0x3ea/0xa90 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:218
ksys_unshare+0x45b/0xa40 kernel/fork.c:3121
__do_sys_unshare kernel/fork.c:3192 [inline]
__se_sys_unshare kernel/fork.c:3190 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:3190
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x490 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa1a6b8e929
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff3a090368 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 00007fa1a6db5fa0 RCX: 00007fa1a6b8e929
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000080
RBP: 00007fa1a6c10b39 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa1a6db5fa0 R14: 00007fa1a6db5fa0 R15: 0000000000000001
</TASK>
Fixes: d47151b79e32 ("nfs: expose /proc/net/sunrpc/nfs in net namespaces")
Reported-by: syzbot+a4cc4ac22daa4a71b87c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a4cc4ac22daa4a71b87c
Tested-by: syzbot+a4cc4ac22daa4a71b87c@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Some users and customers reported that their backup/copy tools started
to fail when the directory being copied contained symlink targets that
the client couldn't parse - even when those symlinks weren't followed.
Fix this by allowing lstat(2) and readlink(2) to succeed even when the
client can't resolve the symlink target, restoring old behavior.
Cc: linux-cifs@vger.kernel.org
Cc: stable@vger.kernel.org
Reported-by: Remy Monsen <monsen@monsen.cc>
Closes: https://lore.kernel.org/r/CAN+tdP7y=jqw3pBndZAGjQv0ObFq8Q=+PUDHgB36HdEz9QA6FQ@mail.gmail.com
Reported-by: Pierguido Lambri <plambri@redhat.com>
Fixes: 12b466eb52d9 ("cifs: Fix creating and resolving absolute NT-style symlinks")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Export anon_inode_make_secure_inode() to allow KVM guest_memfd to create
anonymous inodes with proper security context. This replaces the current
pattern of calling alloc_anon_inode() followed by
inode_init_security_anon() for creating security context manually.
This change also fixes a security regression in secretmem where the
S_PRIVATE flag was not cleared after alloc_anon_inode(), causing
LSM/SELinux checks to be bypassed for secretmem file descriptors.
As guest_memfd currently resides in the KVM module, we need to export this
symbol for use outside the core kernel. In the future, guest_memfd might be
moved to core-mm, at which point the symbols no longer would have to be
exported. When/if that happens is still unclear.
Fixes: 2bfe15c52612 ("mm: create security context for memfd_secret inodes")
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/20250620070328.803704-3-shivankg@amd.com
Acked-by: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure the array tracking which kernel text positions need to be
alternatives-patched doesn't get mishandled by out-of-order
modifications, leading to it overflowing and causing page faults when
patching
- Avoid an infinite loop when early code does a ranged TLB invalidation
before the broadcast TLB invalidation count of how many pages it can
flush, has been read from CPUID
- Fix a CONFIG_MODULES typo
- Disable broadcast TLB invalidation when PTI is enabled to avoid an
overflow of the bitmap tracking dynamic ASIDs which need to be
flushed when the kernel switches between the user and kernel address
space
- Handle the case of a CPU going offline and thus reporting zeroes when
reading top-level events in the resctrl code
* tag 'x86_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Fix int3 handling failure from broken text_poke array
x86/mm: Fix early boot use of INVPLGB
x86/its: Fix an ifdef typo in its_alloc()
x86/mm: Disable INVLPGB when PTI is enabled
x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Multichannel channel allocation fix for Kerberos mounts
- Two reconnect fixes
- Fix netfs_writepages crash with smbdirect/RDMA
- Directory caching fix
- Three minor cleanup fixes
- Log error when close cached dirs fails
* tag 'v6.16-rc2-smb3-client-fixes-v2' of git://git.samba.org/sfrench/cifs-2.6:
smb: minor fix to use SMB2_NTLMV2_SESSKEY_SIZE for auth_key size
smb: minor fix to use sizeof to initialize flags_string buffer
smb: Use loff_t for directory position in cached_dirents
smb: Log an error when close_all_cached_dirs fails
cifs: Fix prepare_write to negotiate wsize if needed
smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma()
smb: client: fix first command failure during re-negotiation
cifs: Remove duplicate fattr->cf_dtype assignment from wsl_to_fattr() function
smb: fix secondary channel creation issue with kerberos by populating hostname when adding channels
|
|
Before calling bch2_indirect_extent_missing_error(), we have to
calculate the missing range, which is the intersection of the reflink
pointer and the non-indirect-extent we found.
The calculation didn't take into account that the returned extent may
span the iter position, leading to an infinite loop when we
(unnecessarily) resized the extent we were returning to one that didn't
extend past the offset we were looking up.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Make sure we return a standard error code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Two fixes for commits in the nfsd-6.16 merge
- One fix for the recently-added NFSD netlink facility
- One fix for a remote SunRPC crasher
* tag 'nfsd-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
sunrpc: handle SVC_GARBAGE during svc auth processing as auth error
nfsd: use threads array as-is in netlink interface
SUNRPC: Cleanup/fix initial rq_pages allocation
NFSD: Avoid corruption of a referring call list
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Use the mounter’s credentials for file-backed mounts to resolve
Android SELinux permission issues
- Remove the unused trace event `erofs_destroy_inode`
- Error out on crafted out-of-file-range encoded extents
- Remove an incorrect check for encoded extents
* tag 'erofs-for-6.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: remove a superfluous check for encoded extents
erofs: refuse crafted out-of-file-range encoded extents
erofs: remove unused trace event erofs_destroy_inode
erofs: impersonate the opener's credentials when accessing backing file
|
|
Replaced hardcoded value 16 with SMB2_NTLMV2_SESSKEY_SIZE
in the auth_key definition and memcpy call.
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Replaced hardcoded length with sizeof(flags_string).
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Change the pos field in struct cached_dirents from int to loff_t
to support large directory offsets. This avoids overflow and
matches kernel conventions for directory positions.
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Under low-memory conditions, close_all_cached_dirs() can't move the
dentries to a separate list to dput() them once the locks are dropped.
This will result in a "Dentry still in use" error, so add an error
message that makes it clear this is what happened:
[ 495.281119] CIFS: VFS: \\otters.example.com\share Out of memory while dropping dentries
[ 495.281595] ------------[ cut here ]------------
[ 495.281887] BUG: Dentry ffff888115531138{i=78,n=/} still in use (2) [unmount of cifs cifs]
[ 495.282391] WARNING: CPU: 1 PID: 2329 at fs/dcache.c:1536 umount_check+0xc8/0xf0
Also, bail out of looping through all tcons as soon as a single
allocation fails, since we're already in trouble, and kmalloc() attempts
for subseqeuent tcons are likely to fail just like the first one did.
Signed-off-by: Paul Aurich <paul@darkrain42.org>
Acked-by: Bharath SM <bharathsm@microsoft.com>
Suggested-by: Ruben Devos <rdevos@oxya.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix cifs_prepare_write() to negotiate the wsize if it is unset.
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This fixes the following problem:
[ 749.901015] [ T8673] run fstests cifs/001 at 2025-06-17 09:40:30
[ 750.346409] [ T9870] ==================================================================
[ 750.346814] [ T9870] BUG: KASAN: slab-out-of-bounds in smb_set_sge+0x2cc/0x3b0 [cifs]
[ 750.347330] [ T9870] Write of size 8 at addr ffff888011082890 by task xfs_io/9870
[ 750.347705] [ T9870]
[ 750.348077] [ T9870] CPU: 0 UID: 0 PID: 9870 Comm: xfs_io Kdump: loaded Not tainted 6.16.0-rc2-metze.02+ #1 PREEMPT(voluntary)
[ 750.348082] [ T9870] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 750.348085] [ T9870] Call Trace:
[ 750.348086] [ T9870] <TASK>
[ 750.348088] [ T9870] dump_stack_lvl+0x76/0xa0
[ 750.348106] [ T9870] print_report+0xd1/0x640
[ 750.348116] [ T9870] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 750.348120] [ T9870] ? kasan_complete_mode_report_info+0x26/0x210
[ 750.348124] [ T9870] kasan_report+0xe7/0x130
[ 750.348128] [ T9870] ? smb_set_sge+0x2cc/0x3b0 [cifs]
[ 750.348262] [ T9870] ? smb_set_sge+0x2cc/0x3b0 [cifs]
[ 750.348377] [ T9870] __asan_report_store8_noabort+0x17/0x30
[ 750.348381] [ T9870] smb_set_sge+0x2cc/0x3b0 [cifs]
[ 750.348496] [ T9870] smbd_post_send_iter+0x1990/0x3070 [cifs]
[ 750.348625] [ T9870] ? __pfx_smbd_post_send_iter+0x10/0x10 [cifs]
[ 750.348741] [ T9870] ? update_stack_state+0x2a0/0x670
[ 750.348749] [ T9870] ? cifs_flush+0x153/0x320 [cifs]
[ 750.348870] [ T9870] ? cifs_flush+0x153/0x320 [cifs]
[ 750.348990] [ T9870] ? update_stack_state+0x2a0/0x670
[ 750.348995] [ T9870] smbd_send+0x58c/0x9c0 [cifs]
[ 750.349117] [ T9870] ? __pfx_smbd_send+0x10/0x10 [cifs]
[ 750.349231] [ T9870] ? unwind_get_return_address+0x65/0xb0
[ 750.349235] [ T9870] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 750.349242] [ T9870] ? arch_stack_walk+0xa7/0x100
[ 750.349250] [ T9870] ? stack_trace_save+0x92/0xd0
[ 750.349254] [ T9870] __smb_send_rqst+0x931/0xec0 [cifs]
[ 750.349374] [ T9870] ? kernel_text_address+0x173/0x190
[ 750.349379] [ T9870] ? kasan_save_stack+0x39/0x70
[ 750.349382] [ T9870] ? kasan_save_track+0x18/0x70
[ 750.349385] [ T9870] ? __kasan_slab_alloc+0x9d/0xa0
[ 750.349389] [ T9870] ? __pfx___smb_send_rqst+0x10/0x10 [cifs]
[ 750.349508] [ T9870] ? smb2_mid_entry_alloc+0xb4/0x7e0 [cifs]
[ 750.349626] [ T9870] ? cifs_call_async+0x277/0xb00 [cifs]
[ 750.349746] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs]
[ 750.349867] [ T9870] ? netfs_do_issue_write+0xc2/0x340 [netfs]
[ 750.349900] [ T9870] ? netfs_advance_write+0x45b/0x1270 [netfs]
[ 750.349929] [ T9870] ? netfs_write_folio+0xd6c/0x1be0 [netfs]
[ 750.349958] [ T9870] ? netfs_writepages+0x2e9/0xa80 [netfs]
[ 750.349987] [ T9870] ? do_writepages+0x21f/0x590
[ 750.349993] [ T9870] ? filemap_fdatawrite_wbc+0xe1/0x140
[ 750.349997] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 750.350002] [ T9870] smb_send_rqst+0x22e/0x2f0 [cifs]
[ 750.350131] [ T9870] ? __pfx_smb_send_rqst+0x10/0x10 [cifs]
[ 750.350255] [ T9870] ? local_clock_noinstr+0xe/0xd0
[ 750.350261] [ T9870] ? kasan_save_alloc_info+0x37/0x60
[ 750.350268] [ T9870] ? __kasan_check_write+0x14/0x30
[ 750.350271] [ T9870] ? _raw_spin_lock+0x81/0xf0
[ 750.350275] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10
[ 750.350278] [ T9870] ? smb2_setup_async_request+0x293/0x580 [cifs]
[ 750.350398] [ T9870] cifs_call_async+0x477/0xb00 [cifs]
[ 750.350518] [ T9870] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
[ 750.350636] [ T9870] ? __pfx_cifs_call_async+0x10/0x10 [cifs]
[ 750.350756] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10
[ 750.350760] [ T9870] ? __kasan_check_write+0x14/0x30
[ 750.350763] [ T9870] ? __smb2_plain_req_init+0x933/0x1090 [cifs]
[ 750.350891] [ T9870] smb2_async_writev+0x15ff/0x2460 [cifs]
[ 750.351008] [ T9870] ? sched_clock_noinstr+0x9/0x10
[ 750.351012] [ T9870] ? local_clock_noinstr+0xe/0xd0
[ 750.351018] [ T9870] ? __pfx_smb2_async_writev+0x10/0x10 [cifs]
[ 750.351144] [ T9870] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 750.351150] [ T9870] ? _raw_spin_unlock+0xe/0x40
[ 750.351154] [ T9870] ? cifs_pick_channel+0x242/0x370 [cifs]
[ 750.351275] [ T9870] cifs_issue_write+0x256/0x610 [cifs]
[ 750.351554] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs]
[ 750.351677] [ T9870] netfs_do_issue_write+0xc2/0x340 [netfs]
[ 750.351710] [ T9870] netfs_advance_write+0x45b/0x1270 [netfs]
[ 750.351740] [ T9870] ? rolling_buffer_append+0x12d/0x440 [netfs]
[ 750.351769] [ T9870] netfs_write_folio+0xd6c/0x1be0 [netfs]
[ 750.351798] [ T9870] ? __kasan_check_write+0x14/0x30
[ 750.351804] [ T9870] netfs_writepages+0x2e9/0xa80 [netfs]
[ 750.351835] [ T9870] ? __pfx_netfs_writepages+0x10/0x10 [netfs]
[ 750.351864] [ T9870] ? exit_files+0xab/0xe0
[ 750.351867] [ T9870] ? do_exit+0x148f/0x2980
[ 750.351871] [ T9870] ? do_group_exit+0xb5/0x250
[ 750.351874] [ T9870] ? arch_do_signal_or_restart+0x92/0x630
[ 750.351879] [ T9870] ? exit_to_user_mode_loop+0x98/0x170
[ 750.351882] [ T9870] ? do_syscall_64+0x2cf/0xd80
[ 750.351886] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 750.351890] [ T9870] do_writepages+0x21f/0x590
[ 750.351894] [ T9870] ? __pfx_do_writepages+0x10/0x10
[ 750.351897] [ T9870] filemap_fdatawrite_wbc+0xe1/0x140
[ 750.351901] [ T9870] __filemap_fdatawrite_range+0xba/0x100
[ 750.351904] [ T9870] ? __pfx___filemap_fdatawrite_range+0x10/0x10
[ 750.351912] [ T9870] ? __kasan_check_write+0x14/0x30
[ 750.351916] [ T9870] filemap_write_and_wait_range+0x7d/0xf0
[ 750.351920] [ T9870] cifs_flush+0x153/0x320 [cifs]
[ 750.352042] [ T9870] filp_flush+0x107/0x1a0
[ 750.352046] [ T9870] filp_close+0x14/0x30
[ 750.352049] [ T9870] put_files_struct.part.0+0x126/0x2a0
[ 750.352053] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10
[ 750.352058] [ T9870] exit_files+0xab/0xe0
[ 750.352061] [ T9870] do_exit+0x148f/0x2980
[ 750.352065] [ T9870] ? __pfx_do_exit+0x10/0x10
[ 750.352069] [ T9870] ? __kasan_check_write+0x14/0x30
[ 750.352072] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0
[ 750.352076] [ T9870] do_group_exit+0xb5/0x250
[ 750.352080] [ T9870] get_signal+0x22d3/0x22e0
[ 750.352086] [ T9870] ? __pfx_get_signal+0x10/0x10
[ 750.352089] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100
[ 750.352101] [ T9870] ? folio_add_lru+0xda/0x120
[ 750.352105] [ T9870] arch_do_signal_or_restart+0x92/0x630
[ 750.352109] [ T9870] ? __pfx_arch_do_signal_or_restart+0x10/0x10
[ 750.352115] [ T9870] exit_to_user_mode_loop+0x98/0x170
[ 750.352118] [ T9870] do_syscall_64+0x2cf/0xd80
[ 750.352123] [ T9870] ? __kasan_check_read+0x11/0x20
[ 750.352126] [ T9870] ? count_memcg_events+0x1b4/0x420
[ 750.352132] [ T9870] ? handle_mm_fault+0x148/0x690
[ 750.352136] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0
[ 750.352140] [ T9870] ? __kasan_check_read+0x11/0x20
[ 750.352143] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100
[ 750.352146] [ T9870] ? irqentry_exit_to_user_mode+0x2e/0x250
[ 750.352151] [ T9870] ? irqentry_exit+0x43/0x50
[ 750.352154] [ T9870] ? exc_page_fault+0x75/0xe0
[ 750.352160] [ T9870] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 750.352163] [ T9870] RIP: 0033:0x7858c94ab6e2
[ 750.352167] [ T9870] Code: Unable to access opcode bytes at 0x7858c94ab6b8.
[ 750.352175] [ T9870] RSP: 002b:00007858c9248ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000022
[ 750.352179] [ T9870] RAX: fffffffffffffdfe RBX: 00007858c92496c0 RCX: 00007858c94ab6e2
[ 750.352182] [ T9870] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 750.352184] [ T9870] RBP: 00007858c9248d10 R08: 0000000000000000 R09: 0000000000000000
[ 750.352185] [ T9870] R10: 0000000000000000 R11: 0000000000000246 R12: fffffffffffffde0
[ 750.352187] [ T9870] R13: 0000000000000020 R14: 0000000000000002 R15: 00007ffc072d2230
[ 750.352191] [ T9870] </TASK>
[ 750.352195] [ T9870]
[ 750.395206] [ T9870] Allocated by task 9870 on cpu 0 at 750.346406s:
[ 750.395523] [ T9870] kasan_save_stack+0x39/0x70
[ 750.395532] [ T9870] kasan_save_track+0x18/0x70
[ 750.395536] [ T9870] kasan_save_alloc_info+0x37/0x60
[ 750.395539] [ T9870] __kasan_slab_alloc+0x9d/0xa0
[ 750.395543] [ T9870] kmem_cache_alloc_noprof+0x13c/0x3f0
[ 750.395548] [ T9870] mempool_alloc_slab+0x15/0x20
[ 750.395553] [ T9870] mempool_alloc_noprof+0x135/0x340
[ 750.395557] [ T9870] smbd_post_send_iter+0x63e/0x3070 [cifs]
[ 750.395694] [ T9870] smbd_send+0x58c/0x9c0 [cifs]
[ 750.395819] [ T9870] __smb_send_rqst+0x931/0xec0 [cifs]
[ 750.395950] [ T9870] smb_send_rqst+0x22e/0x2f0 [cifs]
[ 750.396081] [ T9870] cifs_call_async+0x477/0xb00 [cifs]
[ 750.396232] [ T9870] smb2_async_writev+0x15ff/0x2460 [cifs]
[ 750.396359] [ T9870] cifs_issue_write+0x256/0x610 [cifs]
[ 750.396492] [ T9870] netfs_do_issue_write+0xc2/0x340 [netfs]
[ 750.396544] [ T9870] netfs_advance_write+0x45b/0x1270 [netfs]
[ 750.396576] [ T9870] netfs_write_folio+0xd6c/0x1be0 [netfs]
[ 750.396608] [ T9870] netfs_writepages+0x2e9/0xa80 [netfs]
[ 750.396639] [ T9870] do_writepages+0x21f/0x590
[ 750.396643] [ T9870] filemap_fdatawrite_wbc+0xe1/0x140
[ 750.396647] [ T9870] __filemap_fdatawrite_range+0xba/0x100
[ 750.396651] [ T9870] filemap_write_and_wait_range+0x7d/0xf0
[ 750.396656] [ T9870] cifs_flush+0x153/0x320 [cifs]
[ 750.396787] [ T9870] filp_flush+0x107/0x1a0
[ 750.396791] [ T9870] filp_close+0x14/0x30
[ 750.396795] [ T9870] put_files_struct.part.0+0x126/0x2a0
[ 750.396800] [ T9870] exit_files+0xab/0xe0
[ 750.396803] [ T9870] do_exit+0x148f/0x2980
[ 750.396808] [ T9870] do_group_exit+0xb5/0x250
[ 750.396813] [ T9870] get_signal+0x22d3/0x22e0
[ 750.396817] [ T9870] arch_do_signal_or_restart+0x92/0x630
[ 750.396822] [ T9870] exit_to_user_mode_loop+0x98/0x170
[ 750.396827] [ T9870] do_syscall_64+0x2cf/0xd80
[ 750.396832] [ T9870] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 750.396836] [ T9870]
[ 750.397150] [ T9870] The buggy address belongs to the object at ffff888011082800
which belongs to the cache smbd_request_0000000008f3bd7b of size 144
[ 750.397798] [ T9870] The buggy address is located 0 bytes to the right of
allocated 144-byte region [ffff888011082800, ffff888011082890)
[ 750.398469] [ T9870]
[ 750.398800] [ T9870] The buggy address belongs to the physical page:
[ 750.399141] [ T9870] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11082
[ 750.399148] [ T9870] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
[ 750.399155] [ T9870] page_type: f5(slab)
[ 750.399161] [ T9870] raw: 000fffffc0000000 ffff888022d65640 dead000000000122 0000000000000000
[ 750.399165] [ T9870] raw: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000
[ 750.399169] [ T9870] page dumped because: kasan: bad access detected
[ 750.399172] [ T9870]
[ 750.399505] [ T9870] Memory state around the buggy address:
[ 750.399863] [ T9870] ffff888011082780: fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 750.400247] [ T9870] ffff888011082800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 750.400618] [ T9870] >ffff888011082880: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 750.400982] [ T9870] ^
[ 750.401370] [ T9870] ffff888011082900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 750.401774] [ T9870] ffff888011082980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 750.402171] [ T9870] ==================================================================
[ 750.402696] [ T9870] Disabling lock debugging due to kernel taint
[ 750.403202] [ T9870] BUG: unable to handle page fault for address: ffff8880110a2000
[ 750.403797] [ T9870] #PF: supervisor write access in kernel mode
[ 750.404204] [ T9870] #PF: error_code(0x0003) - permissions violation
[ 750.404581] [ T9870] PGD 5ce01067 P4D 5ce01067 PUD 5ce02067 PMD 78aa063 PTE 80000000110a2021
[ 750.404969] [ T9870] Oops: Oops: 0003 [#1] SMP KASAN PTI
[ 750.405394] [ T9870] CPU: 0 UID: 0 PID: 9870 Comm: xfs_io Kdump: loaded Tainted: G B 6.16.0-rc2-metze.02+ #1 PREEMPT(voluntary)
[ 750.406510] [ T9870] Tainted: [B]=BAD_PAGE
[ 750.406967] [ T9870] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 750.407440] [ T9870] RIP: 0010:smb_set_sge+0x15c/0x3b0 [cifs]
[ 750.408065] [ T9870] Code: 48 83 f8 ff 0f 84 b0 00 00 00 48 ba 00 00 00 00 00 fc ff df 4c 89 e1 48 c1 e9 03 80 3c 11 00 0f 85 69 01 00 00 49 8d 7c 24 08 <49> 89 04 24 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 0f
[ 750.409283] [ T9870] RSP: 0018:ffffc90005e2e758 EFLAGS: 00010246
[ 750.409803] [ T9870] RAX: ffff888036c53400 RBX: ffffc90005e2e878 RCX: 1ffff11002214400
[ 750.410323] [ T9870] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff8880110a2008
[ 750.411217] [ T9870] RBP: ffffc90005e2e798 R08: 0000000000000001 R09: 0000000000000400
[ 750.411770] [ T9870] R10: ffff888011082800 R11: 0000000000000000 R12: ffff8880110a2000
[ 750.412325] [ T9870] R13: 0000000000000000 R14: ffffc90005e2e888 R15: ffff88801a4b6000
[ 750.412901] [ T9870] FS: 0000000000000000(0000) GS:ffff88812bc68000(0000) knlGS:0000000000000000
[ 750.413477] [ T9870] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 750.414077] [ T9870] CR2: ffff8880110a2000 CR3: 000000005b0a6005 CR4: 00000000000726f0
[ 750.414654] [ T9870] Call Trace:
[ 750.415211] [ T9870] <TASK>
[ 750.415748] [ T9870] smbd_post_send_iter+0x1990/0x3070 [cifs]
[ 750.416449] [ T9870] ? __pfx_smbd_post_send_iter+0x10/0x10 [cifs]
[ 750.417128] [ T9870] ? update_stack_state+0x2a0/0x670
[ 750.417685] [ T9870] ? cifs_flush+0x153/0x320 [cifs]
[ 750.418380] [ T9870] ? cifs_flush+0x153/0x320 [cifs]
[ 750.419055] [ T9870] ? update_stack_state+0x2a0/0x670
[ 750.419624] [ T9870] smbd_send+0x58c/0x9c0 [cifs]
[ 750.420297] [ T9870] ? __pfx_smbd_send+0x10/0x10 [cifs]
[ 750.420936] [ T9870] ? unwind_get_return_address+0x65/0xb0
[ 750.421456] [ T9870] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 750.421954] [ T9870] ? arch_stack_walk+0xa7/0x100
[ 750.422460] [ T9870] ? stack_trace_save+0x92/0xd0
[ 750.422948] [ T9870] __smb_send_rqst+0x931/0xec0 [cifs]
[ 750.423579] [ T9870] ? kernel_text_address+0x173/0x190
[ 750.424056] [ T9870] ? kasan_save_stack+0x39/0x70
[ 750.424813] [ T9870] ? kasan_save_track+0x18/0x70
[ 750.425323] [ T9870] ? __kasan_slab_alloc+0x9d/0xa0
[ 750.425831] [ T9870] ? __pfx___smb_send_rqst+0x10/0x10 [cifs]
[ 750.426548] [ T9870] ? smb2_mid_entry_alloc+0xb4/0x7e0 [cifs]
[ 750.427231] [ T9870] ? cifs_call_async+0x277/0xb00 [cifs]
[ 750.427882] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs]
[ 750.428909] [ T9870] ? netfs_do_issue_write+0xc2/0x340 [netfs]
[ 750.429425] [ T9870] ? netfs_advance_write+0x45b/0x1270 [netfs]
[ 750.429882] [ T9870] ? netfs_write_folio+0xd6c/0x1be0 [netfs]
[ 750.430345] [ T9870] ? netfs_writepages+0x2e9/0xa80 [netfs]
[ 750.430809] [ T9870] ? do_writepages+0x21f/0x590
[ 750.431239] [ T9870] ? filemap_fdatawrite_wbc+0xe1/0x140
[ 750.431652] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 750.432041] [ T9870] smb_send_rqst+0x22e/0x2f0 [cifs]
[ 750.432586] [ T9870] ? __pfx_smb_send_rqst+0x10/0x10 [cifs]
[ 750.433108] [ T9870] ? local_clock_noinstr+0xe/0xd0
[ 750.433482] [ T9870] ? kasan_save_alloc_info+0x37/0x60
[ 750.433855] [ T9870] ? __kasan_check_write+0x14/0x30
[ 750.434214] [ T9870] ? _raw_spin_lock+0x81/0xf0
[ 750.434561] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10
[ 750.434903] [ T9870] ? smb2_setup_async_request+0x293/0x580 [cifs]
[ 750.435394] [ T9870] cifs_call_async+0x477/0xb00 [cifs]
[ 750.435892] [ T9870] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
[ 750.436388] [ T9870] ? __pfx_cifs_call_async+0x10/0x10 [cifs]
[ 750.436881] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10
[ 750.437237] [ T9870] ? __kasan_check_write+0x14/0x30
[ 750.437579] [ T9870] ? __smb2_plain_req_init+0x933/0x1090 [cifs]
[ 750.438062] [ T9870] smb2_async_writev+0x15ff/0x2460 [cifs]
[ 750.438557] [ T9870] ? sched_clock_noinstr+0x9/0x10
[ 750.438906] [ T9870] ? local_clock_noinstr+0xe/0xd0
[ 750.439293] [ T9870] ? __pfx_smb2_async_writev+0x10/0x10 [cifs]
[ 750.439786] [ T9870] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 750.440143] [ T9870] ? _raw_spin_unlock+0xe/0x40
[ 750.440495] [ T9870] ? cifs_pick_channel+0x242/0x370 [cifs]
[ 750.440989] [ T9870] cifs_issue_write+0x256/0x610 [cifs]
[ 750.441492] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs]
[ 750.441987] [ T9870] netfs_do_issue_write+0xc2/0x340 [netfs]
[ 750.442387] [ T9870] netfs_advance_write+0x45b/0x1270 [netfs]
[ 750.442969] [ T9870] ? rolling_buffer_append+0x12d/0x440 [netfs]
[ 750.443376] [ T9870] netfs_write_folio+0xd6c/0x1be0 [netfs]
[ 750.443768] [ T9870] ? __kasan_check_write+0x14/0x30
[ 750.444145] [ T9870] netfs_writepages+0x2e9/0xa80 [netfs]
[ 750.444541] [ T9870] ? __pfx_netfs_writepages+0x10/0x10 [netfs]
[ 750.444936] [ T9870] ? exit_files+0xab/0xe0
[ 750.445312] [ T9870] ? do_exit+0x148f/0x2980
[ 750.445672] [ T9870] ? do_group_exit+0xb5/0x250
[ 750.446028] [ T9870] ? arch_do_signal_or_restart+0x92/0x630
[ 750.446402] [ T9870] ? exit_to_user_mode_loop+0x98/0x170
[ 750.446762] [ T9870] ? do_syscall_64+0x2cf/0xd80
[ 750.447132] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 750.447499] [ T9870] do_writepages+0x21f/0x590
[ 750.447859] [ T9870] ? __pfx_do_writepages+0x10/0x10
[ 750.448236] [ T9870] filemap_fdatawrite_wbc+0xe1/0x140
[ 750.448595] [ T9870] __filemap_fdatawrite_range+0xba/0x100
[ 750.448953] [ T9870] ? __pfx___filemap_fdatawrite_range+0x10/0x10
[ 750.449336] [ T9870] ? __kasan_check_write+0x14/0x30
[ 750.449697] [ T9870] filemap_write_and_wait_range+0x7d/0xf0
[ 750.450062] [ T9870] cifs_flush+0x153/0x320 [cifs]
[ 750.450592] [ T9870] filp_flush+0x107/0x1a0
[ 750.450952] [ T9870] filp_close+0x14/0x30
[ 750.451322] [ T9870] put_files_struct.part.0+0x126/0x2a0
[ 750.451678] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10
[ 750.452033] [ T9870] exit_files+0xab/0xe0
[ 750.452401] [ T9870] do_exit+0x148f/0x2980
[ 750.452751] [ T9870] ? __pfx_do_exit+0x10/0x10
[ 750.453109] [ T9870] ? __kasan_check_write+0x14/0x30
[ 750.453459] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0
[ 750.453787] [ T9870] do_group_exit+0xb5/0x250
[ 750.454082] [ T9870] get_signal+0x22d3/0x22e0
[ 750.454406] [ T9870] ? __pfx_get_signal+0x10/0x10
[ 750.454709] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100
[ 750.455031] [ T9870] ? folio_add_lru+0xda/0x120
[ 750.455347] [ T9870] arch_do_signal_or_restart+0x92/0x630
[ 750.455656] [ T9870] ? __pfx_arch_do_signal_or_restart+0x10/0x10
[ 750.455967] [ T9870] exit_to_user_mode_loop+0x98/0x170
[ 750.456282] [ T9870] do_syscall_64+0x2cf/0xd80
[ 750.456591] [ T9870] ? __kasan_check_read+0x11/0x20
[ 750.456897] [ T9870] ? count_memcg_events+0x1b4/0x420
[ 750.457280] [ T9870] ? handle_mm_fault+0x148/0x690
[ 750.457616] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0
[ 750.457925] [ T9870] ? __kasan_check_read+0x11/0x20
[ 750.458297] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100
[ 750.458672] [ T9870] ? irqentry_exit_to_user_mode+0x2e/0x250
[ 750.459191] [ T9870] ? irqentry_exit+0x43/0x50
[ 750.459600] [ T9870] ? exc_page_fault+0x75/0xe0
[ 750.460130] [ T9870] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 750.460570] [ T9870] RIP: 0033:0x7858c94ab6e2
[ 750.461206] [ T9870] Code: Unable to access opcode bytes at 0x7858c94ab6b8.
[ 750.461780] [ T9870] RSP: 002b:00007858c9248ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000022
[ 750.462327] [ T9870] RAX: fffffffffffffdfe RBX: 00007858c92496c0 RCX: 00007858c94ab6e2
[ 750.462653] [ T9870] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 750.462969] [ T9870] RBP: 00007858c9248d10 R08: 0000000000000000 R09: 0000000000000000
[ 750.463290] [ T9870] R10: 0000000000000000 R11: 0000000000000246 R12: fffffffffffffde0
[ 750.463640] [ T9870] R13: 0000000000000020 R14: 0000000000000002 R15: 00007ffc072d2230
[ 750.463965] [ T9870] </TASK>
[ 750.464285] [ T9870] Modules linked in: siw ib_uverbs ccm cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs softdog vboxsf vboxguest cpuid intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_vsec polyval_clmulni ghash_clmulni_intel sha1_ssse3 aesni_intel rapl i2c_piix4 i2c_smbus joydev input_leds mac_hid sunrpc binfmt_misc kvm_intel kvm irqbypass sch_fq_codel efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci dmi_sysfs ip_tables x_tables autofs4 hid_generic vboxvideo usbhid drm_vram_helper psmouse vga16fb vgastate drm_ttm_helper serio_raw hid ahci libahci ttm pata_acpi video wmi [last unloaded: vboxguest]
[ 750.467127] [ T9870] CR2: ffff8880110a2000
cc: Tom Talpey <tom@talpey.com>
cc: linux-cifs@vger.kernel.org
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Fixes: c45ebd636c32 ("cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
after fabc4ed200f9, server_unresponsive add a condition to check whether client
need to reconnect depending on server->lstrp. When client failed to reconnect
for some time and abort connection, server->lstrp is updated for the last time.
In the following scene, server->lstrp is too old. This cause next command
failure in re-negotiation rather than waiting for re-negotiation done.
1. mount -t cifs -o username=Everyone,echo_internal=10 //$server_ip/export /mnt
2. ssh $server_ip "echo b > /proc/sysrq-trigger &"
3. ls /mnt
4. sleep 21s
5. ssh $server_ip "service firewalld stop"
6. ls # return EHOSTDOWN
If the interval between 5 and 6 is too small, 6 may trigger sending negotiation
request. Before backgrounding cifsd thread try to receive negotiation response
from server in cifs_readv_from_socket, server_unresponsive may trigger
cifs_reconnect which cause 6 to be failed:
ls thread
----------------
smb2_negotiate
server->tcpStatus = CifsInNegotiate
compound_send_recv
wait_for_compound_request
cifsd thread
----------------
cifs_readv_from_socket
server_unresponsive
server->tcpStatus == CifsInNegotiate && jiffies > server->lstrp + 20s
cifs_reconnect
cifs_abort_connection: mid_state = MID_RETRY_NEEDED
ls thread
----------------
cifs_sync_mid_result return EAGAIN
smb2_negotiate return EHOSTDOWN
Though server->lstrp means last server response time, it is updated in
cifs_abort_connection and cifs_get_tcp_session. We can also update server->lstrp
before switching into CifsInNegotiate state to avoid failure in 6.
Fixes: 7ccc1465465d ("smb: client: fix hang in wait_for_response() for negproto")
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Acked-by: Meetakshi Setiya <msetiya@microsoft.com>
Signed-off-by: zhangjian <zhangjian496@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
It is possible when an inode is split into segments for multi-threaded
compression, and the tail extent of a segment could also be small.
Fixes: 1d191b4ca51d ("erofs: implement encoded extent metadata")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250620153108.1368029-1-hsiangkao@linux.alibaba.com
|