diff options
| author | Dave Chinner <dchinner@redhat.com> | 2024-11-13 11:11:40 +0100 | 
|---|---|---|
| committer | Carlos Maiolino <cem@kernel.org> | 2024-11-22 11:24:51 +0100 | 
| commit | a8581099604dfa609a34a3fac8ef5af0d300d2c1 (patch) | |
| tree | 76979bdb525900f50cf28f62c94f59bb68d93aad /scripts/bpf_doc.py | |
| parent | c9c293240e4351aa2678186cd88a08141fc6ce9e (diff) | |
xfs: prevent mount and log shutdown race
I recently had an fstests hang where there were two internal tasks
stuck like so:
[ 6559.010870] task:kworker/24:45   state:D stack:12152 pid:631308 tgid:631308 ppid:2      flags:0x00004000
[ 6559.016984] Workqueue: xfs-buf/dm-2 xfs_buf_ioend_work
[ 6559.020349] Call Trace:
[ 6559.022002]  <TASK>
[ 6559.023426]  __schedule+0x650/0xb10
[ 6559.025734]  schedule+0x6d/0xf0
[ 6559.027835]  schedule_timeout+0x31/0x180
[ 6559.030582]  wait_for_common+0x10c/0x1e0
[ 6559.033495]  wait_for_completion+0x1d/0x30
[ 6559.036463]  __flush_workqueue+0xeb/0x490
[ 6559.039479]  ? mempool_alloc_slab+0x15/0x20
[ 6559.042537]  xlog_cil_force_seq+0xa1/0x2f0
[ 6559.045498]  ? bio_alloc_bioset+0x1d8/0x510
[ 6559.048578]  ? submit_bio_noacct+0x2f2/0x380
[ 6559.051665]  ? xlog_force_shutdown+0x3b/0x170
[ 6559.054819]  xfs_log_force+0x77/0x230
[ 6559.057455]  xlog_force_shutdown+0x3b/0x170
[ 6559.060507]  xfs_do_force_shutdown+0xd4/0x200
[ 6559.063798]  ? xfs_buf_rele+0x1bd/0x580
[ 6559.066541]  xfs_buf_ioend_handle_error+0x163/0x2e0
[ 6559.070099]  xfs_buf_ioend+0x61/0x200
[ 6559.072728]  xfs_buf_ioend_work+0x15/0x20
[ 6559.075706]  process_scheduled_works+0x1d4/0x400
[ 6559.078814]  worker_thread+0x234/0x2e0
[ 6559.081300]  kthread+0x147/0x170
[ 6559.083462]  ? __pfx_worker_thread+0x10/0x10
[ 6559.086295]  ? __pfx_kthread+0x10/0x10
[ 6559.088771]  ret_from_fork+0x3e/0x50
[ 6559.091153]  ? __pfx_kthread+0x10/0x10
[ 6559.093624]  ret_from_fork_asm+0x1a/0x30
[ 6559.096227]  </TASK>
[ 6559.109304] Workqueue: xfs-cil/dm-2 xlog_cil_push_work
[ 6559.112673] Call Trace:
[ 6559.114333]  <TASK>
[ 6559.115760]  __schedule+0x650/0xb10
[ 6559.118084]  schedule+0x6d/0xf0
[ 6559.120175]  schedule_timeout+0x31/0x180
[ 6559.122776]  ? call_rcu+0xee/0x2f0
[ 6559.125034]  __down_common+0xbe/0x1f0
[ 6559.127470]  __down+0x1d/0x30
[ 6559.129458]  down+0x48/0x50
[ 6559.131343]  ? xfs_buf_item_unpin+0x8d/0x380
[ 6559.134213]  xfs_buf_lock+0x3d/0xe0
[ 6559.136544]  xfs_buf_item_unpin+0x8d/0x380
[ 6559.139253]  xlog_cil_committed+0x287/0x520
[ 6559.142019]  ? sched_clock+0x10/0x30
[ 6559.144384]  ? sched_clock_cpu+0x10/0x190
[ 6559.147039]  ? psi_group_change+0x48/0x310
[ 6559.149735]  ? _raw_spin_unlock+0xe/0x30
[ 6559.152340]  ? finish_task_switch+0xbc/0x310
[ 6559.155163]  xlog_cil_process_committed+0x6d/0x90
[ 6559.158265]  xlog_state_shutdown_callbacks+0x53/0x110
[ 6559.161564]  ? xlog_cil_push_work+0xa70/0xaf0
[ 6559.164441]  xlog_state_release_iclog+0xba/0x1b0
[ 6559.167483]  xlog_cil_push_work+0xa70/0xaf0
[ 6559.170260]  process_scheduled_works+0x1d4/0x400
[ 6559.173286]  worker_thread+0x234/0x2e0
[ 6559.175779]  kthread+0x147/0x170
[ 6559.177933]  ? __pfx_worker_thread+0x10/0x10
[ 6559.180748]  ? __pfx_kthread+0x10/0x10
[ 6559.183231]  ret_from_fork+0x3e/0x50
[ 6559.185601]  ? __pfx_kthread+0x10/0x10
[ 6559.188092]  ret_from_fork_asm+0x1a/0x30
[ 6559.190692]  </TASK>
This is an ABBA deadlock where buffer IO completion is triggering a
forced shutdown with the buffer lock held. It is waiting for the CIL
to flush as part of the log force. The CIL flush is blocked doing
shutdown processing of all it's objects, trying to unpin a buffer
item. That requires taking the buffer lock....
For the CIL to be doing shutdown processing, the log must be marked
with XLOG_IO_ERROR, but that doesn't happen until after the log
force is issued. Hence for xfs_do_force_shutdown() to be forcing
the log on a shut down log, we must have had a racing
xlog_force_shutdown and xfs_force_shutdown like so:
p0			p1			CIL push
   			<holds buffer lock>
xlog_force_shutdown
  xfs_log_force
   test_and_set_bit(XLOG_IO_ERROR)
   						xlog_state_release_iclog()
						  sees XLOG_IO_ERROR
						  xlog_state_shutdown_callbacks
						    ....
						    xfs_buf_item_unpin
						    xfs_buf_lock
						    <blocks on buffer p1 holds>
   			xfs_force_shutdown
			  xfs_set_shutdown(mp) wins
			    xlog_force_shutdown
			      xfs_log_force
			        <blocks on CIL push>
  xfs_set_shutdown(mp) fails
  <shuts down rest of log>
The deadlock can be mitigated by avoiding the log force on the
second pass through xlog_force_shutdown. Do this by adding another
atomic state bit (XLOG_OP_PENDING_SHUTDOWN) that is set on entry to
xlog_force_shutdown() but doesn't mark the log as shutdown.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Diffstat (limited to 'scripts/bpf_doc.py')
0 files changed, 0 insertions, 0 deletions
