diff options
author | Leo Martins <loemra.dev@gmail.com> | 2025-07-21 10:49:16 -0700 |
---|---|---|
committer | David Sterba <dsterba@suse.com> | 2025-08-07 17:07:15 +0200 |
commit | ad580dfa388fabb52af033e3f8cc5d04be985e54 (patch) | |
tree | c38912c063f2613005a47df74aec905efbe917a9 /scripts/lib/kdoc/kdoc_output.py | |
parent | 0a32e4f0025a74c70dcab4478e9b29c22f5ecf2f (diff) |
btrfs: fix subpage deadlock in try_release_subpage_extent_buffer()
There is a potential deadlock that can happen in
try_release_subpage_extent_buffer() because the irq-safe xarray spin
lock fs_info->buffer_tree is being acquired before the irq-unsafe
eb->refs_lock.
This leads to the potential race:
// T1 (random eb->refs user) // T2 (release folio)
spin_lock(&eb->refs_lock);
// interrupt
end_bbio_meta_write()
btrfs_meta_folio_clear_writeback()
btree_release_folio()
folio_test_writeback() //false
try_release_extent_buffer()
try_release_subpage_extent_buffer()
xa_lock_irq(&fs_info->buffer_tree)
spin_lock(&eb->refs_lock); // blocked; held by T1
buffer_tree_clear_mark()
xas_lock_irqsave() // blocked; held by T2
I believe that the spin lock can safely be replaced by an rcu_read_lock.
The xa_for_each loop does not need the spin lock as it's already
internally protected by the rcu_read_lock. The extent buffer is also
protected by the rcu_read_lock so it won't be freed before we take the
eb->refs_lock and check the ref count.
The rcu_read_lock is taken and released every iteration, just like the
spin lock, which means we're not protected against concurrent
insertions into the xarray. This is fine because we rely on
folio->private to detect if there are any ebs remaining in the folio.
There is already some precedent for this with find_extent_buffer_nolock,
which loads an extent buffer from the xarray with only rcu_read_lock.
lockdep warning:
=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.16.0-0_fbk701_debug_rc0_123_g4c06e63b9203 #1 Tainted: G E N
-----------------------------------------------------
kswapd0/66 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffff000011ffd600 (&eb->refs_lock){+.+.}-{3:3}, at: try_release_extent_buffer+0x18c/0x560
and this task is already holding:
ffff0000c1d91b88 (&buffer_xa_class){-.-.}-{3:3}, at: try_release_extent_buffer+0x13c/0x560
which would create a new lock dependency:
(&buffer_xa_class){-.-.}-{3:3} -> (&eb->refs_lock){+.+.}-{3:3}
but this new dependency connects a HARDIRQ-irq-safe lock:
(&buffer_xa_class){-.-.}-{3:3}
... which became HARDIRQ-irq-safe at:
lock_acquire+0x178/0x358
_raw_spin_lock_irqsave+0x60/0x88
buffer_tree_clear_mark+0xc4/0x160
end_bbio_meta_write+0x238/0x398
btrfs_bio_end_io+0x1f8/0x330
btrfs_orig_write_end_io+0x1c4/0x2c0
bio_endio+0x63c/0x678
blk_update_request+0x1c4/0xa00
blk_mq_end_request+0x54/0x88
virtblk_request_done+0x124/0x1d0
blk_mq_complete_request+0x84/0xa0
virtblk_done+0x130/0x238
vring_interrupt+0x130/0x288
__handle_irq_event_percpu+0x1e8/0x708
handle_irq_event+0x98/0x1b0
handle_fasteoi_irq+0x264/0x7c0
generic_handle_domain_irq+0xa4/0x108
gic_handle_irq+0x7c/0x1a0
do_interrupt_handler+0xe4/0x148
el1_interrupt+0x30/0x50
el1h_64_irq_handler+0x14/0x20
el1h_64_irq+0x6c/0x70
_raw_spin_unlock_irq+0x38/0x70
__run_timer_base+0xdc/0x5e0
run_timer_softirq+0xa0/0x138
handle_softirqs.llvm.13542289750107964195+0x32c/0xbd0
____do_softirq.llvm.17674514681856217165+0x18/0x28
call_on_irq_stack+0x24/0x30
__irq_exit_rcu+0x164/0x430
irq_exit_rcu+0x18/0x88
el1_interrupt+0x34/0x50
el1h_64_irq_handler+0x14/0x20
el1h_64_irq+0x6c/0x70
arch_local_irq_enable+0x4/0x8
do_idle+0x1a0/0x3b8
cpu_startup_entry+0x60/0x80
rest_init+0x204/0x228
start_kernel+0x394/0x3f0
__primary_switched+0x8c/0x8958
to a HARDIRQ-irq-unsafe lock:
(&eb->refs_lock){+.+.}-{3:3}
... which became HARDIRQ-irq-unsafe at:
...
lock_acquire+0x178/0x358
_raw_spin_lock+0x4c/0x68
free_extent_buffer_stale+0x2c/0x170
btrfs_read_sys_array+0x1b0/0x338
open_ctree+0xeb0/0x1df8
btrfs_get_tree+0xb60/0x1110
vfs_get_tree+0x8c/0x250
fc_mount+0x20/0x98
btrfs_get_tree+0x4a4/0x1110
vfs_get_tree+0x8c/0x250
do_new_mount+0x1e0/0x6c0
path_mount+0x4ec/0xa58
__arm64_sys_mount+0x370/0x490
invoke_syscall+0x6c/0x208
el0_svc_common+0x14c/0x1b8
do_el0_svc+0x4c/0x60
el0_svc+0x4c/0x160
el0t_64_sync_handler+0x70/0x100
el0t_64_sync+0x168/0x170
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&eb->refs_lock);
local_irq_disable();
lock(&buffer_xa_class);
lock(&eb->refs_lock);
<Interrupt>
lock(&buffer_xa_class);
*** DEADLOCK ***
2 locks held by kswapd0/66:
#0: ffff800085506e40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xe8/0xe50
#1: ffff0000c1d91b88 (&buffer_xa_class){-.-.}-{3:3}, at: try_release_extent_buffer+0x13c/0x560
Link: https://www.kernel.org/doc/Documentation/locking/lockdep-design.rst#:~:text=Multi%2Dlock%20dependency%20rules%3A
Fixes: 19d7f65f032f ("btrfs: convert the buffer_radix to an xarray")
CC: stable@vger.kernel.org # 6.16+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'scripts/lib/kdoc/kdoc_output.py')
0 files changed, 0 insertions, 0 deletions