Age | Commit message (Collapse) | Author |
|
This fixes two deadlocks:
1.pcpu_alloc_mutex involved one as pointed by syzbot[1]
2.recursion deadlock.
The root cause is that we hold the bc lock during alloc_percpu, fix it
by following the pattern used by __btree_node_mem_alloc().
[1] https://lore.kernel.org/all/66f97d9a.050a0220.6bad9.001d.GAE@google.com/T/
Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The transaction is going to abort, so there will be no cycle involving
this transaction anymore.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When the cycle doesn't involve the initiator of the cycle detection,
we might choose a transaction that is not involved in the cycle to abort.
It shouldn't be that since it won't break the cycle, this patch
therefore chooses the transaction in the cycle to abort.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This patch introduces a helper function called lock_graph_pop_from,
it pops the graph from i.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If the transaction chose itself as a victim before and restarted, it
might request a no fail lock request this time. But it might be added to
others' lock graph and be chose as the victim again, it's no longer safe
without additional check. We can also convert the cycle detector to be
fully RCU-based to solve that unsoundness, but the latency added to trans_put
and additional memory required may not worth it.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This reverts commit 62448afee714354a26db8a0f3c644f58628f0792.
six_lock_tryupgrade fails only if there is an intent lock held,
it won't fail no matter how many read locks are held.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
New helper for dropping all write locks; which is distinct from the
helper the transaction commit path uses, which is faster and only
touches updates.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If the btree_path's lock seq is wrong, the next bch2_trans_relock()
operation is guaranteed to fail and we take an unnecessary transaction
restart.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
fix some spurious lockdep splats
Reported-by: syzbot+e088be3c2d5c05aaac35@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We no longer track individual btree node locks with lockdep, so this
will never be enabled.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new helper to disable lockdep tracking entirely for a given class.
This is needed for bcachefs, which takes too many btree node locks for
lockdep to track. Instead, we have a single lockdep_map for "btree_trans
has any btree nodes locked", which makes more since given that we have
centralized lock management and a cycle detector.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
proper lock ordering is: fs_reclaim -> btree node locks
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a field for tracking whether a transaction object holds btree locks,
and assertions to verify state.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In the key cache fill path, we use path_upgrade() on a path that isn't
uptodate yet but should be locked.
This change makes bch2_btree_path_upgrade() slightly looser so we can
use it in key cache upgrade, instead of the __ version.
Also, make the related assert - that path->uptodate implies nodes_locked
- slightly clearer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Factor out slowpath into a separate helper
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Long form version of bch2_btree_path_to_text() - useful in error
messages and tracepoints.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
dropping read locks in bch2_btree_node_lock_write_nofail() dates from
before we had the cycle detector; we can now tell the cycle detector
directly when taking a lock may not fail because we can't handle
transaction restarts.
This is needed for adding should_be_locked asserts.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If a path doesn't have any active references, we shouldn't downgrade it;
it'll either be reused, possibly with intent refs again, or dropped at
bch2_trans_begin() time.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fixes: e6a2566f7a00 ("bcachefs: Better journal tracepoints")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: smatch
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We now include backtraces for every thread involved in the cycle.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- Some tweaks to greatly reduce locking overhead for the list of btree
transactions, so that it can always be enabled: leave btree_trans
objects on the list when they're on the percpu single item freelist,
and only check for duplicates in the same process when
CONFIG_BCACHEFS_DEBUG is enabled
- don't zero out the full btree_trans() unless we allocated it from
the mempool
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Upcoming patches are going to be changing trans->paths to a
reallocatable buffer. We need to guard against use after free when it's
used by other threads; this introduces RCU protection to those paths and
changes them to check for trans->paths == NULL
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
path->idx is now a code smell: we should be using path_idx_t, since it's
stable across btree path reallocation.
This is also a bit faster, using the same loop counter vs. fetching
path->idx from each path we iterate over.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
These were for extra info in tracepoints for debugging a specialized
issue - we do not want to bloat btree_path for this, at least in release
builds.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In the CI, we're seeing tests failing due to excessive would_deadlock
transaction restarts - the tracepoint now includes the lock cycle that
occured.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The SRCU read lock that btree_trans takes exists to make it safe for
bch2_trans_relock() to deref pointers to btree nodes/key cache items we
don't have locked, but as a side effect it blocks reclaim from freeing
those items.
Thus, it's important to not hold it for too long: we need to
differentiate between bch2_trans_unlock() calls that will be only for a
short duration, and ones that will be for an unbounded duration.
This introduces bch2_trans_unlock_long(), to be used mainly by the data
move paths.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We should only be downgrading locks on success - otherwise, our
transaction restarts won't be getting the correct locks and we'll
livelock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
clang had a few more warnings about enum conversion, and also didn't
like the opts.c initializer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- endianness fixes
- mark some things static
- fix a few __percpu annotations
- fix silent enum conversions
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes a deadlock:
01305 WARNING: possible circular locking dependency detected
01305 6.3.0-ktest-gf4de9bee61af #5305 Tainted: G W
01305 ------------------------------------------------------
01305 cat/14658 is trying to acquire lock:
01305 ffffffc00982f460 (fs_reclaim){+.+.}-{0:0}, at: __kmem_cache_alloc_node+0x48/0x278
01305
01305 but task is already holding lock:
01305 ffffff8011aaf040 (&lock->wait_lock){+.+.}-{2:2}, at: bch2_check_for_deadlock+0x4b8/0xa58
01305
01305 which lock already depends on the new lock.
01305
01305
01305 the existing dependency chain (in reverse order) is:
01305
01305 -> #2 (&lock->wait_lock){+.+.}-{2:2}:
01305 _raw_spin_lock+0x54/0x70
01305 __six_lock_wakeup+0x40/0x1b0
01305 six_unlock_ip+0xe8/0x248
01305 bch2_btree_key_cache_scan+0x720/0x940
01305 shrink_slab.constprop.0+0x284/0x770
01305 shrink_node+0x390/0x828
01305 balance_pgdat+0x390/0x6d0
01305 kswapd+0x2e4/0x718
01305 kthread+0x184/0x1a8
01305 ret_from_fork+0x10/0x20
01305
01305 -> #1 (&c->lock#2){+.+.}-{3:3}:
01305 __mutex_lock+0x104/0x14a0
01305 mutex_lock_nested+0x30/0x40
01305 bch2_btree_key_cache_scan+0x5c/0x940
01305 shrink_slab.constprop.0+0x284/0x770
01305 shrink_node+0x390/0x828
01305 balance_pgdat+0x390/0x6d0
01305 kswapd+0x2e4/0x718
01305 kthread+0x184/0x1a8
01305 ret_from_fork+0x10/0x20
01305
01305 -> #0 (fs_reclaim){+.+.}-{0:0}:
01305 __lock_acquire+0x19d0/0x2930
01305 lock_acquire+0x1dc/0x458
01305 fs_reclaim_acquire+0x9c/0xe0
01305 __kmem_cache_alloc_node+0x48/0x278
01305 __kmalloc_node_track_caller+0x5c/0x278
01305 krealloc+0x94/0x180
01305 bch2_printbuf_make_room.part.0+0xac/0x118
01305 bch2_prt_printf+0x150/0x1e8
01305 bch2_btree_bkey_cached_common_to_text+0x170/0x298
01305 bch2_btree_trans_to_text+0x244/0x348
01305 print_cycle+0x7c/0xb0
01305 break_cycle+0x254/0x528
01305 bch2_check_for_deadlock+0x59c/0xa58
01305 bch2_btree_deadlock_read+0x174/0x200
01305 full_proxy_read+0x94/0xf0
01305 vfs_read+0x15c/0x3a8
01305 ksys_read+0xb8/0x148
01305 __arm64_sys_read+0x48/0x60
01305 invoke_syscall.constprop.0+0x64/0x138
01305 do_el0_svc+0x84/0x138
01305 el0_svc+0x34/0x80
01305 el0t_64_sync_handler+0xb0/0xb8
01305 el0t_64_sync+0x14c/0x150
01305
01305 other info that might help us debug this:
01305
01305 Chain exists of:
01305 fs_reclaim --> &c->lock#2 --> &lock->wait_lock
01305
01305 Possible unsafe locking scenario:
01305
01305 CPU0 CPU1
01305 ---- ----
01305 lock(&lock->wait_lock);
01305 lock(&c->lock#2);
01305 lock(&lock->wait_lock);
01305 lock(fs_reclaim);
01305
01305 *** DEADLOCK ***
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes a spurious assert in the btree node read path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new helper for the common pattern of:
- trans_unlock()
- do something
- trans_relock()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_btree_trans_to_text() is used on btree_trans objects that are owned
by different threads - when printing out deadlock cycles - so we need a
safe version of trans_for_each_path(), else we race with seeing a
btree_path that was just allocated and not fully initialized:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
six_lock_pcpu_alloc() is an unsafe interface: it's not safe to allocate
or free the percpu reader count on an existing lock that's in use, the
only safe time to allocate percpu readers is when the lock is first
being initialized.
This patch adds a flags parameter to six_lock_init(), and instead of
six_lock_pcpu_free() we now expose six_lock_exit(), which does the same
thing but is less likely to be misused.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This moves a helper out of the bcachefs code that shouldn't have been
there, since it touches six lock internals.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes some confusion in the lockdep code due to initializing btree
node/key cache locks with the same lockdep key, but different names.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a new helper, bch2_trans_mutex_lock(), for locking a mutex -
dropping and retaking btree locks as needed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we unlock in order to submit IO, the next relock event is likely to
fail if submit_bio() blocked - we shouldn't those events in our _fail
stats, since those are expected events and shouldn't cause test
failures.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This uses the new _ip() interface to six locks and hooks it up to
btree_path->ip_allocated, when available.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- Marking a non-static function as inline doesn't actually work and is
now causing problems - drop that
- Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages
with bcachefs (filesystem name)
- Userspace doesn't have real percpu variables (maybe we can get this
fixed someday), put an #ifdef around bch2_disk_reservation_add()
fastpath
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|