summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-22bcachefs: Run bch2_fs_counters_init() earlierKent Overstreet
We need counters to be initialized before initializing shrinkers - the shrinker callbacks will update those counters. This fixes a segfault in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: btree_err() now uses bch2_print_string_as_lines()Kent Overstreet
We've seen long error messages get truncated here, so convert to the new bch2_print_string_as_lines(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve bch2_fsck_err()Kent Overstreet
- factor out fsck_err_get() - if the "bcachefs (%s):" prefix has already been applied, don't duplicate it - convert to printbufs instead of static char arrays - tidy up control flow a bit - use bch2_print_string_as_lines(), to avoid messages getting truncated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_print_string_as_lines()Kent Overstreet
This adds a helper for printing a large buffer one line at a time, to avoid the 1k printk limit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_btree_node_relock_notrace()Kent Overstreet
Most of the node_relock_fail trace events are generated from bch2_btree_path_verify_level(), when debugcheck_iterators is enabled - but we're not interested in these trace events, they don't indicate that we're in a slowpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_btree_cache_scan() improvementKent Overstreet
We're still seeing OOM issues caused by the btree node cache shrinker not sufficiently freeing memory: thus, this patch changes the shrinker to not exit if __GFP_FS was not supplied. Instead, tweak btree node memory allocation so that we never invoke memory reclaim while holding the btree node cache lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix blocking with locks heldKent Overstreet
This is a major oopsy - we should always be unlocking before calling closure_sync(), else we'll cause a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: btree_update_nodes_written() needs BTREE_INSERT_USE_RESERVEKent Overstreet
This fixes an obvious deadlock - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix error handling in bch2_btree_update_start()Kent Overstreet
We were checking for -EAGAIN, but we're not returned that when we didn't pass a closure to wait with - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve bch2_btree_trans_to_text()Kent Overstreet
This is just a formatting/readability improvement. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill normalize_read_intent_locks()Kent Overstreet
Before we had the deadlock cycle detector, we didn't want to be holding read locks when taking intent locks, because blocking on an intent lock while holding a read lock was a lock ordering violation that could cause a deadlock. With the cycle detector this is no longer an issue, so this code can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Ensure bch2_btree_node_lock_write_nofail() never failsKent Overstreet
In order for bch2_btree_node_lock_write_nofail() to never produce a deadlock, we must ensure we're never holding read locks when using it. Fortunately, it's only used from code paths where any read locks may be safely dropped. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Delete old deadlock avoidance codeKent Overstreet
This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Print deadlock cycle in debugfsKent Overstreet
In the event that we're not finished debugging the cycle detector, this adds a new file to debugfs that shows what the cycle detector finds, if anything. By comparing this with btree_transactions, which shows held locks for every btree_transaction, we'll be able to determine if it's the cycle detector that's buggy or something else. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Deadlock cycle detectorKent Overstreet
We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix bch2_btree_node_upgrade()Kent Overstreet
Previously, if we were trying to upgrade from a read to an intent lock but we held an additional read lock via another btree_path, bch2_btree_node_upgrade() would always fail, in six_lock_tryupgrade(). This patch factors out the code that __bch2_btree_node_lock_write() uses to temporarily drop extra read locks, so that six_lock_tryupgrade() can succeed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add a debug assertKent Overstreet
Chasing down a strange locking bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Wakeup now takes lock on behalf of waiterKent Overstreet
This brings back an important optimization, to avoid touching the wait lists an extra time, while preserving the property that a thread is on a lock waitlist iff it is waiting - it is never removed from the waitlist until it has the lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Fix a lost wakeupKent Overstreet
There was a lost wakeup between a read unlock in percpu mode and a write lock. The unlock path unlocks, then executes a barrier, then checks for waiters; correspondingly, the lock side should set the wait bit and execute a barrier, then attempt to take the lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Enable lockdepKent Overstreet
Now that we have lockdep_set_no_check_recursion(), we can enable lockdep checking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Add start_time to six_lock_waiterKent Overstreet
This is needed by the cycle detector in bcachefs - we need a way to iterater over waitlist entries while dropping and retaking the waitlist lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: six_lock_waiter()Kent Overstreet
This allows passing in the wait list entry - to be used for a deadlock cycle detector. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Simplify wait listsKent Overstreet
This switches to a single list of waiters, instead of separate lists for read and intent, and switches write locks to also use the wait lists instead of being handled differently. Also, removal from the wait list is now done by the process waiting on the lock, not the process doing the wakeup. This is needed for the new deadlock cycle detector - we need tasks to stay on the waitlist until they've successfully acquired the lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add private error codes for ENOSPCKent Overstreet
Continuing the saga of introducing private dedicated error codes for each error path, this patch converts ENOSPC to error codes that are subtypes of ENOSPC. We've recently had a test failure where we got -ENOSPC where we shouldn't have, and didn't have enough information to tell where it came from, so this patch will solve that problem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Errcodes can now subtype standard error codesKent Overstreet
The next patch is going to be adding private error codes for all the places we return -ENOSPC. Additionally, this patch updates return paths at all module boundaries to call bch2_err_class(), to return the standard error code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make an assertion more informativeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: All held locks must be in a btree pathKent Overstreet
With the new deadlock cycle detector, it's critical that all held locks be marked in a btree_path, because that's what the cycle detector traverses - any locks that aren't correctly marked will cause deadlocks. This changes the btree_path to allocate some btree_paths for the new nodes, since until the final update is done we otherwise don't have a path referencing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_btree_path_upgrade() now emits transaction restartKent Overstreet
Centralizing the transaction restart/tracepoint in bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits old and new locks_want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a manual trigger for lock wakeupsKent Overstreet
Spotted a lockup once that appeared to be a lost wakeup. Adding a manual trigger for lock wakeups will make it easy to tell if that's what it is next time it occurs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix sb_field_counters formattingKent Overstreet
We have counters with longer names now, so adjust the tabstop - also, make sure there's always a space printed between the name and the number. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Re-enable hash_redo_key()Kent Overstreet
When subvolumes & snapshots were rolled out, hash_redo_key() was disabled due to some new complications - namely, bch2_hash_set() works at the subvolume level, and fsck does not run in a defined subvolume, instead working at the snapshot ID level. This patch splits out bch2_hash_set_snapshot() from bch2_hash_set(), and makes one small tweak for fsck: - Normally, bch2_hash_set() (and other dirent code) needs to know what subvolume we're in, because dirents that point to other subvolumes should only be visible in the subvolume they were created in, not other snapshots. We can't check that in fsck, so we just assume that all dirents are visible. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill journal_keys->journal_seq_baseKent Overstreet
This removes an optimization that didn't actually save us any memory, due to alignment, but did make the code more complicated than it needed to be. We were also seeing a bug where journal_seq_base wasn't getting correctly initailized, so hopefully it'll fix that too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix redundant transaction restartKent Overstreet
Little bit of tidying up, this makes the counters a little bit clearer as to what's happening. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ensure intent locks are marked before taking write locksKent Overstreet
Locks must be correctly marked for the cycle detector to work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Avoid using btree_node_lock_nopath()Kent Overstreet
With the upcoming cycle detector, we have to be careful about using btree_node_lock_nopath - in particular, using it to take write locks can cause deadlocks. All held locks need to be tracked in a btree_path, so that the cycle detector knows about them - unless we know that we cannot cause deadlocks for other reasons: e.g. we are only taking read locks, or we're in very early fsck (topology repair). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix usage of six lock's percpu mode, key cache versionKent Overstreet
Similar to "bcachefs: Fix usage of six lock's percpu mode", six locks have a percpu mode, but we can't switch between percpu and non percpu modes while a lock is in use: threads attempting to take a read lock may race, and we'll end up with the read count permanently off. Fixing this the "correct" way, in six_lock_pcpu_(alloc|free) would require an RCU barrier, and we don't want to do that - instead, we have to permanently segragate percpu and non percpu objects, including when on freelists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor bkey_cached_alloc() pathKent Overstreet
Clean up the arguments passed and make them more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert more locking code to btree_bkey_cached_commonKent Overstreet
Ideally, all the code in btree_locking.c should be converted, but then we'd want to convert btree_path to point to btree_key_cached_common too, and then we'd be in for a much bigger cleanup - but a bit of incremental cleanup will still be helpful for the next patches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: btree_bkey_cached_common->cachedKent Overstreet
Add a type descriptor to btree_bkey_cached_common - there's no reason not to since we've got padding that was otherwise unused, and this is a nice cleanup (and helpful in later patches). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix six_lock_readers_add()Kent Overstreet
Have to be careful with bit fields - when subtracting, this was overflowing into the write_locking bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_btree_node_lock_write_nofail()Kent Overstreet
Taking a write lock will be able to fail, with the new cycle detector - unless we pass it nofail, which is possible but not preferred. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New locking functionsKent Overstreet
In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Mark write locks before taking lockKent Overstreet
six locks are unfair: while a thread is blocked trying to take a write lock, new read locks will fail. The new deadlock cycle detector makes use of our existing lock tracing, so we need to tell it we're holding a write lock before we take the lock for it to work correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Delete time_stats for lock contended timesKent Overstreet
Since we've now got time_stats for lock hold times (per btree transaction), we don't need this anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't leak lock pcpu counts memoryKent Overstreet
This fixes a small memory leak. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Delete six_lock_pcpu_free_rcu()Kent Overstreet
Didn't have any users, and wasn't a good idea to begin with - delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add persistent counters for all tracepointsKent Overstreet
Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_btree_update_start() to return ↵Kent Overstreet
-BCH_ERR_journal_reclaim_would_deadlock On failure to get a journal pre-reservation because we're called from journal reclaim we're not supposed to return a transaction restart error - this fixes a livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve bch2_btree_node_relock()Kent Overstreet
This moves the IS_ERR_OR_NULL() check to the inline part, since that's a fast path event. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve trans_restart_journal_preres_get tracepointKent Overstreet
It now includes journal_flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>