summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_key_cache.c
AgeCommit message (Collapse)Author
2023-10-22bcachefs: Delete redundant tracepointKent Overstreet
We were emitting two trace events on transaction restart in this code path - delete the redundant one. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Btree key cache coherencyKent Overstreet
- Updates to non key cache iterators will now be transparently redirected to the key cache for cached btrees. - Except when creating new keys: then the update goes to underlying btree For for iterating over a cached btree to work, we need to ensure that if a key exists in the key cache, it also exists in the btree - otherwise the iterator code will skip past it and not check the key cache. Otherwise, for consistency, all updates should go to the same place - the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BTREE_ITER_WITH_KEY_CACHEKent Overstreet
This is the start of cache coherency with the btree key cache - this adds a btree iterator flag that causes lookups to also check the key cache when we're iterating over the btree (not iterating over the key cache). Note that we could still race with another thread creating at item in the key cache and updating it, since we aren't holding the key cache locked if it wasn't found. The next patch for the update path will address this by causing the transaction to restart if the key cache is found to be dirty. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve btree_key_cache_flush_pos()Kent Overstreet
btree_key_cache_flush_pos() uses BTREE_ITER_CACHED_NOFILL - but it wasn't checking for !ck->valid. It does check for the entry being dirty, so it shouldn't matter, but this refactor it a bit and adds and assertion. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Tracepoint improvementsKent Overstreet
This improves the transaction restart tracepoints - adding distinct tracepoints for all the locations and reasons a transaction might have been restarted, and ensures that there's a tracepoint for every transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Log & error message improvementsKent Overstreet
- Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work in userspace - We don't need to print the bcachefs: or the filesystem name prefix in userspace - Improve a few error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Switch to __func__for recording where btree_trans was initializedKent Overstreet
Symbol decoding, via %ps, isn't supported in userspace - this will also be faster when we're using trans->fn in the fast path, as with the new BCH_JSET_ENTRY_log journal messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add error messages for memory allocation failuresKent Overstreet
This adds some missing diagnostics from rare but annoying to debug runtime allocation failure paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix some shutdown path bugsKent Overstreet
This fixes some bugs when we hit an error very early in the filesystem startup path, before most things have been initialized. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: BTREE_ITER_FILTER_SNAPSHOTSKent Overstreet
For snapshots, we need to implement btree lookups that return the first key that's an ancestor of the snapshot ID the lookup is being done in - and filter out keys in unrelated snapshots. This patch adds the btree iterator flag BTREE_ITER_FILTER_SNAPSHOTS which does that filtering. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Subvolumes, snapshotsKent Overstreet
This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22Revert "bcachefs: Add more assertions for locking btree iterators out of order"Kent Overstreet
Figured out the bug we were chasing, and it had nothing to do with locking btree iterators/paths out of order. This reverts commit ff08733dd298c969aec7c7828095458f73fd5374. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add more assertions for locking btree iterators out of orderKent Overstreet
btree_path_traverse_all() traverses btree iterators in sorted order, and thus shouldn't see transaction restarts due to potential deadlocks - but sometimes we do. This patch adds some more assertions and tracks some more state to help track this down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: btree_pathKent Overstreet
This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill BTREE_ITER_NEED_PEEKKent Overstreet
This was used for an optimization that hasn't existing in quite awhile - iter->uptodate will probably be going away as well. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Further reduce iter->trans usageKent Overstreet
This is prep work for splitting btree_path out from btree_iter - btree_path will not have a pointer to btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Reduce iter->trans usageKent Overstreet
Disfavoured, and should go away. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill BTREE_INSERT_NOUNLOCKKent Overstreet
With the recent transaction restart changes, it's no longer needed - all transaction commits have BTREE_INSERT_NOUNLOCK semantics. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: trans->restartedKent Overstreet
Start tracking when btree transactions have been restarted - and assert that we're always calling bch2_trans_begin() immediately after transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use bch2_trans_do() in bch2_btree_key_cache_journal_flush()Kent Overstreet
We're working to standardize handling of transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't downgrade in traverse()Kent Overstreet
Downgrading of btree iterators is something that should only happen explicitly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Tighten up btree_iter locking assertionsKent Overstreet
We weren't correctly verifying that we had interior node intent locks - this patch also fixes bugs uncovered by the new assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: BTREE_UPDATE_INTERNAL_SNAPSHOT_NODEKent Overstreet
Add a new flag to control assertions about updating to internal snapshot nodes, that normally should not be written to - to be used in an upcoming patch. Also do some renaming - trigger_flags is now update_flags. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Change bch2_btree_key_cache_count() to exclude dirty keysKent Overstreet
We're seeing livelocks that appear to be due to bch2_btree_key_cache_scan repeatedly scanning and blocking other tasks from using the key cache lock - we probably shouldn't be reporting objects that can't actually be freed yet. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix key cache assertionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix an out of bounds readKent Overstreet
bch2_varint_decode() can read up to 7 bytes past the end of the buffer, which means we need to allocate slightly larger key cache buffers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a deadlock on journal reclaimKent Overstreet
Flushing the btree key cache needs to use allocation reserves - journal reclaim depends on flushing the btree key cache for making forward progress, and the allocator and copygc depend on journal reclaim making forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't flush btree writes more aggressively because of btree key cacheKent Overstreet
We need to flush the btree key cache when it's too dirty, because otherwise the shrinker won't be able to reclaim memory - this is done by journal reclaim. But journal reclaim also kicks btree node writes: this meant that btree node writes were getting kicked much too often just because we needed to flush btree key cache keys. This patch splits journal pins into two different lists, and teaches journal reclaim to not flush btree node writes when it only needs to flush key cache keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a startup raceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Be more careful about JOURNAL_RES_GET_RESERVEDKent Overstreet
JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be done to free up space in the journal. In particular, when we're flushing keys from the key cache, if we're flushing them out of order we shouldn't be using it, since we're using up our remaining space in the journal without dropping a pin that will let us make forward progress. With this patch, BTREE_INSERT_JOURNAL_RECLAIM without BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on journal reclaim if we're already in journal reclaim. This means we need to propagate these errors up to journal reclaim, indicating that flushing a journal pin should be retried in the future. This is prep work for a patch to change the way journal reclaim works, to split out flushing key cache keys because the btree key cache is too dirty from journal reclaim because we need space in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out bpos_cmp() and bkey_cmp()Kent Overstreet
With snapshots, we're going to need to differentiate between comparisons that should and shouldn't include the snapshot field. bpos_cmp is now the comparison function that does include the snapshot field, used by core btree code. Upper level filesystem code generally does _not_ want to compare against the snapshot field - that code wants keys to compare as equal even when one of them is in an ancestor snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: btree key cache locking improvementsKent Overstreet
The btree key cache mutex was becoming a significant bottleneck - it was mainly used to protect the lists of dirty, clean and freed cached keys. This patch eliminates the dirty and clean lists - instead, when we need to scan for keys to drop from the cache we iterate over the rhashtable, and thus we're able to remove most uses of that lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: btree_iter_set_dontneed()Kent Overstreet
This is a bit clearer than using bch2_btree_iter_free(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix locking in bch2_btree_iter_traverse_cached()Kent Overstreet
bch2_btree_iter_traverse() is supposed to ensure we have the correct type of lock - it was downgrading if necessary, but if we entered with a read lock it wasn't upgrading to an intent lock, oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't use BTREE_INSERT_USE_RESERVE so muchKent Overstreet
Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add some cond_rescheds() in shutdown pathKent Overstreet
Particularly on emergency shutdown we can end up having to clean up a lot of dirty cached btree keys here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix some spurious gcc warningsKent Overstreet
These only come up when building in userspace, for some reason. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_trans_get_iter() no longer returns errorsKent Overstreet
Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change a BUG_ON() to a fatal errorKent Overstreet
In the btree key cache code, failing to flush a dirty key is a serious error, but it doesn't need to be a BUG_ON(), we can stop the filesystem instead. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix error in filesystem initializationKent Overstreet
The rhashtable code doesn't like when we destroy an rhashtable that was never initialized Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Move journal reclaim to a kthreadKent Overstreet
This is to make tracing easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ensure journal reclaim runs when btree key cache is too dirtyKent Overstreet
Ensuring the key cache isn't too dirty is critical for ensuring that the shrinker can reclaim memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve btree key cache shrinkerKent Overstreet
The shrinker should start scanning for entries that can be freed oldest to newest - this way, we can avoid scanning a lot of entries that are too new to be freed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a kmem_cache for btree_key_cache objectsKent Overstreet
We allocate a lot of these, and we're seeing sporading OOMs - this will help with tracking those down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a shrinker for the btree key cacheKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix btree key cache shutdownKent Overstreet
On emergency shutdown, we might still have dirty keys in the btree key cache that need to be cleaned up properly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add accounting for dirty btree nodes/keysKent Overstreet
This lets us improve journal reclaim, so that it now tries to make sure no more than 3/4s of the btree node cache and btree key cache are dirty - ensuring the shrinkers can free memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: More inlinining in the btree key cache codeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve tracing for transaction restartsKent Overstreet
We have a bug where we can get stuck with a process spinning in transaction restarts - need more information. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use cached iterators for inode updatesKent Overstreet
This switches inode updates to use cached btree iterators - which should be a nice performance boost, since lock contention on the inodes btree can be a bottleneck on multithreaded workloads. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>