summaryrefslogtreecommitdiff
path: root/fs/bcachefs/recovery.c
AgeCommit message (Collapse)Author
2023-10-22bcachefs: Convert fsck errors to errcode.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use bch2_err_str() in error messagesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: We can handle missing btree roots for all alloc btreesKent Overstreet
We can rebuild alloc info if these btree roots are missing - no need to bail out and say the filesystem is unrecoverable Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix subvol/snapshot deleting in recoveryKent Overstreet
fsck doesn't want to run while we're cleaning up deleted snapshots - if that work needs to be done, we want it to have finished before fsck runs, otherwise fsck will get confused when it finds multiple keys in the same snapshot ID equivalence class (i.e. the mechanism that snapshot deletion uses for cleaning up redundant keys). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Printbuf reworkKent Overstreet
This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix btree_and_journal_iterKent Overstreet
We had a bug where btree_and_journal_iter would return the same key twice - after deleting it (perhaps because it was present in both the btree and the journal?) This reworks btree_and_journal_iter to track the current position, much like btree_paths, which makes the logic considerably simpler and more robust. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix for cmd_list_journalKent Overstreet
cmd_list_journal wasn't correctly listing the most recent journal entries as blacklisted - because in the recovery path when just reading the journal, we were failing to add those to the blacklist table. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix journal_keys_search() overheadKent Overstreet
Previously, on every btree_iter_peek() operation we were searching the journal keys, doing a full binary search - which was slow. This patch fixes that by saving our position in the journal keys, so that we only do a full binary search when moving our position backwards or a large jump forwards. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Always print when doing journal replay in fsckKent Overstreet
This logging improvement helps see when the previous fsck pass has completed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: LRU repair tweaksKent Overstreet
- Drop old unneeded parameter for whether we're in initial GC - which was from when btree updates had to be done differently before we went RW. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix journal_iters_fix()Kent Overstreet
journal_iters_fix() was incorrectly rewinding iterators past keys they had already returned, leading to those keys being double counted in the bch2_gc() path - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Go RW before bch2_check_lrus()Kent Overstreet
btree updates before going RW are expensive if they're in random order, since they use the list of keys for journal replay to insert, which is just a gap buffer. This patch improves the bucket invalidate path so that if bch2_check_lrus() hasn't finished it only prints warnings instead of doing an emergency shutdown, which means we can now set BCH_FS_MAY_GO_RW before bch2_check_lrus(). Also, the filesystem state bits are reorganized a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill old rebuild_replicas optionKent Overstreet
This option was useful when the replicas mechism was new and still being debugged, but hasn't been used in ages - let's delete it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix for getting stuck in journal replayKent Overstreet
In journal replay, we weren't immediately dropping journal pins when we start doing updates that ewern't from journal replay - leading to journal reclaim getting stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Introduce bch2_journal_keys_peek_(upto|slot)()Kent Overstreet
When many journal replay keys have been overwritten, bch2_journal_keys_peek() was taking excessively long to scan before it found a key to return. Fix this by introducing bch2_journal_keys_peek_upto() which takes a parameter for the end of the range we want, so that we can terminate the search much sooner, and replace all uses of bch2_journal_keys_peek() with peek_upto() or peek_slot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix a few warnings on 32 bitKent Overstreet
These showed up when building for mips. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use a genradix for reading journal entriesKent Overstreet
Previously, the journal read path used a linked list for storing the journal entries we read from disk. But there's been a bug that's been causing journal_flush_delay to incorrectly be set to 0, leading to far more journal entries than is normal being written out, which then means filesystems are no longer able to start due to the O(n^2) behaviour of inserting into/searching that linked list. Fix this by switching to a radix tree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Refactor journal_keys_sort() to return an error codeKent Overstreet
When there weren't any keys in the journal there's no need to allocate the buffer - but doing that causes a spurious -ENOMEM. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fold bucket_state in to BCH_DATA_TYPES()Kent Overstreet
Previously, we were missing accounting for buckets in need_gc_gens and need_discard states. This matters because buckets in those states need other btree operations done before they can be used, so they can't be conuted when checking current number of free buckets against the allocation watermark. Also, we weren't directly counting free buckets at all. Now, data type 0 == BCH_DATA_free, and free buckets are counted; this means we can get rid of the separate (poorly defined) count of unavailable buckets. This is a new on disk format version, with upgrade and fsck required for the accounting changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: More improvements for alloc info checksKent Overstreet
- Move checks for whether the device & bucket are valid from the .key_invalid method to bch2_check_alloc_key(). This is because .key_invalid() is called on keys that may no longer exist (post journal replay), which is a problem when removing/resizing devices. - We weren't checking the need_discard btree to ensure that every set bucket has a corresponding alloc key. This refactors the code for checking the freespace btree, so that it now checks both. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Gap buffer for journal keysKent Overstreet
Btree updates before we go RW work by inserting into the array of keys that journal replay will insert - but inserting into a flat array is O(n), meaning if btree_gc needs to update many alloc keys, we're O(n^2). Fortunately, the updates btree_gc does happens in sequential order, which means a gap buffer works nicely here - this patch implements a gap buffer for journal keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill main in-memory bucket arrayKent Overstreet
All code using the in-memory bucket array, excluding GC, has now been converted to use the alloc btree directly - so we can finally delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fsck for need_discard & freespace btreesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill allocator threads & freelistsKent Overstreet
Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Freespace, need_discard btreesKent Overstreet
This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Introduce a separate journal watermark for copygcKent Overstreet
Since journal reclaim -> btree key cache flushing may require the allocation of new btree nodes, it has an implicit dependency on copygc in order to make forward progress - so we should avoid blocking copygc unless the journal is really close to full. This introduces watermarks to replace our single MAY_GET_UNRESERVED bit in the journal, and adds a watermark for copygc and plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_journal_log_msg()Kent Overstreet
This adds bch2_journal_log_msg(), which just logs a message to the journal, and uses it to mark startup and when journal replay finishes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Heap allocate printbufsKent Overstreet
This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Normal update/commit path now works before going RWKent Overstreet
This improves __bch2_trans_commit - early in the recovery process, when we're running btree_gc and before we want to go RW, it now uses bch2_journal_key_insert() to add the update to the list of updates for journal replay to do, instead of btree_gc having to use separate interfaces depending on whether we're running at bringup or, later, runtime. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add .to_text() methods for all superblock sectionsKent Overstreet
This patch improves the superblock .to_text() methods and adds methods for all types that were missing them. It also improves printbufs by allowing them to specfiy what units we want to be printing in, and adds new wrapper methods for unifying our kernel and userspace environments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: opts.read_journal_onlyKent Overstreet
Add an option that tells recovery to only read the journal, to be used by the list_journal command. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Delete some flag bits that are no longer usedKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Print a better message for mark and sweep passKent Overstreet
Btree gc, aka mark and sweep, checks allocations - so let's just print that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: btree_gc no longer uses main in-memory bucket arrayKent Overstreet
This changes the btree_gc code to only use the second bucket array, the one dedicated to GC. On completion, it compares what's in its in memory bucket array to the allocation information in the btree and writes it directly, instead of updating the main in-memory bucket array and writing that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: run_one_trigger() now checks journal keysKent Overstreet
Previously, when doing updates and running triggers before journal replay completes, triggers would see the incorrect key for the old key being overwritten - this patch updates the trigger code to check the journal keys when necessary, needed for the upcoming allocator rewrite. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22Revert "bcachefs: Delete some obsolete journal_seq_blacklist code"Kent Overstreet
This reverts commit f95b61228efd04c9c158123da5827c96e9773b29. It turns out, we're seeing filesystems in the wild end up with blacklisted btree node bsets - this should not be happening, and until we understand why and fix it we need to keep this code around. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Log & error message improvementsKent Overstreet
- Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work in userspace - We don't need to print the bcachefs: or the filesystem name prefix in userspace - Improve a few error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add verbose log messages for journal readKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use kvmalloc() for array of sorted keys in journal replayKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Simplify journal replayKent Overstreet
With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the order we have to replay keys from the journal in, and we can also start up journal reclaim right away - and delete a bunch of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: BTREE_ITER_WITH_JOURNALKent Overstreet
This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is automatically enabled when initializing a btree iterator before journal replay has completed - it overlays the contents of the journal with the btree. This lets us delete bch2_btree_and_journal_walk() and just use the normal btree iterator interface instead - which also lets us delete a significant amount of duplicated code. Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch - we're redoing the binary search over keys in the journal every time we call bch2_btree_iter_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Tweak journal reclaim orderKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Make sure BCH_FS_FSCK_DONE gets setKent Overstreet
If we're not running fsck we still want to set BCH_FS_FSCK_DONE, so that bch2_fsck_err() calls are interpreted as bch2_inconsistent_error() calls(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix race between btree updates & journal replayKent Overstreet
Add a flag to indicate whether a journal replay key has been overwritten, and set/test it with appropriate btree locks held. This fixes a race between the allocator - invalidating buckets, and doing btree updates - and journal replay, which before this patch could clobber the allocator thread's update with an older version of the key from the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_journal_entry_to_text()Kent Overstreet
This adds a _to_text() pretty printer for journal entries - including every subtype - which will shortly be used by the 'bcachefs list_journal' subcommand. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Journal replay does't resort main list of keysKent Overstreet
The upcoming BTREE_ITER_WITH_JOURNAL patch will require journal keys to stay in sorted order, so the btree iterator code can overlay them over btree keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Run scan_old_btree_nodes after version upgradeKent Overstreet
In the recovery path, we scan for old btree nodes if we don't have certain compat bits set. If we do this, we should be doing it after we upgraded to the newest on disk format. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Delete some obsolete journal_seq_blacklist codeKent Overstreet
Since metadata version bcachefs_metadata_version_btree_ptr_sectors_written, we haven't needed the journal seq blacklist mechanism for ignoring blacklisted btree node writes - we now only need it for ignoring journal entries that were written after the newest flush journal entry, and then we only need to keep those blacklist entries around until journal replay is finished. That means we can delete the code for scanning btree nodes to GC journal_seq_blacklist entries. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_journal_key_insert() no longer transfers ownershipKent Overstreet
bch2_journal_key_insert() used to assume that the key passed to it was allocated with kmalloc(), and on success took ownership. This patch deletes that behaviour, making it more similar to bch2_trans_update()/bch2_trans_commit(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't start allocator threads too earlyKent Overstreet
If the allocator threads start before journal replay has finished replaying alloc keys, journal replay might overwrite the allocator's btree updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>