summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2023-10-22bcachefs: Improve some fsck error messagesKent Overstreet
We have string names for d_type; use it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Go emergency RO when i_blocks underflowsKent Overstreet
This improves some of our warnings and assertions - they imply possible filesystem inconsistencies, so they should be calling bch2_fs_inconsistent(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Ensure sysfs show fns print a newlineKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill old rebuild_replicas optionKent Overstreet
This option was useful when the replicas mechism was new and still being debugged, but hasn't been used in ages - let's delete it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: In fsck, pass BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE when deleting ↵Kent Overstreet
dirents A user reported an error where we hit an assertion due to deleting a key in an internal snapshot node, when deleting a dirent that points to a nonexisting inode. We try to avoid doing updates to keys for internal snapshot nodes, but upon inspection of the places where we remove dirents in fsck it appears BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE is correct for all of them: either the target dirent doesn't exist, or it's a directory with multiple dirents pointing to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix for getting stuck in journal replayKent Overstreet
In journal replay, we weren't immediately dropping journal pins when we start doing updates that ewern't from journal replay - leading to journal reclaim getting stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve error logging in fsck.cKent Overstreet
This adds error logging to a bunch of functions in fsck.c - in fsck, reduntant error messages is probably better than not enough. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix inode_backpointer_exists()Kent Overstreet
If the dirent an inode points to doesn't exist, we shouldn't be returning an error - just 0/false. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve bch2_lru_delete() error messagesKent Overstreet
When we detect a filesystem inconsistency, we should include the relevent keys in the error message. This patch adds a parameter to pass the key with the lru entry to bch2_lru_delete(), so that it can be printed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Introduce bch2_journal_keys_peek_(upto|slot)()Kent Overstreet
When many journal replay keys have been overwritten, bch2_journal_keys_peek() was taking excessively long to scan before it found a key to return. Fix this by introducing bch2_journal_keys_peek_upto() which takes a parameter for the end of the range we want, so that we can terminate the search much sooner, and replace all uses of bch2_journal_keys_peek() with peek_upto() or peek_slot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve error message when alloc key doesn't match lru entryKent Overstreet
Error messages should always print out the full key when available - this gives us a starting point when looking through the journal to debug what went wrong. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Ensure buckets have io_time[READ] setKent Overstreet
It's an error if a bucket is in state BCH_DATA_cached but not on the LRU btree - i.e io_time[READ] == 0 - so, make sure it's set before adding it. Also, make some of the LRU code a bit clearer and more direct. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use bch2_trans_inconsistent_on() in more placesKent Overstreet
This gets us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve bch2_open_buckets_to_text()Kent Overstreet
This patch updates bch2_open_buckets_to_text() to include the device and bucket the open_bucket owns. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix CPU usage in journal read pathKent Overstreet
In journal_entry_add(), we were repeatedly scanning the journal entries radix tree to scan for old entries that can be freed, with O(n^2) behaviour. This patch tweaks things to remember the previous last_seq, so we don't have to scan for entries to free from the start. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix a null ptr derefKent Overstreet
We start doing allocations before the GC thread is created, which means we need to check for that to avoid a null ptr deref. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't trigger extra assertions in journal replayKent Overstreet
We now pass a rw argument to .key_invalid methods so they can trigger assertions for updates but not on existing keys. We shouldn't trigger these extra assertions in journal replay - this patch changes the transaction commit path accordingly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Minor device removal fixesKent Overstreet
- We weren't clearing the LRU btree - bch2_alloc_read() runs before bch2_check_alloc_key() deletes alloc keys for devices/buckets that don't exists, so it needs to check for that - bch2_check_lrus() needs to check that buckets exists - improve some error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix a few warnings on 32 bitKent Overstreet
These showed up when building for mips. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_btree_delete_extent_at()Kent Overstreet
New helper, for deleting extents. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't skip triggers in fcollapse()Kent Overstreet
With backpointers this doesn't work anymore - backpointers always need to be updated to point to the new extent position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Initialize ec work structs earlyKent Overstreet
We need to ensure that work structs in bch_fs always get initialized - otherwise an error in filesystem initialization can pop a warning in the workqueue code when we try to cancel a work struct that wasn't initialized. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use a genradix for reading journal entriesKent Overstreet
Previously, the journal read path used a linked list for storing the journal entries we read from disk. But there's been a bug that's been causing journal_flush_delay to incorrectly be set to 0, leading to far more journal entries than is normal being written out, which then means filesystems are no longer able to start due to the O(n^2) behaviour of inserting into/searching that linked list. Fix this by switching to a radix tree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Refactor journal_keys_sort() to return an error codeKent Overstreet
When there weren't any keys in the journal there's no need to allocate the buffer - but doing that causes a spurious -ENOMEM. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fold bucket_state in to BCH_DATA_TYPES()Kent Overstreet
Previously, we were missing accounting for buckets in need_gc_gens and need_discard states. This matters because buckets in those states need other btree operations done before they can be used, so they can't be conuted when checking current number of free buckets against the allocation watermark. Also, we weren't directly counting free buckets at all. Now, data type 0 == BCH_DATA_free, and free buckets are counted; this means we can get rid of the separate (poorly defined) count of unavailable buckets. This is a new on disk format version, with upgrade and fsck required for the accounting changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add a sysfs attr for triggering discardsKent Overstreet
We're currently debugging an issue with discards not getting run; this patch adds a manual trigger so we can then watch the tracepoint while it runs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Topology repair fixesKent Overstreet
- We were failing to start topology repair, because we hadn't set the superblock flag indicating it needed to run - set_node_min() forget to update the btree node's key - bch2_gc_alloc_reset() didn't reset data type, leading to inserting an invalid key that was empty but had nonzero data type Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use bch2_trans_inconsistent() moreKent Overstreet
This gets us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Move alloc assertion to .key_invalid()Kent Overstreet
.key_invalid is a better place for this assertion. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve btree_bad_header()Kent Overstreet
In the future printbufs will be mempool-ified, so we shouldn't be using more than one at a time if we don't have to. This also fixes an extra trailing newline. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Check for read_time == 0 in bch2_alloc_v4_invalid()Kent Overstreet
We've been seeing this error in fsck and we weren't able to track down where it came from - but now that .key_invalid methods take a rw argument, we can safely check for this. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: fsck: Work around transaction restartsKent Overstreet
In check_extents() and check_dirents(), we're working towards only handling transaction restarts in one place, at the top level - but we're not there yet. check_i_sectors() and check_subdir_count() handle transaction restarts locally, which means the iterator for the dirent/extent is left unlocked (should_be_locked == 0), leading to asserts popping when we go to do updates. This patch hacks around this for now, until we can delete the offending code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add rw to .key_invalid()Kent Overstreet
This adds a new parameter to .key_invalid() methods for whether the key is being read or written; the idea being that methods can do more aggressive checks when a key is newly created and being written, when we wouldn't want to delete the key because of those checks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: More improvements for alloc info checksKent Overstreet
- Move checks for whether the device & bucket are valid from the .key_invalid method to bch2_check_alloc_key(). This is because .key_invalid() is called on keys that may no longer exist (post journal replay), which is a problem when removing/resizing devices. - We weren't checking the need_discard btree to ensure that every set bucket has a corresponding alloc key. This refactors the code for checking the freespace btree, so that it now checks both. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Silence spurious copygc err when shutting downKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Convert .key_invalid methods to printbufsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Gap buffer for journal keysKent Overstreet
Btree updates before we go RW work by inserting into the array of keys that journal replay will insert - but inserting into a flat array is O(n), meaning if btree_gc needs to update many alloc keys, we're O(n^2). Fortunately, the updates btree_gc does happens in sequential order, which means a gap buffer works nicely here - this patch implements a gap buffer for journal keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't normalize to pages in btree cache shrinkerKent Overstreet
This behavior dates from the early, early days of bcache, and upon further delving appears to not make any sense. The shrinker only works in terms of 'objects' of unknown size; normalizing to pages only had the effect of changing the batch size, which we could do directly - if we wanted; we probably don't. Normalizing to pages meant our batch size was very small, which seems to have been keeping us from doing as much shrinking as we should be under heavy memory pressure; this patch appears to alleviate some OOMs we've been seeing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add a tracepoint for superblock writesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: gc mark fn fixes, cleanupsKent Overstreet
mark_stripe_bucket() was busted; it was using @new unitialized. Also, clean up all the gc mark functions, and convert them to the same style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't write partially-initialized superblocksKent Overstreet
This neatly avoids bugs where we fail partway through initializing a new filesystem, if we just don't write out partly-initialized state. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve read_from_stale_dirty_pointer() messageKent Overstreet
With printbufs, it's now easy to build up multi-line log messages and emit them with one call, which is good because it prevents multiple multi-line log messages from getting Interspersed in the log buffer; this patch also improves the formatting and converts it to latest style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use crc_is_compressed()Kent Overstreet
Trivial cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix pr_buf() callsKent Overstreet
In a few places we were passing a variable to pr_buf() for the format string - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill struct bucket_markKent Overstreet
This switches struct bucket to using a lock, instead of cmpxchg. And now that the protected members no longer need to fit into a u64, we can expand the sector counts to 32 bits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill main in-memory bucket arrayKent Overstreet
All code using the in-memory bucket array, excluding GC, has now been converted to use the alloc btree directly - so we can finally delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_dev_usage_update() no longer depends on bucket_markKent Overstreet
This is one of the last steps in getting rid of the main in-memory bucket array. This changes bch2_dev_usage_update() to take bkey_alloc_unpacked instead of bucket_mark, and for the places where we are in fact working with bucket_mark and don't have bkey_alloc_unpacked, we add a wrapper that takes bucket_mark and converts to bkey_alloc_unpacked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fsck for need_discard & freespace btreesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New bucket invalidate pathKent Overstreet
In the old allocator code, preparing an existing empty bucket was part of the same code path that invalidated buckets containing cached data. In the new allocator code this is no longer the case: the main allocator path finds empty buckets (via the new freespace btree), and can't allocate buckets that contain cached data. We now need a separate code path to invalidate buckets containing cached data when we're low on empty buckets, which this patch implements. When the number of free buckets decreases that triggers the new invalidate path to run, which uses the LRU btree to pick cached data buckets to invalidate until we're above our watermark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New discard implementationKent Overstreet
In the old allocator code, buckets would be discarded just prior to being used - this made sense in bcache where we were discarding buckets just after invalidating the cached data they contain, but in a filesystem where we typically have more free space we want to be discarding buckets when they become empty. This patch implements the new behaviour - it checks the need_discard btree for buckets awaiting discards, and then clears the appropriate bit in the alloc btree, which moves the buckets to the freespace btree. Additionally, discards are now enabled by default. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>