summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-22bcachefs: data jobs, including rebalance wait for copygc.Daniel Hill
move_ratelimit() now has a bool that specifies whether we want to wait for copygc to finish. When copygc is running, we're probably low on free buckets instead of consuming the remaining buckets, we want to wait for copygc to finish. This should help with performance, and run away bucket fragmentation. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Redo data_update interfaceKent Overstreet
This patch significantly cleans up and simplifies the data_update interface. Instead of only being able to specify a single pointer by device to rewrite, we're now able to specify any or all of the pointers in the original extent to be rewrited, as a bitmask. data_cmd is no more: the various pred functions now just return true if the extent should be moved/updated. All the data_update path does is rewrite existing replicas, or add new ones. This fixes a bug where with background compression on replicated filesystems, where rebalance -> data_update would incorrectly drop the wrong old replica, and keep trying to recompress an extent pointer and each time failing to drop the right replica. Oops. Now, the data update path doesn't look at the io options to decide which pointers to keep and which to drop - it only goes off of the data_update_options passed to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix bch2_check_alloc_key()Kent Overstreet
bch2_check_alloc_key() was failing to check buckets that didn't have alloc keys yet (because they'd never been used) - they still need to be added to the freespace btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve bch2_check_alloc_infoKent Overstreet
- In check_alloc_key(), previously we were re-initializing iterators for the need_discard and freespace btrees for every alloc key we checked. But this was causing us to redo lookups into the journal keys every time, since those lookups are cached in struct btree_iter. This initializes the iterators in bch2_check_alloc_info and passes them into check_alloc_key(). - Make the looping more consistent/efficient in bch2_check_alloc_info() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use BTREE_INSERT_LAZY_RW in bch2_check_alloc_info()Kent Overstreet
This runs before we go rw for journal replay, but after we're allowed to go rw. It might be time to consider killing BTREE_INSERT_LAZY_RW, though. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Bucket invalidate path improvementsKent Overstreet
- invalidate_one_bucket() now returns 1 when we don't have any buckets on this device to invalidate, ensuring we don't spin - the tracepoint invocation is moved to after the transaction commit, and we now include the number of cached sectors in the tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't BUG_ON() inode link count underflowKent Overstreet
This switches that assertion to a bch2_trans_inconsistent() call, as it should be. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Always descend to leaf nodes it btree_gcKent Overstreet
If a btree node is unreadable, it's the topology repair that fixes that and it's kicked off by btree_gc, so btree_gc needs to touch every node and very that they can be read. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: fix __dev_available().Daniel Hill
__dev_available() now calculates available buckets correctly. Previously it would almost always return 0 when we have cached data. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix assertion in topology repairKent Overstreet
If we were at the end of the node, when breaking out of the loop we'd pop the assertion on line 446 when cur wasn't NULL. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Make verbose option settable at runtimeKent Overstreet
-o verbose is very useful, and we're starting to use it more for runtime debug statements - making it possible to enable at runtime is a no brainer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve "copygc requested to run" error messageKent Overstreet
This improves the "copygc requested to run but no buckets found" to show the device that requires copygc to be run on - we'll definitely need to improve this more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Pull out data_update.cKent Overstreet
This is the start of reorganizing the data IO paths. The plan is to also break apart io.c into data_read.c and data_write.c, and migrate_write will be renamed to the data_update path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out dev_buckets_free()Kent Overstreet
Previously, dev_buckets_available() only counted buckets that are eligible to be allocated right now - i.e. buckets that don't have cached data, or need discard, or need gc gens, etc. But most users of this function want to know how many buckets are eligible to be allocated from without moving data around - copygc, allocator striping, which means we should be including cached data buckets etc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: btree key cache pcpu freedlistKent Overstreet
Originally, the btree key cache code would always allocate new entries by reusing from the recently-freed list, if that list wasn't empty. But that behaviour was dropped, for lock contention reasons. But it seems that entries stranded on the freed list have been contributing to some of our oom issues, because long running btree transactions will prevent them from being freed. This patch re-adds allocating from the freed list, but it also adds percpu buffers to solve the lock contention issues - and the new percpu freed lists will improve the evict paths, too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Make IO in flight by copygc/rebalance configurableKent Overstreet
This adds a new option, move_bytes_in_flight, for configuring the amount of IO in flight by copygc/rebalance - users with many devices in their filesystem will want to increase this. In the future we should be smarter about this, but this is an easy improvement. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Check for extents with too many ptrsKent Overstreet
We have a hardcoded maximum on number of pointers in an extent that's used by some other data structures - notably bch_devs_list - but we weren't actually checking for it. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix refcount leak in bch2_do_invalidates()Kent Overstreet
If we fail to queue the work item because it's already in process, we need to drop the ref we just took. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Always use percpu_ref_tryget_live() on c->writesKent Overstreet
If we're trying to get a ref and the refcount has been killed, it means we're doing an emergency shutdown - we always want tryget_live(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve checksum error messagesKent Overstreet
We're seeing checksum errors in the bch2_rechecksum_bio() path - give it a better error message to help track this down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve an error messageKent Overstreet
When inserting a key type that's not valid for a given btree, we should print out which btree we were inserting into. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix assertion in bch2_dev_list_add_dev()Kent Overstreet
We were only allowing 4 devices in a dev_list, not 16. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Increase max size for btree_trans bump allocatorKent Overstreet
With backpointers, alloc keys have gotten bigger, so we're needing more memory here. We're probably going to need to go with something more sophisticated than a bump allocator, but - let's see if we can avoid doing that just yet. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add a persistent counter for bucket discardsKent Overstreet
Like the previous patch for bucket invalidates, add another counter for a core allocator path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix btree node read retriesKent Overstreet
b->written wasn't being reset to 0 in the btree node read retry path, causing decrypting & validation of previously read bsets to not be re-run - ouch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add a persistent counter for bucket invalidationKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Call bch2_do_invalidates() when going read writeKent Overstreet
Like bch2_do_discards(), we should check if this needs to be done when going rw. Also, add some sysfs code for debugging bucket invalidation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improved human readable integer parsingKent Overstreet
Printbufs recently switched to using string_get_size() for printing integers in human readable units. This updates __bch2_strtoh() to parse numbers printed by string_get_size() - we now have to handle floating point numbers, and new unit suffixes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix freespace initializationKent Overstreet
bch2_dev_freespace_init() was using __bch2_trans_do() incorrectly, and calling bch2_bucket_do_index() with a stale alloc key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Printbuf reworkKent Overstreet
This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix btree node read error pathKent Overstreet
We were forgetting to clear the read_in_flight flag - oops. This also fixes it to not call bch2_fatal_error() before topology repair has had a chance to do its thing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix btree_and_journal_iterKent Overstreet
We had a bug where btree_and_journal_iter would return the same key twice - after deleting it (perhaps because it was present in both the btree and the journal?) This reworks btree_and_journal_iter to track the current position, much like btree_paths, which makes the logic considerably simpler and more robust. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix for cmd_list_journalKent Overstreet
cmd_list_journal wasn't correctly listing the most recent journal entries as blacklisted - because in the recovery path when just reading the journal, we were failing to add those to the blacklist table. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Also log overwrites in journalKent Overstreet
Lately we've been doing a lot of debugging by looking at the journal to see what was changed, and by what code path. This patch adds a new journal entry type for recording overwrites, so that we don't have to search backwards through the journal to see what was being overwritten in order to work out what the triggers were supposed to be doing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Refactor journal entry addingKent Overstreet
This takes copying the payload out of bch2_journal_add_entry(), which means we can use it for journal_transaction_name() - also prep work for journalling overwrites. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add some missing error messagesKent Overstreet
bch2_opt_parse() was failing to generate error messages in error path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix memory corruption in encryption pathKent Overstreet
When do_encrypt() was passed a vmalloc address and the buffer spanned more than a single page, we were encrypting/decrypting completely different pages than the ones intended. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_trans_reset_updates()Kent Overstreet
Factor out a new helper. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix error checking in bch2_fs_alloc()Kent Overstreet
One of the init calls had a ; instead of a ?:, and errors after that got dropped - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Print message on btree node read retry successKent Overstreet
Right now, we print an error message on btree node read error, and we print that we're retrying, but we don't explicitly say if the retry succeeded - this makes things a little clearer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix journal_keys_search() overheadKent Overstreet
Previously, on every btree_iter_peek() operation we were searching the journal keys, doing a full binary search - which was slow. This patch fixes that by saving our position in the journal keys, so that we only do a full binary search when moving our position backwards or a large jump forwards. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Always print when doing journal replay in fsckKent Overstreet
This logging improvement helps see when the previous fsck pass has completed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Rename group to label for remaining strings.Daniel Hill
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix encryption path on armKent Overstreet
flush_dcache_page() is not a noop on arm, but we were using virt_to_page() instead of vmalloc_to_page() for an address on the kernel stack - vmalloc memory, leading to an oops in flush_dcache_page(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Switch to key_type_user, not logonKent Overstreet
The only difference key_type_logon and key_type_user is that key_type_logon keys can't be read by userspace. However, userspace has actually been adding keys to both the logon and user keychains, because userspace fsck requires the keychain interface - so we might as well just use user and drop the logon keychain. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: LRU repair tweaksKent Overstreet
- Drop old unneeded parameter for whether we're in initial GC - which was from when btree updates had to be done differently before we went RW. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Delete bch_writepageKent Overstreet
Per Dave Chinner and the xfs folks, .writepage is no longer needed, and it's better not to define it if .writepages is the intended path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Make bch_option compatible with Rust ffiBrett Holman
Rust FFI lacks support for unnamed structs and unions. The space saved in bch_option is not enough to be significant. Signed-off-by: Brett Holman <bholman.devel@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Put btree_trans_verify_sorted() behind debug_check_iteratorsKent Overstreet
This is pretty expensive, and we've tested sufficiently with it now that it doesn't need to be on by default. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix extent mergingKent Overstreet
When merging extents, we have to check that we won't overflow size fields in any CRC entries - but the check for this was wrong, because in the loop it was in we weren't keeping a pointer to the (packed, encoded) CRC field. Fix this by moving it to its own loop. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>