summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-22bcachefs: Convert .key_invalid methods to printbufsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Gap buffer for journal keysKent Overstreet
Btree updates before we go RW work by inserting into the array of keys that journal replay will insert - but inserting into a flat array is O(n), meaning if btree_gc needs to update many alloc keys, we're O(n^2). Fortunately, the updates btree_gc does happens in sequential order, which means a gap buffer works nicely here - this patch implements a gap buffer for journal keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't normalize to pages in btree cache shrinkerKent Overstreet
This behavior dates from the early, early days of bcache, and upon further delving appears to not make any sense. The shrinker only works in terms of 'objects' of unknown size; normalizing to pages only had the effect of changing the batch size, which we could do directly - if we wanted; we probably don't. Normalizing to pages meant our batch size was very small, which seems to have been keeping us from doing as much shrinking as we should be under heavy memory pressure; this patch appears to alleviate some OOMs we've been seeing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add a tracepoint for superblock writesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: gc mark fn fixes, cleanupsKent Overstreet
mark_stripe_bucket() was busted; it was using @new unitialized. Also, clean up all the gc mark functions, and convert them to the same style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't write partially-initialized superblocksKent Overstreet
This neatly avoids bugs where we fail partway through initializing a new filesystem, if we just don't write out partly-initialized state. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve read_from_stale_dirty_pointer() messageKent Overstreet
With printbufs, it's now easy to build up multi-line log messages and emit them with one call, which is good because it prevents multiple multi-line log messages from getting Interspersed in the log buffer; this patch also improves the formatting and converts it to latest style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use crc_is_compressed()Kent Overstreet
Trivial cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix pr_buf() callsKent Overstreet
In a few places we were passing a variable to pr_buf() for the format string - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill struct bucket_markKent Overstreet
This switches struct bucket to using a lock, instead of cmpxchg. And now that the protected members no longer need to fit into a u64, we can expand the sector counts to 32 bits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill main in-memory bucket arrayKent Overstreet
All code using the in-memory bucket array, excluding GC, has now been converted to use the alloc btree directly - so we can finally delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_dev_usage_update() no longer depends on bucket_markKent Overstreet
This is one of the last steps in getting rid of the main in-memory bucket array. This changes bch2_dev_usage_update() to take bkey_alloc_unpacked instead of bucket_mark, and for the places where we are in fact working with bucket_mark and don't have bkey_alloc_unpacked, we add a wrapper that takes bucket_mark and converts to bkey_alloc_unpacked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fsck for need_discard & freespace btreesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New bucket invalidate pathKent Overstreet
In the old allocator code, preparing an existing empty bucket was part of the same code path that invalidated buckets containing cached data. In the new allocator code this is no longer the case: the main allocator path finds empty buckets (via the new freespace btree), and can't allocate buckets that contain cached data. We now need a separate code path to invalidate buckets containing cached data when we're low on empty buckets, which this patch implements. When the number of free buckets decreases that triggers the new invalidate path to run, which uses the LRU btree to pick cached data buckets to invalidate until we're above our watermark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New discard implementationKent Overstreet
In the old allocator code, buckets would be discarded just prior to being used - this made sense in bcache where we were discarding buckets just after invalidating the cached data they contain, but in a filesystem where we typically have more free space we want to be discarding buckets when they become empty. This patch implements the new behaviour - it checks the need_discard btree for buckets awaiting discards, and then clears the appropriate bit in the alloc btree, which moves the buckets to the freespace btree. Additionally, discards are now enabled by default. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill allocator threads & freelistsKent Overstreet
Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Freespace, need_discard btreesKent Overstreet
This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: KEY_TYPE_alloc_v4Kent Overstreet
This introduces a new alloc key which doesn't use varints. Soon we'll be adding backpointers and storing them in alloc keys, which means our pack/unpack workflow for alloc keys won't really work - we'll need to be mutating alloc keys in place. Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that converts older types of alloc keys to v4 if needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: LRU btreeKent Overstreet
This implements new persistent LRUs, to be used for buckets containing cached data, as well as stripes ordered by time when a block became empty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: KEY_TYPE_setKent Overstreet
A new empty key type, to be used when using a btree as a set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch_sb_field_journal_v2Kent Overstreet
Add a new superblock field which represents journal buckets as ranges: also move code for the superblock journal fields to journal_sb.c. This also reworks the code for resizing the journal to write the new superblock before using the new journal buckets, and thus be a bit safer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Run btree updates after write out of write_pointKent Overstreet
In the write path, after the write to the block device(s) complete we have to punt to process context to do the btree update. Instead of using the work item embedded in op->cl, this patch switches to a per write-point work item. This helps with two different issues: - lock contention: btree updates to the same writepoint will (usually) be updating the same alloc keys - context switch overhead: when we're bottlenecked on btree updates, having a thread (running out of a work item) checking the write point for completed ops is cheaper than queueing up a new work item and waking up a kworker. In an arbitrary benchmark, 4k random writes with fio running inside a VM, this patch resulted in a 10% improvement in total iops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_btree_update_start() refactoringKent Overstreet
This simplifies the logic in bch2_btree_update_start() a bit, handling the unlock/block logic more locally. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Introduce a separate journal watermark for copygcKent Overstreet
Since journal reclaim -> btree key cache flushing may require the allocation of new btree nodes, it has an implicit dependency on copygc in order to make forward progress - so we should avoid blocking copygc unless the journal is really close to full. This introduces watermarks to replace our single MAY_GET_UNRESERVED bit in the journal, and adds a watermark for copygc and plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Copygc allocations shouldn't be nowaitKent Overstreet
We don't actually want copygc allocations to be nowait - an allocation for copygc might fail and then later succeed due to a bucket needing to wait on journal commit, or to be discarded. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix bch2_journal_pin_set()Kent Overstreet
When bch2_journal_pin_set() is updating an existing pin, we shouldn't call bch2_journal_reclaim_fast() after dropping the old pin and before dropping the new pin - that could reclaim the entry we're trying to pin. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: x-macroize alloc_reserve enumKent Overstreet
This makes an array of strings available, like our other enums. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Run overwrite triggers before insertKent Overstreet
For backpointers, we'll need to delete old backpointers before adding new backpointers - otherwise we'll run into spurious duplicate backpointer errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Move deletion of refcount=0 indirect extents to their triggersKent Overstreet
For backpointers, we need to switch the order triggers are run in: we need to run triggers for deletions/overwrites before triggers for inserts. To avoid breaking the reflink triggers, this patch moves deleting of indirect extents with refcount=0 to their triggers, instead of doing it when we update those keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve bch2_bkey_ptrs_to_text()Kent Overstreet
Print bucket:offset when the filesystem is online; this makes debugging easier when correlating with alloc updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_trans_log_msg()Kent Overstreet
Add a new helper for logging messages to the journal - a new debugging tool, an alternative to trace_printk(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use darray for extra_journal_entriesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: btree_path_make_mut() clears should_be_lockedKent Overstreet
This fixes a bug where __bch2_btree_node_update_key() wasn't clearing should_be_locked, leading to bch2_btree_path_traverse() always failing - all callers of btree_path_make_mut() want should_be_locked cleared, so do it there. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add a missing btree_path_set_dirty() callsKent Overstreet
bch2_btree_iter_next_node() was mucking with other btree_path state without setting path->update to be consistent with the fact that the path is very much no longer uptodate - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix error path in bch2_snapshot_set_equiv()Kent Overstreet
We weren't properly catching errors from snapshot_live() - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Work around a journal self-deadlockKent Overstreet
bch2_journal_space_available -> bch2_journal_halt() self deadlocks on journal lock; work around this by dropping/retaking journal lock before we call bch2_fatal_error(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Heap code fixKent Overstreet
When deleting an entry from a heap that was at entry h->used - 1, we'd end up calling heap_sift() on an entry outside the heap - the entry we just removed - which would end up re-adding it to the heap and deleting something we didn't want to delete. Oops... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix an unitialized var warning in userspaceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add printf format attribute to bch2_pr_buf()Kent Overstreet
This tells the compiler to check printf format strings, and catches a few bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Reset journal flush delay to default value if zeroedKent Overstreet
We've been seeing a very strange bug where journal flush & reclaim delay end up getting inexplicably zeroed, in the superblock. We're now validating all the options in bch2_validate_super(), and 0 is no longer a valid value for those options, but we need to be careful not to prevent people's filesystems from mounting because of the new validation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Change journal_io.c assertion to error messageKent Overstreet
Something funny is going on with the new code for restoring the journal write point, and it's hard to reproduce. We do want to debug this because resuming writing to the journal in the wrong spot could be something serious. For now, replace the assertion with an error message and revert to old behaviour when it happens. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Make minimum journal_flush_delay nonzeroKent Overstreet
We're seeing a very strange bug where journal_flush_delay sometimes gets set to 0 in the superblock. Together with the preceding patch, this should help us track it down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Better superblock opt validationKent Overstreet
This moves validation of superblock options to bch2_sb_validate(), so they'll be checked in the write path as well. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: x-macro metadata version enumKent Overstreet
Now we've got strings for metadata versions - this changes bch2_sb_to_text() and our mount log message to use it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix large key cache keysKent Overstreet
Previously, we'd go into an infinite loop when attempting to cache a bkey in the key cache larger than 128 u64s - since we were only using a u8 for the size field, it'd get rounded up to 256 then truncated to 0. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Convert some WARN_ONs to WARN_ON_ONCEKent Overstreet
These warnings are symptomatic of something else going wrong, we don't want them spamming up the logs as that'll make it harder to find the real issue. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Restore journal write point at startupKent Overstreet
This patch tweaks the journal recovery path so that we start writing right after where we left off, instead of the next empty bucket. This is partly prep work for supporting zoned devices, but it's also good to do in general to avoid the journal completely filling up and getting stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: darraysKent Overstreet
Inspired by CCAN darray - simple, stupid resizable (dynamic) arrays. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix BTREE_TRIGGER_WANTS_OLD_AND_NEWKent Overstreet
BTREE_TRIGGER_WANTS_OLD_AND_NEW didn't work correctly when the old and new key were both alloc keys, but different versions - it required old and new key type to be identical, and this bug is a problem for the new allocator rewrite. This patch fixes it by checking if the old and new key have the same trigger functions - the different versions of alloc (and inode) keys have the same trigger functions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Move trigger fns to bkey_opsKent Overstreet
This replaces the switch statements in bch2_mark_key(), bch2_trans_mark_key() with new bkey methods - prep work for the next patch, which fixes BTREE_TRIGGER_WANTS_OLD_AND_NEW. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>