summaryrefslogtreecommitdiff
path: root/fs/bcachefs/movinggc.c
AgeCommit message (Collapse)Author
2025-03-06bcachefs: copygc now skips non-rw devicesKent Overstreet
There's no point in doing copygc on non-rw devices: the fragmentation doesn't matter if we're not writing to them, and we may not have anywhere to put the data on our other devices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-26bcachefs: Fix trace_copygcKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Rename btree_iter_peek_upto() -> btree_iter_peek_max()Kent Overstreet
We'll be introducing btree_iter_peek_prev_min(), so rename for consistency. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: copygc_enabled, rebalance_enabled now opts.h optionsKent Overstreet
They can now be set at mount time Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04bcachefs: Kill alloc_v4.fragmentation_lruKent Overstreet
The fragmentation_lru field hasn't been needed since we reworked the LRU btrees to use the btree write buffer; previously it was used to resolve collisions, but the revised LRU btree uses the backpointer (the bucket) as part of the key. It should have been deleted at the time of the LRU rework; since it wasn't, that left places for bugs to hide, in check/repair. This fixes LRU fsck on a filesystem image helpfully provided by a user who disappeared before I could get his name for the reported-by. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-24bcachefs: Fix failure to flush moves before sleeping in copygcKent Overstreet
This fixes an apparent deadlock - rebalance would get stuck trying to take nocow locks because they weren't being released by copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Improve copygc_wait_to_text()Kent Overstreet
printing the raw values can occasionally be very useful Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Enable automatic shrinking for rhashtablesKent Overstreet
Since the key cache shrinker walks the rhashtable, a mostly empty rhashtable leads to really nasty reclaim performance issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_trans_unlock() must always be followed by relock() or begin()Kent Overstreet
We're about to add new asserts for btree_trans locking consistency, and part of that requires that aren't using the btree_trans while it's unlocked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet
Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Improve bch2_fatal_error()Kent Overstreet
error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: for_each_member_device() now declares loop iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: for_each_btree_key() now declares loop iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: for_each_btree_key_upto() -> for_each_btree_key_old_upto()Kent Overstreet
And for_each_btree_key2_upto -> for_each_btree_key_upto Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: darray_for_each() now declares loop iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_err_(fn|msg) check if should printKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: copygc shouldn't try moving buckets on errorDaniel Hill
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: copygc should wakeup on shutdown if disabledDaniel Hill
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: remove dead bch2_evacuate_bucket()Daniel Hill
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush()Kent Overstreet
More accurate naming. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Clean up btree write buffer write ref handlingKent Overstreet
__bch2_btree_write_buffer_flush() now assumes a write ref is already held (as called by the transaction commit path); and the wrappers bch2_write_buffer_flush() and flush_sync() take an explicit write ref. This means internally the write buffer code can always use BTREE_INSERT_NOCHECK_RW, instead of in the previous code passing flags around and hoping the NOCHECK_RW flag was always carried around correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: New bucket sector count helpersKent Overstreet
This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(), prep work for separately accounting stripe sectors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Extra kthread_should_stop() calls for copygcKent Overstreet
This fixes a bug where going read-only was taking longer than it should have due to copygc forgetting to check kthread_should_stop() Additionally: fix a missing is_kthread check in bch2_move_ratelimit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04bcachefs: fix odebug warn and lockdep splat due to on-stack rhashtableBrian Foster
Guenter Roeck reports a lockdep splat and DEBUG_OBJECTS_WORK related warning when bch2_copygc_thread() initializes its rhashtable. The lockdep splat relates to a warning print caused by the fact that the rhashtable exists on the stack but is not annotated as so. This is something that could be addressed by INIT_WORK_ONSTACK(), but rhashtable doesn't expose that control and probably isnt worth the churn for just one user. Instead, dynamically allocate the buckets_in_flight structure and avoid the splat that way. Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04bcachefs: Data move path now uses bch2_trans_unlock_long()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04bcachefs: Ensure copygc does not spinKent Overstreet
If copygc does no work - finds no fragmented buckets - wait for a bit of IO to happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: move: move_stats refactoringKent Overstreet
data_progress_list is gone - it was redundant with moving_context_list The upcoming rebalance rewrite is going to have it using two different move_stats objects with the same moving_context, depending on whether it's scanning or using the rebalance_work btree - this patch plumbs stats around a bit differently so that will work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: moving_context now owns a btree_transKent Overstreet
btree_trans and moving_context are used together, and having the moving_context owns the transaction object reduces some plumbing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Heap allocate btree_transKent Overstreet
We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix W=12 build errorsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix -Wcompare-distinct-pointer-types in bch2_copygc_get_buckets()Nathan Chancellor
When building bcachefs for 32-bit ARM, there is a warning when using max() to compare an expression involving 'size_t' with an 'unsigned long' literal: fs/bcachefs/movinggc.c:159:21: error: comparison of distinct pointer types ('typeof (16UL) *' (aka 'unsigned long *') and 'typeof (buckets_in_flight->nr / 4) *' (aka 'unsigned int *')) [-Werror,-Wcompare-distinct-pointer-types] 159 | size_t nr_to_get = max(16UL, buckets_in_flight->nr / 4); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/minmax.h:76:19: note: expanded from macro 'max' 76 | #define max(x, y) __careful_cmp(x, y, >) | ^~~~~~~~~~~~~~~~~~~~~~ include/linux/minmax.h:38:24: note: expanded from macro '__careful_cmp' 38 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~~~~~~~ include/linux/minmax.h:28:4: note: expanded from macro '__safe_cmp' 28 | (__typecheck(x, y) && __no_side_effects(x, y)) | ^~~~~~~~~~~~~~~~~ include/linux/minmax.h:22:28: note: expanded from macro '__typecheck' 22 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~ 1 error generated. On 64-bit architectures, size_t is 'unsigned long', so there is no warning when comparing these two expressions. Use max_t(size_t, ...) for this situation, eliminating the warning. Fixes: dd49018737d4 ("bcachefs: Rhashtable based buckets_in_flight for copygc") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Break up io.cKent Overstreet
More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert more code to bch_err_msg()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix for bch2_copygc() spuriously returning -EEXISTKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill BTREE_INSERT_USE_RESERVEKent Overstreet
Now that we have journal watermarks and alloc watermarks unified, BTREE_INSERT_USE_RESERVE is redundant and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill JOURNAL_WATERMARKKent Overstreet
This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards specifying watermarks once in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a missing rhashtable_destroy() callKent Overstreet
Fixes https://lore.kernel.org/linux-bcachefs/784c3e6a-75bd-e6ca-535a-43b3e1daf643@kernel.dk/T/#mbf7caf005f960018eba23b58795d06c06c947411 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rename enum alloc_reserve -> bch_watermarkKent Overstreet
This is prep work for consolidating with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert -ENOENT to private error codesKent Overstreet
As with previous conversions, replace -ENOENT uses with more informative private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_bkey_get_iter() helpersKent Overstreet
Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Mark bch2_copygc() noinlineKent Overstreet
This works around a "stack from too large" error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill bch2_verify_bucket_evacuated()Kent Overstreet
With backpointers, it's now impossible for bch2_evacuate_bucket() to be completely reliable: it can race with an extent being partially overwritten or split, which needs a new write buffer flush for the backpointer to be seen. This shouldn't be a real issue in practice; the previous patch added a new tracepoint so we'll be able to see more easily if it is. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rhashtable based buckets_in_flight for copygcKent Overstreet
Previously, copygc used a fifo for tracking buckets in flight - this had the disadvantage of being fixed size, since we pass references to elements into the move code. This restructures it to be a hash table and linked list, since with erasure coding we need to be able to pipeline across an arbitrary number of buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improved copygc wait debuggingKent Overstreet
This just adds a line for how long copygc has been waiting to sysfs copygc_wait, helpful for debugging why copygc isn't running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix an assert in copygc thread shutdown pathKent Overstreet
We're not supposed to have nested (locked) btree_trans on the stack: this means copygc shutdown needs to exit our btree_trans before exiting the move_ctxt, which calls bch2_write(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_bucket_is_movable() -> BTREE_ITER_CACHEDKent Overstreet
BTREE_ITER_CACHED should really be the default for cached btrees - this is an easy mistake to make. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improved copygc pipeliningKent Overstreet
This improves copygc pipelining across multiple buckets: we now track each in flight bucket we're evacuating, with separate moving_contexts. This means that whereas previously we had to wait for outstanding moves to complete to ensure we didn't try to evacuate the same bucket twice, we can now just check buckets we want to evacuate against the pending list. This also mean we can run the verify_bucket_evacuated() check without killing pipelining - meaning it can now always be enabled, not just on debug builds. This is going to be important for the upcoming erasure coding work, where moving IOs that are being erasure coded will now skip the initial replication step; instead the IOs will wait on the stripe to complete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Mark stripe buckets with correct data typeKent Overstreet
Currently, we don't use bucket data type for tracking whether buckets are part of a stripe; parity buckets are BCH_DATA_parity, but data buckets in a stripe are BCH_DATA_user. There's a separate counter, buckets_ec, outside the BCH_DATA_TYPES system for tracking number of buckets on a device that are part of a stripe. The trouble with this approach is that it's too coarse grained, and we need better information on fragmentation for debugging copygc. With this patch, data buckets in a stripe are now tracked as BCH_DATA_stripe buckets. This doesn't yet differentiate between erasure coded and non-erasure coded data in a stripe bucket, nor do we yet track empty data buckets in stripes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_copygc_wait_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fragmentation LRUKent Overstreet
Now that we have much more efficient updates to the LRU btree, this patch adds a new LRU that indexes buckets by fragmentation. This means copygc no longer has to scan every bucket to find buckets that need to be evacuated. Changes: - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to the bucket's position in the fragmentation LRU. We add a new field for this instead of calculating it as needed because we may make the fragmentation LRU optional; this field indicates whether a bucket is on the fragmentation LRU. Also, zoned devices will introduce variable bucket sizes; explicitly recording the LRU position will be safer for them. - A new copygc path for using the fragmentation LRU instead of scanning every bucket and building up an in-memory heap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>