summaryrefslogtreecommitdiff
path: root/fs/bcachefs/alloc_background.h
AgeCommit message (Collapse)Author
2024-12-21bcachefs: Don't recurse in check_discard_freespace_keyKent Overstreet
When calling check_discard_freeespace_key from the allocator, we can't repair without recursing - run it asynchronously instead. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: struct bkey_validate_contextKent Overstreet
Add a new parameter to bkey validate functions, and use it to improve invalid bkey error messages: we can now print the btree and depth it came from, or if it came from the journal, or is a btree root. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: try_alloc_bucket() now uses bch2_check_discard_freespace_key()Kent Overstreet
check_discard_freespace_key() was doing all the same checks as try_alloc_bucket(), but with repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-24bcachefs: fix shift oob in alloc_lru_idx_fragmentationJeongjun Park
The size of a.data_type is set abnormally large, causing shift-out-of-bounds. To fix this, we need to add validation on a.data_type in alloc_lru_idx_fragmentation(). Reported-by: syzbot+7f45fa9805c40db3f108@syzkaller.appspotmail.com Fixes: 260af1562ec1 ("bcachefs: Kill alloc_v4.fragmentation_lru") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21bcachefs: bch2_dev_rcu_noerror()Kent Overstreet
bch2_dev_rcu() now properly errors if the device is invalid Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21bcachefs: bch2_dev_remove_alloc() -> alloc_background.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-16bcachefs: avoid overflowing LRU_TIME_BITS for cached data lruKent Overstreet
Reported-by: syzbot+510b0b28f8e6de64d307@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()Kent Overstreet
bkey_fsck_err() was added as an interface that looks like fsck_err(), but previously all it did was ensure that the appropriate error counter was incremented in the superblock. This is a cleanup and bugfix patch that converts it to a wrapper around fsck_err(). This is needed to fix an issue with the upgrade path to disk_accounting_v3, where the "silent fix" error list now includes bkey_fsck errors; fsck_err() handles this in a unified way, and since we need to change printing of bkey fsck errors from the caller to the inner bkey_fsck_err() calls, this ends up being a pretty big change. Als,, rename .invalid() methods to .validate(), for clarity, while we're changing the function signature anyways (to drop the printbuf argument). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-07bcachefs: Add a comment for bucket helper typesKent Overstreet
We've had bugs in the past with incorrect integer conversions in disk accounting code, which is why bucket helpers now always return s64s; add a comment explaining this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-07bcachefs: Don't rely on implicit unsigned -> signed integer conversionKent Overstreet
implicit integer conversion is a fertile source of bugs, and we really would rather not have the min()/max() macros doing it implicitly. bcachefs appears to be the only place in the kernel where this happens, so let's fix it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Fix missing BTREE_TRIGGER_bucket_invalidate flagKent Overstreet
This fixes an accounting mismatch for cached data. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Disk space accounting rewriteKent Overstreet
Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: BCH_DATA_unstripedKent Overstreet
Add a new pseudo data type, to track buckets that are members of a stripe, but have unstriped data in them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: bch_alloc->stripe_sectorsKent Overstreet
Add a separate counter to bch_alloc_v4 for amount of striped data; this lets us separately track striped and unstriped data in a bucket, which lets us see when erasure coding has failed to update extents with stripe pointers, and also find buckets to continue updating if we crash mid way through creating a new stripe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-25bcachefs: Discard, invalidate workers are now per deviceKent Overstreet
There's no reason for discards to be single threaded across all devices; this will improve performance on multi device setups. Additionally, making them per-device simplifies the refcounting on bch_dev->io_ref; we now hold it for the duration that the discard path is running, which fixes a race between the discard path and device removal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Guard against overflowing LRU_TIME_BITSKent Overstreet
LRUs only have 48 bits for the time field (i.e. LRU order); thus we need overflow checks and guards. Reported-by: syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: s/bkey_invalid_flags/bch_validate_flagsKent Overstreet
We're about to start using bch_validate_flags for superblock section validation - it's no longer bkey specific. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_dev_bucket_exists() uses bch2_dev_rcu()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: simplify bch2_trans_start_alloc_update()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: __mark_pointer now takes bch_alloc_v4Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_usage_update_m()Kent Overstreet
by using bucket_m_to_alloc() more, we can get some nice code cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: alloc_data_type_set()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Fix type of flags parameter for some ->trigger() implementationsNathan Chancellor
When building with clang's -Wincompatible-function-pointer-types-strict (a warning designed to catch potential kCFI failures at build time), there are several warnings along the lines of: fs/bcachefs/bkey_methods.c:118:2: error: incompatible function pointer types initializing 'int (*)(struct btree_trans *, enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, enum btree_iter_update_trigger_flags)' with an expression of type 'int (struct btree_trans *, enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, unsigned int)' [-Werror,-Wincompatible-function-pointer-types-strict] 118 | BCH_BKEY_TYPES() | ^~~~~~~~~~~~~~~~ fs/bcachefs/bcachefs_format.h:394:2: note: expanded from macro 'BCH_BKEY_TYPES' 394 | x(inode, 8) \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~ fs/bcachefs/bkey_methods.c:117:41: note: expanded from macro 'x' 117 | #define x(name, nr) [KEY_TYPE_##name] = bch2_bkey_ops_##name, | ^~~~~~~~~~~~~~~~~~~~ <scratch space>:277:1: note: expanded from here 277 | bch2_bkey_ops_inode | ^~~~~~~~~~~~~~~~~~~ fs/bcachefs/inode.h:26:13: note: expanded from macro 'bch2_bkey_ops_inode' 26 | .trigger = bch2_trigger_inode, \ | ^~~~~~~~~~~~~~~~~~ There are several functions that did not have their flags parameter converted to 'enum btree_iter_update_trigger_flags' in the recent unification, which will cause kCFI failures at runtime because the types, while ABI compatible (hence no warning from the non-strict version of this warning), do not match exactly. Fix up these functions (as well as a few other obvious functions that should have it, even if there are no warnings currently) to resolve the warnings and potential kCFI runtime failures. Fixes: 31e4ef3280c8 ("bcachefs: iter/update/trigger/str_hash flag cleanup") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bucket_data_type_mismatch()Kent Overstreet
We're working on potentially unifying bch2_check_bucket_ref() and bch2_check_fix_ptrs() - or at least eliminating gratuitious differences. Most immediately, there's a bunch of cleanups to be done regarding BCH_DATA_stripe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: member helper cleanupsKent Overstreet
Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bucket_valid()Kent Overstreet
cut out a branch from doing it the obvious way Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06bcachefs: Fix assert in bch2_alloc_v4_invalid()Kent Overstreet
Reported-by: syzbot+10827fa6b176e1acf1d0@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Split out discard fastpathKent Overstreet
Buckets usually can't be discarded until the transaction that made them empty has been committed in the journal. Tracing has indicated that we're queuing the discard worker excessively, only for it to skip over many buckets that are still waiting on a journal commit, discarding only one or two buckets per iteration. We want to switch to only queuing the discard worker after a journal flush write, but there's an important optimization we need to preserve: if a bucket becomes empty and it was never committed in the journal while it was in use, we want to discard it and reuse it right away - since overwriting it before the previous writes are flushed from the device cache eans those writes only cost bus bandwidth. So, this patch implements a fast path for buckets that can be discarded right away. We need new locking between the two discard workers; the new list of buckets being discarded provides that locking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Combine .trans_trigger, .atomic_triggerKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: unify alloc triggerKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: move bch2_mark_alloc() to alloc_background.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: trans_mark now takes bkey_sKent Overstreet
Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: New bucket sector count helpersKent Overstreet
This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(), prep work for separately accounting stripe sectors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04bcachefs: Ensure copygc does not spinKent Overstreet
If copygc does no work - finds no fragmented buckets - wait for a bit of IO to happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: Enumerate fsck errorsKent Overstreet
This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Correctly initialize new buckets on device resizeKent Overstreet
bch2_dev_resize() was never updated for the allocator rewrite with persistent freelists, and it wasn't noticed because the tests weren't running fsck - oops. Fix this by running bch2_dev_freespace_init() for the new buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill bch2_bucket_gens_read()Kent Overstreet
This folds bch2_bucket_gens_read() into bch2_alloc_read(), doing the version check there. This is prep work for enumarating all recovery passes: we need some cleanup first to make calling all the recovery passes consistent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change check for invalid key typesKent Overstreet
As part of the forward compatibility patch series, we need to allow for new key types without complaining loudly when running an old version. This patch changes the flags parameter of bkey_invalid to an enum, and adds a new flag to indicate we're being called from the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rename enum alloc_reserve -> bch_watermarkKent Overstreet
This is prep work for consolidating with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bkey_ops.min_val_sizeKent Overstreet
This adds a new field to bkey_ops for the minimum size of the value, which standardizes that check and also enforces the new rule (previously done somewhat ad-hoc) that we can extend value types by adding new fields on to the end. To make that work we do _not_ initialize min_val_size with sizeof, instead we initialize it to the size of the first version of those values. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: RESERVE_stripeKent Overstreet
Rework stripe creation path - new algorithm for deciding when to create new stripes or reuse existing stripes. We add a new allocation watermark, RESERVE_stripe, above RESERVE_none. Then we always try to create a new stripe by doing RESERVE_stripe allocations; if this fails, we reuse an existing stripe and allocate buckets for it with the reserve watermark for the given write (RESERVE_none or RESERVE_movinggc). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Mark stripe buckets with correct data typeKent Overstreet
Currently, we don't use bucket data type for tracking whether buckets are part of a stripe; parity buckets are BCH_DATA_parity, but data buckets in a stripe are BCH_DATA_user. There's a separate counter, buckets_ec, outside the BCH_DATA_TYPES system for tracking number of buckets on a device that are part of a stripe. The trouble with this approach is that it's too coarse grained, and we need better information on fragmentation for debugging copygc. With this patch, data buckets in a stripe are now tracked as BCH_DATA_stripe buckets. This doesn't yet differentiate between erasure coded and non-erasure coded data in a stripe bucket, nor do we yet track empty data buckets in stripes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fragmentation LRUKent Overstreet
Now that we have much more efficient updates to the LRU btree, this patch adds a new LRU that indexes buckets by fragmentation. This means copygc no longer has to scan every bucket to find buckets that need to be evacuated. Changes: - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to the bucket's position in the fragmentation LRU. We add a new field for this instead of calculating it as needed because we may make the fragmentation LRU optional; this field indicates whether a bucket is on the fragmentation LRU. Also, zoned devices will introduce variable bucket sizes; explicitly recording the LRU position will be safer for them. - A new copygc path for using the fragmentation LRU instead of scanning every bucket and building up an in-memory heap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change bkey_invalid() rw param to flagsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improved nocow lockingKent Overstreet
This improves the nocow lock table so that hash table entries have multiple locks, and locks specify which bucket they're for - i.e. we can now resolve hash collisions. This is important because the allocator has to skip buckets that are locked in the nocow lock table, and previously hash collisions would cause it to spuriously skip unlocked buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bucket_gens btreeKent Overstreet
To improve mount times, add a btree for just bucket gens, 256 of them per key: this means we'll have to scan drastically less metadata at startup. This adds - trigger for keeping it in sync with the all btree - initialization code, for filesystems from previous versions - new path for reading bucket gens - new fsck code And a new on disk format version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New on disk format: BackpointersKent Overstreet
This patch adds backpointers: we now have a reverse index from device and offset on that device (specifically, offset within a bucket) back to btree nodes and (non cached) data extents. The first 40 backpointers within a bucket are stored in the alloc key; after that backpointers spill over to the next backpointers btree. This is to help avoid performance regressions from additional btree updates on large streaming workloads. This patch adds all the code for creating, checking and repairing backpointers. The next patch in the series is going to use backpointers for copygc - finally getting rid of the need to scan all extents to do copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Better inlining for bch2_alloc_to_v4_mutKent Overstreet
This separates out the slowpath into a separate function, and inlines bch2_alloc_v4_mut into bch2_trans_start_alloc_update(), the main place it's called. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: More style fixesKent Overstreet
Fixes for various checkpatch errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix should_invalidate_buckets()Kent Overstreet
Like bch2_copygc_wait_amount, should_invalidate_buckets() needs to try to ensure that there are always more buckets free than the largest reserve. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>