summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-22bcachefs: Fix bch2_btree_node_insert_fits()Kent Overstreet
It should be checking for the recently added flag btree_node_needs_rewrite. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ensure we only allocate one EC bucket per writepointKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a race with BCH_WRITE_SKIP_CLOSURE_PUTKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't let copygc buckets be stolen by other threadsKent Overstreet
And assorted other copygc fixes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Delete unused argumentsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix an error pathKent Overstreet
We were missing a 'goto retry' and continuing on with an error pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor replicas codeKent Overstreet
Awhile back the mechanism for garbage collecting unused replicas entries was significantly improved, but some cleanup was missed - this patch does that now. This is also prep work for a patch to account for erasure coded parity blocks separately - we need to consolidate the logic for checking/marking the various replicas entries from one bkey into a single function. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't restrict copygc writes to the same deviceKent Overstreet
This no longer makes any sense, since copygc is now one thread per filesystem, not per device, with a single write point. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add bch2_blk_status_to_str()Kent Overstreet
We define our own BLK_STS_REMOVED, so we need our own to_str helper too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a faulty assertionKent Overstreet
Now that updates to interior nodes are journalled, we shouldn't be checking topology of interior nodes until we've finished replaying updates to that node. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Wrap write path in memalloc_nofs_save()Kent Overstreet
This fixes a lockdep splat where we're allocating memory with vmalloc in the compression bounce path, which doesn't always obey GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add an option for rebuilding the replicas sectionKent Overstreet
There is a bug where we cnan end up clearing the data_has field in the superblock members section, which causes us to skip reading the journal and thus journal replay fails. This option tells the recovery path to not trust those fields. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make copygc thread globalKent Overstreet
Per device copygc threads don't move data to different devices and they make fragmentation works - they don't make much sense anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Drop extra pointers when marking data as in a stripeKent Overstreet
We ideally want the buckets used for the extra initial replicas to be reused right away. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix extent_ptr_durability() calculation for erasure coded dataKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use x-macros for data typesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix short buffered writesKent Overstreet
In the buffered write path, we have to check for short writes that write to the full page, where the page wasn't UpToDate; when this happens, the page is partly garbage, so we have to zero it out and revert that part of the write. This check was wrong - we reverted total from copied, but didn't revert the iov_iter, probably also leading to corrupted writes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Allow existing stripes to be updated with new data bucketsKent Overstreet
This solves internal fragmentation within stripes. We already have copygc, which evacuates buckets that are partially or mostly empty, but it's up to the ec code that manages stripes to deal with stripes that have empty buckets in them. This patch changes the path for creating new stripes to check if there's existing stripes with empty buckets - and if so, update them with new data buckets instead of creating new stripes. TODO: improve the disk space accounting so that we can only use this (more expensive path) when we have too much fragmentation in existing stripes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor stripe creationKent Overstreet
Prep work for the patch to update existing stripes with new data blocks. This moves allocating new stripes into ec.c, and also sets up the data structures so that we can handly only allocating some of the blocks in a stripe. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Move stripe creation to workqueueKent Overstreet
This is mainly to solve a lock ordering issue, and also simplifies the code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve stripe triggers/heap codeKent Overstreet
Soon we'll be able to modify existing stripes - replacing empty blocks with new blocks and new p/q blocks. This patch updates the trigger code to handle pointers changing in an existing stripe; also, it significantly improves how the stripes heap works, which means we can get rid of the stripe creation/deletion lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rework triggers interfaceKent Overstreet
The trigger for stripe keys is shortly going to need both the old and the new key passed to the trigger - this patch does that rework. For now, this just changes the in memory triggers, and this doesn't change how extent triggers work. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill BTREE_TRIGGER_NOOVERWRITESKent Overstreet
This is prep work for reworking the triggers machinery - we have triggers that need to know both the old and the new key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Mark btree nodes as needing rewrite when not all replicas are RWKent Overstreet
This fixes a bug where recovery fails when one of the devices is read only. Also - consolidate the "must rewrite this node to insert it" behind a new btree node flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use blk_status_to_str()Kent Overstreet
Improved error messages are always a good thing Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't cap ios in dio write path at 2 MBKent Overstreet
It appears this was erronious, a different bug was responsible Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor dio write code to reinit bch_write_opKent Overstreet
This fixes a bug where the BCH_WRITE_SKIP_CLOSURE_PUT was set incorrectly, causing the completion to be delivered multiple times. oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_extent_can_insert() not being calledKent Overstreet
It's supposed to check whether we're splitting a compressed extent and if so get a bigger disk reservation - hence this fixes a "disk usage increased by x without a reservaiton" bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a null ptr deref in bch2_btree_iter_traverse_one()Kent Overstreet
We use sentinal values that aren't NULL to indicate there's a btree node at a higher level; occasionally, this may result in btree_iter_up_until_good_node() stopping at one of those sentinal values. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Track sectors of erasure coded dataKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use btree reserve when appropriateKent Overstreet
Whenever we're doing an update that has pointers, that generally means we need to do the update in order to release open bucket references - so we should be using the btree open bucket reserve. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a kthread_should_stop() check to allocator threadKent Overstreet
Turns out it's possible during shutdown for the allocator to get stuck spinning on bch2_invalidate_buckets() without hitting any of the other checks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change bch2_dump_bset() to also print key valuesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a deadlock in the RO pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix incorrect gfp checkKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix lock ordering with new btree cache codeKent Overstreet
The code that checks lock ordering was recently changed to go off of the pos of the btree node, rather than the iterator, but the btree cache code didn't update to handle iterators that point to cached bkeys. Oops Also, update various debug code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: delete a slightly faulty assertionKent Overstreet
state lock isn't held at startup Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Increase size of btree node reserveKent Overstreet
Also tweak the allocator to be more aggressive about keeping it full. The recent changes to make updates to interior nodes transactional (and thus generate updates to the alloc btree) all put more stress on the btree node reserves. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Give bkey_cached_key same attributes as bposKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use cached iterators for alloc btreeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Btree key cacheKent Overstreet
This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Implement a new gc that only recalcs oldest genKent Overstreet
Full mark and sweep gc doesn't (yet?) work with the new btree key cache code, but it also blocks updates to interior btree nodes for the duration and isn't really necessary in practice; we aren't currently attempting to repair errors in allocation info at runtime. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Turn c->state_lock into an rwsemKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add an internal option for reading entire journalKent Overstreet
To be used the debug tool that dumps the contents of the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't deadlock when btree node reuse changes lock orderingKent Overstreet
Btree node lock ordering is based on the logical key. However, 'struct btree' may be reused for a different btree node under memory pressure. This patch uses the new six lock callback to check if a btree node is no longer the node we wanted to lock before blocking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a deadlockKent Overstreet
__bch2_btree_node_lock() was incorrectly using iter->pos as a proxy for btree node lock ordering, this caused an off by one error that was triggered by bch2_btree_node_get_sibling() getting the previous node. This refactors the code to compare against btree node keys directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor btree insert pathKent Overstreet
This splits out the journalling code from the btree update code; prep work for the btree key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Always give out journal pre-res if we already have oneKent Overstreet
This is better than skipping the journal pre-reservation if we already have one - we should still acount for the journal reservation we're going to have to get. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: More open bucketsKent Overstreet
We need a larger open bucket reserve now that the btree interior update path holds onto open bucket references; filesystems with many high through devices may need more open buckets now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't allocate memory under the btree cache lockKent Overstreet
The btree cache lock is needed for reclaiming from the btree node cache, and memory allocation can potentially spin and sleep (for 100 ms at a time), so.. don't do that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>