linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-12-21	bcachefs: fix O(n^2) issue with whiteouts in journal keys	Kent Overstreet
	The journal_keys array can't be substantially modified after we go RW, because lookups need to be able to check it locklessly - thus we're limited on what we can do when a key in the journal has been overwritten. This is a problem when there's many overwrites to skip over for peek() operations. To fix this, add tracking of ranges of overwrites: we create a range entry when there's more than one contiguous whiteout. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: btree_and_journal_iter: don't iterate over too many whiteouts when ↵	Kent Overstreet
	prefetching To help ameloriate issues with peek operations having to skip over deletions in the journal - just bail out if all we're doing is prefetching btree nodes. Since btree node prefetching runs every time we iterate to a new node, and has to sequentially scan ahead, this avoids another O(n^2). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: journal keys: sort keys for interior nodes first	Kent Overstreet
	There's an unavoidable issue with btree lookups when we're overlaying journal keys and the journal has many deletions for keys present in the btree - peek operations will have to iterate over all those deletions to find the next live key to return. This is mainly a problem for lookups in interior nodes, if we have to traverse to a leaf. Looking up an insert position in a leaf (for journal replay) doesn't have to find the next live key, but walking down the btree does. So to ameloriate this, change journal key sort ordering so that we replay keys from roots and interior nodes first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: kill bch2_journal_entries_free()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Implement bch2_btree_iter_prev_min()	Kent Overstreet
	A user contributed a filessytem dump, where the dump was actually corrupted (due to being taken while the filesystem was online), but which exposed an interesting bug in fsck - reconstruct_inode(). When itearting in BTREE_ITER_filter_snapshots mode, it's required to give an end position for the iteration and it can't span inode numbers; continuing into the next inode might mean we start seeing keys from a different snapshot tree, that the is_ancestor() checks always filter, thus we're never able to return a key and stop iterating. Backwards iteration never implemented the end position because nothing else needed it - except for reconstuct_inode(). Additionally, backwards iteration is now able to overlay keys from the journal, which will be useful if we ever decide to start doing journal replay in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Rename btree_iter_peek_upto() -> btree_iter_peek_max()	Kent Overstreet
	We'll be introducing btree_iter_peek_prev_min(), so rename for consistency. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14	bcachefs: Coalesce accounting keys before journal replay	Kent Overstreet
	This fixes a performance regression in journal replay; without colaescing accounting keys we have multiple keys at the same position, which means journal_keys_peek_upto() has to skip past many overwritten keys - turning journal replay into an O(n^2) algorithm. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14	bcachefs: Accumulate accounting keys in journal replay	Kent Overstreet
	Until accounting keys hit the btree, they are deltas, not new versions of the existing key; this means we have to teach journal replay to accumulate them. Additionally, the journal doesn't track precisely which entries have been flushed to the btree; it only tracks a range of entries that may possibly still need to be flushed. That means we need to compare accounting keys against the version in the btree and only flush updates that are newer. There's another wrinkle with the write buffer: if the write buffer starts flushing accounting keys before journal replay has finished flushing accounting keys, journal replay will see the version number from the new updates and updates from the journal will be lost. To avoid this, journal replay has to flush accounting keys first, and we'll be adding a flag so that write buffer flush knows to hold accounting keys until then. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08	bcachefs: bch2_journal_keys_dump()	Kent Overstreet
	debug helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03	bcachefs: bch2_shoot_down_journal_keys()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31	bcachefs: Be careful about btree node splits during journal replay	Kent Overstreet
	Don't pick a pivot that's going to be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10	bcachefs: btree node prefetching in check_topology	Kent Overstreet
	btree_and_journal_iter is old code that we want to get rid of, but we're not ready to yet. lack of btree node prefetching is, it turns out, a real performance issue for fsck on spinning rust, so - add it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10	bcachefs: btree_and_journal_iter.trans	Kent Overstreet
	we now always have a btree_trans when using a btree_and_journal_iter; prep work for adding prefetching to btree_and_journal_iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24	bcachefs: Proper refcounting for journal_keys	Kent Overstreet
	The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: btree_journal_iter.c	Kent Overstreet
	Split out a new file from recovery.c for managing the list of keys we read from the journal: before journal replay finishes the btree iterator code needs to be able to iterate over and return keys from the journal as well, so there's a fair bit of code here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>