linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-01-01	bcachefs: Don't rejournal keys in key cache flush	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Fix userspace bch2_prt_datetime()	Kent Overstreet
	ctime_r() outputs a newline, which we don't want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Kill BTREE_ITER_ALL_LEVELS	Kent Overstreet
	As discussed in the previous patch, BTREE_ITER_ALL_LEVELS appears to be racy with concurrent interior node updates - and perhaps it is fixable, but it's tricky and unnecessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: backpointers fsck no longer uses BTREE_ITER_ALL_LEVELS	Kent Overstreet
	It appears that BTREE_ITER_ALL_LEVELS is racy with concurrent interior node btree updates; unfortunate but not terribly surprising it's a difficult problem - that was the original reason for gc_lock. BTREE_ITER_ALL_LEVELS will probably be deleted in a subsequent patch, this changes backpointers fsck to instead walk keys at one level of the btree at a time. This fixes the tiering_drop_alloc test, which stopped working with the patch to not flush the journal after journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Improve btree_path_dowgrade tracepoint	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Rename BTREE_INSERT flags	Kent Overstreet
	BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: bch_str_hash_flags_t	Kent Overstreet
	Create a separate enum for str_hash flags - instead of abusing the btree_insert_flags enum - and create a __bitwise typedef for sparse typechecking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Kill dead BTREE_INSERT flags	Kent Overstreet
	BTREE_INSERT_NOWAIT and BTREE_INSERT_GC_LOCK_HELD are no longer used, and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Fix redundant variable initialization	Kent Overstreet
	path->level was being read, but never used. Reported-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Avoiding dropping/retaking write locks in ↵	Kent Overstreet
	bch2_btree_write_buffer_flush_one() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Make journal replay more efficient	Kent Overstreet
	Journal replay now first attempts to replay keys in sorted order, similar to how the btree write buffer flush path works. Any keys that can not be replayed due to journal deadlock are then left for later and replayed in journal order, unpinning journal entries as we go. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Go rw before journal replay	Kent Overstreet
	This gets us slightly nicer log messages. Also, this slightly clarifies synchronization of c->journal_keys; after we go RW it's in use by multiple threads (so that the btree iterator code can overlay keys from the journal); so it has to be prepped before that point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Kill BTREE_UPDATE_PREJOURNAL	Kent Overstreet
	With the previous patch that reworks BTREE_INSERT_JOURNAL_REPLAY, we can now switch the btree write buffer to use it for flushing. This has the advantage that transaction commits don't need to take a journal reservation at all. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: BTREE_INSERT_JOURNAL_REPLAY now "don't init trans->journal_res"	Kent Overstreet
	This slightly changes how trans->journal_res works, in preparation for changing the btree write buffer flush path to use it. Now, BTREE_INSERT_JOURNAL_REPLAY means "don't take a journal reservation; trans->journal_res.seq already refers to the journal sequence number to pin". Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Clear k->needs_whitout earlier in commit path	Kent Overstreet
	The upcoming btree write buffer rework is going to use the journal itself as the first stage of the write buffer; this is a cleanup to make sure k->needs_whiteout is initialized before keys hit the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: track_event_change()	Kent Overstreet
	This introduces a new helper for connecting time_stats to state changes, i.e. when taking journal reservations is blocked for some reason. We use this to track separately the different reasons the journal might be blocked - i.e. space in the journal full, or the journal pin fifo full. Also do some cleanup and improvements on the time stats code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Journal pins must always have a flush_fn	Kent Overstreet
	flush_fn is how we identify journal pins in debugfs - this is a debugging aid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Add an assertion in bch2_journal_pin_set()	Kent Overstreet
	Previously, bch2_journal_pin_set() would silently ignore a request to pin a journal sequence number that was no longer dirty, because it was used internally by bch2_journal_pin_copy() which could race with the src pin being flushed. Split these apart so that we can properly assert that @seq is a currently dirty journal sequence number - this is almost always a bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Include average write size in sysfs journal_debug	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Fix warning when building in userspace	Kent Overstreet
	bch_err() doesn't reference the fs arg in userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Print old version when scanning for old metadata	Kent Overstreet
	Also: we should be using bch2_fs_read_write_early() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Fix locking when checking freespace btree	Kent Overstreet
	On transaction restart, we weren't re-validating the hole we saw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Check for unlinked inodes not on deleted list	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: kill INODE_LOCK, use lock_two_nondirectories()	Kent Overstreet
	In an ideal world, we'd have a common helper that could be used for sorting a list of inodes into the correct lock order, and then the same lock ordering could be used for any type of inode lock, not just i_rwsem. But the lock ordering rules for i_rwsem are a bit complicated, so - abandon that dream for now and do it the more standard way. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Improved backpointer messages in fsck	Kent Overstreet
	When we have a key to print, we should print it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Add extra verbose logging for ro path	Kent Overstreet
	Also log time waiting for c->writes references to be dropped; this will help in debugging why unmounts are taking longer than they should. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Flush fsck errors before running twice	Kent Overstreet
	It's confusing if we run fsck a second time (in debug mode, to verify the second run is clean), but errors are still ratelimited from the first run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: make RO snapshots actually RO	Kent Overstreet
	Add checks to all the VFS paths for "are we in a RO snapshot?". Note - we don't check this when setting inode options via our xattr interface, since those generally only affect data placement, not contents of data. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
2024-01-01	bcachefs: bch_sb_field_downgrade	Kent Overstreet
	Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: bch_sb.recovery_passes_required	Kent Overstreet
	Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Add persistent identifiers for recovery passes	Kent Overstreet
	The next patch will start to refer to recovery passes from the superblock; naturally, we now need identifiers that don't change, since the existing enum is in the order in which they are run and is not fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: prt_bitflags_vector()	Kent Overstreet
	similar to prt_bitflags(), but for ulong arrays Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: move BCH_SB_ERRS() to sb-errors_types.h	Kent Overstreet
	we need BCH_SB_ERR_MAX in bcachefs.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: fix buffer overflow in nocow write path	Kent Overstreet
	BCH_REPLICAS_MAX isn't the actual maximum number of pointers in an extent, it's the maximum number of dirty pointers. We don't have a real restriction on the number of cached pointers, and we don't want a fixed size array here anyways - so switch to DARRAY_PREALLOCATED(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-and-tested-by: Daniel J Blueman <daniel@quora.org>
2024-01-01	bcachefs: DARRAY_PREALLOCATED()	Kent Overstreet
	Add support to darray for preallocating some number of elements. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Switch darray to kvmalloc()	Kent Overstreet
	We sometimes use darrays for quite large buffers - the btree write buffer in particular needs large buffers, since it must be sized to hold all the write buffer keys outstanding in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Factor out darray resize slowpath	Kent Overstreet
	Move the slowpath (actually growing the darray) to an out-of-line function; also, add some helpers for the upcoming btree write buffer rewrite. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: fix setting version_upgrade_complete	Kent Overstreet
	If a superblock write hasn't happened (i.e. we never had to go rw), then c->sb.version will be out of date w.r.t. c->disk_sb.sb->version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: fix invalid free in dio write path	Kent Overstreet
	turns out iterate_iovec() mutates __iov, we need to save our own copy Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: Marcin Mirosław <marcin@mejor.pl>
2024-01-01	bcachefs: Fix extents iteration + snapshots interaction	Kent Overstreet
	peek_upto() checks against the end position and bails out before FILTER_SNAPSHOTS checks; this is because if we end up at a different inode number than the original search key none of the keys we see might be visibile in the current snapshot - we might be looking at inode in a completely different subvolume. But this is broken, because when we're iterating over extents we're checking against the extent start position to decide when to bail out, and the extent start position isn't monotonically increasing until after we've run FILTER_SNAPSHOTS. Fix this by adding a simple inode number check where the old bailout check was, and moving the main check to the correct position. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
2023-12-26	bcachefs: Fix promotes	Kent Overstreet
	The recent work to fix data moves w.r.t. durability broke promotes, because the caused us to bail out when the extent minus pointers being dropped still has enough pointers to satisfy the current number of replicas. Disable this check when we're adding cached replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-21	bcachefs: Fix leakage of internal error code	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-21	bcachefs: Fix insufficient disk reservation with compression + snapshots	Kent Overstreet
	When overwriting and splitting existing extents, we weren't correctly accounting for a 3 way split of a compressed extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19	bcachefs: fix BCH_FSCK_ERR enum	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19	bcachefs: Fix bch2_alloc_sectors_start_trans() error handling	Kent Overstreet
	When we fail to allocate because of insufficient open buckets, we don't want to retry from the full set of devices - we just want to retry in blocking mode. But if the retry in blocking mode fails with a different error code, we end up squashing the -BCH_ERR_open_buckets_empty error with an error that makes us thing we won't be able to allocate (insufficient_devices) - which is incorrect when we didn't try to allocate from the full set of devices, and causes the write to fail. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19	bcachefs; guard against overflow in btree node split	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19	bcachefs: btree_node_u64s_with_format() takes nr keys	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-17	bcachefs: print explicit recovery pass message only once	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-14	bcachefs: improve modprobe support by providing softdeps	Daniel Hill
	We need to help modprobe load architecture specific modules so we don't fall back to generic software implementations, this should help performance when building as a module. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-14	bcachefs: fix invalid memory access in bch2_fs_alloc() error path	Thomas Bertschinger
	When bch2_fs_alloc() gets an error before calling bch2_fs_btree_iter_init(), bch2_fs_btree_iter_exit() makes an invalid memory access because btree_trans_list is uninitialized. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Fixes: 6bd68ec266ad ("bcachefs: Heap allocate btree_trans") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>