linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2024-01-05	bcachefs: Split brain detection	Kent Overstreet
	Use the new bch_member->seq, sb->write_time fields to detect split brain and kick out devices when necessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: better error message in btree_node_write_work()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Improve the nopromote tracepoint	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: check for failure to downgrade	Kent Overstreet
	With the upcoming member seq patch, it's now critical that we don't ever write to a superblock that hasn't been version downgraded - failure to update member seq fields will cause split brain detection to fire erroniously. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: btree write buffer now slurps keys from journal	Kent Overstreet
	Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: BCH_ERR_opt_parse_error	Kent Overstreet
	Continuing the project of replacing generic error codes with more specific ones. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: Make journal replay more efficient	Kent Overstreet
	Journal replay now first attempts to replay keys in sorted order, similar to how the btree write buffer flush path works. Any keys that can not be replayed due to journal deadlock are then left for later and replayed in journal order, unpinning journal entries as we go. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: bch_sb_field_downgrade	Kent Overstreet
	Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01	bcachefs: bch_sb.recovery_passes_required	Kent Overstreet
	Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28	bcachefs: Add missing validation for jset_entry_data_usage	Kent Overstreet
	Validation was completely missing for replicas entries in the journal (not the superblock replicas section) - we can't have replicas entries pointing to invalid devices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-25	bcachefs: Data update path won't accidentaly grow replicas	Kent Overstreet
	Previously, there was a bug where if an extent had greater durability than required (because we needed to move a durability=1 pointer and ended up putting it on a durability 2 device), we would submit a write for replicas=2 - the durability of the pointer being rewritten - instead of the number of replicas required to bring it back up to the data_replicas option. This, plus the allocation path sometimes allocating on a greater durability device than requested, meant that extents could continue having more and more replicas added as they were being rewritten. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05	bcachefs: Replace ERANGE with private error codes	Kent Overstreet
	We avoid using standard error codes: private, per-callsite error codes make debugging easier. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01	bcachefs: bch_sb_field_errors	Kent Overstreet
	Add a new superblock section to keep counts of errors seen since filesystem creation: we'll be addingcounters for every distinct fsck error. The new superblock section has entries of the for [ id, count, time_of_last_error ]; this is intended to let us see what errors are occuring - and getting fixed - via show-super output. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31	bcachefs: Guard against unknown compression options	Kent Overstreet
	Since compression options now include compression level, proper validation is a bit more involved. This adds bch2_compression_opt_valid(), and plumbs it around appropriately. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: New superblock section members_v2	Hunter Shaffer
	members_v2 has dynamically resizable entries so that we can extend bch_member. The members can no longer be accessed with simple array indexing Instead members_v2_get is used to find a member's exact location within the array and returns a copy of that member. Alternatively member_v2_get_mut retrieves a mutable point to a member. Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: More minor smatch fixes	Kent Overstreet
	- fix a few uninitialized return values - return a proper error code in lookup_lostfound() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: trace_read_nopromote()	Kent Overstreet
	Add a tracepoint to print the reason a read wasn't promoted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Break up io.c	Kent Overstreet
	More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Convert btree_err_type to normal error codes	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Make topology repair a normal recovery pass	Kent Overstreet
	This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and explicitly running a specific recovery pass - this is a more general replacement for how we were running topology repair before. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: bch2_run_explicit_recovery_pass()	Kent Overstreet
	This introduces bch2_run_explicit_recovery_pass() and uses it for when fsck detects that we need to re-run dead snaphots cleanup, and makes dead snapshot cleanup more like a normal recovery pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: overlapping_extents_found()	Kent Overstreet
	This improves the repair path for overlapping extents - we now verify that we find in the btree the overlapping extents that the algorithm detected, and fail the fsck run with a more useful error if it doesn't match. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Convert more -EROFS to private error codes	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: BCH_ERR_fsck -> EINVAL	Kent Overstreet
	When we return errors outside of bcachefs, we need to return a standard error code - fix this for BCH_ERR_fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix more lockdep splats in debug.c	Kent Overstreet
	Similar to previous fixes, we can't incur page faults while holding btree locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Use unlikely() in bch2_err_matches()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Convert -ENOENT to private error codes	Kent Overstreet
	As with previous conversions, replace -ENOENT uses with more informative private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: BTREE_ID_snapshot_tree	Kent Overstreet
	This adds a new btree which gets us a persistent per-snapshot-tree identifier. - BTREE_ID_snapshot_trees - KEY_TYPE_snapshot_tree - bch_snapshot now has a field that points to a snapshot_tree This is going to be used to designate one snapshot ID/subvolume out of a given tree of snapshots as the "main" subvolume, so that we can do quota accounting in that subvolume and not the rest. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: bch2_bkey_get_empty_slot()	Kent Overstreet
	Add a new helper for allocating a new slot in a btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Private error codes: ENOMEM	Kent Overstreet
	This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix stripe reuse path	Kent Overstreet
	It's possible that we reuse a stripe that doesn't have quite the same configuration as the stripe_head we're allocating from. In that case, we have to make sure that the new stripe uses the settings from the stripe we resue, not the stripe head, and make sure the buffer is allocated correctly. This fixes the ec_mixed_tiers test. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: RESERVE_stripe	Kent Overstreet
	Rework stripe creation path - new algorithm for deciding when to create new stripes or reuse existing stripes. We add a new allocation watermark, RESERVE_stripe, above RESERVE_none. Then we always try to create a new stripe by doing RESERVE_stripe allocations; if this fails, we reuse an existing stripe and allocate buckets for it with the reserve watermark for the given write (RESERVE_none or RESERVE_movinggc). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: don't block reads if we're promoting	Daniel Hill
	The promote path calls data_update_init() and now that we take locks here, there's potential for promote to block our read path, just error when we can't take the lock instead of blocking. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Data update support for unwritten extents	Kent Overstreet
	The data update path requires special support for unwritten extents - we still need to be able to move them, but there's no need to read or write anything. This patch adds a new error code to tell bch2_move_extent() that we're short circuiting the read, and adds bch2_update_unwritten_extent() to create a reservation then call __bch2_data_update_index_update(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: New on disk format: Backpointers	Kent Overstreet
	This patch adds backpointers: we now have a reverse index from device and offset on that device (specifically, offset within a bucket) back to btree nodes and (non cached) data extents. The first 40 backpointers within a bucket are stored in the alloc key; after that backpointers spill over to the next backpointers btree. This is to help avoid performance regressions from additional btree updates on large streaming workloads. This patch adds all the code for creating, checking and repairing backpointers. The next patch in the series is going to use backpointers for copygc - finally getting rid of the need to scan all extents to do copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Btree write buffer	Kent Overstreet
	This adds a new method of doing btree updates - a straight write buffer, implemented as a flat fixed size array. This is only useful when we don't need to read from the btree in order to do the update, and when reading is infrequent - perfect for the LRU btree. This will make LRU btree updates fast enough that we'll be able to use it for persistently indexing buckets by fragmentation, which will be a massive boost to copygc performance. Changes: - A new btree_insert_type enum, for btree_insert_entries. Specifies btree, btree key cache, or btree write buffer. - bch2_trans_update_buffered(): updates via the btree write buffer don't need a btree path, so we need a new update path. - Transaction commit path changes: The update to the btree write buffer both mutates global, and can fail if there isn't currently room. Therefore we do all write buffer updates in the transaction all at once, and also if it fails we have to revert filesystem usage counter changes. If there isn't room we flush the write buffer in the transaction commit error path and retry. - A new persistent option, for specifying the number of entries in the write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Convert EAGAIN errors to private error codes	Kent Overstreet
	More error code cleanup, for better error messages and debugability. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Convert EROFS errors to private error codes	Kent Overstreet
	More error code improvements - this gets us more useful error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix for long running btree transactions & key cache	Kent Overstreet
	While a btree transaction is running, we hold a SRCU read lock on the btree key cache that prevents btree key cache keys from being freed - this is so that relock() operations won't access freed memory. The downside of this is that long running btree transactions prevent memory from being freed from the key cache. This adds a check in bch2_trans_begin() - if the transaction has been running longer than 1 second, drop and retake the SRCU read lock and zero out pointers to unlock key cache paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: More errcode cleanup	Kent Overstreet
	We shouldn't be overloading standard error codes now that we have provisions for bcachefs-specific errorcodes: this patch converts super.c and super-io.c to per error site errcodes, with a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Kill btree_insert_ret enum	Kent Overstreet
	Replace with standard bcachefs-private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Add error path to btree_split()	Kent Overstreet
	The next patch in the series is (finally!) going to change btree splits (and interior updates in general) to not take intent locks all the way up to the root - instead only locking the nodes they'll need to modify. However, this will be introducing a race since if we're not holding a write lock on a btree node it can be written out by another thread, and then we might not have enough space for a new bset entry. We can handle this by retrying - we just need to introduce a new error path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Delete old deadlock avoidance code	Kent Overstreet
	This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Deadlock cycle detector	Kent Overstreet
	We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Add private error codes for ENOSPC	Kent Overstreet
	Continuing the saga of introducing private dedicated error codes for each error path, this patch converts ENOSPC to error codes that are subtypes of ENOSPC. We've recently had a test failure where we got -ENOSPC where we shouldn't have, and didn't have enough information to tell where it came from, so this patch will solve that problem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Errcodes can now subtype standard error codes	Kent Overstreet
	The next patch is going to be adding private error codes for all the places we return -ENOSPC. Additionally, this patch updates return paths at all module boundaries to call bch2_err_class(), to return the standard error code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Add distinct error code for key_cache_upgrade	Kent Overstreet
	This aids in debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Convert fsck errors to errcode.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: EINTR -> BCH_ERR_transaction_restart	Kent Overstreet
	Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>