summaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)Author
2023-11-28bcachefs: Extra kthread_should_stop() calls for copygcKent Overstreet
This fixes a bug where going read-only was taking longer than it should have due to copygc forgetting to check kthread_should_stop() Additionally: fix a missing is_kthread check in bch2_move_ratelimit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Convert gc_alloc_start() to for_each_btree_key2()Kent Overstreet
This eliminates some SRCU warnings: for_each_btree_key2() runs every loop iteration in a distinct transaction context. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Fix race between btree writes and metadata dropKent Overstreet
btree writes update the btree node key after every write, in order to update sectors_written, and they also might need to drop pointers if one of the writes failed in a replicated btree node. But the btree node might also have had a pointer dropped while the write was in flight, by bch2_dev_metadata_drop(), and thus there was a bug where the btree node write would ovewrite the btree node's key with what it had at the start of the write. Fix this by dropping pointers not currently in the btree node key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: move journal seq assertionKent Overstreet
journal_cur_seq() can legitimately be used outside of the journal lock, where this assert can race Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: -EROFS doesn't count as move_extent_start_failKent Overstreet
The automated tests check if we've hit too many slowpath/error path events and fail the test - if we're just shutting down, that naturally shouldn't count. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: trace_move_extent_start_fail() now includes errcodeKent Overstreet
Renamed from trace_move_extent_alloc_mem_fail, because there are other reasons we colud fail (disk space allocation failure). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Fix split_race livelockKent Overstreet
bch2_btree_update_start() calculates which nodes are going to have to be split/rewritten, so that we know how many nodes to reserve and how deep in the tree we have to take locks. But btree node merges require inserting two keys into the parent node, not just splits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Fix bucket data type for stripe bucketsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Add missing validation for jset_entry_data_usageKent Overstreet
Validation was completely missing for replicas entries in the journal (not the superblock replicas section) - we can't have replicas entries pointing to invalid devices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Fix zstd compress workspace sizeKent Overstreet
zstd apparently lies about the size of the compression workspace it requires; if we double it compression succeeds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-25bcachefs: bpos is misaligned on big endianKent Overstreet
bkey embeds a bpos that is misaligned on big endian; this is so that bch2_bkey_swab() works correctly without having to differentiate between packed and non-packed keys (a debatable design decision). This means it can't have the __aligned() tag on big endian. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-25bcachefs: Fix ec + durability calculationKent Overstreet
Durability of an erasure coded pointer doesn't add the device durability; durability is the same for any extent in that stripe so the calculation only comes from the stripe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-25bcachefs: Data update path won't accidentaly grow replicasKent Overstreet
Previously, there was a bug where if an extent had greater durability than required (because we needed to move a durability=1 pointer and ended up putting it on a durability 2 device), we would submit a write for replicas=2 - the durability of the pointer being rewritten - instead of the number of replicas required to bring it back up to the data_replicas option. This, plus the allocation path sometimes allocating on a greater durability device than requested, meant that extents could continue having more and more replicas added as they were being rewritten. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: deallocate_extra_replicas()Kent Overstreet
When allocating from devices with different durability, we might end up with more replicas than required; this changes bch2_alloc_sectors_start() to check for this, and drop replicas that aren't needed to hit the number of replicas requested. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Proper refcounting for journal_keysKent Overstreet
The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: preserve device path as device nameBrian Foster
Various userspace scripts/tools may expect mount entries in /proc/mounts to reflect the device path names used to mount the associated filesystem. bcachefs seems to normalize the device path to the underlying device name based on the block device. This confuses tools like fstests when the test devices might be lvm or device-mapper based. The default behavior for show_vfsmnt() appers to be to use the string passed to alloc_vfsmnt(), so tweak bcachefs to copy the path at device superblock read time and to display it via ->show_devname(). Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Fix an endianness conversionKent Overstreet
cpu_to_le32(), not le32_to_cpu() - fixes a sparse complaint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Start gc, copygc, rebalance threads after initing writes refKent Overstreet
This fixes a bug where copygc would occasionally race with going read-write and die, thinking we were read only, because it couldn't take a ref on c->writes. It's not necessary for copygc (or rebalance, or copygc) to take write refs; they could run with BCH_TRANS_COMMIT_nocheck_rw, but this is an easier fix that making sure that flag is passed correctly everywhere. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Don't stop copygc thread on device resizeKent Overstreet
copygc no longer has to scan the buckets, so it's no longer a problem if the number of buckets is changing while it's running. This also fixes a bug where we forgot to restart copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Make sure bch2_move_ratelimit() also waits for move_opsKent Overstreet
This adds move_ctxt_wait_event_timeout(), which can sleep for a timeout while also issueing pending moves as reads complete. Co-developed-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: bch2_moving_ctxt_flush_all()Kent Overstreet
Introduce a new helper to flush all move IOs, and use it in a few places where we should have been. The new helper also drops btree locks before waiting on outstanding move writes, avoiding potential deadlocks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Put erasure coding behind an EXPERIMENTAL kconfig optionKent Overstreet
We still have disk space accounting changes coming for erasure coding, and the changes won't be as strictly backwards compatible as they'd ought to be - specifically, we need to start accounting striped data under a separate counter in bch_alloc (which describes buckets). A fsck will suffice for upgrading/downgrading, but since erasure coding is the most incomplete major feature of bcachefs it still makes sense to put behind a separate kconfig option, so that users are fully aware. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24closures: CLOSURE_CALLBACK() to fix type punningKent Overstreet
Control flow integrity is now checking that type signatures match on indirect function calls. That breaks closures, which embed a work_struct in a closure in such a way that a closure_fn may also be used as a workqueue fn by the underlying closure code. So we have to change closure fns to take a work_struct as their argument - but that results in a loss of clarity, as closure fns have different semantics from normal workqueue functions (they run owning a ref on the closure, which must be released with continue_at() or closure_return()). Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros as suggested by Kees, to smooth things over a bit. Suggested-by: Kees Cook <keescook@chromium.org> Cc: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-16bcachefs: Fix missing locking for dentry->d_parent accessKent Overstreet
Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: six locks: Fix lost wakeupKent Overstreet
In percpu reader mode, trylock() for read had a lost wakeup: on failure to get the lock, we may have caused a writer to fail to get the lock, because we temporarily elevated the reader count. We need to check for waiters after decrementing the read count - not before. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: Fix no_data_io mode checksum checkKent Overstreet
In no_data_io mode, we expect data checksums to be wrong - don't want to spew the log with them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: Fix bch2_check_nlinks() for snapshotsKent Overstreet
When searching the link table for the matching inode, we were searching for a specific - incorrect - snapshot ID as well, causing us to fail to find the inode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: Don't decrease BTREE_ITER_MAX when LOCKDEP=yKent Overstreet
Running with fewer max btree paths doesn't work anymore when replication is enabled - as we've added e.g. the freespace and bucket gens btrees, we naturally end up needing more btree paths. This is an issue with lockdep, we end up taking more locks than lockdep will track (the MAX_LOCKD_DEPTH constant). But bcachefs as merged does not yet support lockdep anyways, so we can leave that for later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: Disable debug log statementsKent Overstreet
The journal read path had some informational log statements preperatory for ZNS support - they're not of interest to users, so we can turn them off. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: Fix missing transaction commitKent Overstreet
In may_delete_deleted_inode(), there's a corner case when a snapshot was taken while we had an unlinked inode: we don't want to delete the inode in the internal (shared) snapshot node, since it might have been reattached in a descendent snapshot. Instead we propagate the key to any snapshot leaves it doesn't exist in, so that it can be deleted there if necessary, and then clear the unlinked flag in the internal node. But we forgot to commit after clearing the unlinked flag, causing us to go into an infinite loop. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: Fix error path in bch2_mount()Kent Overstreet
This fixes a bug discovered by generic/388 where sb->s_fs_info was NULL while the superblock was still active - the error path was entirely fubar, and was trying to do something unclear and unecessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: Fix potential sleeping during mountDaniel J Blueman
During mount, bcachefs mount option processing may sleep while allocating a string buffer. Fix this by reference counting in order to take the atomic path. Signed-off-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: Fix iterator leak in may_delete_deleted_inode()Kent Overstreet
may_delete_deleted_inode() was returning without exiting a btree iterator, eventually causing propagate_key_to_snaphot_leaves() to go into an infinite loop hitting btree_trans_too_many_iters(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: Kill journal pre-reservationsKent Overstreet
This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Check for nonce offset inconsistency in data_update pathKent Overstreet
We've rarely been seeing a nonce offset inconsistency that doesn't show up in tests: this adds some extra verification code to the data update path that prints out more relevant info when it occurs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Make sure to drop/retake btree locks before reclaimKent Overstreet
We really don't want to be invoking memory reclaim with btree locks held: even aside from (solvable, but tricky) recursion issues, it can cause painful to diagnose performance edge cases. This fixes a recently reported issue in btree_key_can_insert_cached(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: Mateusz Guzik <mjguzik@gmail.com> Fixes: https://lore.kernel.org/linux-bcachefs/CAGudoHEsb_hGRMeWeXh+UF6po0qQuuq_NKSEo+s1sEb6bDLjpA@mail.gmail.com/T/
2023-11-13bcachefs: btree_trans->write_lockedKent Overstreet
As prep work for the next patch to fix a key cache reclaim issue, we need to start tracking whether we're currently holding write locks - so that we can release and retake the before calling into memory reclaim. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Run btree key cache shrinker less aggressivelyKent Overstreet
The btree key cache maintains lists of items that have been freed, but can't yet be reclaimed because a bch2_trans_relock() call might find them - we're waiting for SRCU readers to release. Previously, we wouldn't count these items against the number we're attempting to scan for, which would mean we'd evict more live key cache entries - doing quite a bit of potentially unecessary work. With recent work to make sure we don't hold SRCU locks for too long, it should be safe to count all the items on the freelists against number to scan - even if we can't reclaim them yet, we will be able to soon. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Split out btree_key_cache_types.hKent Overstreet
More consistent organization. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Guard against insufficient devices to create stripesKent Overstreet
We can't create stripes if we don't have enough devices - this manifested as an integer underflow bug later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Fix null ptr deref in bch2_backpointer_get_node()Kent Overstreet
bch2_btree_iter_peek_node() can return a NULL ptr (when the tree is shorter than the search depth); handle this with an early return. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: https://lore.kernel.org/linux-bcachefs/5fc3c28b-c232-4ec7-b0ac-4ef220ddf976@moroto.mountain/T/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Fix multiple -Warray-bounds warningsGustavo A. R. Silva
Transform zero-length array `entries` into a proper flexible-array member in `struct journal_seq_blacklist_table`; and fix the following -Warray-bounds warnings: fs/bcachefs/journal_seq_blacklist.c:148:26: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:150:30: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:154:27: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:176:27: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:177:27: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:297:34: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:298:34: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:300:31: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] This results in no differences in binary output. This helps with the ongoing efforts to globally enable -Warray-bounds. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Use DECLARE_FLEX_ARRAY() helper and fix multiple -Warray-bounds ↵Gustavo A. R. Silva
warnings Transform zero-length array `s` into a proper flexible-array member in `struct snapshot_table` via the DECLARE_FLEX_ARRAY() helper; and fix tons of the following -Warray-bounds warnings: fs/bcachefs/snapshot.h:36:21: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] fs/bcachefs/snapshot.h:36:21: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] fs/bcachefs/snapshot.c:135:70: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] fs/bcachefs/snapshot.h:36:21: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] fs/bcachefs/snapshot.h:36:21: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] fs/bcachefs/snapshot.h:36:21: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] This helps with the ongoing efforts to globally enable -Warray-bounds. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Use correct fgf_t type as function argumentKent Overstreet
This quiets a sparse complaint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: make bch2_target_to_text_sb staticJiapeng Chong
The bch2_target_to_text_sb are not used outside the file disk_groups.c, so the modification is defined as static. fs/bcachefs/disk_groups.c:583:6: warning: no previous prototype for ‘bch2_target_to_text_sb’. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7144 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-11-07Merge tag 'bcachefs-2023-11-5' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull more bcachefs updates from Kent Overstreet: "Here's the second big bcachefs pull request. This brings your tree up to date with my master branch, which is what existing bcachefs users are currently running. New features: - rebalance_work btree (and metadata version 1.3): the rebalance thread no longer has to scan to find extents that need processing - big scalability improvement. - sb_errors superblock section: this adds counters for each fsck error type, since filesystem creation, along with the date of the most recent error. It'll get us better bug reports (since users do not typically report errors that fsck was able to fix), and I might add telemetry for this in the future. Fixes include: - multiple snapshot deletion fixes - members_v2 fixups - deleted_inodes btree fixes - copygc thread no longer spins when a device is full but has no fragmented buckets (i.e. rebalance needs to move data around instead) - a fix for a memory reclaim issue with the btree key cache: we're now careful not to hold the srcu read lock that blocks key cache reclaim for too long - an early allocator locking fix, from Brian - endianness fixes, from Brian - CONFIG_BCACHEFS_DEBUG_TRANSACTIONS no longer defaults to y, a big performance improvement on multithreaded workloads" * tag 'bcachefs-2023-11-5' of https://evilpiepirate.org/git/bcachefs: (70 commits) bcachefs: Improve stripe checksum error message bcachefs: Simplify, fix bch2_backpointer_get_key() bcachefs: kill thing_it_points_to arg to backpointer_not_found() bcachefs: bch2_ec_read_extent() now takes btree_trans bcachefs: bch2_stripe_to_text() now prints ptr gens bcachefs: Don't iterate over journal entries just for btree roots bcachefs: Break up bch2_journal_write() bcachefs: Replace ERANGE with private error codes bcachefs: bkey_copy() is no longer a macro bcachefs: x-macro-ify inode flags enum bcachefs: Convert bch2_fs_open() to darray bcachefs: Move __bch2_members_v2_get_mut to sb-members.h bcachefs: bch2_prt_datetime() bcachefs: CONFIG_BCACHEFS_DEBUG_TRANSACTIONS no longer defaults to y bcachefs: Add a comment for BTREE_INSERT_NOJOURNAL usage bcachefs: rebalance_work btree is not a snapshots btree bcachefs: Add missing printk newlines bcachefs: Fix recovery when forced to use JSET_NO_FLUSH journal entry bcachefs: .get_parent() should return an error pointer bcachefs: Fix bch2_delete_dead_inodes() ...
2023-11-05bcachefs: Improve stripe checksum error messageKent Overstreet
We now include the name of the device in the error message - and also increment the number of checksum errors on that device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05bcachefs: Simplify, fix bch2_backpointer_get_key()Kent Overstreet
- backpointer_not_found() checks backpointers_no_use_write_buffer, no need to do it inbackpointer_get_key(). - always use backpointer_get_node() for pointers to nodes: backpointer_get_key() was sometimes returning the key from the root node unlocked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05bcachefs: kill thing_it_points_to arg to backpointer_not_found()Kent Overstreet
This can be calculated locally. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05bcachefs: bch2_ec_read_extent() now takes btree_transKent Overstreet
We're not supposed to have more than one btree_trans at a time in a given thread - that causes recursive locking deadlocks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>