summaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)Author
18 hoursbcachefs: Add missing snapshots_seen_add_inorder()Kent Overstreet
This fixes an infinite loop when repairing "extent past end of inode", when the extent is an older snapshot than the inode that needs repair. Without the snaphsots_seen_add_inorder() we keep trying to delete the same extent, even though it's no longer visible in the inode's snapshot. Fixes: 63d6e9311999 ("bcachefs: bch2_fpunch_snapshot()") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 hoursbcachefs: Fix write buffer flushing from open journal entryKent Overstreet
When flushing the btree write buffer, we pull write buffer keys directly from the journal instead of letting the journal write path copy them to the write buffer. When flushing from the currently open journal buffer, we have to block new reservations and wait for outstanding reservations to complete. Recheck the reservation state after blocking new reservations: previously, we were checking the reservation count from before calling __journal_block(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 daysbcachefs: btree_node_scan: don't re-read before initializing found_btree_nodeKent Overstreet
If the btree node is encrypted, this caused us to initialize found_btree_node from the encrypted header. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 daysbcachefs: Fix bch2_maybe_casefold() when CONFIG_UTF8=nKent Overstreet
maybe_casefold() shouldn't have been nooped, just bch2_casefold(). Fixes: 94426e4201fb ("bcachefs: opts.casefold_disabled") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 daysbcachefs: Fix build when CONFIG_UNICODE=nKent Overstreet
94426e4201fb, which added the killswitch for casefolding, accidentally removed some of the ifdefs we need to avoid build errors. It appears we need better build testing for different configurations, it took two weeks for the robots to catch this one. Fixes: 94426e4201fb ("bcachefs: opts.casefold_disabled") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 daysbcachefs: Fix reference to invalid bucket in copygcKent Overstreet
Use bch2_dev_bucket_tryget() instead of bch2_dev_tryget() before checking the bucket bitmap. Reported-by: syzbot+3168625f36f4a539237e@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 daysbcachefs: Don't build aux search tree when still repairing nodeKent Overstreet
bch2_btree_node_drop_keys_outside_node() will (re)build aux search trees, because it's also called by topology repair. bch2_btree_node_read_done() was calling it before validating individual keys; invalid ones have to be dropped. If we call drop_keys_outside_node() first, then bch2_bset_build_aux_tree() doesn't run because the node already has an aux search tree - which was invalidated by the repair. Reported-by: syzbot+c5e7a66b3b23ae65d44f@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 daysbcachefs: Tweak threshold for allocator triggering discardsKent Overstreet
The allocator path has a "if we're really low on free buckets, check if we should issue discards" - tweak this to also trigger discards if more than 1/128th of the device is in need_discard state. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 daysbcachefs: Fix triggering of discard by the journal pathKent Overstreet
It becomes possible to do discards after a journal flush, which naturally the journal code is reponsible for. A prior refactoring seems to have broken this - which went unnoticed because the foreground allocator path can also trigger discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
12 daysbcachefs: io_read: remove from async obj list in rbio_done()Kent Overstreet
Previously, only split rbios allocated in io_read.c would be removed from the async obj list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-08bcachefs: Don't set BCH_FS_error on transaction restartKent Overstreet
This started showing up more when we started logging the error being corrected in the journal - but __bch2_fsck_err() could return transaction restarts before that. Setting BCH_FS_error incorrectly causes recovery passes to not be cleared, among other issues. Fixes: b43f72492768 ("bcachefs: Log fsck errors in the journal") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-07bcachefs: Fix additional misalignment in journal space calculationsKent Overstreet
Additional fix on top of f54b2a80d0df bcachefs: Fix misaligned bucket check in journal space calculations Make sure that when we calculate space for the next entry it's not misaligned: we need to round_down() to filesystem block size in multiple places (next entry size calculation as well as total space available). Reported-by: Ondřej Kraus <neverberlerfellerer@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-07bcachefs: Don't schedule non persistent passes persistentlyKent Overstreet
if (!(in_recovery && (flags & RUN_RECOVERY_PASS_nopersistent))) should have been if (!in_recovery && !(flags & RUN_RECOVERY_PASS_nopersistent))) But the !in_recovery part was also wrong: the assumption is that if we're in recovery we'll just rewind and run the recovery pass immediately, but we're not able to do so if we've already gone RW and the pass must be run before we go RW. In that case, we need to schedule it in the superblock so it can be run on the next mount attempt. Scheduling it persistently is fine, because it'll be cleared in the superblock immediately when the pass completes successfully. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-05bcachefs: Fix bch2_btree_transactions_read() synchronizationKent Overstreet
Since we're accessing btree_trans objects owned by another thread, we need to guard against using pointers to freed key cache entries: we need our own srcu read lock, and we should skip a btree_trans if it didn't hold the srcu lock (and thus it might have pointers to freed key cache entries). 00693 Mem abort info: 00693 ESR = 0x0000000096000005 00693 EC = 0x25: DABT (current EL), IL = 32 bits 00693 SET = 0, FnV = 0 00693 EA = 0, S1PTW = 0 00693 FSC = 0x05: level 1 translation fault 00693 Data abort info: 00693 ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 00693 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 00693 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 00693 user pgtable: 4k pages, 39-bit VAs, pgdp=000000012e650000 00693 [000000008fb96218] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 00693 Internal error: Oops: 0000000096000005 [#1] SMP 00693 Modules linked in: 00693 CPU: 0 UID: 0 PID: 4307 Comm: cat Not tainted 6.16.0-rc2-ktest-g9e15af94fd86 #27578 NONE 00693 Hardware name: linux,dummy-virt (DT) 00693 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) 00693 pc : six_lock_counts+0x20/0xe8 00693 lr : bch2_btree_bkey_cached_common_to_text+0x38/0x130 00693 sp : ffffff80ca98bb60 00693 x29: ffffff80ca98bb60 x28: 000000008fb96200 x27: 0000000000000007 00693 x26: ffffff80eafd06b8 x25: 0000000000000000 x24: ffffffc080d75a60 00693 x23: ffffff80eafd0000 x22: ffffffc080bdfcc0 x21: ffffff80eafd0210 00693 x20: ffffff80c192ff08 x19: 000000008fb96200 x18: 00000000ffffffff 00693 x17: 0000000000000000 x16: 0000000000000000 x15: 00000000ffffffff 00693 x14: 0000000000000000 x13: ffffff80ceb5a29a x12: 20796220646c6568 00693 x11: 72205d3e303c5b20 x10: 0000000000000020 x9 : ffffffc0805fb6b0 00693 x8 : 0000000000000020 x7 : 0000000000000000 x6 : 0000000000000020 00693 x5 : ffffff80ceb5a29c x4 : 0000000000000001 x3 : 000000000000029c 00693 x2 : 0000000000000000 x1 : ffffff80ef66c000 x0 : 000000008fb96200 00693 Call trace: 00693 six_lock_counts+0x20/0xe8 (P) 00693 bch2_btree_bkey_cached_common_to_text+0x38/0x130 00693 bch2_btree_trans_to_text+0x260/0x2a8 00693 bch2_btree_transactions_read+0xac/0x1e8 00693 full_proxy_read+0x74/0xd8 00693 vfs_read+0x90/0x300 00693 ksys_read+0x6c/0x108 00693 __arm64_sys_read+0x20/0x30 00693 invoke_syscall.constprop.0+0x54/0xe8 00693 do_el0_svc+0x44/0xc8 00693 el0_svc+0x18/0x58 00693 el0t_64_sync_handler+0x104/0x130 00693 el0t_64_sync+0x154/0x158 00693 Code: 910003fd f9423c22 f90017e2 d2800002 (f9400c01) 00693 ---[ end trace 0000000000000000 ]--- Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-05bcachefs: btree read retry fixesKent Overstreet
Fix btree node read retries after validate errors: __btree_err() is the wrong place to flag a topology error: that is done by btree_lost_data(). Additionally, some calls to bch2_bkey_pick_read_device() were not updated in the 6.16 rework for improved log messages; we were failing to signal that we still had a retry. Cc: Nikita Ofitserov <himikof@gmail.com> Cc: Alan Huang <mmpgouride@gmail.com> Reported-and-tested-by: Edoardo Codeglia <bcachefs@404.blue> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-05bcachefs: btree node scan no longer uses btree cacheKent Overstreet
Previously, btree node scan used the btree node cache to check if btree nodes were readable, but this is subject to interference from threads scanning different devices trying to read the same node - and more critically, nodes that we already attempted and failed to read before kicking off scan. Instead, we now allocate a 'struct btree' that does not live in the btree node cache, and call bch2_btree_node_read_done() directly. Cc: Nikita Ofitserov <himikof@gmail.com> Reviewed-by: Nikita Ofitserov <himikof@gmail.com> Reported-and-tested-by: Edoardo Codeglia <bcachefs@404.blue> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-04bcachefs: Tweak btree cache helpers for use by btree node scanKent Overstreet
btree node scan needs to not use the btree node cache: that causes interference from prior failed reads and parallel workers. Instead we need to allocate btree nodes that don't live in the btree cache, so that we can call bch2_btree_node_read_done() directly. This patch tweaks the low level helpers so they don't touch the btree cache lists. Cc: Nikita Ofitserov <himikof@gmail.com> Reviewed-by: Nikita Ofitserov <himikof@gmail.com> Reported-and-tested-by: Edoardo Codeglia <bcachefs@404.blue> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-04bcachefs: Fix btree for nonexistent tree depthKent Overstreet
The fix for when we should increase tree depth in journal replay was entirely bogus. We should only increase the tree depth in journal replay when recovery from btree node scan, and then only for keys found by btree node scan. This needs additional work - we should be shooting down existing interior node pointers when recovery from scan, they shouldn't be showing up here. Fixes: b47a82ff4772 ("bcachefs: Only run 'increase_depth' for keys from btree node csan") Cc: Alan Huang <mmpgouride@gmail.com> Reported-by: syzbot+8deb6ff4415db67a9f18@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-04bcachefs: Fix bch2_io_failures_to_text()Kent Overstreet
This wasn't updated when we added tracking for btree validate errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-04bcachefs: bch2_fpunch_snapshot()Kent Overstreet
Add a new version of fpunch for operating on a snapshot ID, not a subvolume - and use it for "extent past end of inode" repair. Previously, repair would try to delete everything at once, but deleting too many extents at once can overflow the btree_trans bump allocator, as well as causing other problems - the new helper properly uses bch2_extent_trim_atomic(). Reported-and-tested-by: Edoardo Codeglia <bcachefs@404.blue> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-01bcachefs: opts.casefold_disabledKent Overstreet
Add an option for completely disabling casefolding on a filesystem, as a workaround for overlayfs. This should only be needed as a temporary workaround, until the overlayfs fix arrives. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-01bcachefs: Work around deadlock to btree node rewrites in journal replayKent Overstreet
Don't mark btree nodes for rewrites, if they are or would be degraded, if journal replay hasn't finished, to avoid a deadlock. This is because btree node rewrites generate more updates for the interior updates (alloc, backpointers), and if those updates touch new nodes and generate more rewrites - we can only have so many interior btree updates in flight before we deadlock on open_buckets. The biggest cause is that we don't use the btree write buffer (for the backpointer updates - this needs some real thought on locking in order to fix. The problem with this workaround (not doing the rewrite for degraded nodes in journal replay) is that those degraded nodes persist, and we don't want that (this is a real bug when a btree node write completes with fewer replicas than we wanted and leaves a degraded node due to device _removal_, i.e. the device went away mid write). It's less of a bug here, but still a problem because we don't yet have a way of tracking degraded data - we another index (all extents/btree nodes, by replicas entry) in order to fix properly (re-replicate degraded data at the earliest possible time). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-30bcachefs: Fix incorrect transaction restart handlingAlan Huang
Reported-by: syzbot+cc7567f096079cb4146f@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-29bcachefs: fix btree_trans_peek_prev_journal()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-27bcachefs: mark invalid_btree_id autofixBharadwaj Raju
Checking for invalid IDs was introduced in 9e7cfb35e266 ("bcachefs: Check for invalid btree IDs") to prevent an invalid shift later, but since 141526548052 ("bcachefs: Bad btree roots are now autofix") which made btree_root_bkey_invalid autofix, the fsck_err_on call didn't do anything. We can mark this err type (invalid_btree_id) autofix as well, so it gets handled. Reported-by: syzbot+029d1989099aa5ae3e89@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=029d1989099aa5ae3e89 Fixes: 141526548052 ("bcachefs: Bad btree roots are now autofix") Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-26bcachefs: Plumb correct ip to trans_relock_fail tracepointKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-26bcachefs: Ensure we rewind to run recovery passesKent Overstreet
Fix a 6.16 regression from the recovery pass rework, which introduced a bug where calling bch2_run_explicit_recovery_pass() would only return the error code to rewind recovery for the first call that scheduled that recovery pass. If the error code from the first call was swallowed (because it was called by an asynchronous codepath), subsequent calls would go "ok, this pass is already marked as needing to run" and return 0. Fixing this ensures that check_topology bails out to run btree_node_scan before doing any repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-26bcachefs: Ensure btree node scan runs before checking for scanned nodesKent Overstreet
Previously, calling bch2_btree_has_scanned_nodes() when btree node scan hadn't actually run would erroniously return false - causing us to think a btree was entirely gone. This fixes a 6.16 regression from moving the scheduling of btree node scan out of bch2_btree_lost_data() (fixing the bug where we'd schedule it persistently in the superblock) and only scheduling it when check_toploogy() is asking for scanned btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-26bcachefs: btree_root_unreadable_and_scan_found_nothing should not be autofixKent Overstreet
Autofix is specified in btree_gc.c if it's not an important btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24bcachefs: fix bch2_journal_keys_peek_prev_min() underflowKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24bcachefs: Use wait_on_allocator() when allocating journalKent Overstreet
wait_on_allocator() emits debug info when we hang trying to allocate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24bcachefs: Check for bad write buffer key when moving from journalKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blockedAlan Huang
Reported-by: syzbot+d540192e763531d307ff@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-22bcachefs: Fix range in bch2_lookup_indirect_extent() error pathKent Overstreet
Before calling bch2_indirect_extent_missing_error(), we have to calculate the missing range, which is the intersection of the reflink pointer and the non-indirect-extent we found. The calculation didn't take into account that the returned extent may span the iter position, leading to an infinite loop when we (unnecessarily) resized the extent we were returning to one that didn't extend past the offset we were looking up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-22bcachefs: fix spurious error_throwKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-22bcachefs: Add missing bch2_err_class() to fileattr_set()Kent Overstreet
Make sure we return a standard error code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-19bcachefs: Add missing key type checks to check_snapshot_exists()Kent Overstreet
For now we only have one key type in these btrees, but forward compatibility means we do have to check. Reported-by: syzbot+b4cb4a6988aced0cec4b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-19bcachefs: Don't log fsck err in the journal if doing repair elsewhereKent Overstreet
This fixes exceeding the bump allocator limit when the allocator finds many buckets that need repair - they're repaired asynchronously, which means that every error logged a message in the bump allocator, without committing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-19bcachefs: Fix *__bch2_trans_subbuf_alloc() error pathKent Overstreet
Don't change buf->size on error - this would usually be a transaction restart, but it could also be -ENOMEM - when we've exceeded the bump allocator max). Fixes: 247abee6ae6d ("bcachefs: btree_trans_subbuf") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-17bcachefs: Fix missing newlines before eroKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-17bcachefs: fix spurious error in read_btree_roots()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-17bcachefs: fsck: Fix oops in key_visible_in_snapshot()Kent Overstreet
The normal fsck code doesn't call key_visible_in_snapshot() with an empty list of snapshot IDs seen (the current snapshot ID will always be on the list), but str_hash_repair_key() -> bch2_get_snapshot_overwrites() can, and that's totally fine as long as we check for it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-17bcachefs: fsck: fix unhandled restart in topology repairKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-17bcachefs: fsck: Fix check_directory_structure when no check_direntsKent Overstreet
check_directory_structure runs after check_dirents, so it expects that it won't see any inodes with missing backpointers - normally. But online fsck can't run check_dirents yet, or the user might only be running a specific pass, so we need to be careful that this isn't an error. If an inode is unreachable, that's handled by a separate pass. Also, add a new 'bch2_inode_has_backpointer()' helper, since we were doing this inconsistently. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-17bcachefs: Fix restart handling in btree_node_scrub_work()Kent Overstreet
btree node scrub was sometimes failing to rewrite nodes with errors; bch2_btree_node_rewrite() can return a transaction restart and we weren't checking - the lockrestart_do() needs to wrap the entire operation. And there's a better helper it should've been using, bch2_btree_node_rewrite_key(), which makes all this more convenient. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16bcachefs: Fix bch2_read_bio_to_text()Kent Overstreet
We can only pass negative error codes to bch2_err_str(); if it's a positive integer it's not an error and we trip an assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16bcachefs: fsck: Fix check_path_loop() + snapshotsKent Overstreet
A path exists in a particular snapshot: we should do the pathwalk in the snapshot ID of the inode we started from, _not_ change snapshot ID as we walk inodes and dirents. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16bcachefs: fsck: check_subdir_count logs pathKent Overstreet
We can easily go from inode number -> path now, which makes for more useful log messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16bcachefs: fsck: additional diagnostics for reattach_inode()Kent Overstreet
Log the inode's new path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16bcachefs: fsck: check_directory_structure runs in reverse orderKent Overstreet
When we find a directory connectivity problem, we should do the repair in the oldest snapshot that has the issue - so that we don't end up duplicating work or making a real mess of things. Oldest snapshot IDs have the highest integer value, so - just walk inodes in reverse order. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>