summaryrefslogtreecommitdiff
path: root/fs/bcachefs/move.c
AgeCommit message (Collapse)Author
2025-06-02bcachefs: bch_err_throw()Kent Overstreet
Add a tracepoint for any time we return an error and unwind. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01bcachefs: Replace rcu_read_lock() with guardsKent Overstreet
The new guard(), scoped_guard() allow for more natural code. Some of the uses with creative flow control have been left. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31bcachefs: Don't unlock trans before data_update_init()Kent Overstreet
data_update_init() does need to do btree operations, delay doing the unlock-before-io. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30bcachefs: Catch data_update_done events in trace_io_move_start_failKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30bcachefs: io_move_evacuate_bucket tracepoint, counterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30bcachefs: trace_io_move_predKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: async objs now support bch_write_opsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_check_bucket_backpointer_mismatch()Kent Overstreet
Detect buckets with missing backpointers, and run repair on demand. __bch2_move_data_phys() now calls bch2_check_bucket_backpointer_mismatch() as it walks buckets, which checks for missing backpointers by comparing backpointers against bucket sector counts. When missing backpointers are detected, we kick off bch2_check_extents_to_backpointers() asynchronously - right away if we're trying to evacuate, or with a threshold if we're just running copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: do_rebalance_scan() now only updates bch_extent_rebalanceKent Overstreet
This ensures that our pending rebalance work accounting is accurate quickly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: kill move_bucket_in_flightKent Overstreet
Small cleanup/simplification, and prep work for the next patch, which will add checking if buckets don't get evacuated because they're missing backpointers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: kill dead code in move_data_phys()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_move_data_btree() can now walk rootsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_move_data_btree() can move btree nodesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: plumb btree_id through move_pred_fdKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Plumb target parameter through btree_node_rewrite_pos()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: export bch2_move_data_phys()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: move_data_phys: stats are not requiredKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Data move can read from poisoned extentsKent Overstreet
Now, if an extent is poisoned we can move it even if there was a checksum error. We'll have to give it a new checksum, but the poison bit means that userspace will still see the appropriate error when they try to read it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-07bcachefs: Filter out harmless EROFS error messagesKent Overstreet
These just indicate that we're shutting down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02bcachefs: Kill btree_iter.transKent Overstreet
This was planned to be done ages ago, now finally completed; there are places where we have quite a few btree_trans objects on the stack, so this reduces stack usage somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-26bcachefs: Fix btree iter flags in data move (2)Kent Overstreet
Data move -> move_get_io_opts -> bch2_get_update_rebalance_opts requires a not_extents iterator; this fixes the path where we're walking the extents btree and chase a reflink pointer into the reflink btree. bch2_lookup_indirect_extent() requires working with an extents iterator (due to peek_slot() semantics), so we implement bch2_lookup_indirect_extent_for_move(). This is simplified because there's no need to report indirect_extent_missing_errors here, that can be deferred until fsck or when a user reads that data. Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Fix btree iter flags in data moveKent Overstreet
Rebalance requires a not_extents iterator. This wasn't hit before because all_snapshots disableds is_extents on snapshots btrees - but has no effect on the reflink btree. Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Fix offset_into_extent in data move pathKent Overstreet
Fixes the following: [ 17.607394] kernel BUG at fs/bcachefs/reflink.c:261! [ 17.608316] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 17.608485] CPU: 0 UID: 0 PID: 564 Comm: bch-rebalance/3 Tainted: G OE 6.14.0-rc6-arch1-gfcb0bd9609d2 #7 0efd7a8f4a00afeb2c5fb6e7ecb1aec8ddcbb1e1 [ 17.608616] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 17.608736] Hardware name: Micro-Star International Co., Ltd. MS-7D75/MAG B650 TOMAHAWK WIFI (MS-7D75), BIOS 1.74 08/01/2023 [ 17.608855] RIP: 0010:bch2_lookup_indirect_extent+0x252/0x290 [bcachefs] [ 17.609006] Code: 00 00 00 00 e8 7f 51 f5 ff 89 c3 85 c0 74 52 48 8b 7d b0 4c 89 ee e8 4d 4b f4 ff 48 63 d3 48 89 d0 31 d2 e9 2e ff ff ff 0f 0b <0f> 0b 48 8b 7d b0 4c 89 ee 48 89 55 a8 e8 2c 4b f4 ff 4c 8b 55 a8 [ 17.609136] RSP: 0018:ffffa3714455f850 EFLAGS: 00010246 [ 17.609261] RAX: 0000000000000080 RBX: ffff895891098790 RCX: 0000000000000000 [ 17.609387] RDX: 0000000000000080 RSI: ffffa3714455fa90 RDI: ffff895889550000 [ 17.609511] RBP: ffffa3714455f8c0 R08: ffff895891098790 R09: 0000000000000001 [ 17.609637] R10: ffffa3714455f8d8 R11: ffffa3714455f950 R12: ffffa3714455fa58 [ 17.609763] R13: ffff895891098790 R14: ffffa3714455fa58 R15: ffff895889550000 [ 17.609888] FS: 0000000000000000(0000) GS:ffff896757c00000(0000) knlGS:0000000000000000 [ 17.610015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 17.610143] CR2: 0000716b8cda2750 CR3: 0000000914e22000 CR4: 0000000000f50ef0 [ 17.610272] PKRU: 55555554 [ 17.610403] Call Trace: [ 17.610535] <TASK> [ 17.610662] ? __die_body.cold+0x19/0x27 [ 17.610791] ? die+0x2e/0x50 [ 17.610918] ? do_trap+0xca/0x110 [ 17.611049] ? do_error_trap+0x6a/0x90 [ 17.611178] ? bch2_lookup_indirect_extent+0x252/0x290 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.611331] ? exc_invalid_op+0x50/0x70 [ 17.611468] ? bch2_lookup_indirect_extent+0x252/0x290 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.611620] ? asm_exc_invalid_op+0x1a/0x20 [ 17.611757] ? bch2_lookup_indirect_extent+0x252/0x290 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.611911] ? bch2_move_data_btree+0x58a/0x6c0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612084] bch2_move_data_btree+0x58a/0x6c0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612256] ? __pfx_rebalance_pred+0x10/0x10 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612431] ? bch2_move_extent+0x3d7/0x6e0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612607] ? __bch2_move_data+0xea/0x200 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612782] __bch2_move_data+0xea/0x200 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612959] ? __pfx_rebalance_pred+0x10/0x10 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.613149] do_rebalance+0x517/0x8d0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.613342] ? local_clock_noinstr+0xd/0xd0 [ 17.613518] ? local_clock+0x15/0x30 [ 17.613693] ? __bch2_trans_get+0x152/0x300 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.613890] ? __pfx_bch2_rebalance_thread+0x10/0x10 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.614090] bch2_rebalance_thread+0x66/0xb0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] The offset_into_extent bit was copied from the read path, but it's unnecessary here, where we always want to read and move the entire indirect extent, and it causes the assertion pop - because we're using a non-extents iterator, which always points to the end of the reflink pointer. Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: trace_io_move_write_failKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: BCH_READ_data_update -> bch_read_bio.data_updateKent Overstreet
Read flags are codepath dependent and change as they're passed around, while the fields in rbio._state are mostly fixed properties of that particular object. Losing track of BCH_READ_data_update would be bad, and previously it was not obvious if it was always correctly set in the rbio, so this is a safety cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: BCH_ERR_data_read_buffer_too_smallKent Overstreet
Now that the read path uses proper error codes, we can get rid of the weird rbio->hole signalling to the move path that the read didn't happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: rebalance, copygc status also print stacktraceKent Overstreet
These are commonly needed when debugging, and saves from having to ask users to dig. Also, rebalance_status now includes pending rebalance work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Kill a bit of dead codeKent Overstreet
Found with CC=clang W=1 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: bcachefs_metadata_version_stripe_backpointersKent Overstreet
Stripes now have backpointers. This is needed for proper scrub - stripe checksums need to be verified, separately from extents within the stripe, since a block may not be full of live extents but it's still needed for reconstruct. And this will be needed for (efficient) evacuate/repair paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Convert migrate to move_data_phys()Kent Overstreet
Iterating over backpointers on a specific device is potentially much cheaper than walking all filesystem data. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Read/move path counter workKent Overstreet
Reorganize counters a bit, grouping related counters together. New counters: - io_read_inline - io_read_hole Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: ScrubKent Overstreet
Add a new data op to walk all data and metadata in a filesystem, checking if it can be read successfully, and on error repairing from another copy if possible. - New helper: bch2_dev_idx_is_online(), so that we can bail out and report to userspace when we're unable to scrub because the device is offline - data_update_opts, which controls the data move path, now understands scrub: data is only read, not written. The read path is responsible for rewriting on read error, as with other reads. - scrub_pred skips data extents that don't have checksums - bch_ioctl_data has a new scrub member, which has a data_types field for data types to check - i.e. all data types, or only metadata. - Add new entries to bch_move_stats so that we can report numbers for corrected and uncorrected errors - Add a new enum to bch_ioctl_data_event for explicitly reporting completion and return code (i.e. device offline) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: __bch2_move_data_phys() now uses bch2_btree_node_rewrite_pos()Kent Overstreet
Kill most of the separate logic for btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: bch2_move_data_phys()Kent Overstreet
Add a more general version of bch2_evacuate_bucket - to be used for scrub. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: cleanup redundant code around data_update_op initializationKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: bch2_update_unwritten_extent() no longer depends on wbioKent Overstreet
Prep work for improving bch2_data_update_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: data_update now embeds bch_read_bioKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: rbio_init() cleanupKent Overstreet
Move more initialization to rbio_init(), to assist in further cleanups. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: x-macroize BCH_READ flagsKent Overstreet
Will be adding a bch2_read_bio_to_text(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: bch2_data_update_inflight_to_text()Kent Overstreet
Add a new helper for bch2_moving_ctxt_to_text(), which may be used to debug if moving_ios are getting stuck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-20Merge tag 'for-6.14/block-20250118' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - NVMe pull requests via Keith: - Target support for PCI-Endpoint transport (Damien) - TCP IO queue spreading fixes (Sagi, Chaitanya) - Target handling for "limited retry" flags (Guixen) - Poll type fix (Yongsoo) - Xarray storage error handling (Keisuke) - Host memory buffer free size fix on error (Francis) - MD pull requests via Song: - Reintroduce md-linear (Yu Kuai) - md-bitmap refactor and fix (Yu Kuai) - Replace kmap_atomic with kmap_local_page (David Reaver) - Quite a few queue freeze and debugfs deadlock fixes Ming introduced lockdep support for this in the 6.13 kernel, and it has (unsurprisingly) uncovered quite a few issues - Use const attributes for IO schedulers - Remove bio ioprio wrappers - Fixes for stacked device atomic write support - Refactor queue affinity helpers, in preparation for better supporting isolated CPUs - Cleanups of loop O_DIRECT handling - Cleanup of BLK_MQ_F_* flags - Add rotational support for null_blk - Various fixes and cleanups * tag 'for-6.14/block-20250118' of git://git.kernel.dk/linux: (106 commits) block: Don't trim an atomic write block: Add common atomic writes enable flag md/md-linear: Fix a NULL vs IS_ERR() bug in linear_add() block: limit disk max sectors to (LLONG_MAX >> 9) block: Change blk_stack_atomic_writes_limits() unit_min check block: Ensure start sector is aligned for stacking atomic writes blk-mq: Move more error handling into blk_mq_submit_bio() block: Reorder the request allocation code in blk_mq_submit_bio() nvme: fix bogus kzalloc() return check in nvme_init_effects_log() md/md-bitmap: move bitmap_{start, end}write to md upper layer md/raid5: implement pers->bitmap_sector() md: add a new callback pers->bitmap_sector() md/md-bitmap: remove the last parameter for bimtap_ops->endwrite() md/md-bitmap: factor behind write counters out from bitmap_{start/end}write() md: Replace deprecated kmap_atomic() with kmap_local_page() md: reintroduce md-linear partitions: ldm: remove the initial kernel-doc notation blk-cgroup: rwstat: fix kernel-doc warnings in header file blk-cgroup: fix kernel-doc warnings in header file nbd: fix partial sending ...
2024-12-29bcachefs: Option changes now get propagated to reflinked dataKent Overstreet
Now that bch2_move_get_io_opts() re-propagates changed inode io options to bch_extent_rebalance, we can properly suport changing IO path options for reflinked data. Changing a per-file IO path option, either via the xattr interface or via the BCHFS_IOC_REINHERIT_ATTRS ioctl, will now trigger a scan (the inode number is marked as needing a scan, via bch2_set_rebalance_needs_scan()), and rebalance will use bch2_move_data(), which will walk the inode number and pick up the new options. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: Add write buffer flush param to backpointer_get_key()Kent Overstreet
In an upcoming patch bch2_backpointer_get_key() will be repairing when it finds a dangling backpointer; it will need to flush the btree write buffer before it can definitively say there's an error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-23block: Delete bio_set_prio()John Garry
Since commit 43b62ce3ff0a ("block: move bio io prio to a new field"), macro bio_set_prio() does nothing but set bio->bi_ioprio. All other places just set bio->bi_ioprio directly, so replace bio_set_prio() remaining callsites with setting bio->bi_ioprio directly and delete that macro. Signed-off-by: John Garry <john.g.garry@oracle.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241202111957.2311683-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-21bcachefs: Don't BUG_ON() inode unpack errorKent Overstreet
Bkey validation checks that inodes are well-formed and unpack successfully, so an unpack error should always indicate memory corruption or some other kind of hardware bug - but these are still errors we can recover from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Advance to next bp on BCH_ERR_backpointer_to_overwritten_btree_nodeKent Overstreet
Don't spin. Fixes: de95cc201a97 ("bcachefs: Kill bch2_get_next_backpointer()") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bucket_pos_to_bp_end()Kent Overstreet
Better helpers for iterating over backpointers within a specific bucket Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Fix evacuate_bucket tracepointKent Overstreet
86a494c8eef9 ("bcachefs: Kill bch2_get_next_backpointer()") dropped some things the tracepoint emitted because bch2_evacuate_bucket() no longer looks at the alloc key - but we did want at least some of that. We still no longer look at the alloc key so we can't report on the fragmentation number, but that's a direct function of dirty_sectors and a copygc concern anyways - copygc should get its own tracepoint that includes information from the fragmentation LRU. But we can report on the number of sectors we moved and the bucket size. Co-developed-by: Piotr Zalewski <pZ010001011111@proton.me> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Kill bch2_get_next_backpointer()Kent Overstreet
Since for quite some time backpointers have only been stored in the backpointers btree, not alloc keys (an aborted experiment, support for which has been removed) - we can replace get_next_backpointer() with simple btree iteration. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Fix unhandled transaction restart in evacuate_bucket()Kent Overstreet
Generally, releasing a transaction within a transaction restart means an unhandled transaction restart: but this can happen legitimately within the move code, e.g. when bch2_move_ratelimit() tells us to exit before we've retried. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>