Age | Commit message (Collapse) | Author |
|
fsck_err() can return a transaction restart if passed a transaction
object - this has always been true when it has to drop locks to prompt
for user input, but we're seeing this more now that we're logging the
error being corrected in the journal.
gc_accounting_done() doesn't call fsck_err() from an actual commit loop,
and it doesn't need to be holding btree locks when it calls fsck_err(),
so the easy fix here for the unhandled transaction restart is to just
not pass it the transaction object. We'll miss out on the fancy new
logging, but that's ok.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a tracepoint for any time we return an error and unwind.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The new guard(), scoped_guard() allow for more natural code.
Some of the uses with creative flow control have been left.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It used to be that we had a fixed maximum number of btree paths to work
with - 64.
That's no longer the case, so bch2_extent_atomic_end() doesn't have to
be as strict.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Accounting has gotten quite heavy, and there's lots of redundancy in
accounting updates within a transaction, as we often add/delete multiple
extents that touch the same accountign counters.
This will reduce the amount of data that we journal, and reduce pressure
downstream on the btree write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There can be a lot of rendundancy in accounting updates within a single
btree transaction.
Split out accounting updates so that they can be deduped, in the next
commit.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Refactor a couple of structs that contain flexible arrays in the
middle by replacing them with unions.
So, with these changes, fix the following warnings:
fs/bcachefs/disk_accounting.c:429:51: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
fs/bcachefs/ec_types.h:8:41: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This pops up when buliding in userspace.
The structs aren't actually variable length, but no way to tell the
compiler that...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Before invoking bch2_accounting_mem_mod_locked in
bch2_gc_accounting_done, we already write locked mark_lock,
in bch2_accounting_mem_insert, we lock mark_lock again.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This was planned to be done ages ago, now finally completed; there are
places where we have quite a few btree_trans objects on the stack, so
this reduces stack usage somewhat.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Not all compilers fully initialize these - they're not guaranteed to
because of the union shenanigans.
Fixes: https://github.com/koverstreet/bcachefs/issues/844
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add the new helper printbuf_indent_add_nextline(), and use it in
__bch2_fsck_err() to centralize setting the indentation of multiline
fsck errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We weren't checking that accounting keys have the expected number of
accounters. Originally we probably wanted to be flexible on this, but it
doesn't look like that will be required - accounting is extended by
adding new counter types, not more counters to an existing type.
This means we can drop a BUG_ON() that popped once in automated testing,
and the new validation will make that bug easier to track down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Locking considerations (possibly no longer relevant?) mean that when an
accounting update needs a new superblock replicas entry to be created,
it's deferred to the transaction commit error path.
But accounting updates for gc/fcsk aren't done from the transaction
commit path - so we need to handle
-BCH_ERR_btree_insert_need_mark_replicas locally.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix sort order for disk accounting keys, in order to fix a regression on
mount times.
The typetag is now the most significant byte of the key, meaning disk
accounting keys of the same type now sort together.
This lets us skip over disk accounting keys that aren't mirrored in
memory when reading accounting at startup, instead of having them
interleaved with other counter types.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since we added per-inode counters there's now far too many counters to
show in one shot - if we want this in the future, it'll have to be in
debugfs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a tracepoint for inserting new accounting entries: we're seeing odd
spinning behaviour in accounting read.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The validate late path was iterating over accounting entries in
eytzinger order, which is unnecessarily tricky when we may have to
remove entries.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new parameter to bkey validate functions, and use it to improve
invalid bkey error messages: we can now print the btree and depth it
came from, or if it came from the journal, or is a btree root.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
6.11 had a bug where we'd sometimes create disk accounting keys with
version 0, which causes issues for journal replay - but we don't need to
delete existing accounting keys with version 0.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we rewind recovery to run topology repair, that causes
accounting_read to run twice.
This fixes accounting being double counted.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Accounting keys that reference invalid devices are corrected by fsck,
they shouldn't cause an emergency shutdown.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
this showed up when building in userspace
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prefer bch2_btree_id_to_text() - it prints out the integer ID when
unknown.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Using commit_do() to call alloc_sectors_start_trans() breaks when we're
randomly injecting transaction restarts - the restart in the commit
causes us to leak the lock that alloc_sectorS_start_trans() takes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- Fix failure to validate that accounting replicas entries point to
valid devices: this wasn't a real bug since they'd be cleaned up by
GC, but is still something we should know about
- Fix failure to validate that dev_data_type entries point to valid
devices: this does fix a real bug, since bch2_accounting_read() would
then try to copy the counters to that device and pop an inconsistent
error when the device didn't exist
- Remove accounting entries that are zeroed or invalid: if we're not
validating them we need to get rid of them: they might not exist in
the superblock, so we need the to trigger the superblock mark path
when they're readded.
This fixes the replication.ktest rereplicate test, which was failing
with "superblock not marked for replicas..."
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We had a bug where disk accounting keys didn't always have their version
field set in journal replay; change the BUG_ON() to a WARN(), and
exclude this case since it's now checked for elsewhere (in the bkey
validate function).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
give bversions a more distinct name, to aid in grepping
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes the following bug, where a disk accounting key has an invalid
replicas entry, and we attempt to add it to the superblock:
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): starting version 1.12: rebalance_work_acct_fix opts=metadata_replicas=2,data_replicas=2,foreground_target=ssd,background_target=hdd,nopromote_whole_extents,verbose,fsck,fix_errors=yes
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): recovering from clean shutdown, journal seq 15211644
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): accounting_read...
accounting not marked in superblock replicas
replicas cached: 1/1 [0], fixing
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0]
replicas_v0 (size 88):
user: 2 [3 5] user: 2 [1 4] cached: 1 [2] btree: 2 [1 2] user: 2 [2 5] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 2 [1 2] user: 2 [2 3] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 3] user: 2 [1 5] user: 2 [2 4]
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): inconsistency detected - emergency read only at journal seq 15211644
accounting not marked in superblock replicas
replicas user: 1/1 [3], fixing
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0]
replicas_v0 (size 96):
user: 2 [3 5] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5]
accounting not marked in superblock replicas
replicas user: 1/2 [3 7], fixing
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 7 in entry user: 1/2 [3 7]
replicas_v0 (size 96):
user: 2 [3 7] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5] user: 2 [3 5]
done
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): alloc_read... done
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): stripes_read... done
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): snapshots_read... done
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
accounting read was checking if accounting replicas entries were marked
in the superblock prior to applying accounting from the journal,
which meant that a recently removed device could spuriously trigger a
"not marked in superblocked" error (when journal entries zero out the
offending counter).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Minor refactoring - replace multiple bool arguments with an enum; prep
work for fixing a bug in accounting read.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds another disk accounting counter to track usage per inode
number (any snapshot ID).
This will be used for a couple things:
- It'll give us a way to tell the user how much space a given file ista
consuming in all snapshots; i.e. how much extra space it's consuming
due to snapshot versioning.
- It counts number of extents and total size of extents (both in btree
keyspace sectors and actual disk usage), meaning it gives us average
extent size: that is, it'll let us cheaply find fragmented files that
should be defragmented.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The next patch will be adding a disk accounting counter type which is
not kept in the in-memory eytzinger tree.
As prep, fold __bch2_accounting_mem_mod() into
bch2_accounting_mem_mod_locked() so that we can check for that counter
type and bail out without calling bpos_to_disk_accounting_pos() twice.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bkey_fsck_err() was added as an interface that looks like fsck_err(),
but previously all it did was ensure that the appropriate error counter
was incremented in the superblock.
This is a cleanup and bugfix patch that converts it to a wrapper around
fsck_err(). This is needed to fix an issue with the upgrade path to
disk_accounting_v3, where the "silent fix" error list now includes
bkey_fsck errors; fsck_err() handles this in a unified way, and since we
need to change printing of bkey fsck errors from the caller to the inner
bkey_fsck_err() calls, this ends up being a pretty big change.
Als,, rename .invalid() methods to .validate(), for clarity, while we're
changing the function signature anyways (to drop the printbuf argument).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
forward compat fix
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Implement bch2_accounting_invalid(); check for junk at the end, and
replicas accounting entries in particular need to be checked or we'll
pop asserts later.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The transaction commit path takes mark_lock, so we shouldn't be holding
it; use a bpos as an iterator so that we can drop and retake.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new helper to free zeroed out accounting entries, and use it in
bch2_replicas_gc2(); bch2_replicas_gc2() was killing superblock replicas
entries if their corresponding accounting counters were nonzero, but
that's incorrect - the superblock replicas entry needs to exist if the
accounting entry exists, not if it's nonzero, because we check and
create the replicas entry when creating the new accounting entry - we
don't know when it's becoming nonzero.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Break up the percpu counter allocations into individual allocations for
each disk accounting counter; this fixes an issue on large systems where
we have too many replica entries to for the percpu allocator's max
practical size.
Also, use just one eytzinger tree for the normal set of counters and the
gc counters; this simplifies accounting_gc_done() where we need the same
set of counters to be present in both tables.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_accounting_mem_insert() drops and retakes mark_lock; thus, we need
to check if the entry in question has already been inserted.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new ioctl that can return the new accounting counter types; it
takes as input a bitmask of accounting types to return.
This will be used for returning e.g. compression accounting and
rebalance_work accounting.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
fsck_err() now optionally takes a btree_trans; if the current thread has
one, it is required that it be passed.
The next patch will use this to unlock when waiting for user input.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add counters for how much disk space we're using per btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|