Age | Commit message (Collapse) | Author |
|
move_ratelimit() now has a bool that specifies whether we want to
wait for copygc to finish.
When copygc is running, we're probably low on free buckets instead
of consuming the remaining buckets, we want to wait for copygc to
finish.
This should help with performance, and run away bucket fragmentation.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This patch significantly cleans up and simplifies the data_update
interface. Instead of only being able to specify a single pointer by
device to rewrite, we're now able to specify any or all of the pointers
in the original extent to be rewrited, as a bitmask.
data_cmd is no more: the various pred functions now just return true if
the extent should be moved/updated. All the data_update path does is
rewrite existing replicas, or add new ones.
This fixes a bug where with background compression on replicated
filesystems, where rebalance -> data_update would incorrectly drop the
wrong old replica, and keep trying to recompress an extent pointer and
each time failing to drop the right replica. Oops.
Now, the data update path doesn't look at the io options to decide which
pointers to keep and which to drop - it only goes off of the
data_update_options passed to it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
bch2_check_alloc_key() was failing to check buckets that didn't have
alloc keys yet (because they'd never been used) - they still need to be
added to the freespace btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- In check_alloc_key(), previously we were re-initializing iterators
for the need_discard and freespace btrees for every alloc key we
checked. But this was causing us to redo lookups into the journal
keys every time, since those lookups are cached in struct btree_iter.
This initializes the iterators in bch2_check_alloc_info and passes
them into check_alloc_key().
- Make the looping more consistent/efficient in bch2_check_alloc_info()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This runs before we go rw for journal replay, but after we're allowed to
go rw. It might be time to consider killing BTREE_INSERT_LAZY_RW,
though.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- invalidate_one_bucket() now returns 1 when we don't have any buckets
on this device to invalidate, ensuring we don't spin
- the tracepoint invocation is moved to after the transaction commit,
and we now include the number of cached sectors in the tracepoint
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This switches that assertion to a bch2_trans_inconsistent() call, as it
should be.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
If a btree node is unreadable, it's the topology repair that fixes that
and it's kicked off by btree_gc, so btree_gc needs to touch every node
and very that they can be read.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
__dev_available() now calculates available buckets correctly. Previously
it would almost always return 0 when we have cached data.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we were at the end of the node, when breaking out of the loop we'd
pop the assertion on line 446 when cur wasn't NULL.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
-o verbose is very useful, and we're starting to use it more for runtime
debug statements - making it possible to enable at runtime is a no
brainer.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This improves the "copygc requested to run but no buckets found" to show
the device that requires copygc to be run on - we'll definitely need to
improve this more.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This is the start of reorganizing the data IO paths. The plan is to also
break apart io.c into data_read.c and data_write.c, and migrate_write
will be renamed to the data_update path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, dev_buckets_available() only counted buckets that are
eligible to be allocated right now - i.e. buckets that don't have cached
data, or need discard, or need gc gens, etc.
But most users of this function want to know how many buckets are
eligible to be allocated from without moving data around - copygc,
allocator striping, which means we should be including cached data
buckets etc.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Originally, the btree key cache code would always allocate new entries
by reusing from the recently-freed list, if that list wasn't empty. But
that behaviour was dropped, for lock contention reasons.
But it seems that entries stranded on the freed list have been
contributing to some of our oom issues, because long running btree
transactions will prevent them from being freed.
This patch re-adds allocating from the freed list, but it also adds
percpu buffers to solve the lock contention issues - and the new percpu
freed lists will improve the evict paths, too.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This adds a new option, move_bytes_in_flight, for configuring the amount
of IO in flight by copygc/rebalance - users with many devices in their
filesystem will want to increase this.
In the future we should be smarter about this, but this is an easy
improvement.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We have a hardcoded maximum on number of pointers in an extent that's
used by some other data structures - notably bch_devs_list - but we
weren't actually checking for it. Oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
If we fail to queue the work item because it's already in process, we
need to drop the ref we just took.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
If we're trying to get a ref and the refcount has been killed, it means
we're doing an emergency shutdown - we always want tryget_live().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We're seeing checksum errors in the bch2_rechecksum_bio() path - give it
a better error message to help track this down.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
When inserting a key type that's not valid for a given btree, we should
print out which btree we were inserting into.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We were only allowing 4 devices in a dev_list, not 16.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
With backpointers, alloc keys have gotten bigger, so we're needing more
memory here.
We're probably going to need to go with something more sophisticated
than a bump allocator, but - let's see if we can avoid doing that just
yet.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Like the previous patch for bucket invalidates, add another counter for
a core allocator path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
b->written wasn't being reset to 0 in the btree node read retry path,
causing decrypting & validation of previously read bsets to not be
re-run - ouch.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Like bch2_do_discards(), we should check if this needs to be done when
going rw.
Also, add some sysfs code for debugging bucket invalidation.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Printbufs recently switched to using string_get_size() for printing
integers in human readable units. This updates __bch2_strtoh() to parse
numbers printed by string_get_size() - we now have to handle floating
point numbers, and new unit suffixes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
bch2_dev_freespace_init() was using __bch2_trans_do() incorrectly, and
calling bch2_bucket_do_index() with a stale alloc key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We were forgetting to clear the read_in_flight flag - oops. This also
fixes it to not call bch2_fatal_error() before topology repair has had a
chance to do its thing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We had a bug where btree_and_journal_iter would return the same key
twice - after deleting it (perhaps because it was present in both the
btree and the journal?)
This reworks btree_and_journal_iter to track the current position, much
like btree_paths, which makes the logic considerably simpler and more
robust.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
cmd_list_journal wasn't correctly listing the most recent journal
entries as blacklisted - because in the recovery path when just reading
the journal, we were failing to add those to the blacklist table.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Lately we've been doing a lot of debugging by looking at the journal to
see what was changed, and by what code path. This patch adds a new
journal entry type for recording overwrites, so that we don't have to
search backwards through the journal to see what was being overwritten
in order to work out what the triggers were supposed to be doing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This takes copying the payload out of bch2_journal_add_entry(), which
means we can use it for journal_transaction_name() - also prep work for
journalling overwrites.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
bch2_opt_parse() was failing to generate error messages in error path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
When do_encrypt() was passed a vmalloc address and the buffer spanned
more than a single page, we were encrypting/decrypting completely
different pages than the ones intended.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Factor out a new helper.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
One of the init calls had a ; instead of a ?:, and errors after that got
dropped - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Right now, we print an error message on btree node read error, and we
print that we're retrying, but we don't explicitly say if the retry
succeeded - this makes things a little clearer.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Previously, on every btree_iter_peek() operation we were searching the
journal keys, doing a full binary search - which was slow.
This patch fixes that by saving our position in the journal keys, so
that we only do a full binary search when moving our position backwards
or a large jump forwards.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This logging improvement helps see when the previous fsck pass has
completed.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
flush_dcache_page() is not a noop on arm, but we were using
virt_to_page() instead of vmalloc_to_page() for an address on the kernel
stack - vmalloc memory, leading to an oops in flush_dcache_page().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
The only difference key_type_logon and key_type_user is that
key_type_logon keys can't be read by userspace.
However, userspace has actually been adding keys to both the logon and
user keychains, because userspace fsck requires the keychain interface -
so we might as well just use user and drop the logon keychain.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- Drop old unneeded parameter for whether we're in initial GC - which
was from when btree updates had to be done differently before we
went RW.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Per Dave Chinner and the xfs folks, .writepage is no longer needed, and
it's better not to define it if .writepages is the intended path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Rust FFI lacks support for unnamed structs and unions. The space
saved in bch_option is not enough to be significant.
Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This is pretty expensive, and we've tested sufficiently with it now that
it doesn't need to be on by default.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
When merging extents, we have to check that we won't overflow size
fields in any CRC entries - but the check for this was wrong, because in
the loop it was in we weren't keeping a pointer to the (packed, encoded)
CRC field.
Fix this by moving it to its own loop.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|