Age | Commit message (Collapse) | Author |
|
System performance is particularly sensitive to journal write latency,
the number of outstanding journal writes is bounded and we can't issue
journal flushes until other journal writes have completed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This lets us print the exact location in the journal if it was found in
the journal, or correctly print if it was found in the superblock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This lets us use darray macros on dev_alloc_list (and it will become a
darray eventually, when we increase the maximum number of devices).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When allocating a journal write fails, then retries after doing
discards, we were failing to count already allocated replicas.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
kill another standard error code use
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new parameter to bkey validate functions, and use it to improve
invalid bkey error messages: we can now print the btree and depth it
came from, or if it came from the journal, or is a btree root.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Remove hard-coded strings by using the helper function str_write_read().
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prefer bch2_btree_id_to_text() - it prints out the integer ID when
unknown.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Needed for improved userspace cmd_list_journal
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If the jset_entry_dev_usage is malformed, and too small, our nr_entries
calculation will be incorrect - just bail out.
Reported-by: syzbot+05d7520be047c9be86e0@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bio_kmalloc may return NULL, will cause NULL pointer dereference.
Add check NULL return for bio_kmalloc in journal_read_bucket.
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Fixes: ac10a9611d87 ("bcachefs: Some fixes for building in userspace")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add the __counted_by compiler attribute to the flexible array members
devs to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Increment nr_devs before adding a new device to the devs array and
adjust the array indexes accordingly. Add a helper macro for adding a
new device.
In bch2_journal_read(), explicitly set nr_devs to 0.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Use jiffies macros instead of using jiffies directly to handle wraparound.
Signed-off-by: Chen Yufan <chenyufan@vivo.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bkey_fsck_err() was added as an interface that looks like fsck_err(),
but previously all it did was ensure that the appropriate error counter
was incremented in the superblock.
This is a cleanup and bugfix patch that converts it to a wrapper around
fsck_err(). This is needed to fix an issue with the upgrade path to
disk_accounting_v3, where the "silent fix" error list now includes
bkey_fsck errors; fsck_err() handles this in a unified way, and since we
need to change printing of bkey fsck errors from the caller to the inner
bkey_fsck_err() calls, this ends up being a pretty big change.
Als,, rename .invalid() methods to .validate(), for clarity, while we're
changing the function signature anyways (to drop the printbuf argument).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The btree write buffer takes as input keys from the journal, sorts them,
deduplicates them, and flushes them back to the btree in sorted order.
The disk space accounting rewrite is moving accounting to normal btree
keys, with update (in this case deltas) accumulated in the write buffer
and then flushed to the btree; but this is going to increase the number
of keys handled by the write buffer by perhaps as much as a factor of
3x-5x.
The overhead from copying around and sorting this many keys would cause
a significant performance regression, but: there is huge locality in
updates to accounting keys that we can take advantage of.
Instead of appending accounting keys to the list of keys to be sorted,
this patch adds an eytzinger search tree of recently seen accounting
keys. We look up the accounting key in the eytzinger search tree and
apply the delta directly, adding it if it doesn't exist, and
periodically prune the eytzinger tree of unused entries.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Use try_cmpxchg() family of functions instead of
cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg
(and related move instruction in front of cmpxchg).
Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when
cmpxchg fails. There is no need to re-read the value in the loop.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fixed missing spaces displayed in journal_entry_dev_usage_to_text
while adjusting the display format to improve readability.
before:
```
# bcachefs list_journal -a -t alloc:1:0 /dev/sdb
...
dev_usage: dev=0free: buckets=233180 sectors=0 fragmented=0sb: buckets=13 sectors=6152 fragmented=504journal: buckets=1847 sectors=945664 fragmented=0btree: buckets=20 sectors=10240 fragmented=0user: buckets=1419 sectors=726513 fragmented=15cached: buckets=0 sectors=0 fragmented=0parity: buckets=0 sectors=0 fragmented=0stripe: buckets=0 sectors=0 fragmented=0need_gc_gens: buckets=0 sectors=0 fragmented=0need_discard: buckets=1 sectors=0 fragmented=0
```
after:
```
# bcachefs list_journal -a -t alloc:1:0 /dev/sdb
...
dev_usage: dev=0
free: buckets=233180 sectors=0 fragmented=0
sb: buckets=13 sectors=6152 fragmented=504
journal: buckets=1847 sectors=945664 fragmented=0
btree: buckets=20 sectors=10240 fragmented=0
user: buckets=1419 sectors=726513 fragmented=15
cached: buckets=0 sectors=0 fragmented=0
parity: buckets=0 sectors=0 fragmented=0
stripe: buckets=0 sectors=0 fragmented=0
need_gc_gens: buckets=0 sectors=0 fragmented=0
need_discard: buckets=1 sectors=0 fragmented=0
```
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Closes: https://syzkaller.appspot.com/bug?extid=8996d8f176cf946ef641
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
silly race
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes a rare deadlock when we're doing an emergency shutdown due to
failure to do a journal write.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
printk strings get truncated to 1024 bytes; if we have a long error
message (journal debug info) we need to use a helper.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We're about to start using bch_validate_flags for superblock section
validation - it's no longer bkey specific.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_dev_bkey_exists() is going away; bch2_dev_have_ref() documents that
we're looking up a device without checking if it's present because we
have a reference to it already.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
On recovery from clean shutdown we don't typically read the journal, but
we still want to avoid overwriting existing entries in the journal for
list_journal debugging.
Thus, add some fields to the member info section so we can remember
where we left off.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since btree_ptr_v2, we no longer require the journal seq blacklist table
for skipping blacklisted bsets (btree node entries); the pointer to a
given node indicates how much data is present.
Therefore there's no longer any need for journal seq blacklist gc to
walk the btree - we can prune entries older than journal last_seq.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Some renaming for better consistency
bch2_member_exists -> bch2_member_alive
bch2_dev_exists -> bch2_member_exists
bch2_dev_exsits2 -> bch2_dev_exists
bch_dev_locked -> bch2_dev_locked
bch_dev_bkey_exists -> bch2_dev_bkey_exists
new helper - bch2_dev_safe
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Be more explicit to the user about what we're doing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_journal_write() was incorrectly waiting on earlier journal writes
synchronously; this usually worked because most of the time we'd be
running in the context of a thread that did a journal_buf_put(), but
sometimes we'd be running out of the same workqueue that completes those
prior journal writes.
Additionally, this makes sure to punt to a workqueue before submitting
preflushes - we really don't want to be calling submit_bio() in the main
transaction commit path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
error messages should always include __func__
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This doesn't need to be a BUG_ON(); the actual serious "things break"
condition is if the whole journal write overruns the available space,
and that has a fatal error, not a BUG_ON(). This check indicates we
screwed something up, but it should be a warning.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
prep work for lifting out of fs/bcachefs/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The journal_write_done() handler was reworked into a loop in commit
746a33c96b7a ("bcachefs: better journal pipelining"). As part of this,
the journal buffer wake was factored into a post-loop branch that
executes if at least one journal buffer has completed.
The journal buffer processing loop iterates on the journal buffer
pointer, however. This means that w refers to the last buffer processed
by the loop, which may or may not be done. This also means that if
multiple buffers are processed by the loop, only the last is awoken.
This lost wakeup behavior has lead to stalling problems in various CI
and fstests, such as generic/703.
Lift the wake into the loop so each done buffer sees a wake call as
it is processed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Improved journal pipelining broke journal_noflush_seq(); it implicitly
assumed only the oldest outstanding journal buf could be in flight, but
that's no longer true.
Make this more straightforward by just setting buf->must_flush whenever
we know a journal buf is going to be flush.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
prep work for replaying the journal backwards
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
All jounal_buf bitfield updates must happen under the journal lock -
perhaps we should just switch these to atomic bit flags.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Recently a severe performance regression was discovered, which bisected
to
a6548c8b5eb5 bcachefs: Avoid flushing the journal in the discard path
It turns out the old behaviour, which issued excessive journal flushes,
worked around a performance issue where queueing delays would cause the
journal to not be able to write quickly enough and stall.
The journal flushes masked the issue because they periodically flushed
the device write cache, reducing write latency for non flushes.
This patch reworks the journalling code to allow more than one
(non-flush) write to be in flight at a time. With this patch, doing 4k
random writes and an iodepth of 128, we are now able to hit 560k iops to
a Samsung 970 EVO Plus - previously, we were stuck in the ~200k range.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|