summaryrefslogtreecommitdiff
path: root/fs/bcachefs/journal_io.c
AgeCommit message (Collapse)Author
2025-01-26bcachefs: Journal writes are now IOPRIO_CLASS_RTKent Overstreet
System performance is particularly sensitive to journal write latency, the number of outstanding journal writes is bounded and we can't issue journal flushes until other journal writes have completed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Plumb bkey_validate_context to journal_entry_validateKent Overstreet
This lets us print the exact location in the journal if it was found in the journal, or correctly print if it was found in the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Journal write path refactoring, debug improvementsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: dev_alloc_list.devs -> dev_alloc_list.dataKent Overstreet
This lets us use darray macros on dev_alloc_list (and it will become a darray eventually, when we increase the maximum number of devices). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Fix failure to allocate journal write on discard retryKent Overstreet
When allocating a journal write fails, then retries after doing discards, we were failing to count already allocated replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: BCH_ERR_insufficient_journal_devicesKent Overstreet
kill another standard error code use Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Silence "unable to allocate journal write" if we're already ROKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Improve "unable to allocate journal write" messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: struct bkey_validate_contextKent Overstreet
Add a new parameter to bkey validate functions, and use it to improve invalid bkey error messages: we can now print the btree and depth it came from, or if it came from the journal, or is a btree root. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Use str_write_read() helper functionThorsten Blum
Remove hard-coded strings by using the helper function str_write_read(). Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Avoid bch2_btree_id_str()Kent Overstreet
Prefer bch2_btree_id_to_text() - it prints out the integer ID when unknown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Factor out jset_entry_log_msg_bytes()Kent Overstreet
Needed for improved userspace cmd_list_journal Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-12bcachefs: Fix journal_entry_dev_usage_to_text() overrunKent Overstreet
If the jset_entry_dev_usage is malformed, and too small, our nr_entries calculation will be incorrect - just bail out. Reported-by: syzbot+05d7520be047c9be86e0@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07bcachefs: add check NULL return of bio_kmalloc in journal_read_bucketPei Xiao
bio_kmalloc may return NULL, will cause NULL pointer dereference. Add check NULL return for bio_kmalloc in journal_read_bucket. Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Fixes: ac10a9611d87 ("bcachefs: Some fixes for building in userspace") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix unlocked access to c->disk_sb.sb in bch2_replicas_entry_validate()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Annotate bch_replicas_entry_{v0,v1} with __counted_by()Thorsten Blum
Add the __counted_by compiler attribute to the flexible array members devs to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Increment nr_devs before adding a new device to the devs array and adjust the array indexes accordingly. Add a helper macro for adding a new device. In bch2_journal_read(), explicitly set nr_devs to 0. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Convert to use jiffies macrosChen Yufan
Use jiffies macros instead of using jiffies directly to handle wraparound. Signed-off-by: Chen Yufan <chenyufan@vivo.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()Kent Overstreet
bkey_fsck_err() was added as an interface that looks like fsck_err(), but previously all it did was ensure that the appropriate error counter was incremented in the superblock. This is a cleanup and bugfix patch that converts it to a wrapper around fsck_err(). This is needed to fix an issue with the upgrade path to disk_accounting_v3, where the "silent fix" error list now includes bkey_fsck errors; fsck_err() handles this in a unified way, and since we need to change printing of bkey fsck errors from the caller to the inner bkey_fsck_err() calls, this ends up being a pretty big change. Als,, rename .invalid() methods to .validate(), for clarity, while we're changing the function signature anyways (to drop the printbuf argument). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Improve "unable to allocate journal write" messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Eytzinger accumulation for accounting keysKent Overstreet
The btree write buffer takes as input keys from the journal, sorts them, deduplicates them, and flushes them back to the btree in sorted order. The disk space accounting rewrite is moving accounting to normal btree keys, with update (in this case deltas) accumulated in the write buffer and then flushed to the btree; but this is going to increase the number of keys handled by the write buffer by perhaps as much as a factor of 3x-5x. The overhead from copying around and sorting this many keys would cause a significant performance regression, but: there is huge locality in updates to accounting keys that we can take advantage of. Instead of appending accounting keys to the list of keys to be sorted, this patch adds an eytzinger search tree of recently seen accounting keys. We look up the accounting key in the eytzinger search tree and apply the delta directly, adding it if it doesn't exist, and periodically prune the eytzinger tree of unused entries. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Use try_cmpxchg() family of functions instead of cmpxchg()Uros Bizjak
Use try_cmpxchg() family of functions instead of cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Fix missing spaces in journal_entry_dev_usage_to_textYouling Tang
Fixed missing spaces displayed in journal_entry_dev_usage_to_text while adjusting the display format to improve readability. before: ``` # bcachefs list_journal -a -t alloc:1:0 /dev/sdb ... dev_usage: dev=0free: buckets=233180 sectors=0 fragmented=0sb: buckets=13 sectors=6152 fragmented=504journal: buckets=1847 sectors=945664 fragmented=0btree: buckets=20 sectors=10240 fragmented=0user: buckets=1419 sectors=726513 fragmented=15cached: buckets=0 sectors=0 fragmented=0parity: buckets=0 sectors=0 fragmented=0stripe: buckets=0 sectors=0 fragmented=0need_gc_gens: buckets=0 sectors=0 fragmented=0need_discard: buckets=1 sectors=0 fragmented=0 ``` after: ``` # bcachefs list_journal -a -t alloc:1:0 /dev/sdb ... dev_usage: dev=0 free: buckets=233180 sectors=0 fragmented=0 sb: buckets=13 sectors=6152 fragmented=504 journal: buckets=1847 sectors=945664 fragmented=0 btree: buckets=20 sectors=10240 fragmented=0 user: buckets=1419 sectors=726513 fragmented=15 cached: buckets=0 sectors=0 fragmented=0 parity: buckets=0 sectors=0 fragmented=0 stripe: buckets=0 sectors=0 fragmented=0 need_gc_gens: buckets=0 sectors=0 fragmented=0 need_discard: buckets=1 sectors=0 fragmented=0 ``` Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10bcachefs: Fix missing error check in journal_entry_btree_keys_validate()Kent Overstreet
Closes: https://syzkaller.appspot.com/bug?extid=8996d8f176cf946ef641 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10bcachefs: Fix journal getting stuck on a flush commitKent Overstreet
silly race Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-23bcachefs: Add missing bch2_journal_do_writes() callKent Overstreet
This fixes a rare deadlock when we're doing an emergency shutdown due to failure to do a journal write. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-21bcachefs: Use bch2_print_string_as_lines for long errKent Overstreet
printk strings get truncated to 1024 bytes; if we have a long error message (journal debug info) we need to use a helper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: s/bkey_invalid_flags/bch_validate_flagsKent Overstreet
We're about to start using bch_validate_flags for superblock section validation - it's no longer bkey specific. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: Invalid devices are now checked for by fsck, not .invalid methodsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: bch2_dev_get_ioref() checks for device not presentKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: bch2_dev_get_ioref2(); journal_io.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in journal_ptrs_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_dev_have_ref()Kent Overstreet
bch2_dev_bkey_exists() is going away; bch2_dev_have_ref() documents that we're looking up a device without checking if it's present because we have a reference to it already. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: x-macroize journal flags enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Simplify resuming of journal positionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch_member.last_journal_bucketKent Overstreet
On recovery from clean shutdown we don't typically read the journal, but we still want to avoid overwriting existing entries in the journal for list_journal debugging. Thus, add some fields to the member info section so we can remember where we left off. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: journal seq blacklist gc no longer has to walk btreeKent Overstreet
Since btree_ptr_v2, we no longer require the journal seq blacklist table for skipping blacklisted bsets (btree node entries); the pointer to a given node indicates how much data is present. Therefore there's no longer any need for journal seq blacklist gc to walk the btree - we can prune entries older than journal last_seq. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: member helper cleanupsKent Overstreet
Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: allow for custom action in fsck error messagesKent Overstreet
Be more explicit to the user about what we're doing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20bcachefs: Fix deadlock in journal write pathKent Overstreet
bch2_journal_write() was incorrectly waiting on earlier journal writes synchronously; this usually worked because most of the time we'd be running in the context of a thread that did a journal_buf_put(), but sometimes we'd be running out of the same workqueue that completes those prior journal writes. Additionally, this makes sure to punt to a workqueue before submitting preflushes - we really don't want to be calling submit_bio() in the main transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Standardize helpers for printing enum strs with bounds checksKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Improve bch2_fatal_error()Kent Overstreet
error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Change "accounting overran journal reservation" to a warningKent Overstreet
This doesn't need to be a BUG_ON(); the actual serious "things break" condition is if the whole journal write overruns the available space, and that has a fatal error, not a BUG_ON(). This check indicates we screwed something up, but it should be a warning. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: pull out time_stats.[ch]Kent Overstreet
prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: fix lost journal buf wakeup due to improved pipeliningBrian Foster
The journal_write_done() handler was reworked into a loop in commit 746a33c96b7a ("bcachefs: better journal pipelining"). As part of this, the journal buffer wake was factored into a post-loop branch that executes if at least one journal buffer has completed. The journal buffer processing loop iterates on the journal buffer pointer, however. This means that w refers to the last buffer processed by the loop, which may or may not be done. This also means that if multiple buffers are processed by the loop, only the last is awoken. This lost wakeup behavior has lead to stalling problems in various CI and fstests, such as generic/703. Lift the wake into the loop so each done buffer sees a wake call as it is processed. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Fix bch2_journal_noflush_seq()Kent Overstreet
Improved journal pipelining broke journal_noflush_seq(); it implicitly assumed only the oldest outstanding journal buf could be in flight, but that's no longer true. Make this more straightforward by just setting buf->must_flush whenever we know a journal buf is going to be flush. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: split out ignore_blacklisted, ignore_not_dirtyKent Overstreet
prep work for replaying the journal backwards Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: jset_entry for loops declare loop iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Fix journal_buf bitfield accessesKent Overstreet
All jounal_buf bitfield updates must happen under the journal lock - perhaps we should just switch these to atomic bit flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: kill kvpmalloc()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10bcachefs: better journal pipeliningKent Overstreet
Recently a severe performance regression was discovered, which bisected to a6548c8b5eb5 bcachefs: Avoid flushing the journal in the discard path It turns out the old behaviour, which issued excessive journal flushes, worked around a performance issue where queueing delays would cause the journal to not be able to write quickly enough and stall. The journal flushes masked the issue because they periodically flushed the device write cache, reducing write latency for non flushes. This patch reworks the journalling code to allow more than one (non-flush) write to be in flight at a time. With this patch, doing 4k random writes and an iodepth of 128, we are now able to hit 560k iops to a Samsung 970 EVO Plus - previously, we were stuck in the ~200k range. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>