summaryrefslogtreecommitdiff
path: root/fs/bcachefs/error.c
AgeCommit message (Collapse)Author
2024-12-21bcachefs: Plumb bkey_validate_context to journal_entry_validateKent Overstreet
This lets us print the exact location in the journal if it was found in the journal, or correctly print if it was found in the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bkey_fsck_err now respects errors_silentKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch2_inum_to_path()Kent Overstreet
Add a function for walking backpointers to find a path from a given inode number, and convert various error messages to use it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: do_fsck_ask_yn()Kent Overstreet
__bch2_fsck_err() is huge, and badly needs more refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't error out when logging fsck errorKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: BCH_FS_recovery_runningKent Overstreet
If we're autofixing topology errors, we shouldn't shutdown if we're still in recovery. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: struct bkey_validate_contextKent Overstreet
Add a new parameter to bkey validate functions, and use it to improve invalid bkey error messages: we can now print the btree and depth it came from, or if it came from the journal, or is a btree root. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch2_bucket_do_index(): inconsistent_err -> fsck_errKent Overstreet
Factor out a common helper, need_discard_or_freespace_err(), which is now used by both fsck and the runtime checks, and can repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Kill FSCK_NEED_FSCKKent Overstreet
If we find an error that indicates that we need to run fsck, we can specify that directly with run_explicit_recovery_pass(). These are now log_fsck_err() calls: we're just logging in the superblock that an error occurred - and possibly doing an emergency shutdown, depending on policy. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-20bcachefs: Fix __bch2_fsck_err() warningKent Overstreet
We only warn about having a btree_trans that wasn't passed in if we'll be prompting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04bcachefs: Make sure we print error that causes fsck to bail outKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04bcachefs: bkey errors are only AUTOFIX during readKent Overstreet
Newly generated keys, in the transaction commit path or write path, should not be AUTOFIX; those indicate bugs that we need to fail fast for. Fixes: 5612daafb764 ("bcachefs: Fix fsck warnings from bkey validation") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix fsck warnings from bkey validationKent Overstreet
__bch2_fsck_err() warns if the current task has a btree_trans object and it wasn't passed in, because if it has to prompt for user input it has to be able to unlock it. But plumbing the btree_trans through bkey_validate(), as well as transaction restarts, is problematic - so instead make bkey fsck errors FSCK_AUTOFIX, which doesn't need to warn. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()Kent Overstreet
bkey_fsck_err() was added as an interface that looks like fsck_err(), but previously all it did was ensure that the appropriate error counter was incremented in the superblock. This is a cleanup and bugfix patch that converts it to a wrapper around fsck_err(). This is needed to fix an issue with the upgrade path to disk_accounting_v3, where the "silent fix" error list now includes bkey_fsck errors; fsck_err() handles this in a unified way, and since we need to change printing of bkey fsck errors from the caller to the inner bkey_fsck_err() calls, this ends up being a pretty big change. Als,, rename .invalid() methods to .validate(), for clarity, while we're changing the function signature anyways (to drop the printbuf argument). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Unlock trans when waiting for user input in fsckKent Overstreet
We can't hold locks while waiting for user input, that's a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: twf: convert bch2_stdio_redirect_readline() to darrayKent Overstreet
We now read the line from the buffer atomically, which means we have to allow the buffer to grow past STDIO_REDIRECT_BUFSIZE if we're waiting for a full line - this behaviour is necessary for stdio_redirect_readline_timeout() in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: fsck_err() may now take a btree_transKent Overstreet
fsck_err() now optionally takes a btree_trans; if the current thread has one, it is required that it be passed. The next patch will use this to unlock when waiting for user input. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: add might_sleep() annotations for fsck_err()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-20bcachefs: Fix safe errors by defaultKent Overstreet
i.e. the start of automatic self healing: If errors=continue or fix_safe, we now automatically fix simple errors without user intervention. New error action option: fix_safe This replaces the existing errors=ro option, which gets a new slot, i.e. existing errors=ro users now get errors=fix_safe. This is currently only enabled for a limited set of errors - initially just disk accounting; errors we would never not want to fix, and we don't want to require user intervention (i.e. to make sure a bug report gets filed). Errors will still be counted in the superblock, so we (developers) will still know they've been occuring if a bug report gets filed (as bug reports typically include the errors superblock section). Eventually we'll be enabling this for a much wider set of errors, after we've done thorough error injection testing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: allow for custom action in fsck error messagesKent Overstreet
Be more explicit to the user about what we're doing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: On emergency shutdown, print out current journal sequence numberKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Split out recovery_passes.cKent Overstreet
We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Kill more -EIO error codesKent Overstreet
This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Don't autofix errors we can't fixKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Online fsck can now fix errorsKent Overstreet
BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: factor out thread_with_file, thread_with_stdioKent Overstreet
thread_with_stdio now knows how to handle input - fsck can now prompt to fix errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: convert bch_fs_flags to x-macroKent Overstreet
Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_sb.recovery_passes_requiredKent Overstreet
Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: Enumerate fsck errorsKent Overstreet
This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: bch_sb_field_errorsKent Overstreet
Add a new superblock section to keep counts of errors seen since filesystem creation: we'll be addingcounters for every distinct fsck error. The new superblock section has entries of the for [ id, count, time_of_last_error ]; this is intended to let us see what errors are occuring - and getting fixed - via show-super output. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: Add IO error counts to bch_memberKent Overstreet
We now track IO errors per device since filesystem creation. IO error counts can be viewed in sysfs, or with the 'bcachefs show-super' command. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Break up io.cKent Overstreet
More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make topology repair a normal recovery passKent Overstreet
This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and explicitly running a specific recovery pass - this is a more general replacement for how we were running topology repair before. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fix_errors option is now a proper enumKent Overstreet
Before, it was parsed as a bool but internally it was really an enum: this lets us pass in all the possible values. But we special case the option parsing: no supplied value is parsed as FSCK_FIX_yes, to match the previous behaviour. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_fsck_ask_yn()Kent Overstreet
- getline() output includes a newline, without stripping that we were just looping - Make the prompt clearer Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make sure hash info gets initialized in fsckKent Overstreet
We had some bugs with setting/using first_this_inode in the inode walker in the dirents/xattr code. This patch changes to not clear first_this_inode until after initializing the new hash info. Also, we fix an error message to not print on transaction restart, and add a comment to related fsck error code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Allow answering y or n to all fsck errors of given typeKent Overstreet
This changes the ask_yn() function used by fsck to accept Y or N, meaning yes or no for all errors of a given type. With this, the user can be prompted only for distinct error types - useful when a filesystem has lots of errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't print out duplicate fsck errorsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve btree node read error pathKent Overstreet
This ensures that failure to read a btree node error is treated as a topology error, and returns the correct error so that the topology repair pass will be run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fixes for building in userspaceKent Overstreet
- Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ensure fsck error is printed before panicKent Overstreet
When errors=panic, we want to make sure we print the error before calling bch2_inconsistent_error(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve bch2_fsck_err()Kent Overstreet
- factor out fsck_err_get() - if the "bcachefs (%s):" prefix has already been applied, don't duplicate it - convert to printbufs instead of static char arrays - tidy up control flow a bit - use bch2_print_string_as_lines(), to avoid messages getting truncated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert fsck errors to errcode.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Log message improvementsKent Overstreet
Change the error messages in bch2_inconsistent_error() and bch2_fatal_error() so we can distinguish them. Also, prefer bch2_fs_fatal_error() (which also logs an error message) to bch2_fatal_error(), and change a call to bch2_inconsistent_error() to bch2_fatal_error() when we can't continue. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't ratelimit certain fsck errorsKent Overstreet
It's unhelpful if we see "Halting mark and sweep to start topology repair" but we don't see the error that triggered it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: New and improved topology repair codeKent Overstreet
This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use x-macros for more enumsKent Overstreet
This patch standardizes all the enums that have associated string tables (probably more enums should have string tables). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Turn c->state_lock into an rwsemKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fsck_error_lock requires GFP_NOFSKent Overstreet
this fixes a lockdep splat Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add an option for fsck error ratelimitingKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>