summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-22bcachefs: Fix a write buffer flush deadlockKent Overstreet
We're not supposed to block if BTREE_INSERT_JOURNAL_RECLAIM && watermark != BCH_WATERMARK_reclaim. This should really be a separate BTREE_INSERT_NONBLOCK flag - add some comments to that effect, it's not important for this patch. btree write buffer flush depends on this behaviour though - the first loop tries to flush sequentially, which doesn't free up space in the journal optimally. If that can't proceed we bail out and flush in journal order - that won't work if we're blocked instead of returning an error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bcachefs_metadata_version_major_minorKent Overstreet
This introduces major/minor versioning to the superblock version number. Major version number changes indicate incompatible releases; we can move forward to a new major version number, but not backwards. Minor version numbers indicate compatible changes - these add features, but can still be mounted and used by old versions. With the recent patches that make it possible to roll out new btrees and key types without breaking compatibility, we should be able to roll out most new features without incompatible changes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add new assertions for shutdown pathKent Overstreet
We've been seeing assertions pop that indicate the btree node cache or key cache have dirty items when we just did a clean shutdown. Add some more assertions so we can catch this when we're dirtying items. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_xattr_set() now updates ctimeKent Overstreet
Fixes fstests generic/728 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill bch2_xattr_get()Kent Overstreet
Inline it into the only caller Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix try_decrease_writepoints()Kent Overstreet
We were freeing open buckets on the writepoint list, but forgetting to take them off the writepoint list - whoops Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Mark as EXPERIMENTALKent Overstreet
As discussed on list, bcachefs is going to be marked as experimental for a few releases, until the inevitable tide of new bug reports subsides. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Enumerate recovery passesKent Overstreet
Recovery and fsck have many different passes/jobs to do, which always run in the same order - but not all of them run all the time. Some are for fsck, some for unclean shutdown, some for version upgrades. This adds some new structure: a defined list of recovery passes that we can run in a loop, as well as consolidating the log messages. The main benefit is consolidating the "should run this recovery pass" logic, as well as cleaning up the "this recovery pass has finished" state; instead of having a bunch of ad-hoc state bits in c->flags, we've now got c->curr_recovery_pass. By consolidating the "should run this recovery pass" logic, in the future on disk format upgrades will be able to say "upgrading to this version requires x passes to run", instead of forcing all of fsck to run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Stash journal replay params in bch_fsKent Overstreet
For the upcoming enumeration of recovery passes, we need all recovery passes to be called the same way - including journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill bch2_bucket_gens_read()Kent Overstreet
This folds bch2_bucket_gens_read() into bch2_alloc_read(), doing the version check there. This is prep work for enumarating all recovery passes: we need some cleanup first to make calling all the recovery passes consistent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix error path in bch2_journal_flush_device_pins()Kent Overstreet
We need to always call bch2_replicas_gc_end() after we've called bch2_replicas_gc_start(), else we leave state around that needs to be cleaned up. Partial fix for: https://github.com/koverstreet/bcachefs/issues/560 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: version_upgrade is now an enumKent Overstreet
The version_upgrade parameter is now an enum, not a bool, and it's persistent in the superblock: - compatible (default): upgrade to the latest compatible version - incompatible: upgrade to latest incompatible version - none Currently all upgrades are incompatible upgrades, but the next release will introduce major:minor versions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE()Kent Overstreet
Version upgrades are not atomic operations: when we do a version upgrade we need to update the superblock before we start using new features, and then when the upgrade completes we need to update the superblock again. This adds a new superblock field so we can detect and handle incomplete version upgrades. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert more -EROFS to private error codesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Delete redundant log messagesKent Overstreet
Now that we have distinct error codes for different memory allocation failures, the early init log messages are no longer needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change check for invalid key typesKent Overstreet
As part of the forward compatibility patch series, we need to allow for new key types without complaining loudly when running an old version. This patch changes the flags parameter of bkey_invalid to an enum, and adds a new flag to indicate we're being called from the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Assorted sparse fixesKent Overstreet
- endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor bch_sb_field_ops handlingKent Overstreet
This changes bch_sb_field_ops lookup to match how bkey_ops now works; for an unknown field type we return an empty ops struct. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Allow for unknown key typesKent Overstreet
This adds a new helper for lookups bkey_ops for a given key type, which returns a null bkey_ops for unknown key types; various bkey_ops users are tweaked as well to handle unknown key types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Allow for unknown btree IDsKent Overstreet
We need to allow filesystems with metadata from newer versions to be mountable and usable by older versions. This patch enables us to roll out new btrees without a new major version number; we can now handle btree roots for unknown btree types. The unknown btree roots will be retained, and fsck (including backpointers) will check them, the same as other btree types. We add a dynamic array for the extra, unknown btree roots, in addition to the fixed size btree root array, and add new helpers for looking up btree roots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: flush journal to avoid invalid dev usage entries on recoveryBrian Foster
A crash immediately after device removal can result in an unmountable filesystem due to recovery failure. The following command reliably reproduces on a multi-device fs: bcachefs device remove <dev> && xfs_io -xc shutdown <mnt> The post-crash mount fails with an error similar to the following, reported by fsck: invalid journal entry dev_usage at offset 7994/8034 seq 12: bad dev, fixing This refers to a device usage entry in the journal that refers to the index of the just removed device. Recovery considers this an invalid entry and fails to proceed. Device usage entries are added to journal buffer writes via bch_journal_write() -> bch2_journal_super_entries_add_common(), which means any journal buffer write has content that refers to member devices at the time of the journal write. The device remove sequence already removes metadata references to the device being removed. It then flushes any pins that refer to the device, clears replica entries, removes the in-memory device object and lastly updates the superblock to reflect that the device is no longer present. The problem is that any journal writes that occur during this sequence will include a dev usage entry so long as the device is present. To avoid this problem, we can flush the journal once more after the device entry is removed from the in-core structures, but before the superblock is updated to fully remove the device on-disk. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: mark active journal devices on journal replicas gcBrian Foster
A simple device evacuate, remove, add test loop with concurrent shutdowns occasionally reproduces a problem where the filesystem fails to mount. The mount failure occurs because the filesystem was uncleanly shut down, yet no member device is marked for journal data in the superblock. An fsck detects the problem, restores the mark and allows the mount to proceed without further consistency issues. The reason for the lack of journal data marks is the gc mechanism invoked via bch2_journal_flush_device_pins() runs while the journal happens to be empty. This results in garbage collection of all journal replicas entries. Once the updated replicas table is written to the superblock, the filesystem is put in a transiently unrecoverable state until further journal data is written, because journal recovery expects to find at least one marked journal device whenever the filesystem is not otherwise marked clean (i.e. as on clean unmount). To fix this problem, update the journal replicas gc algorithm to always mark currently active journal replicas entries by writing to the journal. This ensures that only entries for devices that are no longer used for journaling are garbage collected, not just those that don't happen to currently hold journal data. This preserves the journal recovery invariant above and avoids putting the fs into a transiently unrecoverable state. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_version_compatible()Kent Overstreet
This adds a new helper for checking if an on-disk version is compatible with the running version of bcachefs - prep work for introducing major:minor version numbers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_version_to_text()Kent Overstreet
Add a new helper for printing out metadata versions in a standard format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill BTREE_INSERT_USE_RESERVEKent Overstreet
Now that we have journal watermarks and alloc watermarks unified, BTREE_INSERT_USE_RESERVE is redundant and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a null ptr deref in bch2_fs_alloc() error pathKent Overstreet
This fixes a null ptr deref in bch2_free_pending_node_rewrites() when the list head wasn't initialized. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a format string warningKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill JOURNAL_WATERMARKKent Overstreet
This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards specifying watermarks once in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BCH_WATERMARK_reclaimKent Overstreet
Add another watermark for journal reclaim - this is needed for the next patches, that unify BCH_WATERMARK with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: struct bch_extent_rebalanceKent Overstreet
This adds the extent entry for extents that rebalance needs to do something with. We're adding this ahead of the main rebalance_work patchset, because adding new extent entries can't be done in a forwards-compatible way. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Expand BTREE_NODE_IDKent Overstreet
We now have 20 bits for the btree ID in the on disk format - sufficient for 1 million distinct btrees. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix btree node write error messageKent Overstreet
Error messages should include the error code, when available. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fsck: Break walk_inode() up into multiple functionsKent Overstreet
Some refactoring, prep work for algorithm improvements related to snapshots. we need to add a bitmap to the list of inodes for "seen this snapshot"; for this bitmap to correctly be available, we'll need to gather the list of inodes first, and later look up the inode for a given snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix leak in backpointers fsckKent Overstreet
We were forgetting to exit a printbuf - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: unregister_shrinker() now safe on not-registered shrinkerKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a missing rhashtable_destroy() callKent Overstreet
Fixes https://lore.kernel.org/linux-bcachefs/784c3e6a-75bd-e6ca-535a-43b3e1daf643@kernel.dk/T/#mbf7caf005f960018eba23b58795d06c06c947411 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve bch2_bkey_make_mut()Kent Overstreet
bch2_bkey_make_mut() now takes the bkey_s_c by reference and points it at the new, mutable key. This helps in some fsck paths that may have multiple repair operations on the same key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Reduce stack frame size of bch2_check_alloc_info()Kent Overstreet
Excessive inlining may (on some versions of gcc?) cause excessive stack usage; this turns off some inlining in bch2_check_alloc_info. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fsck needs BTREE_UPDATE_INTERNAL_SNAPSHOT_NODEKent Overstreet
A few fsck paths weren't using BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve error message for overlapping extentsKent Overstreet
We now print out the full previous extent we overlapping with, to aid in debugging and searching through the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix check_pos_snapshot_overwritten()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rename enum alloc_reserve -> bch_watermarkKent Overstreet
This is prep work for consolidating with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BCH_ERR_fsck -> EINVALKent Overstreet
When we return errors outside of bcachefs, we need to return a standard error code - fix this for BCH_ERR_fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_trans_mark_pointer() refactoringKent Overstreet
bch2_bucket_backpointer_mod() doesn't need to update the alloc key, we can exit the alloc iter earlier. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix more lockdep splats in debug.cKent Overstreet
Similar to previous fixes, we can't incur page faults while holding btree locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix lockdep splat in bch2_readdirKent Overstreet
dir_emit() can fault (taking mmap_lock); thus we can't be holding btree locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Check for ERR_PTR() from filemap_lock_folio()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New error message helpersKent Overstreet
Add two new helpers for printing error messages with __func__ and bch2_err_str(): - bch_err_fn - bch_err_msg Also kill the old error strings in the recovery path, which were causing us to incorrectly report memory allocation failures - they're not needed anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fiemap: Fix a lockdep splatKent Overstreet
As with the previous patch, we generally can't hold btree locks while copying to userspace, as that may incur a page fault and require mmap_lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: seqmutex; fix a lockdep splatKent Overstreet
We can't be holding btree_trans_lock while copying to user space, which might incur a page fault. To fix this, convert it to a seqmutex so we can unlock/relock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>