summaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)Author
2023-10-22bcachefs: bch2_assert_pos_locked()Kent Overstreet
This adds a new assertion to be used by bch2_inode_update_after_write(), which updates the VFS inode based on the update to the btree inode we just did - we require that the btree inode still be locked when we do that update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: path->should_be_locked fixesKent Overstreet
- We should only be clearing should_be_locked in btree_path_set_pos() - it's the responsiblity of the btree_path code, not the btree_iter code. - bch2_path_put() needs to pay attention to path->should_be_locked, to ensure we don't drop locks we're supposed to be keeping. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Clean up error reporting in the startup pathKent Overstreet
It used to be that error reporting in the startup path was done by returning strings describing the error, but that turned out to be a rather silly idea - if there's something we can describe about the error, just print it right away. This converts a good chunk of code to returning error codes, as is more typical style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Return -ENOKEY/EINVAL when mount decryption failsChris Webb
bch2_fs_encryption_init() correctly passes back -ENOKEY from request_key() when no unlock key is found, or -EINVAL if superblock decryption fails because of an invalid key. However, these get absorbed into a generic NULL return from bch2_fs_alloc() and later returned to user space as -ENOMEM, leading to a misleading error from mount(1): mount(2) system call failed: Out of memory. Return explicit error pointers out of bch2_fs_alloc() and handle them in both callers, so the user instead sees mount(2) system call failed: Required key not available. when attempting to mount a filesystem which is still locked. Signed-off-by: Chris Webb <chris@arachsys.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix upgrade path for reflink_p fixKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Switch fsync to use bi_journal_seqKent Overstreet
Now that we're recording in each inode the journal sequence number of the most recent update, fsync becomes a lot simpler and we can delete all the plumbing for ei_journal_seq. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill bucket quantiles sysfs codeKent Overstreet
We're getting rid of code that uses the in memory bucket array - and we now have better mechanisms for viewing most of what the bucket quantiles code gave us (especially internal fragmentation). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill journal buf bloom filterKent Overstreet
This was used for recording which inodes have been modified by in flight journal writes, but was broken and has been superceded. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add journal_seq to inode & alloc keysKent Overstreet
Add fields to inode & alloc keys that record the journal sequence number when they were most recently modified. For alloc keys, this is needed to know what journal sequence number we have to flush before the bucket can be reused. Currently this is tracked in memory, but we'll be getting rid of the in memory bucket array. For inodes, this is needed for fsync when the inode has been evicted from the vfs cache. Currently we use a bloom filter per outstanding journal buf - but that mechanism has been broken since we added the ability to not issue a flush/fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Update inode on every writeKent Overstreet
This is going to be a performance regression until we get the btree key cache re-enabled - but it's needed for fixing fsync. Upcoming patches will record the journal_seq an inode was updated at in the inode itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BTREE_UPDATE_NOJOURNALKent Overstreet
We're going to have btree updates that don't need to be journalled; add a flag for that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix __remove_dirent()Kent Overstreet
__lookup_inode() doesn't work for what __remove_dirent() wants - it just wants the first inode at a given inode number, they all have the same hash info. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix check_inodes()Kent Overstreet
We were starting at the wrong btree position, and thus not actually checking any inodes - oops. Also, make check_key_has_snapshot() a mustfix fsck error, since later fsck code assumes that all keys have valid snapshot IDs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve error message in bch2_write_super()Kent Overstreet
It's helpful to know what the superblock write is for. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix trans_lock_write()Kent Overstreet
On failure to get a write lock (because we had a conflicting read lock), we need to make sure to upgrade the read lock to an intent lock - or we could end up spinning. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix upgrade_readers()Kent Overstreet
The bch2_btree_path_upgrade() call was failing and tripping an assert - path->level + 1 is in this case not necessarily exactly what we want, fix it by upgrading exactly the locks we want. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix faulty assertionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: BTREE_TRIGGER_INSERT now only means insertKent Overstreet
This allows triggers to distinguish between a key entering the btree - i.e. being called from the trans commit path - vs. being called on a key that already exists, i.e. by GC. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Convert bch2_mark_key() to take a btree_trans *Kent Overstreet
This helps to unify the interface between bch2_mark_key() and bch2_trans_mark_key() - and it also gives access to the journal reservation and journal seq in the mark_key path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Assorted ec fixesKent Overstreet
- The backpointer that ec_stripe_update_ptrs() uses now needs to include the snapshot ID, which means we have to change where we add the backpointer to after getting the snapshot ID for the new extents - ec_stripe_update_ptrs() needs to be calling bch2_trans_begin() - improve error message in bch2_mark_stripe() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix bch2_mark_update()Kent Overstreet
When the old or new key doesn't exist, we should still pass in a deleted key with the correct pos. This fixes a bug in the ec code, when bch2_mark_stripe() was looking up the wrong in-memory stripe. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Ensure journal doesn't get stuck in nochanges modeKent Overstreet
This tweaks the journal code to always act as if there's space available in nochanges mode, when we're not going to be doing any writes. This helps in recovering filesystems that won't mount because they need journal replay and the journal has gotten stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve transaction restart handling in fsck codeKent Overstreet
The fsck code has been handling transaction restarts locally, to avoid calling fsck_err() multiple times (and asking the user/logging the error multiple times) on transaction restart. However, with our improving assertions about iterator validity, this isn't working anymore - the code wasn't entirely correct, in ways that are fine for now but are going to matter once we start wanting online fsck. This code converts much of the fsck code to handle transaction restarts in a more rigorously correct way - moving restart handling up to the top level of check_dirent, check_xattr and others - at the cost of logging errors multiple times on transaction restart. Fixing the issues with logging errors multiple times is probably going to require memoizing calls to fsck_err() - we'll leave that for future improvements. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix bch2_btree_iter_advance()Kent Overstreet
Was popping an assertion on !BTREE_ITER_ALL_SNAPSHOTS iters when getting to the end. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Move bch2_evict_subvolume_inodes() to fs.cKent Overstreet
This fixes building in userspace - code that's coupled to the kernel VFS interface should live in fs.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't do upgrades in nochanges modeKent Overstreet
nochanges mode is often used for getting data off of otherwise nonrecoverable filesystems, which is often because of errors hit during fsck. Don't force version upgrade & fsck in nochanges mode, so that it's more likely to mount. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Drop bch2_journal_meta() call when going RWKent Overstreet
Back when we relied on the journal sequence number blacklist machinery for consistency between btree and the journal, we needed to ensure a new journal entry was written before any btree writes were done. But, this had the side effect of consuming some space in the journal prior to doing journal replay - which could lead to a very wedged filesystem, since we don't yet have a way to grow the journal prior to going RW. Fortunately, the journal sequence number blacklist machinery isn't needed anymore, as btree node pointers now record the numer of sectors currently written to that node - that code should all be ripped out. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add BCH_SUBVOLUME_UNLINKEDKent Overstreet
Snapshot deletion needs to become a multi step process, where we unlink, then tear down the page cache, then delete the subvolume - the deleting flag is equivalent to an inode with i_nlink = 0. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve error messages in trans_mark_reflink_p()Kent Overstreet
We should always print out the key we were marking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't run triggers in fix_reflink_p_key()Kent Overstreet
It seems some users have reflink pointers which span many indirect extents, from a short window in time when merging of reflink pointers was allowed. Now, we're seeing transaction path overflows in fix_reflink_p(), the code path to clear out the reflink_p fields now used for front/back pad - but, we don't actually need to be running triggers in that path, which is an easy partial fix. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: More general fix for transaction paths overflowKent Overstreet
for_each_btree_key() now calls bch2_trans_begin() as needed; that means, we can also call it when we're in danger of overflowing transaction paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix fsck path for refink pointersKent Overstreet
The way __bch2_mark_reflink_p returns errors was clashing with returning the number of sectors processed - we weren't returning FSCK_ERR_EXIT correctly. Fix this by only using the return code for errors, which actually ends up simplifying the overall logic. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Ensure we flush btree updates in evacuate pathKent Overstreet
This fixes a possible race where we fail to remove a device because of btree nodes still on it, that are being deleted by in flight btree updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_btree_node_rewrite() now returns transaction restartsKent Overstreet
We have been getting away from handling transaction restarts locally - convert bch2_btree_node_rewrite() to the newer style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix bch2_btree_iter_next_node()Kent Overstreet
We were modifying state, then return -EINTR, causing us to skip nodes - ouch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Must check for errors from bch2_trans_cond_resched()Kent Overstreet
But we don't need to call it from outside the btree iterator code anymore, since it's called by bch2_trans_begin() and bch2_btree_path_traverse(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix dev accounting after device addKent Overstreet
This is a hacky but effective fix to device usage stats for superblock and journal being wrong on a newly added device (following the comment that already told us how it needed to be done!) Reported-by: Chris Webb <chris@arachsys.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix a transaction path overflowKent Overstreet
readdir() in a directory with many subvolumes could overflow transaction paths - this is a simple hack around the issue. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix error handling in bch2_trans_extent_mergingKent Overstreet
The back merging case wasn't returning errors correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Subvol dirents are now only visible in parent subvolKent Overstreet
This changes the on disk format for dirents that point to subvols so that they also record the subvolid of the parent subvol, so that we can filter them out in other subvolumes. This also updates the dirent code to do that filtering, and in particular tweaks the rename code - we need to ensure that there's only ever one dirent (counting multiplicities in different snapshots) that point to a subvolume. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix restart handling in for_each_btree_key()Kent Overstreet
Code that uses for_each_btree_key often wants transaction restarts to be handled locally and not returned. Originally, we wouldn't return transaction restarts if there was a single iterator in the transaction - the reasoning being if there weren't other iterators being invalidated, and the current iterator was being advanced/retraversed, there weren't any locks or iterators we were required to preserve. But with the btree_path conversion that approach doesn't work anymore - even when we're using for_each_btree_key() with a single iterator there will still be two paths in the transaction, since we now always preserve the path at the pos the iterator was initialized at - the reason being that on restart we often restart from the same place. And it turns out there's now a lot of for_each_btree_key() uses that _do not_ want transaction restarts handled locally, and should be returning them. This patch splits out for_each_btree_key_norestart() and for_each_btree_key_continue_norestart(), and converts existing users as appropriate. for_each_btree_key(), for_each_btree_key_continue(), and for_each_btree_node() now handle transaction restarts themselves by calling bch2_trans_begin() when necessary - and the old hack to not return transaction restarts when there's a single path in the transaction has been deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: cached data shouldn't prevent fs from mountingKent Overstreet
It's not an error if we don't have cached data - skip BCH_DATA_cached in bch2_have_enough_devs(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Delete dentry when deleting snapshotsKent Overstreet
This fixes a bug where subsequently doing creates with the same name fails. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix check_path() for snapshotsKent Overstreet
check_path() wasn't checking the snapshot ID when checking for directory structure loops - so, snapshots would cause us to detect a loop that wasn't actually a loop. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix for leaking of reflinked extentsKent Overstreet
When a reflink pointer points to only part of an indirect extent, and then that indirect extent is fragmented (e.g. by copygc), if the reflink pointer only points to one of the fragments we leak a reference. Fix this by storing front/back pad values in reflink pointers - when inserting reflink pointesr, we initialize them to cover the full range of the indirect extents we reference. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: New on disk format to fix reflink_p pointersKent Overstreet
We had a bug where reflink_p pointers weren't being initialized to 0, and when we started using the second word, things broke badly. This patch revs the on disk format version and adds cleanup code to zero out the second word of reflink_p pointers before we start using it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Handle transaction restarts in bch2_blacklist_entries_gc()Kent Overstreet
It shouldn't be necessary when we're only using a single iterator and not doing updates, but that's harder to debug at the moment. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_trans_exit() no longer returns errorsKent Overstreet
Now that peek_node()/next_node() are converted to return errors directly, we don't need bch2_trans_exit() to return errors - it's cleaner this way and wasn't used much anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: for_each_btree_node() now returns errors directlyKent Overstreet
This changes for_each_btree_node() to work like for_each_btree_key(), and to that end bch2_btree_iter_peek_node() and next_node() also return error ptrs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve reflink repair codeKent Overstreet
When a reflink pointer points to an indirect extent that doesn't exist, we need to replace it with a KEY_TYPE_error key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>