summaryrefslogtreecommitdiff
path: root/fs/bcachefs/inode.c
AgeCommit message (Collapse)Author
2023-10-22bcachefs: New bpos_cmp(), bkey_cmp() replacementsKent Overstreet
This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve bch2_inode_opts_to_opts()Kent Overstreet
It turns out the *_defined entries of bch_io_opts are only used in one place - in the xattr get path - and there we immediately convert to a bch_opts struct, which also has the *_defined entries. This patch changes bch2_inode_opts_to_opts() to go directly from bch_inode_unpacked to bch_opts, which is a minor simplification and will also let us slim down struct bch_io_opts in another patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a transaction path overflowKent Overstreet
It turns out we need bch2_extent_trim_atomi() even when we're deleting extents one at a time because it's possible for one reflink_p to reference arbitrarily many reflink_v extents. This doesn't normally happen, but the data move path can fragment existing extents in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Assorted checkpatch fixesKent Overstreet
checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inline bch2_inode_pack()Kent Overstreet
It's mainly used from bch2_inode_write(), so inline it there. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add private error codes for ENOSPCKent Overstreet
Continuing the saga of introducing private dedicated error codes for each error path, this patch converts ENOSPC to error codes that are subtypes of ENOSPC. We've recently had a test failure where we got -ENOSPC where we shouldn't have, and didn't have enough information to tell where it came from, so this patch will solve that problem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: EINTR -> BCH_ERR_transaction_restartKent Overstreet
Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't BUG_ON() inode link count underflowKent Overstreet
This switches that assertion to a bch2_trans_inconsistent() call, as it should be. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Printbuf reworkKent Overstreet
This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add rw to .key_invalid()Kent Overstreet
This adds a new parameter to .key_invalid() methods for whether the key is being read or written; the idea being that methods can do more aggressive checks when a key is newly created and being written, when we wouldn't want to delete the key because of those checks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Convert .key_invalid methods to printbufsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_btree_iter_peek_upto()Kent Overstreet
In BTREE_ITER_FILTER_SNAPHOTS mode, we skip over keys in unrelated snapshots. When we hit the end of an inode, if the next inode(s) are in a different subvolume, we could potentially have to skip past many keys before finding a key we can return to the caller, so they can terminate the iteration. This adds a peek_upto() variant to solve this problem, to be used when we know the range we're searching within. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Inode create no longer needs to probe key cacheKent Overstreet
Now that we have full key cache coherency, we can simplify bch2_inode_create(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: btree_id_cached()Kent Overstreet
Add a new helper that returns true if the given btree ID uses the btree key cache. This enables some new cleanups, since the helper can check the options for whether caching is enabled on a given btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Simplify bch2_inode_delete_keys()Kent Overstreet
Had a bug report that implies bch2_inode_delete_keys() returned -EINTR before it completed, so this patch simplifies it and makes the flow control a little more conventional. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix a null ptr deref in bch2_inode_delete_keys()Kent Overstreet
Similarly to bch2_btree_delete_range_trans(), bch2_inode_delete_keys() may sometimes split compressed extents, and needs to pass in a disk reservation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix debug build in userspaceKent Overstreet
This fixes some compiler warnings that only trigger in userspace - dead code, a maybe uninitialed variable, a maybe null ptr passed to printk. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix missing field initializationKent Overstreet
When unpacking v1 inodes, we were failing to initialize the journal_seq field, leading to a BUG_ON() when fsync tries to flush a garbage journal sequence number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_assert_pos_locked()Kent Overstreet
This adds a new assertion to be used by bch2_inode_update_after_write(), which updates the VFS inode based on the update to the btree inode we just did - we require that the btree inode still be locked when we do that update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add journal_seq to inode & alloc keysKent Overstreet
Add fields to inode & alloc keys that record the journal sequence number when they were most recently modified. For alloc keys, this is needed to know what journal sequence number we have to flush before the bucket can be reused. Currently this is tracked in memory, but we'll be getting rid of the in memory bucket array. For inodes, this is needed for fsync when the inode has been evicted from the vfs cache. Currently we use a bloom filter per outstanding journal buf - but that mechanism has been broken since we added the ability to not issue a flush/fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add BCH_SUBVOLUME_UNLINKEDKent Overstreet
Snapshot deletion needs to become a multi step process, where we unlink, then tear down the page cache, then delete the subvolume - the deleting flag is equivalent to an inode with i_nlink = 0. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Plumb through subvolume idKent Overstreet
To implement snapshots, we need every filesystem btree operation (every btree operation without a subvolume) to start by looking up the subvolume and getting the current snapshot ID, with bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode. This patch adds those bch2_subvolume_get_snapshot() calls, and also switches to passing around a subvol_inum instead of just an inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Subvolumes, snapshotsKent Overstreet
This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: btree_pathKent Overstreet
This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add flags field to bch2_inode_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Always check for transaction restartsKent Overstreet
On transaction restart iterators won't be locked anymore - make sure we're always checking for errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add an option for whether inodes use the key cacheKent Overstreet
We probably don't ever want to flip this off in production, but it may be useful for certain kinds of testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add safe versions of varint encode/decodeKent Overstreet
This adds safe versions of bch2_varint_(encode|decode) that don't read or write past the end of the buffer, or varint being encoded. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill bch2_btree_iter_peek_cached()Kent Overstreet
It's now been rolled into bch2_btree_iter_peek_slot() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Preallocate transaction memKent Overstreet
This helps avoid transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Check for errors from bch2_trans_update()Kent Overstreet
Upcoming refactoring is going to change bch2_trans_update() to start returning transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix pathalogical behaviour with inode sharding by cpu IDKent Overstreet
If the transactior restarts on a different CPU, it could end up needing to read in a different btree node, which makes another transaction restart more likely... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add an option to control sharding new inode numbersKent Overstreet
We're seeing a bug where inode creates end up spinning in bch2_inode_create - disabling sharding will simplify what we're testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: properly initialize used valuesDan Robertson
- Ensure the second key value in bch_hash_info is initialized to zero if the info type is of type BCH_STR_HASH_SIPHASH. - Initialize the possibly returned value in bch2_inode_create. Assuming bch2_btree_iter_peek returns bkey_s_c_null, the uninitialized value of ret could be returned to the user as an error pointer. - Fix compiler warning in initialization of bkey_s_c_stripe fs/bcachefs/buckets.c:1646:35: warning: suggest braces around initialization of subobject [-Wmissing-braces] struct bkey_s_c_stripe new_s = { NULL }; ^~~~ Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improved check_directory_structure()Kent Overstreet
Now that we have inode backpointers, we can simplify checking directory structure: instead of doing a DFS from the filesystem root and then checking if we found everything, we can iterate over every inode and see if we can go up until we get to the root. This patch also has a number of fixes and simplifications for the inode backpointer checks. Also, it turns out we don't actually need the BCH_INODE_BACKPTR_UNTRUSTED flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change inode allocation code for snapshotsKent Overstreet
For snapshots, when we allocate a new inode we want to allocate an inode number that isn't in use in any other subvolume. We won't be able to use ITER_SLOTS for this, inode allocation needs to change to use BTREE_ITER_ALL_SNAPSHOTS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inode backpointersKent Overstreet
This patch adds two new inode fields, bi_dir and bi_dir_offset, that point back to the inode's dirent. Since we're only adding fields for a single backpointer, files that have been hardlinked won't necessarily have valid backpointers: we also add a new inode flag, BCH_INODE_BACKPTR_UNTRUSTED, that's set if an inode has ever had multiple links to it. That's ok, because we only really need this functionality for directories, which can never have multiple hardlinks - when we add subvolumes, we'll need a way to enemurate and print subvolumes, and this will let us reconstruct a path to a subvolume root given a subvolume root inode. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Start using bpos.snapshot fieldKent Overstreet
This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve inode deletion codeKent Overstreet
It had some silly redundancies. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Require all btree iterators to be freedKent Overstreet
We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rename BTREE_ID enums for consistency with other enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't use inode btree key cache in fsck codeKent Overstreet
We had a cache coherency bug with the btree key cache in the fsck code - this fixes fsck to be consistent about not using it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a shift greater than type sizeKent Overstreet
Found by UBSAN Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_trans_get_iter() no longer returns errorsKent Overstreet
Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_btree_delete_range_trans()Kent Overstreet
This helps reduce stack usage by avoiding multiple btree_trans on the stack. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't use bkey cache for inode update in fsckKent Overstreet
fsck doesn't know about the btree key cache, and non-cached iterators aren't cache coherent (yet?) Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inode delete doesn't need to flush key cache anymoreKent Overstreet
Inode create checks to make sure the slot doesn't exist in the btree key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix btree iterator leakKent Overstreet
this fixes an occasonial btree transaction iterators overflow. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New varintsKent Overstreet
Previous varint implementation used by the inode code was not nearly as fast as it could have been; partly because it was attempting to encode integers up to 96 bits (for timestamps) but this meant that encoding and decoding the length required a table lookup. Instead, we'll just encode timestamps greater than 64 bits as two separate varints; this will make decoding/encoding of inodes significantly faster overall. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improved inode create optimizationKent Overstreet
This shards new inodes into different btree nodes by using the processor ID for the high bits of the new inode number. Much faster than the previous inode create optimization - this also helps with sharding in the other btrees that index by inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>