summaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)Author
2024-12-21bcachefs: errcode cleanup: journal errorsKent Overstreet
Instead of throwing standard error codes, we should be throwing dedicated private error codes, this greatly improves debugability. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Use separate rhltable for bch2_inode_or_descendents_is_open()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: BCH_ERR_btree_node_read_error_cachedKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: btree_write_buffer_flush_seq() no longer closes journalKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: discard fastpath now uses bch2_discard_one_bucket()Kent Overstreet
The discard bucket fastpath previously was using its own code for discarding buckets and clearing them in the need_discard btree, which didn't have any of the consistency checks of the main discard path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Bias reads more in favor of faster deviceKent Overstreet
Per reports of performance issues on mixed multi device filesystems where we're issuing too much IO to the spinning rust - tweak this algorithm. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: trivial btree write buffer refactoringKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Can now block journal activity without closing cur entryKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: New backpointers helpersKent Overstreet
- bch2_backpointer_del() - bch2_backpointer_maybe_flush() Kill a bit of open coding and make sure we're properly handling the btree write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: kill bch_backpointer.bucket_offset usageKent Overstreet
bch_backpointer.bucket_offset is going away - it's no longer needed since we no longer store backpointers in alloc keys, the same information is in the key position itself. And we'll be reclaiming the space in bch_backpointer for the bucket generation number. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Fix check_backpointers_to_extents range limitingKent Overstreet
bch2_get_btree_in_memory_pos() will return positions that refer directly to the btree it's checking will fit in memory - i.e. backpointer positions, not buckets. This also means check_bp_exists() no longer has to refer to the device, and we can delete some code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch_backpointer -> bkey_i_backpointerKent Overstreet
Since we no longer store backpointers in alloc keys, there's no reason not to pass around bkey_i_backpointers; this means we don't have to pass the bucket pos separately. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Drop swab code for backpointers in alloc keysKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bucket_pos_to_bp_end()Kent Overstreet
Better helpers for iterating over backpointers within a specific bucket Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: check for backpointers to invalid deviceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: fix bp_pos_to_bucket_nodev_noerrorKent Overstreet
_noerror means don't produce inconsistent errors, so it should be using bch2_dev_rcu_noerror(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Fix evacuate_bucket tracepointKent Overstreet
86a494c8eef9 ("bcachefs: Kill bch2_get_next_backpointer()") dropped some things the tracepoint emitted because bch2_evacuate_bucket() no longer looks at the alloc key - but we did want at least some of that. We still no longer look at the alloc key so we can't report on the fragmentation number, but that's a direct function of dirty_sectors and a copygc concern anyways - copygc should get its own tracepoint that includes information from the fragmentation LRU. But we can report on the number of sectors we moved and the bucket size. Co-developed-by: Piotr Zalewski <pZ010001011111@proton.me> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: fix O(n^2) issue with whiteouts in journal keysKent Overstreet
The journal_keys array can't be substantially modified after we go RW, because lookups need to be able to check it locklessly - thus we're limited on what we can do when a key in the journal has been overwritten. This is a problem when there's many overwrites to skip over for peek() operations. To fix this, add tracking of ranges of overwrites: we create a range entry when there's more than one contiguous whiteout. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: btree_and_journal_iter: don't iterate over too many whiteouts when ↵Kent Overstreet
prefetching To help ameloriate issues with peek operations having to skip over deletions in the journal - just bail out if all we're doing is prefetching btree nodes. Since btree node prefetching runs every time we iterate to a new node, and has to sequentially scan ahead, this avoids another O(n^2). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: journal keys: sort keys for interior nodes firstKent Overstreet
There's an unavoidable issue with btree lookups when we're overlaying journal keys and the journal has many deletions for keys present in the btree - peek operations will have to iterate over all those deletions to find the next live key to return. This is mainly a problem for lookups in interior nodes, if we have to traverse to a leaf. Looking up an insert position in a leaf (for journal replay) doesn't have to find the next live key, but walking down the btree does. So to ameloriate this, change journal key sort ordering so that we replay keys from roots and interior nodes first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: kill bch2_journal_entries_free()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't BUG_ON() when superblock feature wasn't set for compressed dataKent Overstreet
We don't allocate the mempools for compression/decompression unless we need them - but that means there's an inconsistency to check for. Reported-by: syzbot+cb3fbcfb417448cfd278@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't use a shared decompress workspace mempoolKent Overstreet
gzip and zstd require different decompress workspace sizes, and if we start with one and then start using the other at runtime we may not get the correct size Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: compression workspaces should be indexed by opt, not typeKent Overstreet
type includes lz4 and lz4_old, which do not get different compression workspaces, and incompressible, a fake type - BCH_COMPRESSION_OPTS() is the correct enum to use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: add missing BTREE_ITER_intentKent Overstreet
this fixes excessive transaction restarts due to trans_commit having to upgrade Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Kill bch2_get_next_backpointer()Kent Overstreet
Since for quite some time backpointers have only been stored in the backpointers btree, not alloc keys (an aborted experiment, support for which has been removed) - we can replace get_next_backpointer() with simple btree iteration. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Delete backpointers check in try_alloc_bucket()Kent Overstreet
try_alloc_bucket() has a "safety" check, which avoids allocating a bucket if there's any backpointers present. But backpointers are not the source of truth for live data in a bucket, the bucket sector counts are; this check was fairly useless, and we're also deferring backpointers checks from fsck to runtime in the near future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: peek_prev_min(): Search forwards for extents, snapshotsKent Overstreet
With extents and snapshots, for slightly different reasons, we may have to search forwards to find a key that compares equal to iter->pos (i.e. a key that peek_prev() should return, as it returns keys <= iter->pos). peek_slot() does this, and is an easy way to fix this case. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Implement bch2_btree_iter_prev_min()Kent Overstreet
A user contributed a filessytem dump, where the dump was actually corrupted (due to being taken while the filesystem was online), but which exposed an interesting bug in fsck - reconstruct_inode(). When itearting in BTREE_ITER_filter_snapshots mode, it's required to give an end position for the iteration and it can't span inode numbers; continuing into the next inode might mean we start seeing keys from a different snapshot tree, that the is_ancestor() checks always filter, thus we're never able to return a key and stop iterating. Backwards iteration never implemented the end position because nothing else needed it - except for reconstuct_inode(). Additionally, backwards iteration is now able to overlay keys from the journal, which will be useful if we ever decide to start doing journal replay in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: discard_one_bucket() now uses need_discard_or_freespace_err()Kent Overstreet
More conversion of inconsistent errors to fsck errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch2_bucket_do_index(): inconsistent_err -> fsck_errKent Overstreet
Factor out a common helper, need_discard_or_freespace_err(), which is now used by both fsck and the runtime checks, and can repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: try_alloc_bucket() now uses bch2_check_discard_freespace_key()Kent Overstreet
check_discard_freespace_key() was doing all the same checks as try_alloc_bucket(), but with repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: rework bch2_bucket_alloc_freelist() freelist iterationKent Overstreet
Prep work for converting try_alloc_bucket() to use bch2_check_discard_freespace_key(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: kill inconsistent err in invalidate_one_bucket()Kent Overstreet
Change it to a normal fsck_err() - meaning it'll get repaired at runtime when that's flipped on. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't delete reflink pointers to missing indirect extentsKent Overstreet
To avoid tragic loss in the event of transient errors (i.e., a btree node topology error that was later corrected by btree node scan), we can't delete reflink pointers to correct errors. This adds a new error bit to bch_reflink_p, indicating that it is known to point to a missing indirect extent, and the error has already been reported. Indirect extent lookups now use bch2_lookup_indirect_extent(), which on error reports it as a fsck_err() and sets the error bit, and clears it if necessary on succesful lookup. This also gets rid of the bch2_inconsistent_error() call in __bch2_read_indirect_extent, and in the reflink_p trigger: part of the online self healing project. An on disk format change isn't necessary here: setting the error bit will be interpreted by older versions as pointing to a different index, which will also be missing - which is fine. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Reorganize reflink.c a bitKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Reserve 8 bits in bch_reflink_pKent Overstreet
Better repair for reflink pointers, as well as propagating new inode options to indirect extents, are going to require a few extra bits bch_reflink_p: so claim a few from the high end of the destination index. Also add some missing bounds checking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Kill FSCK_NEED_FSCKKent Overstreet
If we find an error that indicates that we need to run fsck, we can specify that directly with run_explicit_recovery_pass(). These are now log_fsck_err() calls: we're just logging in the superblock that an error occurred - and possibly doing an emergency shutdown, depending on policy. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: lru errors are expected when reconstructing allocKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Delete dead code from bch2_discard_one_bucket()Kent Overstreet
alloc key validation ensures that if a bucket is in need_discard state the sector counts are all zero - we don't have to check for that. The NEED_INC_GEN check appears to be dead code, as well: we only see buckets in the need_discard btree, and it's an error if they aren't in the need_discard state. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch2_btree_bit_mod_iter()Kent Overstreet
factor out a new helper, make it handle extents bitset btrees (freespace). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: delete dead codeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Fix shutdown messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't use page allocator for sb_read_scratchKent Overstreet
Kill another unnecessary dependency on PAGE_SIZE Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Simplify code in bch2_dev_alloc()Youling Tang
- Remove unnecessary variable 'ret'. - Remove unnecessary bch2_dev_free() operations. Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Remove redundant initialization in bch2_vfs_inode_init()Youling Tang
`inode->v.i_ino` has been initialized to `inum.inum`. If `inum.inum` and `bi->bi_inum` are not equal, BUG_ON() is triggered in bch2_inode_update_after_write(). Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Removes NULL pointer checks for __filemap_get_folio return valuesYouling Tang
__filemap_get_folio the return value cannot be NULL, so unnecessary checks are removed. Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Add support for FS_IOC_GETFSSYSFSPATHKent Overstreet
[TEST]: ``` $ cat ioctl_getsysfspath.c #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <sys/ioctl.h> #include <linux/fs.h> #include <unistd.h> int main(int argc, char *argv[]) { int fd; struct fs_sysfs_path sysfs_path = {}; if (argc != 2) { fprintf(stderr, "Usage: %s <path_to_file_or_directory>\n", argv[0]); exit(EXIT_FAILURE); } fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open"); exit(EXIT_FAILURE); } if (ioctl(fd, FS_IOC_GETFSSYSFSPATH, &sysfs_path) == -1) { perror("ioctl FS_IOC_GETFSSYSFSPATH"); close(fd); exit(EXIT_FAILURE); } printf("FS_IOC_GETFSSYSFSPATH: %s\n", sysfs_path.name); close(fd); return 0; } $ gcc ioctl_getsysfspath.c $ sudo bcachefs format /dev/sda $ sudo mount.bcachefs /dev/sda /mnt $ sudo ./a.out /mnt FS_IOC_GETFSSYSFSPATH: bcachefs/c380b4ab-fbb6-41d2-b805-7a89cae9cadb ``` Original patch link: [1]: https://lore.kernel.org/all/20240207025624.1019754-8-kent.overstreet@linux.dev/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Youling Tang <youling.tang@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Add support for FS_IOC_GETFSUUIDKent Overstreet
Use super_set_uuid() to set `sb->s_uuid_len` to avoid returning `-ENOTTY` with sb->s_uuid_len being 0. Original patch link: [1]: https://lore.kernel.org/all/20240207025624.1019754-2-kent.overstreet@linux.dev/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Correct the description of the '--bucket=size' optionsYouling Tang
Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>