summaryrefslogtreecommitdiff
path: root/fs/bcachefs/super-io.c
AgeCommit message (Collapse)Author
2025-02-28bcachefs: Don't set BCH_FEATURE_incompat_version_field unless requestedKent Overstreet
We shouldn't be setting incompatible bits or the incompatible version field unless explicitly request or allowed - otherwise we break mounting with old kernels or userspace. Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Dropped superblock write is no longer a fatal errorKent Overstreet
Just emit a warning if errors=continue or fix_safe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bcachefs_metadata_version_persistent_inode_cursorsKent Overstreet
Persistent cursors for inode allocation. A free inodes btree would add substantial overhead to inode allocation and freeing - a "next num to allocate" cursor is always going to be faster. We just need it to be persistent, to avoid scanning the inodes btree from the start on startup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: BCH_SB_VERSION_INCOMPATKent Overstreet
We've been getting away from feature bits: they don't have any kind of ordering, and thus it's possible for people to enable weird combinations of features that were never tested or intended to be run. Much better to just give every new feature, compatible or incompatible, a version number. Additionally, we probably won't ever rev the major version number: major version numbers represent incompatible versions, but that doesn't really fit with how we actually roll out incompatible features - we need a better way of rolling out incompatible features. So, this patch adds two new superblock fields: - BCH_SB_VERSION_INCOMPAT - BCH_SB_VERSION_INCOMPAT_ALLOWED BCH_SB_VERSION_INCOMPAT_ALLOWED indicates that incompatible features up to version number x are allowed to be used without user prompting, but it does not by itself deny old versions from mounting. BCH_SB_VERSION_INCOMPAT does deny old versions from mounting, and must be <= BCH_SB_VERSION_INCOMPAT_ALLOWED. BCH_SB_VERSION_INCOMPAT will only be set when a codepath attempts to use an incompatible feature, so as to not unnecessarily break compatibility with old versions. bch2_request_incompat_feature() is the new interface to check if an incompatible feature may be used. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: cryptographic MACs on superblock are not (yet?) supportedKent Overstreet
We should add support for cryptographic macs on the superblock - and it won't be hard, but it'll need an incompatible feature bit (and we have a new incompatible feature versioning scheme coming). For now, just add a guard to avoid a dull ptr deref in gen_poly_key(). Reported-by: syzbot+dd3d9835055dacb66f35@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Don't use page allocator for sb_read_scratchKent Overstreet
Kill another unnecessary dependency on PAGE_SIZE Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Use str_write_read() helper in write_super_endio()Thorsten Blum
Remove hard-coded strings by using the str_write_read() helper function. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-24bcachefs: Fix invalid shift in validate_sb_layout()Gianfranco Trad
Add check on layout->sb_max_size_bits against BCH_SB_LAYOUT_SIZE_BITS_MAX to prevent UBSAN shift-out-of-bounds in validate_sb_layout(). Reported-by: syzbot+089fad5a3a5e77825426@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=089fad5a3a5e77825426 Fixes: 03ef80b469d5 ("bcachefs: Ignore unknown mount options") Tested-by: syzbot+089fad5a3a5e77825426@syzkaller.appspotmail.com Signed-off-by: Gianfranco Trad <gianf.trad@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Don't delete unlinked inodes before logged op resumeKent Overstreet
Previously, check_inode() would delete unlinked inodes if they weren't on the deleted list - this code dating from before there was a deleted list. But, if we crash during a logged op (truncate or finsert/fcollapse) of an unlinked file, logged op resume will get confused if the inode has already been deleted - instead, just add it to the deleted list if it needs to be there; delete_dead_inodes runs after logged op resume. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: assign return error when iterating through layoutDiogo Jahchan Koike
syzbot reported a null ptr deref in __copy_user [0] In __bch2_read_super, when a corrupt backup superblock matches the default opts offset, no error is assigned to ret and the freed superblock gets through, possibly being assigned as the best sb in bch2_fs_open and being later dereferenced, causing a fault. Assign EINVALID to ret when iterating through layout. [0]: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876 Reported-by: syzbot+18a5c5e8a9c856944876@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876 Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: bch2_sb_nr_devices()Kent Overstreet
factoring out a helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: promote_whole_extents is now a normal optionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-07bcachefs: Make allocator stuck timeout configurable, ratelimit messagesKent Overstreet
Limit these messages to once every 2 minutes to avoid spamming logs; with multiple devices the output can be quite significant. Also, up the default timeout to 30 seconds from 10 seconds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: show none if label is not setHongbo Li
If label is not set, the Label tag in superblock info show '(none)'. ``` [Before] Device index: 0 Label: Version: 1.4: member_seq [After] Device index: 0 Label: (none) Version: 1.4: member_seq ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Fix shift overflow in read_one_super()Kent Overstreet
Reported-by: syzbot+9f74cb4006b83e2a3df1@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: fix the display format for show-superHongbo Li
There are three keys displayed in non-uniform format. Let's fix them. [Before] ``` Label: testbcachefs Version: 1.9: (unknown version) Version upgrade complete: 0.0: (unknown version) ``` [After] ``` Label: testbcachefs Version: 1.9: (unknown version) Version upgrade complete: 0.0: (unknown version) ``` Fixes: 7423330e30ab ("bcachefs: prt_printf() now respects \r\n\t") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Fix setting of downgrade recovery passes/errorsKent Overstreet
bch2_check_version_downgrade() was setting c->sb.version, which bch2_sb_set_downgrade() expects to be at the previous version; and it shouldn't even have been set directly because c->sb.version is updated by write_super(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: Plumb bch_validate_flags to sb_field_ops.validate()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: member helper cleanupsKent Overstreet
Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: prt_printf() now respects \r\n\tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Better write_super() error messagesKent Overstreet
When a superblock write is silently dropped or it's been modified by another process we need to know which device it was. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-07bcachefs: Fix race in bch2_write_super()Kent Overstreet
bch2_write_super() was looping over online devices multiple times - dropping and retaking io_ref each time. This meant it could race with device removal; it could increment the sequence number on a device but fail to write it - and then if the device was re-added, it would get confused the next time around thinking a superblock write was silently dropped. Fix this by taking io_ref once, and stashing pointers to online devices in a darray. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06bcachefs: BCH_SB_LAYOUT_SIZE_BITS_MAXKent Overstreet
Define a constant for the max superblock size, to avoid a too-large shift. Reported-by: syzbot+a8b0fb419355c91dda7f@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06bcachefs: Fix refcount put in sb_field_resize error pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15Merge tag 'bcachefs-2024-04-15' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull yet more bcachefs fixes from Kent Overstreet: "This gets recovery working again for the affected user I've been working with, and I'm still waiting to hear back on other bug reports but should fix it for everyone else who's been having issues with recovery. - Various recovery fixes: - fixes for the btree_insert_entry being resized on path allocation btree_path array recently became dynamically resizable, and btree_insert_entry along with it; this was being observed during journal replay, when write buffer btree updates don't use the write buffer and instead use the normal btree update path - multiple fixes for deadlock in recovery when we need to do lots of btree node merges; excessive merges were clocking up the whole pipeline - write buffer path now correctly does btree node merges when needed - fix failure to go RW when superblock indicates recovery passes needed (i.e. to complete an unfinished upgrade) - Various unsafety fixes - test case contributed by a user who had two drives out of a six drive array write out a whole bunch of garbage after power failure - New (tiny) on disk format feature: since it appears the btree node scan tool will be a more regular thing (crappy hardware, user error) - this adds a 64 bit per-device bitmap of regions that have ever had btree nodes. - A path->should_be_locked fix, from a larger patch series tightening up invariants and assertions around btree transaction and path locking state. This particular fix prevents us from keeping around btree_paths that are no longer needed" * tag 'bcachefs-2024-04-15' of https://evilpiepirate.org/git/bcachefs: (24 commits) bcachefs: set_btree_iter_dontneed also clears should_be_locked bcachefs: fix error path of __bch2_read_super() bcachefs: Check for backpointer bucket_offset >= bucket size bcachefs: bch_member.btree_allocated_bitmap bcachefs: sysfs internal/trigger_journal_flush bcachefs: Fix bch2_btree_node_fill() for !path bcachefs: add safety checks in bch2_btree_node_fill() bcachefs: Interior known are required to have known key types bcachefs: add missing bounds check in __bch2_bkey_val_invalid() bcachefs: Fix btree node merging on write buffer btrees bcachefs: Disable merges from interior update path bcachefs: Run merges at BCH_WATERMARK_btree bcachefs: Fix missing write refs in fs fio paths bcachefs: Fix deadlock in journal replay bcachefs: Go rw if running any explicit recovery passes bcachefs: Standardize helpers for printing enum strs with bounds checks bcachefs: don't queue btree nodes for rewrites during scan bcachefs: fix race in bch2_btree_node_evict() bcachefs: fix unsafety in bch2_stripe_to_text() bcachefs: fix unsafety in bch2_extent_ptr_to_text() ...
2024-04-15bcachefs: fix error path of __bch2_read_super()Chao Yu
In __bch2_read_super(), if kstrdup() fails, it needs to release memory in sb->holder, fix to call bch2_free_super() in the error path. Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-05Merge tag 'vfs-6.9-rc3.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains a few small fixes. This comes with some delay because I wanted to wait on people running their reproducers and the Easter Holidays meant that those replies came in a little later than usual: - Fix handling of preventing writes to mounted block devices. Since last kernel we allow to prevent writing to mounted block devices provided CONFIG_BLK_DEV_WRITE_MOUNTED isn't set and the block device is opened with restricted writes. When we switched to opening block devices as files we altered the mechanism by which we recognize when a block device has been opened with write restrictions. The detection logic assumed that only read-write mounted filesystems would apply write restrictions to their block devices from other openers. That of course is not true since it also makes sense to apply write restrictions for filesystems that are read-only. Fix the detection logic using an FMODE_* bit. We still have a few left since we freed up a couple a while ago. I also picked up a patch to free up four additional FMODE_* bits scheduled for the next merge window. - Fix counting the number of writers to a block device. This just changes the logic to be consistent. - Fix a bug in aio causing a NULL pointer derefernce after we implemented batched processing in aio. - Finally, add the changes we discussed that allows to yield block devices early even though file closing itself is deferred. This also allows us to remove two holder operations to get and release the holder to align lifetime of file and holder of the block device" * tag 'vfs-6.9-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: aio: Fix null ptr deref in aio_complete() wakeup fs,block: yield devices early block: count BLK_OPEN_RESTRICT_WRITES openers block: handle BLK_OPEN_RESTRICT_WRITES correctly
2024-04-03bcachefs: Flag btrees with missing dataKent Overstreet
We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Split out recovery_passes.cKent Overstreet
We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-27fs,block: yield devices earlyChristian Brauner
Currently a device is only really released once the umount returns to userspace due to how file closing works. That ultimately could cause an old umount assumption to be violated that concurrent umount and mount don't fail. So an exclusively held device with a temporary holder should be yielded before the filesystem is gone. Add a helper that allows callers to do that. This also allows us to remove the two holder ops that Linus wasn't excited about. Link: https://lore.kernel.org/r/20240326-vfs-bdev-end_holder-v1-1-20af85202918@kernel.org Fixes: f3a608827d1f ("bdev: open block device as files") # mainline only Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-19Merge tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Assorted bugfixes. Most are fixes for simple assertion pops; the most significant fix is for a deadlock in recovery when we have to rewrite large numbers of btree nodes to fix errors. This was incorrectly running out of the same workqueue as the core interior btree update path - we now give it its own single threaded workqueue. This was visible to users as "bch2_btree_update_start(): error: BCH_ERR_journal_reclaim_would_deadlock" - and then recovery hanging" * tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix lost wakeup on journal shutdown bcachefs; Fix deadlock in bch2_btree_update_start() bcachefs: ratelimit errors from async_btree_node_rewrite bcachefs: Run check_topology() first bcachefs: Improve bch2_fatal_error() bcachefs: Fix lost transaction restart error bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info bcachefs: fix for building in userspace bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recovery bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init() bcachefs: Improve sysfs internal/btree_updates bcachefs: Split out btree_node_rewrite_worker bcachefs: Fix locking in bch2_alloc_write_key() bcachefs: Avoid extent entry type assertions in .invalid() bcachefs: Fix spurious -BCH_ERR_transaction_restart_nested bcachefs: Fix check_key_has_snapshot() call bcachefs: Change "accounting overran journal reservation" to a warning
2024-03-18bcachefs: Improve bch2_fatal_error()Kent Overstreet
error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-15Merge tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs updates from Kent Overstreet: - Subvolume children btree; this is needed for providing a userspace interface for walking subvolumes, which will come later - Lots of improvements to directory structure checking - Improved journal pipelining, significantly improving performance on high iodepth write workloads - Discard path improvements: the discard path is more efficient, and no longer flushes the journal unnecessarily - Buffered write path can now avoid taking the inode lock - new mm helper: memalloc_flags_{save|restore} - mempool now does kvmalloc mempools * tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs: (128 commits) bcachefs: time_stats: shrink time_stat_buffer for better alignment bcachefs: time_stats: split stats-with-quantiles into a separate structure bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a diet bcachefs: time_stats: add larger units bcachefs: pull out time_stats.[ch] bcachefs: reconstruct_alloc cleanup bcachefs: fix bch_folio_sector padding bcachefs: Fix btree key cache coherency during replay bcachefs: Always flush write buffer in delete_dead_inodes() bcachefs: Fix order of gc_done passes bcachefs: fix deletion of indirect extents in btree_gc bcachefs: Prefer struct_size over open coded arithmetic bcachefs: Kill unused flags argument to btree_split() bcachefs: Check for writing superblocks with nonsense member seq fields bcachefs: fix bch2_journal_buf_to_text() lib/generic-radix-tree.c: Make nodes more reasonably sized bcachefs: copy_(to|from)_user_errcode() bcachefs: Split out bkey_types.h bcachefs: fix lost journal buf wakeup due to improved pipelining bcachefs: intercept mountoption value for bool type ...
2024-03-13bcachefs: Check for writing superblocks with nonsense member seq fieldsKent Overstreet
We're seeing some unmountable filesystems due to split brain detection going awry; it seems we somehow wrote out superblocks where we updated the superblock seq without updating any member seq fields. A given device's superblock should always have the main seq equal to it's member seq field, so this is easy to check for. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: remove redundant assignment to variable retColin Ian King
Variable ret is being assigned a value that is never read, it is being re-assigned a couple of statements later on. The assignment is redundant and can be removed. Cleans up clang scan build warning: fs/bcachefs/super-io.c:806:2: warning: Value stored to 'ret' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: bch2_print_opts()Kent Overstreet
Make sure early error messages get redirected, for kernel-fsck-from-userland. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-11Merge tag 'vfs-6.9.super' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull block handle updates from Christian Brauner: "Last cycle we changed opening of block devices, and opening a block device would return a bdev_handle. This allowed us to implement support for restricting and forbidding writes to mounted block devices. It was accompanied by converting and adding helpers to operate on bdev_handles instead of plain block devices. That was already a good step forward but ultimately it isn't necessary to have special purpose helpers for opening block devices internally that return a bdev_handle. Fundamentally, opening a block device internally should just be equivalent to opening files. So now all internal opens of block devices return files just as a userspace open would. Instead of introducing a separate indirection into bdev_open_by_*() via struct bdev_handle bdev_file_open_by_*() is made to just return a struct file. Opening and closing a block device just becomes equivalent to opening and closing a file. This all works well because internally we already have a pseudo fs for block devices and so opening block devices is simple. There's a few places where we needed to be careful such as during boot when the kernel is supposed to mount the rootfs directly without init doing it. Here we need to take care to ensure that we flush out any asynchronous file close. That's what we already do for opening, unpacking, and closing the initramfs. So nothing new here. The equivalence of opening and closing block devices to regular files is a win in and of itself. But it also has various other advantages. We can remove struct bdev_handle completely. Various low-level helpers are now private to the block layer. Other helpers were simply removable completely. A follow-up series that is already reviewed build on this and makes it possible to remove bdev->bd_inode and allows various clean ups of the buffer head code as well. All places where we stashed a bdev_handle now just stash a file and use simple accessors to get to the actual block device which was already the case for bdev_handle" * tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits) block: remove bdev_handle completely block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access bdev: remove bdev pointer from struct bdev_handle bdev: make struct bdev_handle private to the block layer bdev: make bdev_{release, open_by_dev}() private to block layer bdev: remove bdev_open_by_path() reiserfs: port block device access to file ocfs2: port block device access to file nfs: port block device access to files jfs: port block device access to file f2fs: port block device access to files ext4: port block device access to file erofs: port device access to file btrfs: port device access to file bcachefs: port block device access to file target: port block device access to file s390: port block device access to file nvme: port block device access to file block2mtd: port device access to files bcache: port block device access to files ...
2024-02-25bcachefs: port block device access to fileChristian Brauner
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-18-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-10bcachefs: fix kmemleak in __bch2_read_super error handling pathSu Yue
During xfstest tests, there are some kmemleak reports e.g. generic/051 with if USE_KMEMLEAK=yes: ==================================================================== EXPERIMENTAL kmemleak reported some memory leaks! Due to the way kmemleak works, the leak might be from an earlier test, or something totally unrelated. unreferenced object 0xffff9ef905aaf778 (size 8): comm "mount.bcachefs", pid 169844, jiffies 4295281209 (age 87.040s) hex dump (first 8 bytes): a5 cc cc cc cc cc cc cc ........ backtrace: [<ffffffff87fd9a43>] __kmem_cache_alloc_node+0x1f3/0x2c0 [<ffffffff87f49b66>] kmalloc_trace+0x26/0xb0 [<ffffffffc0a3fefe>] __bch2_read_super+0xfe/0x4e0 [bcachefs] [<ffffffffc0a3ad22>] bch2_fs_open+0x262/0x1710 [bcachefs] [<ffffffffc09c9e24>] bch2_mount+0x4c4/0x640 [bcachefs] [<ffffffff88080c90>] legacy_get_tree+0x30/0x60 [<ffffffff8802c748>] vfs_get_tree+0x28/0xf0 [<ffffffff88061fe5>] path_mount+0x475/0xb60 [<ffffffff880627e5>] __x64_sys_mount+0x105/0x140 [<ffffffff88932642>] do_syscall_64+0x42/0xf0 [<ffffffff88a000e6>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 unreferenced object 0xffff9ef96cdc4fc0 (size 32): comm "mount.bcachefs", pid 169844, jiffies 4295281209 (age 87.040s) hex dump (first 32 bytes): 2f 64 65 76 2f 6d 61 70 70 65 72 2f 74 65 73 74 /dev/mapper/test 2d 31 00 cc cc cc cc cc cc cc cc cc cc cc cc cc -1.............. backtrace: [<ffffffff87fd9a43>] __kmem_cache_alloc_node+0x1f3/0x2c0 [<ffffffff87f4a081>] __kmalloc_node_track_caller+0x51/0x150 [<ffffffff87f3adc2>] kstrdup+0x32/0x60 [<ffffffffc0a3ff1a>] __bch2_read_super+0x11a/0x4e0 [bcachefs] [<ffffffffc0a3ad22>] bch2_fs_open+0x262/0x1710 [bcachefs] [<ffffffffc09c9e24>] bch2_mount+0x4c4/0x640 [bcachefs] [<ffffffff88080c90>] legacy_get_tree+0x30/0x60 [<ffffffff8802c748>] vfs_get_tree+0x28/0xf0 [<ffffffff88061fe5>] path_mount+0x475/0xb60 [<ffffffff880627e5>] __x64_sys_mount+0x105/0x140 [<ffffffff88932642>] do_syscall_64+0x42/0xf0 [<ffffffff88a000e6>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 ==================================================================== The leak happens if bdev_open_by_path() failed to open a block device then it goes label 'out' directly without call of bch2_free_super(). Fix it by going to label 'err' instead of 'out' if bdev_open_by_path() fails. Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull more bcachefs updates from Kent Overstreet: "Some fixes, Some refactoring, some minor features: - Assorted prep work for disk space accounting rewrite - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this makes our trigger context more explicit - A few fixes to avoid excessive transaction restarts on multithreaded workloads: fstests (in addition to ktest tests) are now checking slowpath counters, and that's shaking out a few bugs - Assorted tracepoint improvements - Starting to break up bcachefs_format.h and move on disk types so they're with the code they belong to; this will make room to start documenting the on disk format better. - A few minor fixes" * tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits) bcachefs: Improve inode_to_text() bcachefs: logged_ops_format.h bcachefs: reflink_format.h bcachefs; extents_format.h bcachefs: ec_format.h bcachefs: subvolume_format.h bcachefs: snapshot_format.h bcachefs: alloc_background_format.h bcachefs: xattr_format.h bcachefs: dirent_format.h bcachefs: inode_format.h bcachefs; quota_format.h bcachefs: sb-counters_format.h bcachefs: counters.c -> sb-counters.c bcachefs: comment bch_subvolume bcachefs: bch_snapshot::btime bcachefs: add missing __GFP_NOWARN bcachefs: opts->compression can now also be applied in the background bcachefs: Prep work for variable size btree node buffers bcachefs: grab s_umount only if snapshotting ...
2024-01-21bcachefs: counters.c -> sb-counters.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Print size of superblock with space allocatedKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-10Merge tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs updates from Kent Overstreet: - btree write buffer rewrite: instead of adding keys to the btree write buffer at transaction commit time, we now journal them with a different journal entry type and copy them from the journal to the write buffer just prior to journal write. This reduces the number of atomic operations on shared cachelines in the transaction commit path and is a signicant performance improvement on some workloads: multithreaded 4k random writes went from ~650k iops to ~850k iops. - Bring back optimistic spinning for six locks: the new implementation doesn't use osq locks; instead we add to the lock waitlist as normal, and then spin on the lock_acquired bit in the waitlist entry, _not_ the lock itself. - New ioctls: - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of fsck but without mounting: useful for transparently using the kernel version of fsck from 'bcachefs fsck' when the kernel version is a better match for the on disk filesystem. - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported yet, but the passes that are supported are fully featured - errors may be corrected as normal. The new ioctls use the new 'thread_with_file' abstraction for kicking off a kthread that's tied to a file descriptor returned to userspace via the ioctl. - btree_paths within a btree_trans are now dynamically growable, instead of being limited to 64. This is important for the check_directory_structure phase of fsck, and also fixes some issues we were having with btree path overflow in the reflink btree. - Trigger refactoring; prep work for the upcoming disk space accounting rewrite - Numerous bugfixes :) * tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (226 commits) bcachefs: eytzinger0_find() search should be const bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent() bcachefs: fix simulateously upgrading & downgrading bcachefs: Restart recovery passes more reliably bcachefs: bch2_dump_bset() doesn't choke on u64s == 0 bcachefs: improve checksum error messages bcachefs: improve validate_bset_keys() bcachefs: print sb magic when relevant bcachefs: __bch2_sb_field_to_text() bcachefs: %pg is banished bcachefs: Improve would_deadlock trace event bcachefs: fsck_err()s don't need to manually check c->sb.version anymore bcachefs: Upgrades now specify errors to fix, like downgrades bcachefs: no thread_with_file in userspace bcachefs: Don't autofix errors we can't fix bcachefs: add missing bch2_latency_acct() call bcachefs: increase max_active on io_complete_wq bcachefs: add time_stats for btree_node_read_done() bcachefs: don't clear accessed bit in btree node fill bcachefs: Add an option to control btree node prefetching ...
2024-01-08Merge tag 'vfs-6.8.super' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs super updates from Christian Brauner: "This contains the super work for this cycle including the long-awaited series by Jan to make it possible to prevent writing to mounted block devices: - Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether other writeable opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are allowed. Note that this effectively only prevents modification of the particular block device's page cache by other writers. The actual device content can still be modified by other means - e.g. by issuing direct scsi commands, by doing writes through devices lower in the storage stack (e.g. in case loop devices, DM, or MD are involved) etc. But blocking direct modifications of the block device page cache is enough to give filesystems a chance to perform data validation when loading data from the underlying storage and thus prevent kernel crashes. Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening. We expect that this will be interesting to quite a few workloads. Btrfs is currently opted out of this because they still haven't merged patches we require for this to work from three kernel releases ago. - Reimplement block device freezing and thawing as holder operations on the block device. This allows us to extend block device freezing to all devices associated with a superblock and not just the main device. It also allows us to remove get_active_super() and thus another function that scans the global list of superblocks. Freezing via additional block devices only works if the filesystem chooses to use @fs_holder_ops for these additional devices as well. That currently only includes ext4 and xfs. Earlier releases switched get_tree_bdev() and mount_bdev() to use @fs_holder_ops. The remaining nilfs2 open-coded version of mount_bdev() has been converted to rely on @fs_holder_ops as well. So block device freezing for the main block device will continue to work as before. There should be no regressions in functionality. The only special case is btrfs where block device freezing for the main block device never worked because sb->s_bdev isn't set. Block device freezing for btrfs can be fixed once they can switch to @fs_holder_ops but that can happen whenever they're ready" * tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) block: Fix a memory leak in bdev_open_by_dev() super: don't bother with WARN_ON_ONCE() super: massage wait event mechanism ext4: Block writes to journal device xfs: Block writes to log device fs: Block writes to mounted block devices btrfs: Do not restrict writes to btrfs devices block: Add config option to not allow writing to mounted devices block: Remove blkdev_get_by_*() functions bcachefs: Convert to bdev_open_by_path() fs: handle freezing from multiple devices fs: remove dead check nilfs2: simplify device handling fs: streamline thaw_super_locked ext4: simplify device handling xfs: simplify device handling fs: simplify setup_bdev_super() calls blkdev: comment fs_holder_ops porting: document block device freeze and thaw changes fs: remove unused helper ...
2024-01-05bcachefs: fix simulateously upgrading & downgradingKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: improve checksum error messagesKent Overstreet
new helpers: - bch2_csum_to_text() - bch2_csum_err_msg() standardize our checksum error messages a bit, and print out the checksums a bit more nicely. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: print sb magic when relevantKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: __bch2_sb_field_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: %pg is banishedKent Overstreet
not portable to userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Upgrades now specify errors to fix, like downgradesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>