summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-29bcachefs: fix possible null-ptr-deref in __bch2_ec_stripe_head_get()Gaosheng Cui
The function ec_new_stripe_head_alloc() returns nullptr if kzalloc() fails. It is crucial to verify its return value before dereferencing it to avoid a potential nullptr dereference. Fixes: 035d72f72c91 ("bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-29bcachefs: Fix deadlock on -ENOSPC w.r.t. partial open bucketsKent Overstreet
Open buckets on the partial list should not count as allocated when we're trying to allocate from the partial list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-29bcachefs: Don't filter partial list buckets in open_buckets_to_text()Kent Overstreet
these are an important source of stranded buckets we need to be able to watch Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-29bcachefs: Don't keep tons of cached pointers aroundKent Overstreet
We had a bug report where the data update path was creating an extent that failed to validate because it had too many pointers; almost all of them were cached. To fix this, we have: - want_cached_ptr(), a new helper that checks if we even want a cached pointer (is on appropriate target, device is readable). - bch2_extent_set_ptr_cached() now only sets a pointer cached if we want it. - bch2_extent_normalize_by_opts() now ensures that we only have a single cached pointer that we want. While working on this, it was noticed that this doesn't work well with reflinked data and per-file options. Another patch series is coming that plumbs through additional io path options through bch_extent_rebalance, with improved option handling. Reported-by: Reed Riley <reed@riley.engineer> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-29bcachefs: init freespace inited bits to 0 in bch2_fs_initializePiotr Zalewski
Initialize freespace_initialized bits to 0 in member's flags and update member's cached version for each device in bch2_fs_initialize. It's possible for the bits to be set to 1 before fs is initialized and if call to bch2_trans_mark_dev_sbs (just before bch2_fs_freespace_init) fails bits remain to be 1 which can later indirectly trigger BUG condition in bch2_bucket_alloc_freelist during shutdown. Reported-by: syzbot+2b6a17991a6af64f9489@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2b6a17991a6af64f9489 Fixes: bbe682c76789 ("bcachefs: Ensure devices are always correctly initialized") Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-29bcachefs: Fix unhandled transaction restart in fallocateKent Overstreet
This used to not matter, but now we're being more strict. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-25bcachefs: Fix UAF in bch2_reconstruct_alloc()Kent Overstreet
write_super() -> sb_counters_from_cpu() may reallocate the superblock Reported-by: syzbot+9fc4dac4775d07bcfe34@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-25bcachefs: fix null-ptr-deref in have_stripes()Jeongjun Park
c->btree_roots_known[i].b can be NULL. In this case, a NULL pointer dereference occurs, so you need to add code to check the variable. Reported-by: syzbot+b468b9fef56949c3b528@syzkaller.appspotmail.com Fixes: 7773df19c35f ("bcachefs: metadata version bucket_stripe_sectors") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-24bcachefs: fix shift oob in alloc_lru_idx_fragmentationJeongjun Park
The size of a.data_type is set abnormally large, causing shift-out-of-bounds. To fix this, we need to add validation on a.data_type in alloc_lru_idx_fragmentation(). Reported-by: syzbot+7f45fa9805c40db3f108@syzkaller.appspotmail.com Fixes: 260af1562ec1 ("bcachefs: Kill alloc_v4.fragmentation_lru") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-24bcachefs: Fix invalid shift in validate_sb_layout()Gianfranco Trad
Add check on layout->sb_max_size_bits against BCH_SB_LAYOUT_SIZE_BITS_MAX to prevent UBSAN shift-out-of-bounds in validate_sb_layout(). Reported-by: syzbot+089fad5a3a5e77825426@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=089fad5a3a5e77825426 Fixes: 03ef80b469d5 ("bcachefs: Ignore unknown mount options") Tested-by: syzbot+089fad5a3a5e77825426@syzkaller.appspotmail.com Signed-off-by: Gianfranco Trad <gianf.trad@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-20bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode pathKent Overstreet
This fixes a fsck bug on a very old filesystem (pre mainline merge). Fixes: 72350ee0ea22 ("bcachefs: Kill snapshot arg to fsck_write_inode()") Reported-by: Marcin Mirosław <marcin@mejor.pl> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-20bcachefs: Mark more errors as AUTOFIXKent Overstreet
Reported-by: Marcin Mirosław <marcin@mejor.pl> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-20bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocationsKent Overstreet
kvmalloc() doesn't support allocations > INT_MAX, but vmalloc() does - the limit should be lifted, but we can work around this for now. A user with a 75 TB filesystem reported the following journal replay error: https://github.com/koverstreet/bcachefs/issues/769 In journal replay we have to sort and dedup all the keys from the journal, which means we need a large contiguous allocation. Given that the user has 128GB of ram, the 2GB limit on allocation size has become far too small. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-20bcachefs: Don't use wait_event_interruptible() in recoveryKent Overstreet
Fix a bug where mount was failing with -ERESTARTSYS: https://github.com/koverstreet/bcachefs/issues/741 We only want the interruptible wait when called from fsync. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-20bcachefs: Fix __bch2_fsck_err() warningKent Overstreet
We only warn about having a btree_trans that wasn't passed in if we'll be prompting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: fsck: Improve hash_check_key()Kent Overstreet
hash_check_key() checks and repairs the hash table btrees: dirents and xattrs are open addressing hash tables. We recently had a corruption reported where the hash type on an inode somehow got flipped, which made the existing dirents invisible and allowed new ones to be created with the same name. Now, hash_check_key() can repair duplicates: it will delete one of them, if it has an xattr or dangling dirent, but if it has two valid dirents one of them gets renamed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: bch2_hash_set_or_get_in_snapshot()Kent Overstreet
Add a variant of bch2_hash_set_in_snapshot() that returns the existing key on -EEXIST. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: Repair mismatches in inode hash seed, typeKent Overstreet
Different versions of the same inode (same inode number, different snapshot ID) must have the same hash seed and type - lookups require this, since they see keys from different snapshots simultaneously. To repair we only need to make the inodes consistent, hash_check_key() will do the rest. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: Add hash seed, type to inode_to_text()Kent Overstreet
This helped with discovering some filesystem corruption fsck has having trouble with: the str_hash type had gotten flipped on one snapshot's version of an inode. All versions of a given inode number have the same hash seed and hash type, since lookups will be done with a single hash/seed and type and see dirents/xattrs from multiple snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: INODE_STR_HASH() for bch_inode_unpackedKent Overstreet
Trivial cleanup - add a normal BITMASK() helper for bch_inode_unpacked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: Run in-kernel offline fsck without ratelimit errorsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: skip mount option handle for empty string.Hongbo Li
The options parse in get_tree will split the options buffer, it will get the empty string for last one by strsep(). After commit ea0eeb89b1d5 ("bcachefs: reject unknown mount options") is merged, unknown mount options is not allowed (here is empty string), and this causes this errors. This can be reproduced just by the following steps: bcachefs format /dev/loop mount -t bcachefs -o metadata_target=loop1 /dev/loop1 /mnt/bcachefs/ Fixes: ea0eeb89b1d5 ("bcachefs: reject unknown mount options") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: fix incorrect show_options resultsHongbo Li
When call show_options in bcachefs, the options buffer is appeneded to the seq variable. In fact, it requires an additional comma to be appended first. This will affect the remount process when reading existing mount options. Fixes: 9305cf91d05e ("bcachefs: bch2_opts_to_text()") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: Fix data corruption on -ENOSPC in buffered write pathKent Overstreet
Found by generic/299: When we have to truncate a write due to -ENOSPC, we may have to read in the folio we're writing to if we're now no longer doing a complete write to a !uptodate folio. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: bch2_folio_reservation_get_partial() is now better behavedKent Overstreet
bch2_folio_reservation_get_partial(), on partial success, will now return a reservation that's aligned to the filesystem blocksize. This is a partial fix for fstests generic/299 - fio verify is badly behaved in the presence of short writes that aren't aligned to its blocksize. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: fix disk reservation accounting in bch2_folio_reservation_get()Kent Overstreet
bch2_disk_reservation_put() zeroes out the reservation - oops. This fixes a disk reservation leak when getting a quota reservation returned an error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefS: ec: fix data type on stripe deletionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: Don't use commit_do() unnecessarilyKent Overstreet
Using commit_do() to call alloc_sectors_start_trans() breaks when we're randomly injecting transaction restarts - the restart in the commit causes us to leak the lock that alloc_sectorS_start_trans() takes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: handle restarts in bch2_bucket_io_time_reset()Kent Overstreet
bch2_bucket_io_time_reset() doesn't need to succeed, which is why it didn't previously retry on transaction restart - but we're now treating these as errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: fix restart handling in __bch2_resume_logged_op_finsert()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: fix restart handling in bch2_alloc_write_key()Kent Overstreet
This is ugly: We may discover in alloc_write_key that the data type we calculated is wrong, because BCH_DATA_need_discard is checked/set elsewhere, and the disk accounting counters we calculated need to be updated. But bch2_alloc_key_to_dev_counters(..., BTREE_TRIGGER_gc) is not safe w.r.t. transaction restarts, so we need to propagate the fixup back to our gc state in case we take a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: fix restart handling in bch2_do_invalidates_work()Kent Overstreet
this one is fairly harmless since the invalidate worker will just run again later if it needs to, but still worth fixing Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: fix missing restart handling in bch2_read_retry_nodecode()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: fix restart handling in bch2_fiemap()Kent Overstreet
We were leaking transaction restart errors to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: fix bch2_hash_delete() error pathKent Overstreet
we were exiting an iterator that hadn't been initialized Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: fix restart handling in bch2_rename2()Kent Overstreet
This should be impossible to hit in practice; the first lookup within a transaction won't return a restart due to lock ordering, but we're adding fault injection for transaction restarts and shaking out bugs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-14bcachefs: Fix sysfs warning in fstests generic/730,731Kent Overstreet
sysfs warns if we're removing a symlink from a directory that's no longer in sysfs; this is triggered by fstests generic/730, which simulates hot removal of a block device. This patch is however not a correct fix, since checking kobj->state_in_sysfs on a kobj owned by another subsystem is racy. A better fix would be to add the appropriate check to sysfs_remove_link() - and sysfs_create_link() as well. But kobject_add_internal()/kobject_del() do not as of today have locking that would support that. Note that the block/holder.c code appears to be subject to this race as well. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-13bcachefs: Handle race between stripe reuse, invalidate_stripe_to_devKent Overstreet
When creating a new stripe, we may reuse an existing stripe that has some empty and some nonempty blocks. Generally, the existing stripe won't change underneath us - except for block sector counts, which we copy to the new key in ec_stripe_key_update. But the device removal path can now invalidate stripe pointers to a device, and that can race with stripe reuse. Change ec_stripe_key_update() to check for and resolve this inconsistency. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-13bcachefs: Fix kasan splat in new_stripe_alloc_buckets()Kent Overstreet
Update for BCH_SB_MEMBER_INVALID. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-13bcachefs: Add missing validation for bch_stripe.csum_granularity_bitsKent Overstreet
Reported-by: syzbot+f8c98a50c323635be65d@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-13bcachefs: Fix missing bounds checks in bch2_alloc_read()Kent Overstreet
We were checking that the alloc key was for a valid device, but not a valid bucket. This is the upgrade path from versions prior to bcachefs being mainlined. Reported-by: syzbot+a1b59c8e1a3f022fd301@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-13bcachefs: fix uaf in bch2_dio_write_done()Kent Overstreet
Reported-by: syzbot+19ad84d5133871207377@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-12bcachefs: Improve check_snapshot_exists()Kent Overstreet
Check if we have snapshot_trees or subvolumes that refer to the snapshot node being reconstructed, and use them. With this, the kill_btree_root test that blows away the snapshots btree now passes, and we're able to successfully reconstruct. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-12bcachefs: Fix bkey_nocow_lock()Kent Overstreet
This fixes an assertion pop in nocow_locking.c 00243 kernel BUG at fs/bcachefs/nocow_locking.c:41! 00243 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP 00243 Modules linked in: 00243 Hardware name: linux,dummy-virt (DT) 00243 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) 00244 pc : bch2_bucket_nocow_unlock (/home/testdashboard/linux-7/fs/bcachefs/nocow_locking.c:41) 00244 lr : bkey_nocow_lock (/home/testdashboard/linux-7/fs/bcachefs/data_update.c:79) 00244 sp : ffffff80c82373b0 00244 x29: ffffff80c82373b0 x28: ffffff80e08958c0 x27: ffffff80e0880000 00244 x26: ffffff80c8237a98 x25: 00000000000000a0 x24: ffffff80c8237ab0 00244 x23: 00000000000000c0 x22: 0000000000000008 x21: 0000000000000000 00244 x20: ffffff80c8237a98 x19: 0000000000000018 x18: 0000000000000000 00244 x17: 0000000000000000 x16: 000000000000003f x15: 0000000000000000 00244 x14: 0000000000000008 x13: 0000000000000018 x12: 0000000000000000 00244 x11: 0000000000000000 x10: ffffff80e0880000 x9 : ffffffc0803ac1a4 00244 x8 : 0000000000000018 x7 : ffffff80c8237a88 x6 : ffffff80c8237ab0 00244 x5 : ffffff80e08988d0 x4 : 00000000ffffffff x3 : 0000000000000000 00244 x2 : 0000000000000004 x1 : 0003000000000d1e x0 : ffffff80e08988c0 00244 Call trace: 00244 bch2_bucket_nocow_unlock (/home/testdashboard/linux-7/fs/bcachefs/nocow_locking.c:41) 00245 bch2_data_update_init (/home/testdashboard/linux-7/fs/bcachefs/data_update.c:627 (discriminator 1)) 00245 promote_alloc.isra.0 (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:242 /home/testdashboard/linux-7/fs/bcachefs/io_read.c:304) 00245 __bch2_read_extent (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:949) 00246 __bch2_read (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:1215) 00246 bch2_direct_IO_read (/home/testdashboard/linux-7/fs/bcachefs/fs-io-direct.c:132) 00246 bch2_read_iter (/home/testdashboard/linux-7/fs/bcachefs/fs-io-direct.c:201) 00247 aio_read.constprop.0 (/home/testdashboard/linux-7/fs/aio.c:1602) 00247 io_submit_one.constprop.0 (/home/testdashboard/linux-7/fs/aio.c:2003 /home/testdashboard/linux-7/fs/aio.c:2052) 00248 __arm64_sys_io_submit (/home/testdashboard/linux-7/fs/aio.c:2111 /home/testdashboard/linux-7/fs/aio.c:2081 /home/testdashboard/linux-7/fs/aio.c:2081) 00248 invoke_syscall.constprop.0 (/home/testdashboard/linux-7/arch/arm64/include/asm/syscall.h:61 /home/testdashboard/linux-7/arch/arm64/kernel/syscall.c:54) 00248 ========= FAILED TIMEOUT tiering_variable_buckets_replicas in 1200s Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-12bcachefs: Fix accounting replay flagsKent Overstreet
BCH_TRANS_COMMIT_journal_reclaim without BCH_WATERMARK_reclaim means "return an error if low on journal space" - but accounting replay must succeed. Fixes https://github.com/koverstreet/bcachefs/issues/656 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-12bcachefs: Fix invalid shift in member_to_text()Kent Overstreet
Reported-by: syzbot+064ce437a1ad63d3f6ef@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-11bcachefs: Fix bch2_have_enough_devs() for BCH_SB_MEMBER_INVALIDKent Overstreet
This fixes a kasan splat in the ec device removal tests. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-09bcachefs: __wait_for_freeing_inode: Switch to wait_bit_queue_entryKent Overstreet
inode_bit_waitqueue() is changing - this update clears the way for sched changes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-09bcachefs: Check if stuck in journal_res_get()Kent Overstreet
Like how we already do when the allocator seems to be stuck, check if we're waiting too long for a journal reservation and print some debug info. This is specifically to track down https://github.com/koverstreet/bcachefs/issues/656 which is showing up in userspace where we don't have sysfs/debugfs to get the journal debug info. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-09closures: Add closure_wait_event_timeout()Kent Overstreet
Add a closure version of wait_event_timeout(), with the same semantics. The closure version is useful because unlike wait_event(), it allows blocking code to run in the conditional expression. Cc: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>