summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-05-09ext4: fix uninitialized ratelimit_state->lock access in __ext4_fill_super()Baokun Li
In the following concurrency we will access the uninitialized rs->lock: ext4_fill_super ext4_register_sysfs // sysfs registered msg_ratelimit_interval_ms // Other processes modify rs->interval to // non-zero via msg_ratelimit_interval_ms ext4_orphan_cleanup ext4_msg(sb, KERN_INFO, "Errors on filesystem, " __ext4_msg ___ratelimit(&(EXT4_SB(sb)->s_msg_ratelimit_state) if (!rs->interval) // do nothing if interval is 0 return 1; raw_spin_trylock_irqsave(&rs->lock, flags) raw_spin_trylock(lock) _raw_spin_trylock __raw_spin_trylock spin_acquire(&lock->dep_map, 0, 1, _RET_IP_) lock_acquire __lock_acquire register_lock_class assign_lock_key dump_stack(); ratelimit_state_init(&sbi->s_msg_ratelimit_state, 5 * HZ, 10); raw_spin_lock_init(&rs->lock); // init rs->lock here and get the following dump_stack: ========================================================= INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 12 PID: 753 Comm: mount Tainted: G E 6.7.0-rc6-next-20231222 #504 [...] Call Trace: dump_stack_lvl+0xc5/0x170 dump_stack+0x18/0x30 register_lock_class+0x740/0x7c0 __lock_acquire+0x69/0x13a0 lock_acquire+0x120/0x450 _raw_spin_trylock+0x98/0xd0 ___ratelimit+0xf6/0x220 __ext4_msg+0x7f/0x160 [ext4] ext4_orphan_cleanup+0x665/0x740 [ext4] __ext4_fill_super+0x21ea/0x2b10 [ext4] ext4_fill_super+0x14d/0x360 [ext4] [...] ========================================================= Normally interval is 0 until s_msg_ratelimit_state is initialized, so ___ratelimit() does nothing. But registering sysfs precedes initializing rs->lock, so it is possible to change rs->interval to a non-zero value via the msg_ratelimit_interval_ms interface of sysfs while rs->lock is uninitialized, and then a call to ext4_msg triggers the problem by accessing an uninitialized rs->lock. Therefore register sysfs after all initializations are complete to avoid such problems. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240102133730.1098120-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-09NFSD: Force all NFSv4.2 COPY requests to be synchronousChuck Lever
We've discovered that delivering a CB_OFFLOAD operation can be unreliable in some pretty unremarkable situations. Examples include: - The server dropped the connection because it lost a forechannel NFSv4 request and wishes to force the client to retransmit - The GSS sequence number window under-flowed - A network partition occurred When that happens, all pending callback operations, including CB_OFFLOAD, are lost. NFSD does not retransmit them. Moreover, the Linux NFS client does not yet support sending an OFFLOAD_STATUS operation to probe whether an asynchronous COPY operation has finished. Thus, on Linux NFS clients, when a CB_OFFLOAD is lost, asynchronous COPY can hang until manually interrupted. I've tried a couple of remedies, but so far the side-effects are worse than the disease and they have had to be reverted. So temporarily force COPY operations to be synchronous so that the use of CB_OFFLOAD is avoided entirely. This is a fix that can easily be backported to LTS kernels. I am working on client patches that introduce an implementation of OFFLOAD_STATUS. Note that NFSD arbitrarily limits the size of a copy_file_range to 4MB to avoid indefinitely blocking an nfsd thread. A short COPY result is returned in that case, and the client can present a fresh COPY request for the remainder. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-09ext4: remove calls to to set/clear the folio error flagMatthew Wilcox (Oracle)
Nobody checks this flag on ext4 folios, stop setting and clearing it. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: linux-ext4@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240420025029.2166544-11-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-09f2fs: check validation of fault attrs in f2fs_build_fault_attr()Chao Yu
- It missed to check validation of fault attrs in parse_options(), let's fix to add check condition in f2fs_build_fault_attr(). - Use f2fs_build_fault_attr() in __sbi_store() to clean up code. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-05-09f2fs: fix to limit gc_pin_file_thresholdChao Yu
type of f2fs_inode.i_gc_failures, f2fs_inode_info.i_gc_failures, and f2fs_sb_info.gc_pin_file_threshold is __le16, unsigned int, and u64, so it will cause truncation during comparison and persistence. Unifying variable of these three variables to unsigned short, and add an upper boundary limitation for gc_pin_file_threshold. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-05-09f2fs: remove unused GC_FAILURE_PINChao Yu
After commit 3db1de0e582c ("f2fs: change the current atomic write way"), we removed all GC_FAILURE_ATOMIC usage, let's change i_gc_failures[] array to i_pin_failure for cleanup. Meanwhile, let's define i_current_depth and i_gc_failures as union variable due to they won't be valid at the same time. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-05-09f2fs: use f2fs_{err,info}_ratelimited() for cleanupChao Yu
Commit b1c9d3f833ba ("f2fs: support printk_ratelimited() in f2fs_printk()") missed some cases, cover all remains for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-05-09f2fs: fix block migration when section is not aligned to pow2Wu Bo
As for zoned-UFS, f2fs section size is forced to zone size. And zone size may not aligned to pow2. Fixes: 859fca6b706e ("f2fs: swap: support migrating swapfile in aligned write mode") Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-05-09erofs: Zstandard compression supportGao Xiang
Add Zstandard compression as the 4th supported algorithm since it becomes more popular now and some end users have asked this for quite a while [1][2]. Each EROFS physical cluster contains only one valid standard Zstandard frame as described in [3] so that decompression can be performed on a per-pcluster basis independently. Currently, it just leverages multi-call stream decompression APIs with internal sliding window buffers. One-shot or bufferless decompression could be implemented later for even better performance if needed. [1] https://github.com/erofs/erofs-utils/issues/6 [2] https://lore.kernel.org/r/Y08h+z6CZdnS1XBm@B-P7TQMD6M-0146.lan [3] https://www.rfc-editor.org/rfc/rfc8478.txt Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240508234453.17896-1-xiang@kernel.org
2024-05-08bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.hPetr Vorel
Move BCACHEFS_STATFS_MAGIC value to UAPI <linux/magic.h> under BCACHEFS_SUPER_MAGIC definition (use common approach for name) and reuse the definition in bcachefs_format.h BCACHEFS_STATFS_MAGIC. There are other bcachefs magic definitions: BCACHE_MAGIC, BCHFS_MAGIC, which use UUID_INIT() and are used only in libbcachefs. Therefore move only BCACHEFS_STATFS_MAGIC value, which can be used outside of libbcachefs for f_type field in struct statfs in statfs() or fstatfs(). Suggested-by: Su Yue <glass.su@suse.com> Signed-off-by: Petr Vorel <pvorel@suse.cz> Acked-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Improve sysfs internal/btree_cacheKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Allocator prefers not to expand mi.btree_allocated bitmapKent Overstreet
We now have a small bitmap in the member info section of the superblock for "regions that have btree nodes", so that if we ever have to scan for btree nodes in repair we don't have to scan the whole device(s). This tweaks the allocator to prefer allocating from regions that are already marked in this bitmap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Better bucket alloc tracepointsKent Overstreet
Tracepoints are garbage, and perf trace even cuts off some of our fields. Much nicer to just trace a string, and then we can build nicely formatted output with printbufs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Move nocow unlock to bch2_write_endio()Kent Overstreet
This fixes a lifetime issue; bch2_nocow_write_unlock() uses PTR_BUCKET_POS(), which needs the device - but we drop our ref to the device in bch2_write_endio(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in journal_ptrs_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in discard_one_bucket_fast()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in check_alloc_info()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_dev_have_ref()Kent Overstreet
bch2_dev_bkey_exists() is going away; bch2_dev_have_ref() documents that we're looking up a device without checking if it's present because we have a reference to it already. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in data_update_init()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in bkey_pick_read_device()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: pass bch_dev to read_from_stale_dirty_pointer()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_dev_bucket_exists() uses bch2_dev_rcu()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in btree_gc.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_extent_normalize() -> bch2_dev_rcu()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_bkey_has_target() -> bch2_dev_rcu()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: extent_ptr_invalid() -> bch2_dev_rcu()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: ptr_stale() -> dev_ptr_stale()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: extent_ptr_durability() -> bch2_dev_rcu()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_extent_merge() -> bch2_dev_rcu()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: ec_validate_checksums() -> bch2_dev_tryget()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: ob_dev()Kent Overstreet
Wrapper around bch2_dev_have_ref() for open_buckets; we do guarantee that the device an open_bucket points to exists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: move replica_set from bch_dev to bch_fsKent Overstreet
This is needed for the next patch - the write submit path has to be able to allocate a replica bio even when we weren't able to get a ref on the device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Kill bch2_dev_bkey_exists() in backpointer codeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: PTR_BUCKET_POS() now takes bch_devKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_dev_iterate()Kent Overstreet
New helper for getting refs to devices as we iterate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_evacuate_bucket() -> bch2_dev_tryget()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_bucket_ref_update() now takes bch_devKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_trigger_alloc() -> bch2_dev_tryget()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_check_alloc_key() -> bch2_dev_tryget_noerror()Kent Overstreet
More elimination of bch2_dev_bkey_exists() usage. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Convert to bch2_dev_tryget_noerror()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_dev_tryget()Kent Overstreet
Most uses of bch2_dev_bkey_exists() are going away, where we assume that because a key references a device the device most exist - instead, we'll be explicitly checking if the device exists and getting a reference to it. This adds the new helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_have_enough_devs() checks for nonexistent deviceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: journal_replay_entry_early() checks for nonexistent deviceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_dev_btree_bitmap_marked() -> bch2_dev_rcu()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Pass device to bch2_bucket_do_index()Kent Overstreet
Eliminating bch2_dev_bkey_exists() uses and replacing them with proper checks; this one was unnecessary since the caller already has it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Pass device to bch2_alloc_write_key()Kent Overstreet
More elimating bch2_dev_bkey_exists() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_dev_safe() -> bch2_dev_rcu()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Debug asserts for ca->refKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: New helpers for device refcountsKent Overstreet
This will be used in the next patch for adding some new debug mode asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_print_allocator_stuck()Kent Overstreet
If we block on the allocator for more than 10 seconds, print out some useful debugging info. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>