summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-02-29btrfs: fix race between ordered extent completion and fiemapFilipe Manana
For fiemap we recently stopped locking the target extent range for the whole duration of the fiemap call, in order to avoid a deadlock in a scenario where the fiemap buffer happens to be a memory mapped range of the same file. This use case is very unlikely to be useful in practice but it may be triggered by fuzz testing (syzbot, etc). However by not locking the target extent range for the whole duration of the fiemap call we can race with an ordered extent. This happens like this: 1) The fiemap task finishes processing a file extent item that covers the file range [512K, 1M[, and that file extent item is the last item in the leaf currently being processed; 2) And ordered extent for the file range [768K, 2M[, in COW mode, completes (btrfs_finish_one_ordered()) and the file extent item covering the range [512K, 1M[ is trimmed to cover the range [512K, 768K[ and then a new file extent item for the range [768K, 2M[ is inserted in the inode's subvolume tree; 3) The fiemap task calls fiemap_next_leaf_item(), which then calls btrfs_next_leaf() to find the next leaf / item. This finds that the the next key following the one we previously processed (its type is BTRFS_EXTENT_DATA_KEY and its offset is 512K), is the key corresponding to the new file extent item inserted by the ordered extent, which has a type of BTRFS_EXTENT_DATA_KEY and an offset of 768K; 4) Later the fiemap code ends up at emit_fiemap_extent() and triggers the warning: if (cache->offset + cache->len > offset) { WARN_ON(1); return -EINVAL; } Since we get 1M > 768K, because the previously emitted entry for the old extent covering the file range [512K, 1M[ ends at an offset that is greater than the new extent's start offset (768K). This makes fiemap fail with -EINVAL besides triggering the warning that produces a stack trace like the following: [1621.677651] ------------[ cut here ]------------ [1621.677656] WARNING: CPU: 1 PID: 204366 at fs/btrfs/extent_io.c:2492 emit_fiemap_extent+0x84/0x90 [btrfs] [1621.677899] Modules linked in: btrfs blake2b_generic (...) [1621.677951] CPU: 1 PID: 204366 Comm: pool Not tainted 6.8.0-rc5-btrfs-next-151+ #1 [1621.677954] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [1621.677956] RIP: 0010:emit_fiemap_extent+0x84/0x90 [btrfs] [1621.678033] Code: 2b 4c 89 63 (...) [1621.678035] RSP: 0018:ffffab16089ffd20 EFLAGS: 00010206 [1621.678037] RAX: 00000000004fa000 RBX: ffffab16089ffe08 RCX: 0000000000009000 [1621.678039] RDX: 00000000004f9000 RSI: 00000000004f1000 RDI: ffffab16089ffe90 [1621.678040] RBP: 00000000004f9000 R08: 0000000000001000 R09: 0000000000000000 [1621.678041] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000041d78000 [1621.678043] R13: 0000000000001000 R14: 0000000000000000 R15: ffff9434f0b17850 [1621.678044] FS: 00007fa6e20006c0(0000) GS:ffff943bdfa40000(0000) knlGS:0000000000000000 [1621.678046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1621.678048] CR2: 00007fa6b0801000 CR3: 000000012d404002 CR4: 0000000000370ef0 [1621.678053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1621.678055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1621.678056] Call Trace: [1621.678074] <TASK> [1621.678076] ? __warn+0x80/0x130 [1621.678082] ? emit_fiemap_extent+0x84/0x90 [btrfs] [1621.678159] ? report_bug+0x1f4/0x200 [1621.678164] ? handle_bug+0x42/0x70 [1621.678167] ? exc_invalid_op+0x14/0x70 [1621.678170] ? asm_exc_invalid_op+0x16/0x20 [1621.678178] ? emit_fiemap_extent+0x84/0x90 [btrfs] [1621.678253] extent_fiemap+0x766/0xa30 [btrfs] [1621.678339] btrfs_fiemap+0x45/0x80 [btrfs] [1621.678420] do_vfs_ioctl+0x1e4/0x870 [1621.678431] __x64_sys_ioctl+0x6a/0xc0 [1621.678434] do_syscall_64+0x52/0x120 [1621.678445] entry_SYSCALL_64_after_hwframe+0x6e/0x76 There's also another case where before calling btrfs_next_leaf() we are processing a hole or a prealloc extent and we had several delalloc ranges within that hole or prealloc extent. In that case if the ordered extents complete before we find the next key, we may end up finding an extent item with an offset smaller than (or equals to) the offset in cache->offset. So fix this by changing emit_fiemap_extent() to address these three scenarios like this: 1) For the first case, steps listed above, adjust the length of the previously cached extent so that it does not overlap with the current extent, emit the previous one and cache the current file extent item; 2) For the second case where he had a hole or prealloc extent with multiple delalloc ranges inside the hole or prealloc extent's range, and the current file extent item has an offset that matches the offset in the fiemap cache, just discard what we have in the fiemap cache and assign the current file extent item to the cache, since it's more up to date; 3) For the third case where he had a hole or prealloc extent with multiple delalloc ranges inside the hole or prealloc extent's range and the offset of the file extent item we just found is smaller than what we have in the cache, just skip the current file extent item if its range end at or behind the cached extent's end, because we may have emitted (to the fiemap user space buffer) delalloc ranges that overlap with the current file extent item's range. If the file extent item's range goes beyond the end offset of the cached extent, just emit the cached extent and cache a subrange of the file extent item, that goes from the end offset of the cached extent to the end offset of the file extent item. Dealing with those cases in those ways makes everything consistent by reflecting the current state of file extent items in the btree and without emitting extents that have overlapping ranges (which would be confusing and violating expectations). This issue could be triggered often with test case generic/561, and was also hit and reported by Wang Yugui. Reported-by: Wang Yugui <wangyugui@e16-tech.com> Link: https://lore.kernel.org/linux-btrfs/20240223104619.701F.409509F4@e16-tech.com/ Fixes: b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-29f2fs: fix write pointers all the timeJaegeuk Kim
Even if the roll forward recovery stopped due to any error, we have to fix the write pointers in order to mount the disk from the previous checkpoint. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-29f2fs: check number of blocks in a current sectionJaegeuk Kim
In cfd66bb715fd ("f2fs: fix deadloop in foreground GC"), we needed to check the number of blocks in a section instead of the segment. Fixes: cfd66bb715fd ("f2fs: fix deadloop in foreground GC") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-29f2fs: prevent an f2fs_gc loop during disable_checkpointJaegeuk Kim
Don't get stuck in the f2fs_gc loop while disabling checkpoint. Instead, we have a time-based management. Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-29f2fs: introduce FAULT_NO_SEGMENTChao Yu
Use it to simulate no free segment case during block allocation. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-29f2fs: compress: fix to check compress flag w/ .i_sem lockChao Yu
It needs to check compress flag w/ .i_sem lock, otherwise, compressed inode may be disabled after the check condition, it's not needed to set compress option on non-compress inode. Fixes: e1e8debec656 ("f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-29f2fs: fix to handle segment allocation failure correctlyChao Yu
If CONFIG_F2FS_CHECK_FS is off, and for very rare corner case that we run out of free segment, we should not panic kernel, instead, let's handle such error correctly in its caller. Signed-off-by: Chao Yu <chao@kernel.org> Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-29f2fs: stop checkpoint when get a out-of-bounds segmentZhiguo Niu
There is low probability that an out-of-bounds segment will be got on a small-capacity device. In order to prevent subsequent write requests allocating block address from this invalid segment, which may cause unexpected issue, stop checkpoint should be performed. Also introduce a new stop cp reason: STOP_CP_REASON_NO_SEGMENT. Note, f2fs_stop_checkpoint(, false) is complex and it may sleep, so we should move it outside segmap_lock spinlock coverage in get_new_segment(). Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-29f2fs: delete f2fs_get_new_segment() declarationChao Yu
Commit 093749e296e2 ("f2fs: support age threshold based garbage collection") added this declaration, but w/ definition, delete it. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-28nfs: fix regression in handling of fsc= option in NFSv4Chen Hanxiao
Setting the uniquifier for fscache via the fsc= mount option is currently broken in NFSv4. Fix this by passing fscache_uniq to root_fc if possible. Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-02-28pnfs/filelayout: add tracepoint to getdeviceinfoOlga Kornievskaia
While decoding filelayout getdeviceinfo received, print out the information about the location of data servers (IPs). Generic getdeviceinfo tracepoints prints the MDS's ip for the dstaddr. In this patch, separate the MDS's address from the DS's addresses. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-02-28NFSv4.2: fix listxattr maximum XDR buffer sizeJorge Mora
Switch order of operations to avoid creating a short XDR buffer: e.g., buflen = 12, old xdrlen = 12, new xdrlen = 20. Having a short XDR buffer leads to lxa_maxcount be a few bytes less than what is needed to retrieve the whole list when using a buflen as returned by a call with size = 0: buflen = listxattr(path, NULL, 0); buf = malloc(buflen); buflen = listxattr(path, buf, buflen); For a file with one attribute (name = '123456'), the first call with size = 0 will return buflen = 12 ('user.123456\x00'). The second call with size = 12, sends LISTXATTRS with lxa_maxcount = 12 + 8 (cookie) + 4 (array count) = 24. The XDR buffer needs 8 (cookie) + 4 (array count) + 4 (name count) + 6 (name len) + 2 (padding) + 4 (eof) = 28 which is 4 bytes shorter than the lxa_maxcount provided in the call. Fixes: 04a5da690e8f ("NFSv4.2: define limits and sizes for user xattr handling") Signed-off-by: Jorge Mora <mora@netapp.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-02-28NFSv4.2: fix nfs4_listxattr kernel BUG at mm/usercopy.c:102Jorge Mora
A call to listxattr() with a buffer size = 0 returns the actual size of the buffer needed for a subsequent call. When size > 0, nfs4_listxattr() does not return an error because either generic_listxattr() or nfs4_listxattr_nfs4_label() consumes exactly all the bytes then size is 0 when calling nfs4_listxattr_nfs4_user() which then triggers the following kernel BUG: [ 99.403778] kernel BUG at mm/usercopy.c:102! [ 99.404063] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 99.408463] CPU: 0 PID: 3310 Comm: python3 Not tainted 6.6.0-61.fc40.aarch64 #1 [ 99.415827] Call trace: [ 99.415985] usercopy_abort+0x70/0xa0 [ 99.416227] __check_heap_object+0x134/0x158 [ 99.416505] check_heap_object+0x150/0x188 [ 99.416696] __check_object_size.part.0+0x78/0x168 [ 99.416886] __check_object_size+0x28/0x40 [ 99.417078] listxattr+0x8c/0x120 [ 99.417252] path_listxattr+0x78/0xe0 [ 99.417476] __arm64_sys_listxattr+0x28/0x40 [ 99.417723] invoke_syscall+0x78/0x100 [ 99.417929] el0_svc_common.constprop.0+0x48/0xf0 [ 99.418186] do_el0_svc+0x24/0x38 [ 99.418376] el0_svc+0x3c/0x110 [ 99.418554] el0t_64_sync_handler+0x120/0x130 [ 99.418788] el0t_64_sync+0x194/0x198 [ 99.418994] Code: aa0003e3 d000a3e0 91310000 97f49bdb (d4210000) Issue is reproduced when generic_listxattr() returns 'system.nfs4_acl', thus calling lisxattr() with size = 16 will trigger the bug. Add check on nfs4_listxattr() to return ERANGE error when it is called with size > 0 and the return value is greater than size. Fixes: 012a211abd5d ("NFSv4.2: hook in the user extended attribute handlers") Signed-off-by: Jorge Mora <mora@netapp.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-02-28NFS: Display the "fsc=" mount option if it is setChen Hanxiao
With this patch, mount command will show fsc=xxx if set: If -o fsc=6666 clientaddr=192.168.122.208,fsc=6666,local_lock=none If only -o fsc clientaddr=192.168.122.208,fsc,local_lock=none Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-02-28pidfd: move struct pidfd_fopsChristian Brauner
Move the pidfd file operations over to their own file in preparation of implementing pidfs and to isolate them from other mostly unrelated functionality in other files. Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-1-f863f58cfce1@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-28xfs: use kvfree() in xlog_cil_free_logvec()Dave Chinner
The xfs_log_vec items are allocated by xlog_kvmalloc(), and so need to be freed with kvfree. This was missed when coverting from the kmem_free() API. Fixes: 49292576136f ("xfs: convert kmem_free() for kvmalloc users to kvfree()") Reported-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-02-28xfs: xfs_btree_bload_prep_block() should use __GFP_NOFAILDave Chinner
This was missed in the conversion from KM* flags. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: 10634530f7ba ("xfs: convert kmem_zalloc() to kzalloc()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-02-27libfs: Drop generic_set_encrypted_ci_d_opsGabriel Krisman Bertazi
No filesystems depend on it anymore, and it is generally a bad idea. Since all dentries should have the same set of dentry operations in case-insensitive capable filesystems, it should be propagated through ->s_d_op. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-11-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-27ubifs: Configure dentry operations at dentry-creation timeGabriel Krisman Bertazi
fscrypt now supports configuring dentry operations at dentry-creation time through the preset sb->s_d_op, instead of at lookup time. Enable this in ubifs, since the lookup-time mechanism is going away. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-10-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-27f2fs: Configure dentry operations at dentry-creation timeGabriel Krisman Bertazi
This was already the case for case-insensitive before commit bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops"), but it was changed to set at lookup-time to facilitate the integration with fscrypt. But it's a problem because dentries that don't get created through ->lookup() won't have any visibility of the operations. Since fscrypt now also supports configuring dentry operations at creation-time, do it for any encrypted and/or casefold volume, simplifying the implementation across these features. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-9-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-27ext4: Configure dentry operations at dentry-creation timeGabriel Krisman Bertazi
This was already the case for case-insensitive before commit bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops"), but it was changed to set at lookup-time to facilitate the integration with fscrypt. But it's a problem because dentries that don't get created through ->lookup() won't have any visibility of the operations. Since fscrypt now also supports configuring dentry operations at creation-time, do it for any encrypted and/or casefold volume, simplifying the implementation across these features. Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-8-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-27libfs: Add helper to choose dentry operations at mount-timeGabriel Krisman Bertazi
In preparation to drop the similar helper that sets d_op at lookup time, add a version to set the right d_op filesystem-wide, through sb->s_d_op. The operations structures are shared across filesystems supporting fscrypt and/or casefolding, therefore we can keep it in common libfs code. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-7-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-27libfs: Merge encrypted_ci_dentry_ops and ci_dentry_opsGabriel Krisman Bertazi
In preparation to get case-insensitive dentry operations from sb->s_d_op again, use the same structure with and without fscrypt. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-6-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-27fscrypt: Factor out a helper to configure the lookup dentryGabriel Krisman Bertazi
Both fscrypt_prepare_lookup_partial and fscrypt_prepare_lookup will set DCACHE_NOKEY_NAME for dentries when the key is not available. Extract out a helper to set this flag in a single place, in preparation to also add the optimization that will disable ->d_revalidate if possible. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-3-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-27ovl: Always reject mounting over case-insensitive directoriesGabriel Krisman Bertazi
overlayfs relies on the filesystem setting DCACHE_OP_HASH or DCACHE_OP_COMPARE to reject mounting over case-insensitive directories. Since commit bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops"), we set ->d_op through a hook in ->d_lookup, which means the root dentry won't have them, causing the mount to accidentally succeed. In v6.7-rc7, the following sequence will succeed to mount, but any dentry other than the root dentry will be a "weird" dentry to ovl and fail with EREMOTE. mkfs.ext4 -O casefold lower.img mount -O loop lower.img lower mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work ovl /mnt Mounting on a subdirectory fails, as expected, because DCACHE_OP_HASH and DCACHE_OP_COMPARE are properly set by ->lookup. Fix by explicitly rejecting superblocks that allow case-insensitive dentries. Yes, this will be solved when we move d_op configuration back to ->s_d_op. Yet, we better have an explicit fix to avoid messing up again. While there, re-sort the entries to have more descriptive error messages first. Fixes: bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops") Acked-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-2-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-27libfs: Attempt exact-match comparison first during casefolded lookupGabriel Krisman Bertazi
Casefolded comparisons are (obviously) way more costly than a simple memcmp. Try the case-sensitive comparison first, falling-back to the case-insensitive lookup only when needed. This allows any exact-match lookup to complete without having to walk the utf8 trie. Note that, for strict mode, generic_ci_d_compare used to reject an invalid UTF-8 string, which would now be considered valid if it exact-matches the disk-name. But, if that is the case, the filesystem is corrupt. More than that, it really doesn't matter in practice, because the name-under-lookup will have already been rejected by generic_ci_d_hash and we won't even get here. The memcmp is safe under RCU because we are operating on str/len instead of dentry->d_name directly, and the caller guarantees their consistency between each other in __d_lookup_rcu_op_compare. Link: https://lore.kernel.org/r/87ttn2sip7.fsf_-_@mailhost.krisman.be Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-27f2fs: introduce get_available_block_count() for cleanupChao Yu
There are very similar codes in inc_valid_block_count() and inc_valid_node_count() which is used for available user block count calculation. This patch introduces a new helper get_available_block_count() to include those common codes, and used it to clean up codes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: support file pinning for zoned devicesDaeho Jeong
Support file pinning with conventional storage area for zoned devices Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: kill heap-based allocationJaegeuk Kim
No one uses this feature. Let's kill it. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: support SEEK_DATA and SEEK_HOLE for compression filesDaeho Jeong
Fix to support SEEK_DATA and SEEK_HOLE for compression files Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: compress: fix to check zstd compress level correctly in mount optionChao Yu
f2fs only support to config zstd compress level w/ a positive number due to layout design, but since commit e0c1b49f5b67 ("lib: zstd: Upgrade to latest upstream zstd version 1.4.10"), zstd supports negative compress level, so that zstd_min_clevel() may return a negative number, then w/ below mount option, .compress_level can be configed w/ a negative number, which is not allowed to f2fs, let's add check condition to avoid it. mount -o compress_algorithm=zstd:4294967295 /dev/sdx /mnt/f2fs Fixes: 00e120b5e4b5 ("f2fs: assign default compression level") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: Use folio in f2fs_read_merkle_tree_pageHuangXiaojia
Use folio in f2fs_read_merkle_tree_page to reduce folio & page converisons from find_get_page_flags and read_mapping_page functions. But the return value should be the exact page. Signed-off-by: HuangXiaojia <huangxiaojia2@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: fix to create selinux label during whiteout initializationChao Yu
generic/700 - output mismatch (see /media/fstests/results//generic/700.out.bad) --- tests/generic/700.out 2023-03-28 10:40:42.735529223 +0000 +++ /media/fstests/results//generic/700.out.bad 2024-02-06 04:37:56.000000000 +0000 @@ -1,2 +1,4 @@ QA output created by 700 +/mnt/scratch_f2fs/f1: security.selinux: No such attribute +/mnt/scratch_f2fs/f2: security.selinux: No such attribute Silence is golden ... (Run 'diff -u /media/fstests/tests/generic/700.out /media/fstests/results//generic/700.out.bad' to see the entire diff) HINT: You _MAY_ be missing kernel fix: 70b589a37e1a xfs: add selinux labels to whiteout inodes Previously, it missed to create selinux labels during whiteout inode initialization, fix this issue. Fixes: 7e01e7ad746b ("f2fs: support RENAME_WHITEOUT") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: separate f2fs_gc_range() to use GC for a rangeDaeho Jeong
Make f2fs_gc_range() an extenal function to use it for GC for a range. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: use BLKS_PER_SEG, BLKS_PER_SEC, and SEGS_PER_SECJaegeuk Kim
No functional change. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27efs: remove SLAB_MEM_SPREAD flag usageChengming Zhou
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224134742.829325-1-chengming.zhou@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27jfs: remove SLAB_MEM_SPREAD flag usageChengming Zhou
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224134925.829677-1-chengming.zhou@linux.dev Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27minix: remove SLAB_MEM_SPREAD flag usageChengming Zhou
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224134935.829715-1-chengming.zhou@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27openpromfs: remove SLAB_MEM_SPREAD flag usageChengming Zhou
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224135028.829910-1-chengming.zhou@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27proc: remove SLAB_MEM_SPREAD flag usageChengming Zhou
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224135048.829987-1-chengming.zhou@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27qnx6: remove SLAB_MEM_SPREAD flag usageChengming Zhou
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224135104.830045-1-chengming.zhou@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27reiserfs: remove SLAB_MEM_SPREAD flag usageChengming Zhou
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224135126.830110-1-chengming.zhou@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27romfs: remove SLAB_MEM_SPREAD flag usageChengming Zhou
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224135143.830142-1-chengming.zhou@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27sysv: remove SLAB_MEM_SPREAD flag usageChengming Zhou
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224135158.830266-1-chengming.zhou@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27vfs: remove SLAB_MEM_SPREAD flag usageChengming Zhou
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224135315.830477-1-chengming.zhou@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27fs/aio: Make io_cancel() generate completions againBart Van Assche
The following patch accidentally removed the code for delivering completions for cancelled reads and writes to user space: "[PATCH 04/33] aio: remove retry-based AIO" (https://lore.kernel.org/all/1363883754-27966-5-git-send-email-koverstreet@google.com/) >From that patch: - if (kiocbIsCancelled(iocb)) { - ret = -EINTR; - aio_complete(iocb, ret, 0); - /* must not access the iocb after this */ - goto out; - } This leads to a leak in user space of a struct iocb. Hence this patch that restores the code that reports to user space that a read or write has been cancelled successfully. Fixes: 41003a7bcfed ("aio: remove retry-based AIO") Cc: Christoph Hellwig <hch@lst.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Sandeep Dhavale <dhavale@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: stable@vger.kernel.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240215204739.2677806-3-bvanassche@acm.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27afs: Fix endless loop in directory parsingDavid Howells
If a directory has a block with only ".__afsXXXX" files in it (from uncompleted silly-rename), these .__afsXXXX files are skipped but without advancing the file position in the dir_context. This leads to afs_dir_iterate() repeating the block again and again. Fix this by making the code that skips the .__afsXXXX file also manually advance the file position. The symptoms are a soft lookup: watchdog: BUG: soft lockup - CPU#3 stuck for 52s! [check:5737] ... RIP: 0010:afs_dir_iterate_block+0x39/0x1fd ... ? watchdog_timer_fn+0x1a6/0x213 ... ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? afs_dir_iterate_block+0x39/0x1fd afs_dir_iterate+0x10a/0x148 afs_readdir+0x30/0x4a iterate_dir+0x93/0xd3 __do_sys_getdents64+0x6b/0xd4 This is almost certainly the actual fix for: https://bugzilla.kernel.org/show_bug.cgi?id=218496 Fixes: 57e9d49c5452 ("afs: Hide silly-rename files from userspace") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/786185.1708694102@warthog.procyon.org.uk Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Markus Suvanto <markus.suvanto@gmail.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27xfs: drop experimental warning for FSDAXShiyang Ruan
FSDAX and reflink can work together now, let's drop this warning. Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-02-26Merge tag 'for-6.8-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A more fixes for recently reported or discovered problems: - fix corner case of send that would generate potentially large stream of zeros if there's a hole at the end of the file - fix chunk validation in zoned mode on conventional zones, it was possible to create chunks that would not be allowed on sequential zones - fix validation of dev-replace ioctl filenames - fix KCSAN warnings about access to block reserve struct members" * tag 'for-6.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve btrfs: fix data races when accessing the reserved amount of block reserves btrfs: send: don't issue unnecessary zero writes for trailing hole btrfs: dev-replace: properly validate device names btrfs: zoned: don't skip block group profile checks on conventional zones
2024-02-26ceph: switch to corrected encoding of max_xattr_size in mdsmapXiubo Li
The addition of bal_rank_mask with encoding version 17 was merged into ceph.git in Oct 2022 and made it into v18.2.0 release normally. A few months later, the much delayed addition of max_xattr_size got merged, also with encoding version 17, placed before bal_rank_mask in the encoding -- but it didn't make v18.2.0 release. The way this ended up being resolved on the MDS side is that bal_rank_mask will continue to be encoded in version 17 while max_xattr_size is now encoded in version 18. This does mean that older kernels will misdecode version 17, but this is also true for v18.2.0 and v18.2.1 clients in userspace. The best we can do is backport this adjustment -- see ceph.git commit 78abfeaff27fee343fb664db633de5b221699a73 for details. [ idryomov: changelog ] Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/64440 Fixes: d93231a6bc8a ("ceph: prevent a client from exceeding the MDS maximum xattr size") Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>