summaryrefslogtreecommitdiff
path: root/fs/ext4/namei.c
AgeCommit message (Collapse)Author
2025-05-20ext4: remove sbi argument from ext4_chksum()Eric Biggers
Since ext4_chksum() no longer uses its sbi argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-2-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-04-13Merge tag 'ext4_for_linus-6.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A few more miscellaneous ext4 bug fixes and cleanups including some syzbot failures and fixing a stale file handing refeencing an inode previously used as a regular file, but which has been deleted and reused as an ea_inode would result in ext4 erroneously considering this a case of fs corruption" * tag 'ext4_for_linus-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix off-by-one error in do_split ext4: make block validity check resistent to sb bh corruption ext4: avoid -Wflex-array-member-not-at-end warning Documentation: ext4: Add fields to ext4_super_block documentation ext4: don't treat fhandle lookup of ea_inode as FS corruption
2025-04-12ext4: fix off-by-one error in do_splitArtem Sadovnikov
Syzkaller detected a use-after-free issue in ext4_insert_dentry that was caused by out-of-bounds access due to incorrect splitting in do_split. BUG: KASAN: use-after-free in ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109 Write of size 251 at addr ffff888074572f14 by task syz-executor335/5847 CPU: 0 UID: 0 PID: 5847 Comm: syz-executor335 Not tainted 6.12.0-rc6-syzkaller-00318-ga9cda7c0ffed #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 __asan_memcpy+0x40/0x70 mm/kasan/shadow.c:106 ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109 add_dirent_to_buf+0x3d9/0x750 fs/ext4/namei.c:2154 make_indexed_dir+0xf98/0x1600 fs/ext4/namei.c:2351 ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2455 ext4_add_nondir+0x8d/0x290 fs/ext4/namei.c:2796 ext4_symlink+0x920/0xb50 fs/ext4/namei.c:3431 vfs_symlink+0x137/0x2e0 fs/namei.c:4615 do_symlinkat+0x222/0x3a0 fs/namei.c:4641 __do_sys_symlink fs/namei.c:4662 [inline] __se_sys_symlink fs/namei.c:4660 [inline] __x64_sys_symlink+0x7a/0x90 fs/namei.c:4660 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> The following loop is located right above 'if' statement. for (i = count-1; i >= 0; i--) { /* is more than half of this entry in 2nd half of the block? */ if (size + map[i].size/2 > blocksize/2) break; size += map[i].size; move++; } 'i' in this case could go down to -1, in which case sum of active entries wouldn't exceed half the block size, but previous behaviour would also do split in half if sum would exceed at the very last block, which in case of having too many long name files in a single block could lead to out-of-bounds access and following use-after-free. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Cc: stable@vger.kernel.org Fixes: 5872331b3d91 ("ext4: fix potential negative array index in do_split()") Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250404082804.2567-3-a.sadovnikov@ispras.ru Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-27Merge tag 'ext4-for_linus-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Ext4 bug fixes and cleanups, including: - hardening against maliciously fuzzed file systems - backwards compatibility for the brief period when we attempted to ignore zero-width characters - avoid potentially BUG'ing if there is a file system corruption found during the file system unmount - fix free space reporting by statfs when project quotas are enabled and the free space is less than the remaining project quota Also improve performance when replaying a journal with a very large number of revoke records (applicable for Lustre volumes)" * tag 'ext4-for_linus-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (71 commits) ext4: fix OOB read when checking dotdot dir ext4: on a remount, only log the ro or r/w state when it has changed ext4: correct the error handle in ext4_fallocate() ext4: Make sb update interval tunable ext4: avoid journaling sb update on error if journal is destroying ext4: define ext4_journal_destroy wrapper ext4: hash: simplify kzalloc(n * 1, ...) to kzalloc(n, ...) jbd2: add a missing data flush during file and fs synchronization ext4: don't over-report free space or inodes in statvfs ext4: clear DISCARD flag if device does not support discard jbd2: remove jbd2_journal_unfile_buffer() ext4: reorder capability check last ext4: update the comment about mb_optimize_scan jbd2: fix off-by-one while erasing journal ext4: remove references to bh->b_page ext4: goto right label 'out_mmap_sem' in ext4_setattr() ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all() ext4: introduce ITAIL helper jbd2: remove redundant function jbd2_journal_has_csum_v2or3_feature ext4: remove redundant function ext4_has_metadata_csum ...
2025-03-17ext4: remove redundant function ext4_has_metadata_csumEric Biggers
Since commit f2b4fa19647e ("ext4: switch to using the crc32c library"), ext4_has_metadata_csum() is just an alias for ext4_has_feature_metadata_csum(). ext4_has_feature_metadata_csum() is generated by EXT4_FEATURE_RO_COMPAT_FUNCS and uses the regular naming convention for checking a single ext4 feature. Therefore, remove ext4_has_metadata_csum() and update all its callers to use ext4_has_feature_metadata_csum() directly. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20250207031335.42637-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-16ext4: remove unused input "inode" in ext4_find_dest_deKemeng Shi
Remove unused input "inode" in ext4_find_dest_de. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123162050.2114499-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-16ext4: remove unneeded forward declaration in namei.cKemeng Shi
Remove unneeded forward declaration in namei.c Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123162050.2114499-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-16ext4: add missing brelse() for bh2 in ext4_dx_add_entry()Kemeng Shi
Add missing brelse() for bh2 in ext4_dx_add_entry(). Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123162050.2114499-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13ext4: add ext4_emergency_state() helper functionBaokun Li
Since both SHUTDOWN and EMERGENCY_RO are emergency states of the ext4 file system, and they are checked in similar locations, we have added a helper function, ext4_emergency_state(), to determine whether the current file system is in one of these two emergency states. Then, replace calls to ext4_forced_shutdown() with ext4_emergency_state() in those functions that could potentially trigger write operations. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122114130.229709-4-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-02-27Change inode_operations.mkdir to return struct dentry *NeilBrown
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-13ext4: introduce linear search for dentriesTheodore Ts'o
This patch addresses an issue where some files in case-insensitive directories become inaccessible due to changes in how the kernel function, utf8_casefold(), generates case-folded strings from the commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points"). There are good reasons why this change should be made; it's actually quite stupid that Unicode seems to think that the characters ❤ and ❤️ should be casefolded. Unfortimately because of the backwards compatibility issue, this commit was reverted in 231825b2e1ff. This problem is addressed by instituting a brute-force linear fallback if a lookup fails on case-folded directory, which does result in a performance hit when looking up files affected by the changing how thekernel treats ignorable Uniode characters, or when attempting to look up non-existent file names. So this fallback can be disabled by setting an encoding flag if in the future, the system administrator or the manufacturer of a mobile handset or tablet can be sure that there was no opportunity for a kernel to insert file names with incompatible encodings. Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-12-22ext4: use inode_set_cached_link()Mateusz Guzik
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20241120112037.822078-3-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-18Merge tag 'ext4_for_linus-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A lot of miscellaneous ext4 bug fixes and cleanups this cycle, most notably in the journaling code, bufered I/O, and compiler warning cleanups" * tag 'ext4_for_linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits) jbd2: Fix comment describing journal_init_common() ext4: prevent an infinite loop in the lazyinit thread ext4: use struct_size() to improve ext4_htree_store_dirent() ext4: annotate struct fname with __counted_by() jbd2: avoid dozens of -Wflex-array-member-not-at-end warnings ext4: use str_yes_no() helper function ext4: prevent delalloc to nodelalloc on remount jbd2: make b_frozen_data allocation always succeed ext4: cleanup variable name in ext4_fc_del() ext4: use string choices helpers jbd2: remove the 'success' parameter from the jbd2_do_replay() function jbd2: remove useless 'block_error' variable jbd2: factor out jbd2_do_replay() jbd2: refactor JBD2_COMMIT_BLOCK process in do_one_pass() jbd2: unified release of buffer_head in do_one_pass() jbd2: remove redundant judgments for check v1 checksum ext4: use ERR_CAST to return an error-valued pointer mm: zero range of eof folio exposed by inode size extension ext4: partial zero eof block on unaligned inode size extension ext4: disambiguate the return value of ext4_dio_write_end_io() ...
2024-11-12ext4: use ERR_CAST to return an error-valued pointerYu Jiaoliang
Instead of directly casting and returning an error-valued pointer, use ERR_CAST to make the error handling more explicit and improve code clarity. Signed-off-by: Yu Jiaoliang <yujiaoliang@vivo.com> Link: https://patch.msgid.link/20240920021440.1959243-1-yujiaoliang@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12ext4: WARN if a full dir leaf block has only one dentryBaokun Li
The maximum length of a filename is 255 and the minimum block size is 1024, so it is always guaranteed that the number of entries is greater than or equal to 2 when do_split() is called. So unless ext4_dx_add_entry() and make_indexed_dir() or some other functions are buggy, 'split == 0' will not occur. Setting 'continued' to 0 in this case masks the problem that the file system has become corrupted, even though it prevents possible out-of-bounds access. Hence WARN_ON_ONCE() is used to check if 'split' is 0, and if it is then warns and returns an error to abort split. Suggested-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20240823160518.GA424729@mit.edu Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241008121152.3771906-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-10-28ext4: Use generic_ci_validate_strict_name helperAndré Almeida
Use the helper function to check the requirements for casefold directories using strict encoding. Suggested-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-2-f443d5814194@igalia.com Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-03ext4: explicitly exit when ext4_find_inline_entry returns an errorThadeu Lima de Souza Cascardo
__ext4_find_entry currently ignores the return of ext4_find_inline_entry, except for returning the bh or NULL when has_inline_data is 1. Even though has_inline_data is set to 1 before calling ext4_find_inline_entry and would only be set to 0 when that function returns NULL, check for an encoded error return explicitly in order to exit. That makes the code more readable, not requiring that one assumes the cases when has_inline_data is 1. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-4-cascardo@igalia.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-03ext4: ext4_search_dir should return a proper errorThadeu Lima de Souza Cascardo
ext4_search_dir currently returns -1 in case of a failure, while it returns 0 when the name is not found. In such failure cases, it should return an error code instead. This becomes even more important when ext4_find_inline_entry returns an error code as well in the next commit. -EFSCORRUPTED seems appropriate as such error code as these failures would be caused by unexpected record lengths and is in line with other instances of ext4_check_dir_entry failures. In the case of ext4_dx_find_entry, the current use of ERR_BAD_DX_DIR was left as is to reduce the risk of regressions. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-2-cascardo@igalia.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26ext4: no need to continue when the number of entries is 1Edward Adam Davis
Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3") Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reported-and-tested-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Link: https://patch.msgid.link/tencent_BE7AEE6C7C2D216CB8949CE8E6EE7ECC2C0A@qq.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2024-07-18Merge tag 'ext4_for_linus-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Many cleanups and bug fixes in ext4, especially for the fast commit feature. Also some performance improvements; in particular, improving IOPS and throughput on fast devices running Async Direct I/O by up to 20% by optimizing jbd2_transaction_committed()" * tag 'ext4_for_linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: make sure the first directory block is not a hole ext4: check dot and dotdot of dx_root before making dir indexed ext4: sanity check for NULL pointer after ext4_force_shutdown jbd2: increase maximum transaction size jbd2: drop pointless shrinker batch initialization jbd2: avoid infinite transaction commit loop jbd2: precompute number of transaction descriptor blocks jbd2: make jbd2_journal_get_max_txn_bufs() internal jbd2: avoid mount failed when commit block is partial submitted ext4: avoid writing unitialized memory to disk in EA inodes ext4: don't track ranges in fast_commit if inode has inlined data ext4: fix possible tid_t sequence overflows ext4: use ext4_update_inode_fsync_trans() helper in inode creation ext4: add missing MODULE_DESCRIPTION() jbd2: add missing MODULE_DESCRIPTION() ext4: use memtostr_pad() for s_volume_name jbd2: speed up jbd2_transaction_committed() ext4: make ext4_da_map_blocks() buffer_head unaware ext4: make ext4_insert_delayed_block() insert multi-blocks ext4: factor out a helper to check the cluster allocation state ...
2024-07-10ext4: make sure the first directory block is not a holeBaokun Li
The syzbot constructs a directory that has no dirblock but is non-inline, i.e. the first directory block is a hole. And no errors are reported when creating files in this directory in the following flow. ext4_mknod ... ext4_add_entry // Read block 0 ext4_read_dirblock(dir, block, DIRENT) bh = ext4_bread(NULL, inode, block, 0) if (!bh && (type == INDEX || type == DIRENT_HTREE)) // The first directory block is a hole // But type == DIRENT, so no error is reported. After that, we get a directory block without '.' and '..' but with a valid dentry. This may cause some code that relies on dot or dotdot (such as make_indexed_dir()) to crash. Therefore when ext4_read_dirblock() finds that the first directory block is a hole report that the filesystem is corrupted and return an error to avoid loading corrupted data from disk causing something bad. Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Fixes: 4e19d6b65fb4 ("ext4: allow directory holes") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240702132349.2600605-3-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-10ext4: check dot and dotdot of dx_root before making dir indexedBaokun Li
Syzbot reports a issue as follows: ============================================ BUG: unable to handle page fault for address: ffffed11022e24fe PGD 23ffee067 P4D 23ffee067 PUD 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 PID: 5079 Comm: syz-executor306 Not tainted 6.10.0-rc5-g55027e689933 #0 Call Trace: <TASK> make_indexed_dir+0xdaf/0x13c0 fs/ext4/namei.c:2341 ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2451 ext4_rename fs/ext4/namei.c:3936 [inline] ext4_rename2+0x26e5/0x4370 fs/ext4/namei.c:4214 [...] ============================================ The immediate cause of this problem is that there is only one valid dentry for the block to be split during do_split, so split==0 results in out of bounds accesses to the map triggering the issue. do_split unsigned split dx_make_map count = 1 split = count/2 = 0; continued = hash2 == map[split - 1].hash; ---> map[4294967295] The maximum length of a filename is 255 and the minimum block size is 1024, so it is always guaranteed that the number of entries is greater than or equal to 2 when do_split() is called. But syzbot's crafted image has no dot and dotdot in dir, and the dentry distribution in dirblock is as follows: bus dentry1 hole dentry2 free |xx--|xx-------------|...............|xx-------------|...............| 0 12 (8+248)=256 268 256 524 (8+256)=264 788 236 1024 So when renaming dentry1 increases its name_len length by 1, neither hole nor free is sufficient to hold the new dentry, and make_indexed_dir() is called. In make_indexed_dir() it is assumed that the first two entries of the dirblock must be dot and dotdot, so bus and dentry1 are left in dx_root because they are treated as dot and dotdot, and only dentry2 is moved to the new leaf block. That's why count is equal to 1. Therefore add the ext4_check_dx_root() helper function to add more sanity checks to dot and dotdot before starting the conversion to avoid the above issue. Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240702132349.2600605-2-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-07ext4: Move CONFIG_UNICODE defguards into the code flowGabriel Krisman Bertazi
Instead of a bunch of ifdefs, make the unicode built checks part of the code flow where possible, as requested by Torvalds. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> [eugen.hristev@collabora.com: port to 6.10-rc1] Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com> Link: https://lore.kernel.org/r/20240606073353.47130-7-eugen.hristev@collabora.com Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-07ext4: Reuse generic_ci_match for ci comparisonsGabriel Krisman Bertazi
Instead of reimplementing ext4_match_ci, use the new libfs helper. It also adds a comment explaining why fname->cf_name.name must be checked prior to the encryption hash optimization, because that tripped me before. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com> Link: https://lore.kernel.org/r/20240606073353.47130-5-eugen.hristev@collabora.com Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-07ext4: Simplify the handling of cached casefolded namesGabriel Krisman Bertazi
Keeping it as qstr avoids the unnecessary conversion in ext4_match Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> [eugen.hristev@collabora.com: port to 6.10-rc1] Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com> Link: https://lore.kernel.org/r/20240606073353.47130-2-eugen.hristev@collabora.com Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-02ext4: avoid excessive credit estimate in ext4_tmpfile()Jan Kara
A user with minimum journal size (1024 blocks these days) complained about the following error triggered by generic/697 test in ext4_tmpfile(): run fstests generic/697 at 2024-02-28 05:34:46 JBD2: vfstest wants too many credits credits:260 rsv_credits:0 max:256 EXT4-fs error (device loop0) in __ext4_new_inode:1083: error 28 Indeed the credit estimate in ext4_tmpfile() is huge. EXT4_MAXQUOTAS_INIT_BLOCKS() is 219, then 10 credits from ext4_tmpfile() itself and then ext4_xattr_credits_for_new_inode() adds more credits needed for security attributes and ACLs. Now the EXT4_MAXQUOTAS_INIT_BLOCKS() is in fact unnecessary because we've already initialized quotas with dquot_init() shortly before and so EXT4_MAXQUOTAS_TRANS_BLOCKS() is enough (which boils down to 3 credits). Fixes: af51a2ac36d1 ("ext4: ->tmpfile() support") Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Luis Henriques <lhenriques@suse.de> Tested-by: Disha Goel <disgoel@linux.ibm.com> Link: https://lore.kernel.org/r/20240307115320.28949-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-27ext4: Configure dentry operations at dentry-creation timeGabriel Krisman Bertazi
This was already the case for case-insensitive before commit bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops"), but it was changed to set at lookup-time to facilitate the integration with fscrypt. But it's a problem because dentries that don't get created through ->lookup() won't have any visibility of the operations. Since fscrypt now also supports configuring dentry operations at creation-time, do it for any encrypted and/or casefold volume, simplifying the implementation across these features. Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-8-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-01-11Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull misc filesystem updates from Al Viro: "Misc cleanups (the part that hadn't been picked by individual fs trees)" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: apparmorfs: don't duplicate kfree_link() orangefs: saner arguments passing in readdir guts ocfs2_find_match(): there's no such thing as NULL or negative ->d_parent reiserfs_add_entry(): get rid of pointless namelen checks __ocfs2_add_entry(), ocfs2_prepare_dir_for_insert(): namelen checks ext4_add_entry(): ->d_name.len is never 0 befs: d_obtain_alias(ERR_PTR(...)) will do the right thing affs: d_obtain_alias(ERR_PTR(...)) will do the right thing /proc/sys: use d_splice_alias() calling conventions to simplify failure exits hostfs: use d_splice_alias() calling conventions to simplify failure exits udf_fiiter_add_entry(): check for zero ->d_name.len is bogus... udf: d_obtain_alias(ERR_PTR(...)) will do the right thing... udf: d_splice_alias() will do the right thing on ERR_PTR() inode nfsd: kill stale comment about simple_fill_super() requirements bfs_add_entry(): get rid of pointless ->d_name.len checks nilfs2: d_obtain_alias(ERR_PTR(...)) will do the right thing... zonefs: d_splice_alias() will do the right thing on ERR_PTR() inode
2023-12-21ext4_add_entry(): ->d_name.len is never 0Al Viro
That bogosity goes back to the initial merge of ext3. Once upon a time ext2 used to have a similar check; that got taken out during the switch to page cache (June 2001). ext3 got merged into mainline 5 months later, still using buffer cache for directories; removal of the pointless check in ext2 should've been done as a separate patch, but it hadn't been, so that thing got missed... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-25ext4: don't access the source subdirectory content on same-directory renameAl Viro
We can't really afford locking the source on same-directory rename; currently vfs_rename() tries to do that, but it will have to be changed. The logics in ext4 is lazy and goes looking for ".." in source even in same-directory case. It's not hard to get rid of that, leaving that behaviour only for cross-directory case; that VFS can get locks safely (and will keep doing that after the coming changes). Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-02Merge tag 'ext4_for_linus-6.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Cleanup ext4's multi-block allocator, including adding some unit tests, as well as cleaning how we update the backup superblock after online resizes or updating the label or uuid. Optimize handling of released data blocks in ext4's commit machinery to avoid a potential lock contention on s_md_lock spinlock. Fix a number of ext4 bugs: - fix race between writepages and remount - fix racy may inline data check in dio write - add missed brelse in an error path in update_backups - fix umask handling when ACL support is disabled - fix lost EIO error when a journal commit races with a fsync of the blockdev - fix potential improper i_size when there is a crash right after an O_SYNC direct write. - check extent node for validity before potentially using what might be an invalid pointer - fix potential stale data exposure when writing to an unwritten extent and the file system is nearly out of space - fix potential accounting error around block reservations when writing partial delayed allocation writes to a bigalloc cluster - avoid memory allocation failure when tracking partial delayed allocation writes to a bigalloc cluster - fix various debugging print messages" * tag 'ext4_for_linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (41 commits) ext4: properly sync file size update after O_SYNC direct IO ext4: fix racy may inline data check in dio write ext4: run mballoc test with different layouts setting ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc ext4: add some kunit stub for mballoc kunit test ext4: call ext4_mb_mark_context in ext4_group_add_blocks() ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks() ext4: call ext4_mb_mark_context in ext4_mb_clear_bb ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used ext4: extend ext4_mb_mark_context to support allocation under journal ext4: call ext4_mb_mark_context in ext4_free_blocks_simple ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb ext4: make state in ext4_mb_mark_bb to be bool jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev ext4: apply umask if ACL support is disabled ext4: mark buffer new if it is unwritten to avoid stale data exposure ext4: move 'ix' sanity check to corrent position jbd2: fix printk format type for 'io_block' in do_one_pass() jbd2: print io_block if check data block checksum failed when do recovery ...
2023-10-18ext4: convert to new timestamp accessorsJeff Layton
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-33-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-05ext4: fix incorrect offsetWang Jianjian
The last argument of ext4_check_dir_entry is dentry offset int the file. Luckily this error only results in the wrong offset being printed in the eventual error message. Signed-off-by: Wang Jianjian <wangjianjian0@foxmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/tencent_F992989953734FD5DE3F88ECB2191A856206@qq.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-09-17Merge tag 'ext4_for_linus-6.6-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Regression and bug fixes for ext4" * tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix rec_len verify error ext4: do not let fstrim block system suspend ext4: move setting of trimmed bit into ext4_try_to_trim_range() jbd2: Fix memory leak in journal_init_common() jbd2: Remove page size assumptions buffer: Make bh_offset() work for compound pages
2023-09-14ext4: fix rec_len verify errorShida Zhang
With the configuration PAGE_SIZE 64k and filesystem blocksize 64k, a problem occurred when more than 13 million files were directly created under a directory: EXT4-fs error (device xx): ext4_dx_csum_set:492: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D. EXT4-fs error (device xx): ext4_dx_csum_verify:463: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D. EXT4-fs error (device xx): dx_probe:856: inode #xxxx: block 8188: comm xxxxx: Directory index failed checksum When enough files are created, the fake_dirent->reclen will be 0xffff. it doesn't equal to the blocksize 65536, i.e. 0x10000. But it is not the same condition when blocksize equals to 4k. when enough files are created, the fake_dirent->reclen will be 0x1000. it equals to the blocksize 4k, i.e. 0x1000. The problem seems to be related to the limitation of the 16-bit field when the blocksize is set to 64k. To address this, helpers like ext4_rec_len_{from,to}_disk has already been introduced to complete the conversion between the encoded and the plain form of rec_len. So fix this one by using the helper, and all the other in this file too. Cc: stable@kernel.org Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes") Suggested-by: Andreas Dilger <adilger@dilger.ca> Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20230803060938.1929759-1-zhangshida@kylinos.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-31Merge tag 'ext4_for_linus-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Many ext4 and jbd2 cleanups and bug fixes: - Cleanups in the ext4 remount code when going to and from read-only - Cleanups in ext4's multiblock allocator - Cleanups in the jbd2 setup/mounting code paths - Performance improvements when appending to a delayed allocation file - Miscellaneous syzbot and other bug fixes" * tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits) ext4: fix slab-use-after-free in ext4_es_insert_extent() libfs: remove redundant checks of s_encoding ext4: remove redundant checks of s_encoding ext4: reject casefold inode flag without casefold feature ext4: use LIST_HEAD() to initialize the list_head in mballoc.c ext4: do not mark inode dirty every time when appending using delalloc ext4: rename s_error_work to s_sb_upd_work ext4: add periodic superblock update check ext4: drop dio overwrite only flag and associated warning ext4: add correct group descriptors and reserved GDT blocks to system zone ext4: remove unused function declaration ext4: mballoc: avoid garbage value from err ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple() ext4: change the type of blocksize in ext4_mb_init_cache() ext4: fix unttached inode after power cut with orphan file feature enabled jbd2: correct the end of the journal recovery scan range ext4: ext4_get_{dev}_journal return proper error value ext4: cleanup ext4_get_dev_journal() and ext4_get_journal() jbd2: jbd2_journal_init_{dev,inode} return proper error return value jbd2: drop useless error tag in jbd2_journal_wipe() ...
2023-08-27ext4: remove redundant checks of s_encodingEric Biggers
Now that ext4 does not allow inodes with the casefold flag to be instantiated when unsupported, it's unnecessary to repeatedly check for support later on during random filesystem operations. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230814182903.37267-3-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: fix unttached inode after power cut with orphan file feature enabledZhihao Cheng
Running generic/475(filesystem consistent tests after power cut) could easily trigger unattached inode error while doing fsck: Unattached zero-length inode 39405. Clear? no Unattached inode 39405 Connect to /lost+found? no Above inconsistence is caused by following process: P1 P2 ext4_create inode = ext4_new_inode_start_handle // itable records nlink=1 ext4_add_nondir err = ext4_add_entry // ENOSPC ext4_append ext4_bread ext4_getblk ext4_map_blocks // returns ENOSPC drop_nlink(inode) // won't be updated into disk inode ext4_orphan_add(handle, inode) ext4_orphan_file_add ext4_journal_stop(handle) jbd2_journal_commit_transaction // commit success >> power cut << ext4_fill_super ext4_load_and_init_journal // itable records nlink=1 ext4_orphan_cleanup ext4_process_orphan if (inode->i_nlink) // true, inode won't be deleted Then, allocated inode will be reserved on disk and corresponds to no dentries, so e2fsck reports 'unattached inode' problem. The problem won't happen if orphan file feature is disabled, because ext4_orphan_add() will update disk inode in orphan list mode. There are several places not updating disk inode while putting inode into orphan area, such as ext4_add_nondir(), ext4_symlink() and whiteout in ext4_rename(). Fix it by updating inode into disk in all error branches of these places. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217605 Fixes: 02f310fcf47f ("ext4: Speedup ext4 orphan inode handling") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230628132011.650383-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-07-29ext4: make ext4_forced_shutdown() take struct super_blockJan Kara
Currently ext4_forced_shutdown() takes struct ext4_sb_info but most callers need to get it from struct super_block anyway. So just pass in struct super_block to save all callers from some boilerplate code. No functional changes. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230616165109.21695-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-07-24ext4: convert to ctime accessor functionsJeff Layton
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-40-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-02ext4: Remove ext4 locking of moved directoryJan Kara
Remove locking of moved directory in ext4_rename2(). We will take care of it in VFS instead. This effectively reverts commit 0813299c586b ("ext4: Fix possible corruption when moving a directory") and followup fixes. CC: Ted Tso <tytso@mit.edu> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-1-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-13ext4: improve error handling from ext4_dirhash()Theodore Ts'o
The ext4_dirhash() will *almost* never fail, especially when the hash tree feature was first introduced. However, with the addition of support of encrypted, casefolded file names, that function can most certainly fail today. So make sure the callers of ext4_dirhash() properly check for failures, and reflect the errors back up to their callers. Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu Reported-by: syzbot+394aa8a792cb99dbc837@syzkaller.appspotmail.com Reported-by: syzbot+344aaa8697ebd232bfc8@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=db56459ea4ac4a676ae4b4678f633e55da005a9b Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-17ext4: fix possible double unlock when moving a directoryTheodore Ts'o
Fixes: 0813299c586b ("ext4: Fix possible corruption when moving a directory") Link: https://lore.kernel.org/r/5efbe1b9-ad8b-4a4f-b422-24824d2b775c@kili.mountain Reported-by: Dan Carpenter <error27@gmail.com> Reported-by: syzbot+0c73d1d8b952c5f3d714@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-12Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Bug fixes and regressions for ext4, the most serious of which is a potential deadlock during directory renames that was introduced during the merge window discovered by a combination of syzbot and lockdep" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: zero i_disksize when initializing the bootloader inode ext4: make sure fs error flag setted before clear journal error ext4: commit super block if fs record error when journal record without error ext4, jbd2: add an optimized bmap for the journal inode ext4: fix WARNING in ext4_update_inline_data ext4: move where set the MAY_INLINE_DATA flag is set ext4: Fix deadlock during directory rename ext4: Fix comment about the 64BIT feature docs: ext4: modify the group desc size to 64 ext4: fix another off-by-one fsmap error on 1k block filesystems ext4: fix RENAME_WHITEOUT handling for inline directories ext4: make kobj_type structures constant ext4: fix cgroup writeback accounting with fs-layer encryption
2023-03-07ext4: Fix deadlock during directory renameJan Kara
As lockdep properly warns, we should not be locking i_rwsem while having transactions started as the proper lock ordering used by all directory handling operations is i_rwsem -> transaction start. Fix the lock ordering by moving the locking of the directory earlier in ext4_rename(). Reported-by: syzbot+9d16c39efb5fade84574@syzkaller.appspotmail.com Fixes: 0813299c586b ("ext4: Fix possible corruption when moving a directory") Link: https://syzkaller.appspot.com/bug?extid=9d16c39efb5fade84574 Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230301141004.15087-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07ext4: fix RENAME_WHITEOUT handling for inline directoriesEric Whitney
A significant number of xfstests can cause ext4 to log one or more warning messages when they are run on a test file system where the inline_data feature has been enabled. An example: "EXT4-fs warning (device vdc): ext4_dirblock_csum_set:425: inode #16385: comm fsstress: No space for directory leaf checksum. Please run e2fsck -D." The xfstests include: ext4/057, 058, and 307; generic/013, 051, 068, 070, 076, 078, 083, 232, 269, 270, 390, 461, 475, 476, 482, 579, 585, 589, 626, 631, and 650. In this situation, the warning message indicates a bug in the code that performs the RENAME_WHITEOUT operation on a directory entry that has been stored inline. It doesn't detect that the directory is stored inline, and incorrectly attempts to compute a dirent block checksum on the whiteout inode when creating it. This attempt fails as a result of the integrity checking in get_dirent_tail (usually due to a failure to match the EXT4_FT_DIR_CSUM magic cookie), and the warning message is then emitted. Fix this by simply collecting the inlined data state at the time the search for the source directory entry is performed. Existing code handles the rest, and this is sufficient to eliminate all spurious warning messages produced by the tests above. Go one step further and do the same in the code that resets the source directory entry in the event of failure. The inlined state should be present in the "old" struct, but given the possibility of a race there's no harm in taking a conservative approach and getting that information again since the directory entry is being reread anyway. Fixes: b7ff91fd030d ("ext4: find old entry again if failed to rename whiteout") Cc: stable@kernel.org Signed-off-by: Eric Whitney <enwlinux@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230210173244.679890-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-28Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Improve performance for ext4 by allowing multiple process to perform direct I/O writes to preallocated blocks by using a shared inode lock instead of taking an exclusive lock. In addition, multiple bug fixes and cleanups" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix incorrect options show of original mount_opt and extend mount_opt2 ext4: Fix possible corruption when moving a directory ext4: init error handle resource before init group descriptors ext4: fix task hung in ext4_xattr_delete_inode jbd2: fix data missing when reusing bh which is ready to be checkpointed ext4: update s_journal_inum if it changes after journal replay ext4: fail ext4_iget if special inode unallocated ext4: fix function prototype mismatch for ext4_feat_ktype ext4: remove unnecessary variable initialization ext4: fix inode tree inconsistency caused by ENOMEM ext4: refuse to create ea block when umounted ext4: optimize ea_inode block expansion ext4: remove dead code in updating backup sb ext4: dio take shared inode lock when overwriting preallocated blocks ext4: don't show commit interval if it is zero ext4: use ext4_fc_tl_mem in fast-commit replay path ext4: improve xattr consistency checking and error reporting
2023-02-25ext4: Fix possible corruption when moving a directoryJan Kara
When we are renaming a directory to a different directory, we need to update '..' entry in the moved directory. However nothing prevents moved directory from being modified and even converted from the inline format to the normal format. When such race happens the rename code gets confused and we crash. Fix the problem by locking the moved directory. CC: stable@vger.kernel.org Fixes: 32f7f22c0b52 ("ext4: let ext4_rename handle inline dir") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230126112221.11866-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-01-19fs: port inode_init_owner() to mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->tmpfile() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>