summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-09-11f2fs: get rid of online repaire on corrupted directoryChao Yu
syzbot reports a f2fs bug as below: kernel BUG at fs/f2fs/inode.c:896! RIP: 0010:f2fs_evict_inode+0x1598/0x15c0 fs/f2fs/inode.c:896 Call Trace: evict+0x532/0x950 fs/inode.c:704 dispose_list fs/inode.c:747 [inline] evict_inodes+0x5f9/0x690 fs/inode.c:797 generic_shutdown_super+0x9d/0x2d0 fs/super.c:627 kill_block_super+0x44/0x90 fs/super.c:1696 kill_f2fs_super+0x344/0x690 fs/f2fs/super.c:4898 deactivate_locked_super+0xc4/0x130 fs/super.c:473 cleanup_mnt+0x41f/0x4b0 fs/namespace.c:1373 task_work_run+0x24f/0x310 kernel/task_work.c:228 ptrace_notify+0x2d2/0x380 kernel/signal.c:2402 ptrace_report_syscall include/linux/ptrace.h:415 [inline] ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline] syscall_exit_work+0xc6/0x190 kernel/entry/common.c:173 syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline] syscall_exit_to_user_mode+0x279/0x370 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0010:f2fs_evict_inode+0x1598/0x15c0 fs/f2fs/inode.c:896 Online repaire on corrupted directory in f2fs_lookup() can generate dirty data/meta while racing w/ readonly remount, it may leave dirty inode after filesystem becomes readonly, however, checkpoint() will skips flushing dirty inode in a state of readonly mode, result in above panic. Let's get rid of online repaire in f2fs_lookup(), and leave the work to fsck.f2fs. Fixes: 510022a85839 ("f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries") Reported-by: syzbot+ebea2790904673d7c618@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000a7b20f061ff2d56a@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-11f2fs: prevent atomic file from being dirtied before commitDaeho Jeong
Keep atomic file clean while updating and make it dirtied during commit in order to avoid unnecessary and excessive inode updates in the previous fix. Fixes: 4bf78322346f ("f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag") Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: get rid of page->indexChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert read_node_page() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert __write_node_page() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_write_data_page() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_do_write_data_page() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_set_compressed_page() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_write_end() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_write_begin() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_submit_page_read() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_handle_page_eio() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_read_multi_pages() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert __f2fs_write_meta_page() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_do_write_meta_page() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_write_single_data_page() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_write_inline_data() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_clear_page_cache_dirty_tag() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_vm_page_mkwrite() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_compress_ctx_add_page() to use folioChao Yu
onvert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: Use sysfs_emit_at() to simplify codeChristophe JAILLET
This file already uses sysfs_emit(). So be consistent and also use sysfs_emit_at(). This slightly simplifies the code and makes it more readable. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: atomic: fix to forbid dio in atomic_fileChao Yu
atomic write can only be used via buffered IO, let's fail direct IO on atomic_file and return -EOPNOTSUPP. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: compress: don't redirty sparse cluster during {,de}compressYeongjin Gil
In f2fs_do_write_data_page, when the data block is NULL_ADDR, it skips writepage considering that it has been already truncated. This results in an infinite loop as the PAGECACHE_TAG_TOWRITE tag is not cleared during the writeback process for a compressed file including NULL_ADDR in compress_mode=user. This is the reproduction process: 1. dd if=/dev/zero bs=4096 count=1024 seek=1024 of=testfile 2. f2fs_io compress testfile 3. dd if=/dev/zero bs=4096 count=1 conv=notrunc of=testfile 4. f2fs_io decompress testfile To prevent the problem, let's check whether the cluster is fully allocated before redirty its pages. Fixes: 5fdb322ff2c2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Tested-by: Jaewook Kim <jw5454.kim@samsung.com> Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: check discard support for conventional zonesShin'ichiro Kawasaki
As the helper function f2fs_bdev_support_discard() shows, f2fs checks if the target block devices support discard by calling bdev_max_discard_sectors() and bdev_is_zoned(). This check works well for most cases, but it does not work for conventional zones on zoned block devices. F2fs assumes that zoned block devices support discard, and calls __submit_discard_cmd(). When __submit_discard_cmd() is called for sequential write required zones, it works fine since __submit_discard_cmd() issues zone reset commands instead of discard commands. However, when __submit_discard_cmd() is called for conventional zones, __blkdev_issue_discard() is called even when the devices do not support discard. The inappropriate __blkdev_issue_discard() call was not a problem before the commit 30f1e7241422 ("block: move discard checks into the ioctl handler") because __blkdev_issue_discard() checked if the target devices support discard or not. If not, it returned EOPNOTSUPP. After the commit, __blkdev_issue_discard() no longer checks it. It always returns zero and sets NULL to the given bio pointer. This NULL pointer triggers f2fs_bug_on() in __submit_discard_cmd(). The BUG is recreated with the commands below at the umount step, where /dev/nullb0 is a zoned null_blk with 5GB total size, 128MB zone size and 10 conventional zones. $ mkfs.f2fs -f -m /dev/nullb0 $ mount /dev/nullb0 /mnt $ for ((i=0;i<5;i++)); do dd if=/dev/zero of=/mnt/test bs=65536 count=1600 conv=fsync; done $ umount /mnt To fix the BUG, avoid the inappropriate __blkdev_issue_discard() call. When discard is requested for conventional zones, check if the device supports discard or not. If not, return EOPNOTSUPP. Fixes: 30f1e7241422 ("block: move discard checks into the ioctl handler") Cc: stable@vger.kernel.org Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: fix to avoid use-after-free in f2fs_stop_gc_thread()Chao Yu
syzbot reports a f2fs bug as below: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_report+0xe8/0x550 mm/kasan/report.c:491 kasan_report+0x143/0x180 mm/kasan/report.c:601 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:252 [inline] __refcount_add include/linux/refcount.h:184 [inline] __refcount_inc include/linux/refcount.h:241 [inline] refcount_inc include/linux/refcount.h:258 [inline] get_task_struct include/linux/sched/task.h:118 [inline] kthread_stop+0xca/0x630 kernel/kthread.c:704 f2fs_stop_gc_thread+0x65/0xb0 fs/f2fs/gc.c:210 f2fs_do_shutdown+0x192/0x540 fs/f2fs/file.c:2283 f2fs_ioc_shutdown fs/f2fs/file.c:2325 [inline] __f2fs_ioctl+0x443a/0xbe60 fs/f2fs/file.c:4325 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is below race condition, it may cause use-after-free issue in sbi->gc_th pointer. - remount - f2fs_remount - f2fs_stop_gc_thread - kfree(gc_th) - f2fs_ioc_shutdown - f2fs_do_shutdown - f2fs_stop_gc_thread - kthread_stop(gc_th->f2fs_gc_task) : sbi->gc_thread = NULL; We will call f2fs_do_shutdown() in two paths: - for f2fs_ioc_shutdown() path, we should grab sb->s_umount semaphore for fixing. - for f2fs_shutdown() path, it's safe since caller has already grabbed sb->s_umount semaphore. Reported-by: syzbot+1a8e2b31f2ac9bd3d148@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/0000000000005c7ccb061e032b9b@google.com Fixes: 7950e9ac638e ("f2fs: stop gc/discard thread after fs shutdown") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: atomic: fix to truncate pagecache before on-disk metadata truncationChao Yu
We should always truncate pagecache while truncating on-disk data. Fixes: a46bebd502fe ("f2fs: synchronize atomic write aborts") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: fix to wait page writeback before setting gcing flagChao Yu
Soft IRQ Thread - f2fs_write_end_io - f2fs_defragment_range - set_page_private_gcing - type = WB_DATA_TYPE(page, false); : assign type w/ F2FS_WB_CP_DATA due to page_private_gcing() is true - dec_page_count() w/ wrong type - end_page_writeback() Value of F2FS_WB_CP_DATA reference count may become negative under above race condition, the root cause is we missed to wait page writeback before setting gcing page private flag, let's fix it. Fixes: 2d1fe8a86bf5 ("f2fs: fix to tag gcing flag on page during file defragment") Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: Create COW inode from parent dentry for atomic writeYeongjin Gil
The i_pino in f2fs_inode_info has the previous parent's i_ino when inode was renamed, which may cause f2fs_ioc_start_atomic_write to fail. If file_wrong_pino is true and i_nlink is 1, then to find a valid pino, we should refer to the dentry from inode. To resolve this issue, let's get parent inode using parent dentry directly. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com> Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: Require FMODE_WRITE for atomic write ioctlsJann Horn
The F2FS ioctls for starting and committing atomic writes check for inode_owner_or_capable(), but this does not give LSMs like SELinux or Landlock an opportunity to deny the write access - if the caller's FSUID matches the inode's UID, inode_owner_or_capable() immediately returns true. There are scenarios where LSMs want to deny a process the ability to write particular files, even files that the FSUID of the process owns; but this can currently partially be bypassed using atomic write ioctls in two ways: - F2FS_IOC_START_ATOMIC_REPLACE + F2FS_IOC_COMMIT_ATOMIC_WRITE can truncate an inode to size 0 - F2FS_IOC_START_ATOMIC_WRITE + F2FS_IOC_ABORT_ATOMIC_WRITE can revert changes another process concurrently made to a file Fix it by requiring FMODE_WRITE for these operations, just like for F2FS_IOC_MOVE_RANGE. Since any legitimate caller should only be using these ioctls when intending to write into the file, that seems unlikely to break anything. Fixes: 88b88a667971 ("f2fs: support atomic writes") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: clean up val{>>,<<}F2FS_BLKSIZE_BITSZhiguo Niu
Use F2FS_BYTES_TO_BLK(bytes) and F2FS_BLK_TO_BYTES(blk) for cleanup Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: fix to use per-inode maxbytes and cleanupZhiguo Niu
This is a supplement to commit 6d1451bf7f84 ("f2fs: fix to use per-inode maxbytes") for some missed cases, also cleanup redundant code in f2fs_llseek. Cc: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: use f2fs_get_node_page when write inline dataZijie Wang
We just need inode page when write inline data, use f2fs_get_node_page() to get it instead of using dnode_of_data, which can eliminate unnecessary struct use. Signed-off-by: Zijie Wang <wangzijie1@honor.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: sysfs: support atgc_enabledliujinbao1
When we add "atgc" to the fstab table, ATGC is not immediately enabled. There is a 7-day time threshold, and we can use "atgc_enabled" to show whether ATGC is enabled. Signed-off-by: liujinbao1 <liujinbao1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15Revert "f2fs: use flush command instead of FUA for zoned device"Wenjie Cheng
This reverts commit c550e25bca660ed2554cbb48d32b82d0bb98e4b1. Commit c550e25bca660ed2554cbb48d32b82d0bb98e4b1 ("f2fs: use flush command instead of FUA for zoned device") used additional flush command to keep write order. Since Commit dd291d77cc90eb6a86e9860ba8e6e38eebd57d12 ("block: Introduce zone write plugging") has enabled the block layer to handle this order issue, there is no need to use flush command. Signed-off-by: Wenjie Cheng <cwjhust@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: get rid of buffer_head useChao Yu
Convert to use folio and related functionality. Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: fix to avoid racing in between read and OPU dio writeChao Yu
If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread A Thread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete In LFS mode, if there is inflight dio, let's wait for its completion, this policy won't cover all race cases, however it is a tradeoff which avoids abusing lock around IO paths. Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: fix to wait dio completionChao Yu
It should wait all existing dio write IOs before block removal, otherwise, previous direct write IO may overwrite data in the block which may be reused by other inode. Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15f2fs: reduce expensive checkpoint trigger frequencyChao Yu
We may trigger high frequent checkpoint for below case: 1. mkdir /mnt/dir1; set dir1 encrypted 2. touch /mnt/file1; fsync /mnt/file1 3. mkdir /mnt/dir2; set dir2 encrypted 4. touch /mnt/file2; fsync /mnt/file2 ... Although, newly created dir and file are not related, due to commit bbf156f7afa7 ("f2fs: fix lost xattrs of directories"), we will trigger checkpoint whenever fsync() comes after a new encrypted dir created. In order to avoid such performance regression issue, let's record an entry including directory's ino in global cache whenever we update directory's xattr data, and then triggerring checkpoint() only if xattr metadata of target file's parent was updated. This patch updates to cover below no encryption case as well: 1) parent is checkpointed 2) set_xattr(dir) w/ new xnid 3) create(file) 4) fsync(file) Fixes: bbf156f7afa7 ("f2fs: fix lost xattrs of directories") Reported-by: wangzijie <wangzijie1@honor.com> Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reported-by: Yunlei He <heyunlei@hihonor.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: atomic: fix to avoid racing w/ GCChao Yu
Case #1: SQLite App GC Thread Kworker Shrinker - f2fs_ioc_start_atomic_write - f2fs_ioc_commit_atomic_write - f2fs_commit_atomic_write - filemap_write_and_wait_range : write atomic_file's data to cow_inode echo 3 > drop_caches to drop atomic_file's cache. - f2fs_gc - gc_data_segment - move_data_page - set_page_dirty - writepages - f2fs_do_write_data_page : overwrite atomic_file's data to cow_inode - f2fs_down_write(&fi->i_gc_rwsem[WRITE]) - __f2fs_commit_atomic_write - f2fs_up_write(&fi->i_gc_rwsem[WRITE]) Case #2: SQLite App GC Thread Kworker - f2fs_ioc_start_atomic_write - __writeback_single_inode - do_writepages - f2fs_write_cache_pages - f2fs_write_single_data_page - f2fs_do_write_data_page : write atomic_file's data to cow_inode - f2fs_gc - gc_data_segment - move_data_page - set_page_dirty - writepages - f2fs_do_write_data_page : overwrite atomic_file's data to cow_inode - f2fs_ioc_commit_atomic_write In above cases racing in between atomic_write and GC, previous data in atomic_file may be overwrited to cow_file, result in data corruption. This patch introduces PAGE_PRIVATE_ATOMIC_WRITE bit flag in page.private, and use it to indicate that there is last dirty data in atomic file, and the data should be writebacked into cow_file, if the flag is not tagged in page, we should never write data across files. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: fix macro definition stat_inc_cp_countJulian Sun
The macro stat_inc_cp_count accepts a parameter si, but it was not used, rather the variable sbi was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: fix macro definition on_f2fs_build_free_nidsJulian Sun
The macro on_f2fs_build_free_nids accepts a parameter nmi, but it was not used, rather the variable nm_i was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: add write priority option based on zone UFSLiao Yuanhong
Currently, we are using a mix of traditional UFS and zone UFS to support some functionalities that cannot be achieved on zone UFS alone. However, there are some issues with this approach. There exists a significant performance difference between traditional UFS and zone UFS. Under normal usage, we prioritize writes to zone UFS. However, in critical conditions (such as when the entire UFS is almost full), we cannot determine whether data will be written to traditional UFS or zone UFS. This can lead to significant performance fluctuations, which is not conducive to development and testing. To address this, we have added an option zlu_io_enable under sys with the following three modes: 1) zlu_io_enable == 0:Normal mode, prioritize writing to zone UFS; 2) zlu_io_enable == 1:Zone UFS only mode, only allow writing to zone UFS; 3) zlu_io_enable == 2:Traditional UFS priority mode, prioritize writing to traditional UFS. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: avoid potential int overflow in sanity_check_area_boundary()Nikita Zhandarovich
While calculating the end addresses of main area and segment 0, u32 may be not enough to hold the result without the danger of int overflow. Just in case, play it safe and cast one of the operands to a wider type (u64). Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: fd694733d523 ("f2fs: cover large section in sanity check of super") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: fix several potential integer overflows in file offsetsNikita Zhandarovich
When dealing with large extents and calculating file offsets by summing up according extent offsets and lengths of unsigned int type, one may encounter possible integer overflow if the values are big enough. Prevent this from happening by expanding one of the addends to (pgoff_t) type. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: d323d005ac4a ("f2fs: support file defragment") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: prevent possible int overflow in dir_block_index()Nikita Zhandarovich
The result of multiplication between values derived from functions dir_buckets() and bucket_blocks() *could* technically reach 2^30 * 2^2 = 2^32. While unlikely to happen, it is prudent to ensure that it will not lead to integer overflow. Thus, use mul_u32_u32() as it's more appropriate to mitigate the issue. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 3843154598a0 ("f2fs: introduce large directory support") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05f2fs: clean up data_blkaddr() and get_dnode_addr()Chao Yu
Introudce a new help get_dnode_base() to wrap common code from get_dnode_addr() and data_blkaddr() for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05Merge tag 'slab-fixes-for-6.11-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: "Since v6.8 we've had a subtle breakage in SLUB with KFENCE enabled, that can cause a crash. It hasn't been found earlier due to quite specific conditions necessary (OOM during kmem_cache_alloc_bulk())" * tag 'slab-fixes-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm, slub: do not call do_slab_free for kfence object
2024-08-04Linux 6.11-rc2v6.11-rc2Linus Torvalds
2024-08-04profiling: remove profile=sleep supportTetsuo Handa
The kernel sleep profile is no longer working due to a recursive locking bug introduced by commit 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked") Booting with the 'profile=sleep' kernel command line option added or executing # echo -n sleep > /sys/kernel/profiling after boot causes the system to lock up. Lockdep reports kthreadd/3 is trying to acquire lock: ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70 but task is already holding lock: ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370 with the call trace being lock_acquire+0xc8/0x2f0 get_wchan+0x32/0x70 __update_stats_enqueue_sleeper+0x151/0x430 enqueue_entity+0x4b0/0x520 enqueue_task_fair+0x92/0x6b0 ttwu_do_activate+0x73/0x140 try_to_wake_up+0x213/0x370 swake_up_locked+0x20/0x50 complete+0x2f/0x40 kthread+0xfb/0x180 However, since nobody noticed this regression for more than two years, let's remove 'profile=sleep' support based on the assumption that nobody needs this functionality. Fixes: 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked") Cc: stable@vger.kernel.org # v5.16+ Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-08-04Merge tag 'x86-urgent-2024-08-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver. A recent change in the ACPI code which consolidated code pathes moved the invocation of init_freq_invariance_cppc() to be moved to a CPU hotplug handler. The first invocation on AMD CPUs ends up enabling a static branch which dead locks because the static branch enable tries to acquire cpu_hotplug_lock but that lock is already held write by the hotplug machinery. Use static_branch_enable_cpuslocked() instead and take the hotplug lock read for the Intel code path which is invoked from the architecture code outside of the CPU hotplug operations. - Fix the number of reserved bits in the sev_config structure bit field so that the bitfield does not exceed 64 bit. - Add missing Zen5 model numbers - Fix the alignment assumptions of pti_clone_pgtable() and clone_entry_text() on 32-bit: The code assumes PMD aligned code sections, but on 32-bit the kernel entry text is not PMD aligned. So depending on the code size and location, which is configuration and compiler dependent, entry text can cross a PMD boundary. As the start is not PMD aligned adding PMD size to the start address is larger than the end address which results in partially mapped entry code for user space. That causes endless recursion on the first entry from userspace (usually #PF). Cure this by aligning the start address in the addition so it ends up at the next PMD start address. clone_entry_text() enforces PMD mapping, but on 32-bit the tail might eventually be PTE mapped, which causes a map fail because the PMD for the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for the clone() invocation which resolves to PTE on 32-bit and PMD on 64-bit. - Zero the 8-byte case for get_user() on range check failure on 32-bit The recend consolidation of the 8-byte get_user() case broke the zeroing in the failure case again. Establish it by clearing ECX before the range check and not afterwards as that obvioulsy can't be reached when the range check fails * tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit x86/mm: Fix pti_clone_entry_text() for i386 x86/mm: Fix pti_clone_pgtable() alignment assumption x86/setup: Parse the builtin command line before merging x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range x86/sev: Fix __reserved field in sev_config x86/aperfmperf: Fix deadlock on cpu_hotplug_lock