Age | Commit message (Collapse) | Author |
|
Syzbot reports a f2fs bug as below:
INFO: task syz-executor328:5856 blocked for more than 144 seconds.
Not tainted 6.15.0-rc6-syzkaller-00208-g3c21441eeffc #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor328 state:D stack:24392 pid:5856 tgid:5832 ppid:5826 task_flags:0x400040 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5382 [inline]
__schedule+0x168f/0x4c70 kernel/sched/core.c:6767
__schedule_loop kernel/sched/core.c:6845 [inline]
schedule+0x165/0x360 kernel/sched/core.c:6860
io_schedule+0x81/0xe0 kernel/sched/core.c:7742
f2fs_balance_fs+0x4b4/0x780 fs/f2fs/segment.c:444
f2fs_map_blocks+0x3af1/0x43b0 fs/f2fs/data.c:1791
f2fs_expand_inode_data+0x653/0xaf0 fs/f2fs/file.c:1872
f2fs_fallocate+0x4f5/0x990 fs/f2fs/file.c:1975
vfs_fallocate+0x6a0/0x830 fs/open.c:338
ioctl_preallocate fs/ioctl.c:290 [inline]
file_ioctl fs/ioctl.c:-1 [inline]
do_vfs_ioctl+0x1b8f/0x1eb0 fs/ioctl.c:885
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0x82/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The root cause is after commit 84b5bb8bf0f6 ("f2fs: modify
f2fs_is_checkpoint_ready logic to allow more data to be written with the
CP disable"), we will get chance to allow f2fs_is_checkpoint_ready() to
return true once below conditions are all true:
1. checkpoint is disabled
2. there are not enough free segments
3. there are enough free blocks
Then it will cause f2fs_balance_fs() to trigger foreground GC.
void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
...
if (!f2fs_is_checkpoint_ready(sbi))
return;
And the testcase mounts f2fs image w/ gc_merge,checkpoint=disable, so deadloop
will happen through below race condition:
- f2fs_do_shutdown - vfs_fallocate - gc_thread_func
- file_start_write
- __sb_start_write(SB_FREEZE_WRITE)
- f2fs_fallocate
- f2fs_expand_inode_data
- f2fs_map_blocks
- f2fs_balance_fs
- prepare_to_wait
- wake_up(gc_wait_queue_head)
- io_schedule
- bdev_freeze
- freeze_super
- sb->s_writers.frozen = SB_FREEZE_WRITE;
- sb_wait_write(sb, SB_FREEZE_WRITE);
- if (sbi->sb->s_writers.frozen >= SB_FREEZE_WRITE) continue;
: cause deadloop
This patch fix to add check condition in f2fs_balance_fs(), so that if
checkpoint is disabled, we will just skip trigger foreground GC to
avoid such deadloop issue.
Meanwhile let's remove f2fs_is_checkpoint_ready() check condition in
f2fs_balance_fs(), since it's redundant, due to the main logic in the
function is to check:
a) whether checkpoint is disabled
b) there is enough free segments
f2fs_balance_fs() still has all logics after f2fs_is_checkpoint_ready()'s
removal.
Reported-by: syzbot+aa5bb5f6860e08a60450@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/682d743a.a00a0220.29bc26.0289.GAE@google.com
Fixes: 84b5bb8bf0f6 ("f2fs: modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable")
Cc: Qi Han <hanqi@vivo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
when performing buffered writes in a large section,
overhead is incurred due to the iteration through
ckpt_valid_blocks within the section.
when SEGS_PER_SEC is 128, this overhead accounts for 20% within
the f2fs_write_single_data_page routine.
as the size of the section increases, the overhead also grows.
to handle this problem ckpt_valid_blocks is
added within the section entries.
Test
insmod null_blk.ko nr_devices=1 completion_nsec=1 submit_queues=8
hw_queue_depth=64 max_sectors=512 bs=4096 memory_backed=1
make_f2fs /dev/block/nullb0
make_f2fs -s 128 /dev/block/nullb0
fio --bs=512k --size=1536M --rw=write --name=1
--filename=/mnt/test_dir/seq_write
--ioengine=io_uring --iodepth=64 --end_fsync=1
before
SEGS_PER_SEC 1
2556MiB/s
SEGS_PER_SEC 128
2145MiB/s
after
SEGS_PER_SEC 1
2556MiB/s
SEGS_PER_SEC 128
2556MiB/s
Signed-off-by: yohan.joung <yohan.joung@sk.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Support to inject a timeout fault into function, currently it only
support to inject timeout to commit_atomic_write flow to reproduce
inconsistent bug, like the bug fixed by commit f098aeba04c9 ("f2fs:
fix to avoid atomicity corruption of atomic file").
By default, the new type fault will inject 1000ms timeout, and the
timeout process can be interrupted by SIGKILL.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
------------[ cut here ]------------
WARNING: CPU: 3 PID: 579 at fs/f2fs/segment.c:2832 new_curseg+0x5e8/0x6dc
pc : new_curseg+0x5e8/0x6dc
Call trace:
new_curseg+0x5e8/0x6dc
f2fs_allocate_data_block+0xa54/0xe28
do_write_page+0x6c/0x194
f2fs_do_write_node_page+0x38/0x78
__write_node_page+0x248/0x6d4
f2fs_sync_node_pages+0x524/0x72c
f2fs_write_checkpoint+0x4bc/0x9b0
__checkpoint_and_complete_reqs+0x80/0x244
issue_checkpoint_thread+0x8c/0xec
kthread+0x114/0x1bc
ret_from_fork+0x10/0x20
get_new_segment() detects inconsistent status in between free_segmap
and free_secmap, let's record such error into super block, and bail
out get_new_segment() instead of continue using the segment.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
A zoned device can has both conventional zones and sequential zones,
so we should not treat first segment of zoned device as first_zoned_segno,
instead, we need to check zone type for each zone during traversing zoned
device to find first_zoned_segno.
Otherwise, for below case, first_zoned_segno will be 0, which could be
wrong.
create_null_blk 512 2 1024 1024
mkfs.f2fs -m /dev/nullb0
Testcase:
export SCRIPTS_PATH=/share/git/scripts
test multiple devices w/ zoned device
for ((i=0;i<8;i++)) do {
zonesize=$((2<<$i))
conzone=$((4096/$zonesize))
seqzone=$((4096/$zonesize))
$SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone
mkfs.f2fs -f -m /dev/vdb -c /dev/nullb0
mount /dev/vdb /mnt/f2fs
touch /mnt/f2fs/file
f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2))
stat /mnt/f2fs/file
df
cat /proc/fs/f2fs/vdb/segment_info
umount /mnt/f2fs
$SCRIPTS_PATH/nullblk_remove.sh 0
} done
test single zoned device
for ((i=0;i<8;i++)) do {
zonesize=$((2<<$i))
conzone=$((4096/$zonesize))
seqzone=$((4096/$zonesize))
$SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone
mkfs.f2fs -f -m /dev/nullb0
mount /dev/nullb0 /mnt/f2fs
touch /mnt/f2fs/file
f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2))
stat /mnt/f2fs/file
df
cat /proc/fs/f2fs/nullb0/segment_info
umount /mnt/f2fs
$SCRIPTS_PATH/nullblk_remove.sh 0
} done
Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert all callers to set/get a folio instead of a page. Removes
five calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
All assignments to this struct member are conversions from a folio
so convert it to be a folio and convert all users. At the same time,
convert data_blkaddr() to take a folio as all callers now have a folio.
Remove eight calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fetch a folio from the pagecache and use it. Removes two calls
to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Get a folio and use it. Saves a call to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert f2fs_get_sum_page() to f2fs_get_sum_folio() and add a
f2fs_get_sum_page() wrapper.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Also convert get_current_nat_page() to get_current_nat_folio().
Removes three hidden calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Get a folio instead of a page. Saves a hidden call to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Get a folio instead of a page. Saves two hidden calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert get_current_sit_page() to get_current_sit_folio() and then
use the folio in build_sit_entries(). Saves a hidden call to
compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Grab a folio instead of a page. Saves four hidden calls to
compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Grab a folio instead of a page. Saves two hidden calls to
compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Grab a folio instead of a page. Saves two hidden calls to
compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Grab a folio instead of a page. Also convert seg_info_to_sit_page() to
seg_info_to_sit_folio() and use a folio in f2fs_flush_sit_entries().
Saves a couple of calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The only caller which passes a page already has a folio, so pass
it in.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This helper returns the inode associated with the f2fs_io_info. That's a
relatively common thing to want, mildly awkward to get and provides one
place to change if we decide to record it directly, or change fio->page
to fio->folio.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[Jaegeuk Kim: fix wrong fio_inode conversion]
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
1. After we start atomic write in a database file, before committing
all data, we'd better not set inode w/ vfs dirty status to avoid
redundant updates, instead, we only set inode w/ atomic dirty status.
2. After we commit all data, before committing metadata, we need to
clear atomic dirty status, and set vfs dirty status to allow vfs flush
dirty inode.
Cc: Daeho Jeong <daehojeong@google.com>
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
During a checkpoint, the current active segment X may not be handled
properly. This occurs when segment X has 0 valid blocks and a non-zero
number of discard blocks, for the following reasons:
locate_dirty_segment() does not mark any active segment as a prefree
segment. As a result, segment X is not included in dirty_segmap[PRE], and
f2fs_clear_prefree_segments() skips it when handling prefree segments.
add_discard_addrs() skips any segment with 0 valid blocks, so segment X is
also skipped.
Consequently, no `struct discard_cmd` is actually created for segment X.
However, the ckpt_valid_map and cur_valid_map of segment X are synced by
seg_info_to_raw_sit() during the current checkpoint process. As a result,
it cannot find the missing discard bits even in subsequent checkpoints.
Consequently, the value of sbi->discard_blks remains non-zero. Thus, when
f2fs is umounted, CP_TRIMMED_FLAG will not be set due to the non-zero
sbi->discard_blks.
Relevant code process:
f2fs_write_checkpoint()
f2fs_flush_sit_entries()
list_for_each_entry_safe(ses, tmp, head, set_list) {
for_each_set_bit_from(segno, bitmap, end) {
...
add_discard_addrs(sbi, cpc, false); // skip segment X due to its 0 valid blocks
...
seg_info_to_raw_sit(); // sync ckpt_valid_map with cur_valid_map for segment X
...
}
}
f2fs_clear_prefree_segments(); // segment X is not included in dirty_segmap[PRE] and is skipped
This issue is easy to reproduce with the following operations:
root # mkfs.f2fs -f /dev/f2fs_dev
root # mount -t f2fs /dev/f2fs_dev /mnt_point
root # dd if=/dev/blk_dev of=/mnt_point/1.bin bs=4k count=256
root # sync
root # rm /mnt_point/1.bin
root # umount /mnt_point
root # dump.f2fs /dev/f2fs_dev | grep "checkpoint state"
Info: checkpoint state = 45 : crc compacted_summary unmount ---- 'trimmed' flag is missing
Since add_discard_addrs() can handle active segments with non-zero valid
blocks, it is reasonable to fix this issue by allowing it to also handle
active segments with 0 valid blocks.
Fixes: b29555505d81 ("f2fs: add key functions for small discards")
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
All its callers now have access to a folio, so pass it in. Removes
an access to page->mapping.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert fio->page to a folio then use it where folio APIs exist.
Removes a reference to page->mapping and a hidden call to
compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert f2fs_wait_on_page_writeback() to f2fs_folio_wait_writeback()
and add a compatibiility wrapper. Replaces five calls to
compound_head() with one.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
commit 4f993264fe29 ("f2fs: introduce discard_unit mount option") introduced
a bug, when we enable discard_unit=section option, it will set
.discard_granularity to BLKS_PER_SEC(), however discard granularity only
supports [1, 512], once section size is not equal to segment size, it will
cause issue_discard_thread() in DPOLICY_BG mode will not select discard entry
w/ any granularity to issue.
Fixes: 4f993264fe29 ("f2fs: introduce discard_unit mount option")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Yohan Joung <yohan.joung@sk.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
syzbot reports a f2fs bug as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2746!
CPU: 0 UID: 0 PID: 5323 Comm: syz.0.0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
RIP: 0010:get_new_segment fs/f2fs/segment.c:2746 [inline]
RIP: 0010:new_curseg+0x1f52/0x1f70 fs/f2fs/segment.c:2876
Call Trace:
<TASK>
__allocate_new_segment+0x1ce/0x940 fs/f2fs/segment.c:3210
f2fs_allocate_new_section fs/f2fs/segment.c:3224 [inline]
f2fs_allocate_pinning_section+0xfa/0x4e0 fs/f2fs/segment.c:3238
f2fs_expand_inode_data+0x696/0xca0 fs/f2fs/file.c:1830
f2fs_fallocate+0x537/0xa10 fs/f2fs/file.c:1940
vfs_fallocate+0x569/0x6e0 fs/open.c:327
do_vfs_ioctl+0x258c/0x2e40 fs/ioctl.c:885
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0x80/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Concurrent pinfile allocation may run out of free section, result in
panic in get_new_segment(), let's expand pin_sem lock coverage to
include f2fs_gc(), so that we can make sure to reclaim enough free
space for following allocation.
In addition, do below changes to enhance error path handling:
- call f2fs_bug_on() only in non-pinfile allocation path in
get_new_segment().
- call reset_curseg_fields() to reset all fields of curseg in
new_curseg()
Fixes: f5a53edcf01e ("f2fs: support aligned pinned file")
Reported-by: syzbot+15669ec8c35ddf6c3d43@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/675cd64e.050a0220.37aaf.00bb.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When testing the atomic write fix patches, the f2fs_bug_on was
triggered as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:935!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 UID: 0 PID: 257 Comm: bash Not tainted 6.13.0-rc1-00033-gc283a70d3497 #5
RIP: 0010:f2fs_evict_inode+0x50f/0x520
Call Trace:
<TASK>
? __die_body+0x65/0xb0
? die+0x9f/0xc0
? do_trap+0xa1/0x170
? f2fs_evict_inode+0x50f/0x520
? f2fs_evict_inode+0x50f/0x520
? handle_invalid_op+0x65/0x80
? f2fs_evict_inode+0x50f/0x520
? exc_invalid_op+0x39/0x50
? asm_exc_invalid_op+0x1a/0x20
? __pfx_f2fs_get_dquots+0x10/0x10
? f2fs_evict_inode+0x50f/0x520
? f2fs_evict_inode+0x2e5/0x520
evict+0x186/0x2f0
prune_icache_sb+0x75/0xb0
super_cache_scan+0x1a8/0x200
do_shrink_slab+0x163/0x320
shrink_slab+0x2fc/0x470
drop_slab+0x82/0xf0
drop_caches_sysctl_handler+0x4e/0xb0
proc_sys_call_handler+0x183/0x280
vfs_write+0x36d/0x450
ksys_write+0x68/0xd0
do_syscall_64+0xc8/0x1a0
? arch_exit_to_user_mode_prepare+0x11/0x60
? irqentry_exit_to_user_mode+0x7e/0xa0
The root cause is: f2fs uses FI_ATOMIC_DIRTIED to indicate dirty
atomic files during commit. If the inode is dirtied during commit,
such as by f2fs_i_pino_write, the vfs inode keeps clean and the
f2fs inode is set to FI_DIRTY_INODE. The FI_DIRTY_INODE flag cann't
be cleared by write_inode later due to the clean vfs inode. Finally,
f2fs_bug_on is triggered due to this inconsistent state when evict.
To reproduce this situation:
- fd = open("/mnt/test.db", O_WRONLY)
- ioctl(fd, F2FS_IOC_START_ATOMIC_WRITE)
- mv /mnt/test.db /mnt/test1.db
- ioctl(fd, F2FS_IOC_COMMIT_ATOMIC_WRITE)
- echo 3 > /proc/sys/vm/drop_caches
To fix this problem, clear FI_DIRTY_INODE after commit, then
f2fs_mark_inode_dirty_sync will ensure a consistent dirty state.
Fixes: fccaa81de87e ("f2fs: prevent atomic file from being dirtied before commit")
Signed-off-by: Yunlei He <heyunlei@xiaomi.com>
Signed-off-by: Jianan Huang <huangjianan@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The following two 'check only recovery' processes are very dependent on
the return value of f2fs_recover_fsync_data, especially when the return
value is greater than 0.
1. when device has readonly mode, shown as commit
23738e74472f ("f2fs: fix to restrict mount condition on readonly block device")
2. mount optiont NORECOVERY or DISABLE_ROLL_FORWARD is set, shown as commit
6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount")
However, commit c426d99127b1 ("f2fs: Check write pointer consistency of open zones")
will change the return value unexpectedly, thereby changing the caller's behavior
This patch let the f2fs_recover_fsync_data return correct value,and not do
f2fs_check_and_fix_write_pointer when the device is read-only.
Fixes: c426d99127b1 ("f2fs: Check write pointer consistency of open zones")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
syzbot reported a f2fs bug as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/gc.c:373!
CPU: 0 UID: 0 PID: 5316 Comm: syz.0.0 Not tainted 6.13.0-rc3-syzkaller-00044-gaef25be35d23 #0
RIP: 0010:get_cb_cost fs/f2fs/gc.c:373 [inline]
RIP: 0010:get_gc_cost fs/f2fs/gc.c:406 [inline]
RIP: 0010:f2fs_get_victim+0x68b1/0x6aa0 fs/f2fs/gc.c:912
Call Trace:
<TASK>
__get_victim fs/f2fs/gc.c:1707 [inline]
f2fs_gc+0xc89/0x2f60 fs/f2fs/gc.c:1915
f2fs_ioc_gc fs/f2fs/file.c:2624 [inline]
__f2fs_ioctl+0x4cc9/0xb8b0 fs/f2fs/file.c:4482
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
w/ below testcase, it can reproduce directly:
- dd if=/dev/zero of=/tmp/file bs=1M count=64
- mkfs.f2fs /tmp/file
- mount -t f2fs -o loop,mode=fragment:block /tmp/file /mnt/f2fs
- echo 0 > /sys/fs/f2fs/loop0/min_ssr_sections
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=5
- umount /mnt/f2fs
- for((i=4096;i<16384;i+=512)) do inject.f2fs --sit 0 --blk $i --mb mtime --val -1 /tmp/file; done
- mount -o loop /tmp/file /mnt/f2fs
- f2fs_io gc 0 /mnt/f2fs/file
static unsigned int get_cb_cost()
{
...
mtime = f2fs_get_section_mtime(sbi, segno);
f2fs_bug_on(sbi, mtime == INVALID_MTIME);
...
}
The root cause is: mtime in f2fs_sit_entry can be fuzzed to INVALID_MTIME,
then it will trigger BUG_ON in get_cb_cost() during GC.
Let's change behavior of f2fs_get_section_mtime() as below for fix:
- return INVALID_MTIME only if total valid blocks is zero.
- return INVALID_MTIME - 1 if average mtime calculated is
INVALID_MTIME.
Fixes: b19ee7272208 ("f2fs: introduce f2fs_get_section_mtime")
Reported-by: syzbot+b9972806adbe20a910eb@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/6768c82e.050a0220.226966.0035.GAE@google.com
Cc: liuderong <liuderong@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
New function can process some consecutive blocks at a time.
Function f2fs_invalidate_blocks()->down_write() and up_write()
are very time-consuming, so if f2fs_invalidate_blocks() can
process consecutive blocks at one time, it will save a lot of time.
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This function can process some consecutive blocks at a time.
When using update_sit_entry() to release consecutive blocks,
ensure that the consecutive blocks belong to the same segment.
Because after update_sit_entry_for_realese(), @segno is still
in use in update_sit_entry().
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
No logical changes, just for cleanliness.
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
New function can process some consecutive blocks at a time.
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Piergiorgio reported a bug in bugzilla as below:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330
RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs]
Call Trace:
__issue_discard_cmd+0x1ca/0x350 [f2fs]
issue_discard_thread+0x191/0x480 [f2fs]
kthread+0xcf/0x100
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x1a/0x30
w/ below testcase, it can reproduce this bug quickly:
- pvcreate /dev/vdb
- vgcreate myvg1 /dev/vdb
- lvcreate -L 1024m -n mylv1 myvg1
- mount /dev/myvg1/mylv1 /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20
- sync
- rm /mnt/f2fs/file
- sync
- lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1
- umount /mnt/f2fs
The root cause is: it will update discard_max_bytes of mounted lvm
device to zero after creating snapshot on this lvm device, then,
__submit_discard_cmd() will pass parameter @nr_sects w/ zero value
to __blkdev_issue_discard(), it returns a NULL bio pointer, result
in panic.
This patch changes as below for fixing:
1. Let's drop all remained discards in f2fs_unfreeze() if snapshot
of lvm device is created.
2. Checking discard_max_bytes before submitting discard during
__submit_discard_cmd().
Cc: stable@vger.kernel.org
Fixes: 35ec7d574884 ("f2fs: split discard command in prior to block layer")
Reported-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fsync data recovery attempts to check and fix write pointer consistency
of cursegs and all other zones. If the write pointers of cursegs are
unaligned, cursegs are changed to new sections.
If recovery fails, zone write pointers are still checked and fixed,
but the latest checkpoint cannot be written back. Additionally, retry-
mount skips recovery and rolls back to reuse the old cursegs whose
zones are already finished. This can lead to unaligned write later.
This patch addresses the issue by leaving writer pointers untouched if
recovery fails. When retry-mount is performed, cursegs and other zones
are checked and fixed after skipping recovery.
Signed-off-by: Song Feng <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch introduces a new helper log_type_to_seg_type() to convert
log type to segment data type, and uses it to clean up opened codes
in build_curseg(), and also it fixes to convert log type before use
in do_write_page().
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In __get_segment_type(), __get_segment_type_6() may return
CURSEG_COLD_DATA_PINNED or CURSEG_ALL_DATA_ATGC log type, but
following f2fs_get_segment_temp() can only handle persistent
log type, fix it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In the __f2fs_init_atgc_curseg->get_atssr_segment calling,
curseg->segno is NULL_SEGNO, indicating that there is no summary
block that needs to be written.
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This f2fs_bug_on was introduced by commit 2c1905042c8c ("f2fs: check
segment type in __f2fs_replace_block") when there were only 6 curseg types.
After commit d0b9e42ab615 ("f2fs: introduce inmem curseg") was introduced,
the condition should be changed to checking curseg->seg_type.
Fixes: d0b9e42ab615 ("f2fs: introduce inmem curseg")
Signed-off-by: LongPing Wei <weilongping@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
first_zoned_segno() returns a fixed value, let's cache it in
structure f2fs_sb_info to avoid redundant calculation.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Now we reclaim too much space before allocating pinned space for zoned
devices.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When segs_per_sec is larger than 1, section may contain invalid segments,
mtime should be the average value of each valid blocks,
so introduce f2fs_get_section_mtime to
record the average mtime of all valid blocks in a section.
Signed-off-by: liuderong <liuderong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Remove unused parameter segno from f2fs_usable_segs_in_sec.
Signed-off-by: liuderong <liuderong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs: fix to don't panic system for no free segment fault injection
syzbot reports a f2fs bug as below:
F2FS-fs (loop0): inject no free segment in get_new_segment of __allocate_new_segment+0x1ce/0x940 fs/f2fs/segment.c:3167
F2FS-fs (loop0): Stopped filesystem due to reason: 7
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2748!
CPU: 0 UID: 0 PID: 5109 Comm: syz-executor304 Not tainted 6.11.0-rc6-syzkaller-00363-g89f5e14d05b4 #0
RIP: 0010:get_new_segment fs/f2fs/segment.c:2748 [inline]
RIP: 0010:new_curseg+0x1f61/0x1f70 fs/f2fs/segment.c:2836
Call Trace:
__allocate_new_segment+0x1ce/0x940 fs/f2fs/segment.c:3167
f2fs_allocate_new_section fs/f2fs/segment.c:3181 [inline]
f2fs_allocate_pinning_section+0xfa/0x4e0 fs/f2fs/segment.c:3195
f2fs_expand_inode_data+0x5d6/0xbb0 fs/f2fs/file.c:1799
f2fs_fallocate+0x448/0x960 fs/f2fs/file.c:1903
vfs_fallocate+0x553/0x6c0 fs/open.c:334
do_vfs_ioctl+0x2592/0x2e50 fs/ioctl.c:886
__do_sys_ioctl fs/ioctl.c:905 [inline]
__se_sys_ioctl+0x81/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0010:get_new_segment fs/f2fs/segment.c:2748 [inline]
RIP: 0010:new_curseg+0x1f61/0x1f70 fs/f2fs/segment.c:2836
The root cause is when we inject no free segment fault into f2fs,
we should not panic system, fix it.
Fixes: 8b10d3653735 ("f2fs: introduce FAULT_NO_SEGMENT")
Reported-by: syzbot+341e5f32ebafbb46b81c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000f0ee5b0621ab694b@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We need to introduce a valid block ratio threshold not to trigger
excessive GC for zoned deivces. The initial value of it is 95%. So, F2FS
will stop the thread from intiating GC for sections having valid blocks
exceeding the ratio.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Keep atomic file clean while updating and make it dirtied during commit
in order to avoid unnecessary and excessive inode updates in the previous
fix.
Fixes: 4bf78322346f ("f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert to use folio, so that we can get rid of 'page->index' to
prepare for removal of 'index' field in structure page [1].
[1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Convert to use folio, so that we can get rid of 'page->index' to
prepare for removal of 'index' field in structure page [1].
[1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As the helper function f2fs_bdev_support_discard() shows, f2fs checks if
the target block devices support discard by calling
bdev_max_discard_sectors() and bdev_is_zoned(). This check works well
for most cases, but it does not work for conventional zones on zoned
block devices. F2fs assumes that zoned block devices support discard,
and calls __submit_discard_cmd(). When __submit_discard_cmd() is called
for sequential write required zones, it works fine since
__submit_discard_cmd() issues zone reset commands instead of discard
commands. However, when __submit_discard_cmd() is called for
conventional zones, __blkdev_issue_discard() is called even when the
devices do not support discard.
The inappropriate __blkdev_issue_discard() call was not a problem before
the commit 30f1e7241422 ("block: move discard checks into the ioctl
handler") because __blkdev_issue_discard() checked if the target devices
support discard or not. If not, it returned EOPNOTSUPP. After the
commit, __blkdev_issue_discard() no longer checks it. It always returns
zero and sets NULL to the given bio pointer. This NULL pointer triggers
f2fs_bug_on() in __submit_discard_cmd(). The BUG is recreated with the
commands below at the umount step, where /dev/nullb0 is a zoned null_blk
with 5GB total size, 128MB zone size and 10 conventional zones.
$ mkfs.f2fs -f -m /dev/nullb0
$ mount /dev/nullb0 /mnt
$ for ((i=0;i<5;i++)); do dd if=/dev/zero of=/mnt/test bs=65536 count=1600 conv=fsync; done
$ umount /mnt
To fix the BUG, avoid the inappropriate __blkdev_issue_discard() call.
When discard is requested for conventional zones, check if the device
supports discard or not. If not, return EOPNOTSUPP.
Fixes: 30f1e7241422 ("block: move discard checks into the ioctl handler")
Cc: stable@vger.kernel.org
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|