Age | Commit message (Collapse) | Author |
|
It would better split small and large IOs separately in order to get more
consecutive big writes.
The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch changes to use bitmap instead of extent in struct discard_entry
to indicate discard range in one segment, for fragmented space, this
implementation can save memory footprint.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Remove unneeded parameter and simply change flow in
destroy_discard_cmd_control.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Adds to count discard command entry and show the number in debugfs,
also fix to add cost of discard command cache into total comsumed
memory footprint.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Show historical count of flush command and discard command.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
- has_not_enough_free_secs
node_secs: 0 dent_secs: 0 freed:0 free_segments:103 reserved:104
- f2fs_gc
- get_victim_by_default
alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1
- do_garbage_collect
start_segno 3976, end_segno 3977 type 0
- is_alive
nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0
- gc_data_segment 766, segno 3976, block 512/426 not alive
So, this patch fixes subtle corrupted case where node version does not match
to summary version which results in infinite loop by gc.
Reported-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch initiates SSR much eariler, resulting in less FG_GC.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In order to give more spatial locality, this patch changes the block allocation
policy which assigns beginning of partition for small and hot data/node blocks.
In order to do this, we set noheap allocation by default and introduce another
mount option, heap, to reset it back.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch fixes missing increased max cost caused by a patch that we increased
cose of data segments in greedy algorithm.
Cc: <stable@vger.kernel.org> # v4.10+
Fixes: b9cd20619 "f2fs: node segment is prior to data segment selected victim"
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch allow write data to normal file when writting
new checkpoint.
We relax three limitations for write_begin path:
1. data allocation
2. node allocation
3. variables in checkpoint
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In this patch, we change xattr block disk layout as below:
Before:
xattr node block layout
+---------------------------------------------+---------------+-------------+
| node block xattr entries | reserved | node footer |
| 4068 Bytes | 4 Bytes | 24 Bytes |
In memory layout
+--------------------+---------------------------------+--------------------+
| inline xattr | node block xattr entries | reserved |
| 200 Bytes | 4068 Bytes | 4 Bytes |
After:
xattr node block layout
+-------------------------------------------------------------+-------------+
| node block xattr entries | node footer |
| 4072 Bytes | 24 Bytes |
In memory layout
+--------------------+---------------------------------+--------------------+
| inline xattr | node block xattr entries | reserved |
| 200 Bytes | 4072 Bytes | 4 Bytes |
With this change, we don't need to reserve additional space in node block,
just keep reserved space in logical in-memory layout. So that it would help
to enlarge valid free space of xattr node block.
As tested, generic/026 shows max stored xattr entires number increases from
531 to 532 when inline_xattr option is enabled.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
1. don't allocate redundant memory in read_all_xattrs.
2. introduce RESERVED_XATTR_SIZE for cleanup.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Don't track volatile file in dirty inode list, otherwise with data_flush
option, background thread will entry into endless loop for flushing
journal file's pages.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds to show the max number of volatile operations which are
conducting concurrently.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In below concurrent case, allocated nid can be loaded into free nid cache
and be allocated again.
Thread A Thread B
- f2fs_create
- f2fs_new_inode
- alloc_nid
- __insert_nid_to_list(ALLOC_NID_LIST)
- f2fs_balance_fs_bg
- build_free_nids
- __build_free_nids
- scan_nat_page
- add_free_nid
- __lookup_nat_cache
- f2fs_add_link
- init_inode_metadata
- new_inode_page
- new_node_page
- set_node_addr
- alloc_nid_done
- __remove_nid_from_list(ALLOC_NID_LIST)
- __insert_nid_to_list(FREE_NID_LIST)
This patch makes nat cache lookup and free nid list operation being atomical
to avoid this race condition.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Use set_page_private marcro instead of operte page struct
directly
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When doing garbage collection, we try to record segment offset which
locates at next one of last victim, using it as the start offset in
next searching.
But in some corner cases, recorded offset may cross the end of main
segment area, it will cause incorrectly searching in dirty_segmap
bitmap. This patch adds modular operation to avoid this issue.
Reported-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The memory size of f2fs_stat_info also should be calculated.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
After filemap_write_and_wait_range fail, the FI_ATOMIC_FILE flags is removed,
so that f2fs should not increase the stat of atomic_write.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The crc_offset towards or beyond the end of block is wrong,
sanity check it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As discuss with Jaegeuk and Chao,
"Once checkpoint is done, f2fs doesn't need to update there-in filename at all."
The disk-level filename is used only one case,
1. create a file A under a dir
2. sync A
3. godown
4. umount
5. mount (roll_forward)
Only the rename/cross_rename changes the filename, if it happens,
a. between step 1 and 2, the sync A will caused checkpoint, so that,
the roll_forward at step 5 never happens.
b. after step 2, the roll_forward happens, file A will roll forward
to the result as after step 1.
So that, any updating the disk filename is useless, just cleanup it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
free_nid_bitmap and free_nid_count in update_free_nid_bitmap should be
updated atomically, use nid_list_lock cover them to avoid race in
concurrent scenario.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
For f2fs_read_data_pages, the f2fs_mpage_readpages gets "page == NULL",
so that, the prefetchw(&page->flags) is operated on NULL.
Fixes: f1e8866016 ("f2fs: expose f2fs_mpage_readpages")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Clear FI_DATA_EXIST flag atomically in truncate_inline_inode, and
the return value from truncate_inline_inode isn't used, remove it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It's needless of mnt_want_write_file for arguments checking.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The inode_newsize_ok is better than only checking the maxbytes,
eg. the rlimit etc.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If move file range return error, the data copied to user-space is duplicate.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
use a slightly simpler expression to calculate nat block with nid.
Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Inject a fault during f2fs_truncate().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch checks the parameter range passed by ioctl to void that range
exceeds the max_file_blocks limit.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch add a function to punch discard command if one segment
reuse before discard. Split this segment from multi-segments discard
range, and discard the left bigger range.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Let's allocate a bio when issuing discard commands later.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Skip writeback meta pages if cp_mutex lock acquire failed, cp will
flush dirty pages instead.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This case is not caused by fsck.f2fs. User needs to retry mount.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fixes: 3cf4574705 ("f2fs: introduce get_next_page_offset to speed up SEEK_DATA")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The nat entry is listed from the set list for freeing,
it's duplicate to do radix tree lookup again.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
[Jaegeuk Kim: remove unnecessary f2fs_bug_on]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The root device's issue flush trace is missing,
add it and tracing the result from submit.
Fixes d50aaeec90 ("f2fs: show actual device info in tracepoints")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Now f2fs only supports volatile writes for journal db regular file.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The atomic writes only supports regular files for database.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When I forced to enable atomic operations intentionally, I could hit the below
panic, since we didn't clear page->private in f2fs_invalidate_page called by
file truncation.
The panic occurs due to NULL mapping having page->private.
BUG: unable to handle kernel paging request at ffffffffffffffff
IP: drop_buffers+0x38/0xe0
PGD 5d00c067
PUD 5d00e067
PMD 0
CPU: 3 PID: 1648 Comm: fsstress Tainted: G D OE 4.10.0+ #5
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: ffff9151952863c0 task.stack: ffffaaec40db4000
RIP: 0010:drop_buffers+0x38/0xe0
RSP: 0018:ffffaaec40db74c8 EFLAGS: 00010292
Call Trace:
? page_referenced+0x8b/0x170
try_to_free_buffers+0xc5/0xe0
try_to_release_page+0x49/0x50
shrink_page_list+0x8bc/0x9f0
shrink_inactive_list+0x1dd/0x500
? shrink_active_list+0x2c0/0x430
shrink_node_memcg+0x5eb/0x7c0
shrink_node+0xe1/0x320
do_try_to_free_pages+0xef/0x2e0
try_to_free_pages+0xe9/0x190
__alloc_pages_slowpath+0x390/0xe70
__alloc_pages_nodemask+0x291/0x2b0
alloc_pages_current+0x95/0x140
__page_cache_alloc+0xc4/0xe0
pagecache_get_page+0xab/0x2a0
grab_cache_page_write_begin+0x20/0x40
get_read_data_page+0x2e6/0x4c0 [f2fs]
? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs]
? truncate_data_blocks_range+0x238/0x2b0 [f2fs]
get_lock_data_page+0x30/0x190 [f2fs]
__exchange_data_block+0xaaf/0xf40 [f2fs]
f2fs_fallocate+0x418/0xd00 [f2fs]
vfs_fallocate+0x157/0x220
SyS_fallocate+0x48/0x80
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Chao Yu: use INMEM_INVALIDATE for better tracing]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs_sync_fs() -> write_checkpoint() calls stat_inc_cp_count(sbi->stat_info),
which needs stat_info allocation.
Otherwise, we can hit:
[254042.598623] ? count_shadow_nodes+0xa0/0xa0
[254042.598633] f2fs_sync_fs+0x65/0xd0 [f2fs]
[254042.598645] f2fs_balance_fs_bg+0xe4/0x1c0 [f2fs]
[254042.598657] f2fs_write_node_pages+0x34/0x1a0 [f2fs]
[254042.598664] ? pagevec_lookup_entries+0x1e/0x30
[254042.598673] do_writepages+0x1e/0x30
[254042.598682] __writeback_single_inode+0x45/0x330
[254042.598688] writeback_single_inode+0xd7/0x190
[254042.598694] write_inode_now+0x86/0xa0
[254042.598699] iput+0x122/0x200
[254042.598709] f2fs_fill_super+0xd4a/0x14d0 [f2fs]
[254042.598717] mount_bdev+0x184/0x1c0
[254042.598934] ? f2fs_commit_super+0x100/0x100 [f2fs]
[254042.599142] f2fs_mount+0x15/0x20 [f2fs]
[254042.599349] mount_fs+0x39/0x160
[254042.599554] ? __alloc_percpu+0x15/0x20
[254042.599759] vfs_kern_mount+0x67/0x110
[254042.599972] do_mount+0x1bb/0xc80
[254042.600175] ? memdup_user+0x42/0x60
[254042.600380] SyS_mount+0x83/0xd0
[254042.600583] entry_SYSCALL_64_fastpath+0x1e/0xad
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When the zone type is BLK_ZONE_TYPE_CONVENTIONAL, the blkstart is
calculated twice.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The parent directory's nlink will change, not the inode.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
After renaming an encrypted file, we have no way to get its
encrypted filename from its dentry.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The previous one was not a proper location to inject an error, since there
is no point to get errors. Instead, we can emulate EIO during truncation,
and the below logic should handle it correctly.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fixes: 22ad0b6ab4 ("f2fs: add bitmaps for empty or full NAT blocks")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|