Age | Commit message (Collapse) | Author |
|
This patch adds a tracepoint for f2fs_read_data_page to trace when page is
readed by user.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch adds a tracepoint for f2fs_write_{meta,node,data}_pages to trace when
pages are fsyncing/flushing.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch adds a tracepoint for f2fs_write_{meta,node,data}_page to trace when
page is writting out.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch adds a tracepoint for f2fs_write_end to trace write op of user.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch adds a tracepoint for f2fs_write_begin to trace write op of user.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
fix the following checkpatch warning:
WARNING: do {} while (0) macros should not be semicolon terminated
Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
If the inode page is clean during its inode eviction, it'd better drop the page
to reduce further memory pressure.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch reduces the lock granularity during write_begin.
When the system is under memory pressure, it would be better to reduce
the locking time for the data pages.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch removes grab_cache_page_write_begin for meta pages.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
We don't need to wait on page writeback for these cases.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch splits grab_cache_page_write_begin into grab_cache_page and
wait_on_page_writeback for node pages.
This patch intends to enhance the latency to get node pages by alleviating
unnecessary wait_on_page_writeback.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Previous we do not truncate inline data in inode page when setattr, so following
case could still read the inline data which has already truncated:
1.write inline data
2.ftruncate size to 0
3.ftruncate size to max inline data size
4.read from offset 0
This patch introduces truncate_inline_data() to fix this problem.
change log from v1:
o fix a bug and do not truncate first page data after truncate inline data.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
We have no so such readahead mechanism in ->iterate() path as the one in
->read() path, it cause low performance when we read large directory.
This patch add readahead in f2fs_readdir() for better performance.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
We should set the error number correctly when we fail in recover_dentry(), so
the recover flow could stop for the reason as error number shows instead of
continuing.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
If an amount of data are allocated though fallocate and user writes a couple of
data among the space, f2fs should return the data offset made by user when
SEEK_DATA is requested.
For example, (N: NEW_ADDR by fallocate, X: NEW_ADDR by user)
1) fallocate 0 ~ 10MB
f -> N N N N N N N N N N N N ... N
2) write 4KB at 5MB offset
f -> N N N N N X N N N N N N ... N
3) SEEK_DATA from 0 should return 5MB offset
So, this patch adds a routine to search the first dirty page to handle that.
Then, the SEEK_DATA flow skips NEW_ADDR offsets until any dirty page is found.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
When SEEK_HOLE is requeted, it should return i_size if the hole position is
found outside of i_size.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
In This patch we introduce f2fs_seek_block to support SEEK_{DATA,HOLE} of
lseek(2).
change log from v1:
o fix bug when lseek from middle of page and fix wrong calculation of
PGOFS_OF_NEXT_DNODE macro.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Introduce help function {create,destroy}_flush_cmd_control to clean up
the create/destory flush merge operation.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Split the flush_merge fields from sm_i, and use the new struct flush_cmd_control
to wrap it, so that we can igonre these fileds if flush_merge is disable, and
it alse can the structs more neat.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Introduce help macro ADDRS_PER_PAGE() to get the number of address pointers in
direct node or inode.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
If f2fs_write_data_page is called through the reclaim path, we should submit
the bio right away.
This patch resolves the following issue that Marc Dietrich reported.
"It took me a while to bisect a problem which causes my ARM (tegra2) netbook to
frequently stall for 5-10 seconds when I enable EXA acceleration (opentegra
experimental ddx)."
And this patch fixes that.
Reported-by: Marc Dietrich <marvin24@gmx.de>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch adds two error conditions early in the setxattr operations.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch passes the "flags" field to the low level setxattr functions
to use XATTR_REPLACE in the following patches.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch includes simple clean-ups to reduce unnecessary long variable names.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
We'd better handle inline data case independently in f2fs_bmap().
It can reduce our handling time in f2fs_bmap().
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
If so many dirty dentry blocks are cached, not reached to the flush condition,
we should fall into livelock in balance_dirty_pages.
So, let's consider the mem size for the condition.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
If the disk has some garbage blocks, F2FS is able to face with BUG_ON when
recovering direct node blocks.
This patch detects the error case and avoids that prior to reaching BUG_ON.
Alexey Khoroshilov addressed the potential security issues as follows.
"An ability to trigger a BUG_ON assert by mounting a crafted image is
usually considered as a local denial of service [1-3]. As far as I
understand, the reason is that some kernel data may become inconsistent
that can lead to further problems.
[1] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3353
[2] http://www.openwall.com/lists/oss-security/2011/06/24/4
[3] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-2928
etc."
Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch introduces available_nids for alloc_nids() and fixes max_nid for
build_free_nids() and scan_nat_pages().
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
inline get_max_meta_blks is only used in checkpoint.c
Use standard static inline format.
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch introduce raw_nat_from_node_info() to simplfy some codes, and also
use exist function node_info_from_raw_nat() to do the same job.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Add the *remount* handle of flush_merge option, so that the users
can enable flush_merge in the runtime, such as the underlying device
handles the cache_flush command relatively slowly.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Use set_mask_bits() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
FS_IMMUTABLE_FL, FS_APPEND_FL, etc. flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.
Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Make recover_inline_xattr() static, because this function is
used only in this file.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch removes list opeations in handling dirty dir inodes.
Previously, F2FS traverses whole the list of dirty dir inodes to check whether
there is an existing inode or not, resulting in heavy CPU overheads.
So this patch removes such the traverse operations by adding FI_DIRTY_DIR to
indicate the inode lies on the list or not.
Through this simple flag, we can remove redundant operations gracefully.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
If it occurs an error, we should call f2fs_unlock_op.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch replace some general codes with redirty_page_for_writepage, which
can be enabled after consideration on additional procedure like counting dirty
pages appropriately.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
The f2fs always scans the next chain of direct node blocks.
But some garbage blocks are able to be remained due to no discard support or
SSR triggers.
This occasionally wreaks recovering wrong inodes that were used or BUG_ONs
due to reallocating node ids as follows.
When mount this f2fs image:
http://linuxtesting.org/downloads/f2fs_fault_image.zip
BUG_ON is triggered in f2fs driver (messages below are generated on
kernel 3.13.2; for other kernels output is similar):
kernel BUG at fs/f2fs/node.c:215!
Call Trace:
[<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
[<ffffffff811446e7>] ? __lock_page+0x67/0x70
[<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
[<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
[<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
[<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
[<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
[<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
[<ffffffff811b8769>] mount_bdev+0x1c9/0x210
[<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
[<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
[<ffffffff811b9497>] mount_fs+0x47/0x1c0
[<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
[<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
[<ffffffff811d6763>] do_mount+0x493/0x910
[<ffffffff811615cb>] ? strndup_user+0x5b/0x80
[<ffffffff811d6c70>] SyS_mount+0x90/0xe0
[<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b
Found by Linux File System Verification project (linuxtesting.org).
Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Enable flush_merge only in f2fs is not read-only, so does the mount
option show.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Put the bio when the flush cmd issued, it also can fix the following
kmemleak:
unreferenced object 0xffff8800270c73c0 (size 200):
comm "f2fs_flush-7:0", pid 27161, jiffies 4312127988 (age 988.503s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 40 07 81 19 01 88 ff ff ........@.......
01 00 00 00 00 00 00 f0 11 14 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81559866>] kmemleak_alloc+0x72/0x96
[<ffffffff81156f7e>] slab_post_alloc_hook+0x28/0x2a
[<ffffffff811595b1>] kmem_cache_alloc+0xec/0x157
[<ffffffff8111924d>] mempool_alloc_slab+0x15/0x17
[<ffffffff81119513>] mempool_alloc+0x71/0x138
[<ffffffff81193548>] bio_alloc_bioset+0x93/0x18c
[<ffffffffa040f857>] issue_flush_thread+0x8d/0x145 [f2fs]
[<ffffffff8107ac16>] kthread+0xba/0xc2
[<ffffffff81571b2c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Directory readahead can throw loud scary but harmless warnings
when multiblock directories are in use a specific pattern of
discontiguous blocks are found in the directory. That is, if a hole
follows a discontiguous block, it will throw a warning like:
XFS (dm-1): xfs_da_do_buf: bno 637 dir: inode 34363923462
XFS (dm-1): [00] br_startoff 637 br_startblock 1917954575 br_blockcount 1 br_state 0
XFS (dm-1): [01] br_startoff 638 br_startblock -2 br_blockcount 1 br_state 0
And dump a stack trace.
This is because the readahead offset increment loop does a double
increment of the block index - it does an increment for the loop
iteration as well as increase the loop counter by the number of
blocks in the extent. As a result, the readahead offset does not get
incremented correctly for discontiguous blocks and hence can ask for
readahead of a directory block from an offset part way through a
directory block. If that directory block is followed by a hole, it
will trigger a mapping warning like the above.
The bad readahead will be ignored, though, because the main
directory block read loop uses the correct mapping offsets rather
than the readahead offset and so will ignore the bad readahead
altogether.
Fix the warning by ensuring that the readahead offset is correctly
incremented.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Reports of a shutdown hang when fsyncing a directory have surfaced,
such as this:
[ 3663.394472] Call Trace:
[ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
[ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
[ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
[ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
[ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
[ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
[ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
If we trigger a shutdown in xlog_cil_push() from xlog_write(), we
will never wake waiters on the current push sequence number, so
anything waiting in xlog_cil_force_lsn() for that push sequence
number to come up will not get woken and hence stall the shutdown.
Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in
the push abort handling, in the log shutdown code when waking all
waiters, and adding a shutdown check in the sequence completion wait
loops to ensure they abort when a wakeup due to a shutdown occurs.
Reported-by: Boris Ranto <branto@redhat.com>
Reported-by: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
truncate_setsize() removes pages from the page cache, and hence
requires page locks to be held. It is not valid to lock a page cache
page inside a transaction context as we can hold page locks when we
we reserve space for a transaction. If we do, then we expose an ABBA
deadlock between log space reservation and page locks.
That is, both the write path and writeback lock a page, then start a
transaction for block allocation, which means they can block waiting
for a log reservation with the page lock held. If we hold a log
reservation and then do something that locks a page (e.g.
truncate_setsize in xfs_setattr_size) then that page lock can block
on the page locked and waiting for a log reservation. If the
transaction that is waiting for the page lock is the only active
transaction in the system that can free log space via a commit,
then writeback will never make progress and so log space will never
free up.
This issue with xfs_setattr_size() was introduced back in 2010 by
commit fa9b227 ("xfs: new truncate sequence") which moved the page
cache truncate from outside the transaction context (what was
xfs_itruncate_data()) to inside the transaction context as a call to
truncate_setsize().
The reason truncate_setsize() was located where in this place was
that we can't shouldn't change the file size until after we are in
the transaction context and the operation will either succeed or
shut down the filesystem on failure. However, block_truncate_page()
already modifies the file contents before we enter the transaction
context, so we can't really fulfill this guarantee in any way. Hence
we may as well ensure that on success or failure, the in-memory
inode and data is truncated away and that the application cleans up
the mess appropriately.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
They do not need to be used outside fs/nfsd/nfs4state.c
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
There is almost nothing left it in, just merge it into the only file
that includes it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
There are no legitimate users outside of fs/nfsd, so move it there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
There are no legitimate users outside of fs/nfsd, so move it there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The only real user of this header is fs/nfsd/nfsfh.h, so merge the
two. Various lockѕ source files used it to indirectly get other
sunrpc or nfs headers, so fix those up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
It is large, it is used in more than one place, and it is not performance
critical. Let gcc figure out whether it should be inlined...
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
When building without CONFIG_SYSCTL, the compiler saw an unused
label. This moves the label into the #ifdef it is used under.
fs/lockd/svc.c: In function ‘init_nlm’:
fs/lockd/svc.c:626:1: warning: label ‘err_sysctl’ defined but not used [-Wunused-label]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|