summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-08-25NFSv4.1/flexfile: ff_layout_remove_mirror can be statickbuild test robot
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4.2/pnfs: Make the layoutstats timer configurableTrond Myklebust
Allow advanced users to set the layoutstats timer in order to lengthen or shorten the period between layoutstat transmissions to the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4.1/flexfile: Ensure uniqueness of mirrors across layout segmentsTrond Myklebust
Keep the full list of mirrors in the struct nfs4_ff_layout_mirror so that they can be shared among the layout segments that use them. Also ensure that we send out only one copy of the layoutstats per mirror. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4.1/flexfiles: Remove mirror backpointer to lseg.Trond Myklebust
When we start sharing mirrors between several lsegs, we won't be able to keep it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4.1/flexfiles: Add refcounting to struct nfs4_ff_layout_mirrorTrond Myklebust
We do want to share mirrors between layout segments, so add a refcount to enable that. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFS41/flexfiles: zero out DS write wccPeng Tao
We do not want to update inode attributes with DS values. Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFS41: remove NFS_LAYOUT_ROC flagPeng Tao
If we return delegation before closing, we fail to do roc check during close because NFS_LAYOUT_ROC is cleared by delegreturn and it causes layouts to be still hanging around after delegreturn + close, which is a voilation against protocol. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4: Add a tracepoint for CB_LAYOUTRECALLTrond Myklebust
Only support for single file layoutrecall for now. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4: Add a tracepoint for CB_GETATTRTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4.1/pnfs: Add a tracepoint for return-on-close eventsTrond Myklebust
Allow tracing of return-on-close. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4: Force a post-op attribute update when holding a delegationTrond Myklebust
If the ctime or mtime or change attribute have changed because of an operation we initiated, we should make sure that we force an attribute update. However we do not want to mark the page cache for revalidation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.0+
2015-08-25Merge branch 'xfs-misc-fixes-for-4.3-3' into for-nextDave Chinner
2015-08-25xfs: fix non-debug build warningsDave Chinner
There seem to be a couple of new set-but-unused build warnings that gcc 4.9.3 is now warning about. These are not regressions, just the compiler being more picky. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-25xfs: collapse allocsize and biosize mount option handlingEric Sandeen
The allocsize and biosize mount options are handled identically, other than allocsize accepting suffixes. suffix_kstrtoint handles bare numbers just fine too, so these can be collapsed. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-25xfs: Fix file type directory corruption for btree directoriesJan Kara
Users have occasionally reported that file type for some directory entries is wrong. This mostly happened after updating libraries some libraries. After some debugging the problem was traced down to xfs_dir2_node_replace(). The function uses args->filetype as a file type to store in the replaced directory entry however it also calls xfs_da3_node_lookup_int() which will store file type of the current directory entry in args->filetype. Thus we fail to change file type of a directory entry to a proper type. Fix the problem by storing new file type in a local variable before calling xfs_da3_node_lookup_int(). cc: <stable@vger.kernel.org> # 3.16 - 4.x Reported-by: Giacomo Comes <comes@naic.edu> Signed-off-by: Jan Kara <jack@suse.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-25xfs: lockdep annotations throw warnings on non-debug buildsDave Chinner
SO, now if we enable lockdep without enabling CONFIG_XFS_DEBUG, the lockdep annotations throw a warning because the assert that uses the lockdep define is not built in: fs/xfs/xfs_inode.c:367:1: warning: 'xfs_lockdep_subclass_ok' defined but not used [-Wunused-function] xfs_lockdep_subclass_ok( So now we need to create an ifdef mess to sort this all out, because we need to handle all the combinations of CONFIG_XFS_DEBUG=[y|n], CONFIG_XFS_WARNING=[y|n] and CONFIG_LOCKDEP=[y|n] appropriately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-25xfs: Fix uninitialized return value in xfs_alloc_fix_freelist()Jan Kara
xfs_alloc_fix_freelist() can sometimes jump to out_agbp_relse without ever setting value of 'error' variable which is then returned. This can happen e.g. when pag->pagf_init is set but AG is for metadata and we want to allocate user data. Fix the problem by initializing 'error' to 0, which is the desired return value when we decide to skip this group. CC: xfs@oss.sgi.com Coverity-id: 1309714 Signed-off-by: Jan Kara <jack@suse.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-24f2fs: fix to release inode correctlyChao Yu
In following call stack, if unfortunately we lose all chances to truncate inode page in remove_inode_page, eventually we will add the nid allocated previously into free nid cache, this nid is with NID_NEW status and with NEW_ADDR in its blkaddr pointer: - f2fs_create - f2fs_add_link - __f2fs_add_link - init_inode_metadata - new_inode_page - new_node_page - set_node_addr(, NEW_ADDR) - f2fs_init_acl failed - remove_inode_page failed - handle_failed_inode - remove_inode_page failed - iput - f2fs_evict_inode - remove_inode_page failed - alloc_nid_failed cache a nid with valid blkaddr: NEW_ADDR This may not only cause resource leak of previous inode, but also may cause incorrect use of the previous blkaddr which is located in NO.nid node entry when this nid is reused by others. This patch tries to add this inode to orphan list if we fail to truncate inode, so that we can obtain a second chance to release it in orphan recovery flow. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24f2fs: handle f2fs_truncate error correctlyChao Yu
This patch fixes to return error number of f2fs_truncate, so that we can handle the error correctly in callers. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24f2fs: avoid unneeded initializing when converting inline dentryChao Yu
When converting inline dentry, we will zero out target dentry page before duplicating data of inline dentry into target page, it become overhead since inline dentry size is not small. So this patch tries to remove unneeded initializing in the space of target dentry page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24f2fs: atomically set inode->i_flagsZhang Zhen
According to commit 5f16f3225b06 ("ext4: atomically set inode->i_flags in ext4_set_inode_flags()"). Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24f2fs: fix wrong pointer access during try_to_free_nidsJaegeuk Kim
If we release the lock in list_for_each_entry_safe, we can lose the tmp pointer by alloc_nid. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24f2fs: use __GFP_NOFAIL to avoid infinite loopJaegeuk Kim
__GFP_NOFAIL can avoid retrying the whole path of kmem_cache_alloc and bio_alloc. And, it also fixes the use cases of GFP_ATOMIC correctly. Suggested-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-239p: fix return code of read() when count is 0Vincent Bernat
When reading 0 bytes from an empty file on a 9P filesystem, the return code of read() was not 0 as expected due to an unitialized err variable. Tested with this simple program: #include <assert.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main(int argc, const char **argv) { assert(argc == 2); char buffer[256]; int fd = open(argv[1], O_RDONLY|O_NOCTTY); assert(fd >= 0); assert(read(fd, buffer, 0) == 0); return 0; } Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2015-08-239p: remove unused option Opt_transFabian Frederick
Commit 8a0dc95fd976 ("9p: transport API reorganization") removed Opt_trans in tokens not in enum. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2015-08-21f2fs: lookup neighbor extent nodes for merging laterChao Yu
In __lookup_extent_tree_ret we will not try to find neighbor nodes if we find the target node, in this condition, we will lost the chance to merge the new mapping with exist extent node later. So our extent cache of inode will be fragmented after overwrite exist file, we can see the number of extent node increases intensively in following test case: dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024 Extent Cache: - Hit Count: L1-1:0 L1-2:0 L2:0 - Hit Ratio: 0% (0 / 3072) - Inner Struct Count: tree: 1, node: 1 dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024 conv=notrunc Extent Cache: - Hit Count: L1-1:2048 L1-2:0 L2:0 - Hit Ratio: 33% (2048 / 6144) - Inner Struct Count: tree: 1, node: 961 This patch fixes to lookup neighbors of target node for further merging. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-21f2fs: split __insert_extent_tree_ret for readabilityChao Yu
This patch splits __insert_extent_tree_ret into __try_merge_extent_node & __insert_extent_tree for code readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-21f2fs: kill dead code in __insert_extent_treeChao Yu
After commit 0f825ee6e873 ("f2fs: add new interfaces for extent tree"), f2fs_init_extent_tree becomes the only caller of __insert_extent_tree, and in f2fs_init_extent_tree, we will only insert extent node in an empty tree, so __try_{back,front}_merge in __insert_extent_tree will never be called. This patch removes these dead codes, besides, rename __insert_extent_tree to __init_extent_tree for readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-21f2fs: adjust showing of extent cache statChao Yu
This patch alters to replace total hit stat with rbtree hit stat, and then adjust showing of extent cache stat: Hit Count: L1-1: for largest node hit count; L1-2: for last cached node hit count; L2: for extent node hit after lookuping in rbtree. Hit Ratio: ratio (hit count / total lookup count) Inner Struct Count: tree count, node count. Before: Extent Hit Ratio: 0 / 2 Extent Tree Count: 3 Extent Node Count: 2 Patched: Exten Cacache: - Hit Count: L1-1:4871 L1-2:2074 L2:208 - Hit Ratio: 1% (7153 / 550751) - Inner Struct Count: tree: 26560, node: 11824 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-21f2fs: add largest/cached stat in extent cacheChao Yu
This patch adds to stat the hit count of largest/cached node for showing in debugfs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-21f2fs: fix incorrect mapping for bmapChao Yu
The test step is like below: 1. touch file 2. truncate -s $((1024*1024)) file 3. fallocate -o 0 -l $((1024*1024)) file 4. fibmap.f2fs file Our result of fibmap.f2fs showed below is not correct: file_pos start_blk end_blk blks 0 -937166132 -937166132 1 4096 -937166132 -937166132 1 8192 -937166132 -937166132 1 12288 -937166132 -937166132 1 16384 -937166132 -937166132 1 20480 -937166132 -937166132 1 ... 1040384 -937166132 -937166132 1 1044480 -937166132 -937166132 1 This is because f2fs_map_blocks will return with no error when meeting a hole or preallocated block, the caller __get_data_block will map the uninitialized variable value to bh->b_blocknr. Unfortunately generic_block_bmap will neither check the return value of get_data() nor check mapping info of buffer_head, result in returning the random block address. After fixing the issue, our result shows correctly: file_pos start_blk end_blk blks 0 0 0 256 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-21f2fs: fix to update cached_en of extent tree properlyFan Li
In f2fs_lookup_extent_tree, et->cached_en was read and updated with only read lock held, it could cause __lookup_extent_tree within return entirely wrong extent_node, if other thread update et->cached_en just before __lookup_extent_tree return. However, there are two things about this patch that need to be noticed: 1. It does no good to arrange the order of concurrent read/write, the result would still be random in such case. 2. It's built on this assumption: the mix up of reads and writes on a single pointer would not make the pointer partially wrong at any time. Please let me know if I'm wrong, thx. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-21f2fs: fix typoJunesung Lee
Fix typo. Signed-off-by: Junesung Lee <junesoung412@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-21btrfs: fix compile when block cgroups are not enabledChris Mason
bio->bi_css and bio->bi_ioc don't exist when block cgroups are not on. This adds an ifdef around them. It's not perfect, but our use of bi_ioc is being removed in the 4.3 merge window. The bi_css usage really should go into bio_clone, but I want to make sure that doesn't introduce problems for other bio_clone use cases. Signed-off-by: Chris Mason <clm@fb.com>
2015-08-21vfs: Test for and handle paths that are unreachable from their mnt_rootEric W. Biederman
In rare cases a directory can be renamed out from under a bind mount. In those cases without special handling it becomes possible to walk up the directory tree to the root dentry of the filesystem and down from the root dentry to every other file or directory on the filesystem. Like division by zero .. from an unconnected path can not be given a useful semantic as there is no predicting at which path component the code will realize it is unconnected. We certainly can not match the current behavior as the current behavior is a security hole. Therefore when encounting .. when following an unconnected path return -ENOENT. - Add a function path_connected to verify path->dentry is reachable from path->mnt.mnt_root. AKA to validate that rename did not do something nasty to the bind mount. To avoid races path_connected must be called after following a path component to it's next path component. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-21dcache: Reduce the scope of i_lock in d_splice_aliasEric W. Biederman
i_lock is only needed until __d_find_any_alias calls dget on the alias dentry. After that the reference to new ensures that dentry_kill and d_delete will not remove the inode from the dentry, and remove the dentry from the inode->d_entry list. The inode i_lock came to be held over the the __d_move calls in d_splice_alias through a series of introduction of locks with increasing smaller scope. First it was the dcache_lock, then it was the dcache_inode_lock, and finally inode->i_lock. Furthermore inode->i_lock is not held over any other calls to d_move or __d_move so it can not provide any meaningful rename protection. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-21dcache: Handle escaped paths in prepend_pathEric W. Biederman
A rename can result in a dentry that by walking up d_parent will never reach it's mnt_root. For lack of a better term I call this an escaped path. prepend_path is called by four different functions __d_path, d_absolute_path, d_path, and getcwd. __d_path only wants to see paths are connected to the root it passes in. So __d_path needs prepend_path to return an error. d_absolute_path similarly wants to see paths that are connected to some root. Escaped paths are not connected to any mnt_root so d_absolute_path needs prepend_path to return an error greater than 1. So escaped paths will be treated like paths on lazily unmounted mounts. getcwd needs to prepend "(unreachable)" so getcwd also needs prepend_path to return an error. d_path is the interesting hold out. d_path just wants to print something, and does not care about the weird cases. Which raises the question what should be printed? Given that <escaped_path>/<anything> should result in -ENOENT I believe it is desirable for escaped paths to be printed as empty paths. As there are not really any meaninful path components when considered from the perspective of a mount tree. So tweak prepend_path to return an empty path with an new error code of 3 when it encounters an escaped path. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-21Merge branch 'superblock-scaling' of ↵Al Viro
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-next Conflicts: include/linux/fs.h
2015-08-20NFSv4.1/pnfs Ensure flexfiles reports all connection related errorsTrond Myklebust
Make sure that we also handle RPC level connection and protocol negotiation errors. Reported-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-20NFSv4.1/pnfs: Ensure the flexfiles layoutstats timers are consistentTrond Myklebust
We want to ensure that the stopwatches for the busy timer and the aggregate timer are consistent. This means that they need to use the same start/stop times. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-20NFS41: fix list splice typePeng Tao
We want to move commiting pages to pages list instead. Otherwise it causes pnfs small writes crash like: [34560.037692] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068 [34560.038557] IP: [<ffffffffa05423d6>] nfs_init_commit+0x26/0x130 [nfs] [34560.039400] PGD 69f5a067 PUD 69f59067 PMD 0 [34560.040207] Oops: 0000 [#1] SMP [34560.041014] Modules linked in: nfsv3(OE) nfs_layout_flexfiles(OE) nfsv4(OE) nfs(OE) fscache(E) rpcsec_gss_krb5(E) xt_addrtype(E) xt_conntrack(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) iptable_filter(E) ip_tables(E) x_tables(E) nf_nat(E) nf_conntrack(E) bridge(E) stp(E) llc(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) ppdev(E) vmw_balloon(E) coretemp(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) cryptd(E) psmouse(E) serio_raw(E) vmw_vmci(E) i2c_piix4(E) shpchp(E) parport_pc(E) parport(E) mac_hid(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) xfs(E) libcrc32c(E) hid_generic(E) usbhid(E) hid(E) e1000(E) mptspi(E) [34560.045106] mptscsih(E) mptbase(E) vmwgfx(E) drm_kms_helper(E) ttm(E) drm(E) autofs4(E) [last unloaded: fscache] [34560.045897] CPU: 0 PID: 130543 Comm: bash Tainted: G OE 4.2.0-rc5-dp-00057-gf993a93 #11 [34560.046699] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 [34560.047525] task: ffff880031b0a980 ti: ffff880045fec000 task.ti: ffff880045fec000 [34560.048264] RIP: 0010:[<ffffffffa05423d6>] [<ffffffffa05423d6>] nfs_init_commit+0x26/0x130 [nfs] [34560.049000] RSP: 0018:ffff880045fefc18 EFLAGS: 00010246 [34560.049717] RAX: 0000000000000000 RBX: ffff8800208fbc80 RCX: ffff880045fefd50 [34560.050396] RDX: ffff880031c19ec0 RSI: ffff880045fefc88 RDI: ffff8800208fbc80 [34560.051041] RBP: ffff880045fefc28 R08: ffff8800208fbe68 R09: ffff880045fefc88 [34560.051666] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880045fefc78 [34560.052247] R13: ffff880045fefc88 R14: ffff880045fefa90 R15: ffff880045fefd50 [34560.052825] FS: 00007fa02d58c740(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000 [34560.053410] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [34560.053992] CR2: 0000000000000068 CR3: 000000003b37a000 CR4: 00000000001406f0 [34560.054615] Stack: [34560.055200] ffff8800208fbc80 ffff8800208fbc80 ffff880045fefcc8 ffffffffa05c1a5b [34560.055800] ffff880045fefcc8 ffff880045fefd50 0000000045fefcb8 ffff880045fefd40 [34560.056418] ffff8800420608e0 ffffffffa04f3910 0000000100000001 ffff880045fefd50 [34560.057013] Call Trace: [34560.057672] [<ffffffffa05c1a5b>] pnfs_generic_commit_pagelist+0x1cb/0x300 [nfsv4] [34560.058277] [<ffffffffa04f3910>] ? ff_layout_commit_pagelist+0x20/0x20 [nfs_layout_flexfiles] [34560.058907] [<ffffffffa04f3905>] ff_layout_commit_pagelist+0x15/0x20 [nfs_layout_flexfiles] [34560.059557] [<ffffffffa0543fc1>] nfs_generic_commit_list+0xb1/0xf0 [nfs] [34560.060214] [<ffffffffa0543e47>] ? nfs_scan_commit+0x37/0xa0 [nfs] [34560.060825] [<ffffffffa0544081>] nfs_commit_inode+0x81/0x150 [nfs] [34560.061432] [<ffffffffa05443ae>] nfs_wb_all+0x1ae/0x400 [nfs] [34560.062035] [<ffffffffa05380ad>] nfs_getattr+0x33d/0x510 [nfs] [34560.062630] [<ffffffff8122499c>] vfs_getattr_nosec+0x2c/0x40 [34560.063223] [<ffffffff81224a66>] vfs_getattr+0x26/0x30 [34560.063818] [<ffffffff81224b35>] vfs_fstatat+0x65/0xa0 [34560.064413] [<ffffffff81224f3f>] SYSC_newstat+0x1f/0x40 [34560.065016] [<ffffffff8102b176>] ? do_audit_syscall_entry+0x66/0x70 [34560.065626] [<ffffffff8102c773>] ? syscall_trace_enter_phase1+0x113/0x170 [34560.066245] [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19 [34560.066868] [<ffffffff812251ae>] SyS_newstat+0xe/0x10 [34560.067533] [<ffffffff817a5df2>] entry_SYSCALL_64_fastpath+0x16/0x7a [34560.068173] Code: 0f 1f 44 00 00 0f 1f 44 00 00 55 4c 8d 87 e8 01 00 00 48 89 e5 53 48 89 fb 48 83 ec 08 4c 8b 0e 49 8b 41 18 4c 39 ce 48 8b 40 40 <4c> 8b 50 68 74 24 48 8b 87 e8 01 00 00 48 8b 7e 08 4d 89 41 08 [34560.069609] RIP [<ffffffffa05423d6>] nfs_init_commit+0x26/0x130 [nfs] [34560.070295] RSP <ffff880045fefc18> [34560.071008] CR2: 0000000000000068 [34560.073207] ---[ end trace f85f873260977406 ]--- [fixes 27571297a7e(pNFS: Tighten up locking around DS commit buckets)] Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-20pmem, dax: have direct_access use __pmem annotationRoss Zwisler
Update the annotation for the kaddr pointer returned by direct_access() so that it is a __pmem pointer. This is consistent with the PMEM driver and with how this direct_access() pointer is used in the DAX code. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-20dax: update I/O path to do proper PMEM flushingRoss Zwisler
Update the DAX I/O path so that all operations that store data (I/O writes, zeroing blocks, punching holes, etc.) properly synchronize the stores to media using the PMEM API. This ensures that the data DAX is writing is durable on media before the operation completes. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-20f2fs: check the node block address of newly allocated nidJaegeuk Kim
This patch adds a routine which checks the block address of newly allocated nid. If an nid has already allocated by other thread due to subtle data races, it will result in filesystem corruption. So, it needs to check whether its block address was already allocated or not in prior to nid allocation as the last chance. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-20f2fs: go out for insert_inode_locked failureJaegeuk Kim
We should not call unlock_new_inode when insert_inode_locked failed. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-20f2fs: retry gc if one section is not successfully reclaimedJaegeuk Kim
If FG_GC failed to reclaim one section, let's retry with another section from the start, since we can get anoterh good candidate. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-20f2fs: fix to cover lock_op for update_inode_pageJaegeuk Kim
Previously, update_inode_page is not called under f2fs_lock_op. Instead we should call with f2fs_write_inode. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-20f2fs: reuse nids more aggressivelyJaegeuk Kim
If we can reuse nids as many as possible, we can mitigate producing obsolete node pages in the page cache. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-20f2fs: avoid garbage collecting already moved node blocksJaegeuk Kim
If node blocks were already moved, we don't need to move them again. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-20f2fs: handle failed bio allocationJaegeuk Kim
As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the same time. So, we can't guarantee that bio is allocated all the time. " * When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be * able to allocate a bio. This is due to the mempool guarantees. To make this * work, callers must never allocate more than 1 bio at a time from this pool. * Callers that need to allocate more than 1 bio must always submit the * previously allocated bio for IO before attempting to allocate a new one. * Failure to do so can cause deadlocks under memory pressure. " Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>