summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-10-03xfs: introduce refcount btree definitionsDarrick J. Wong
Add new per-AG refcount btree definitions to the per-AG structures. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: define tracepoints for refcount btree activitiesDarrick J. Wong
Define all the tracepoints we need to inspect the refcount btree runtime operation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: return an error when an inline directory is too smallDarrick J. Wong
If the size of an inline directory is so small that it doesn't even cover the required header size, return an error to userspace instead of ASSERTing and returning 0 like everything's ok. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03vfs: add a FALLOC_FL_UNSHARE mode to fallocate to unshare a range of blocksDarrick J. Wong
Add a new fallocate mode flag that explicitly unshares blocks on filesystems that support such features. The new flag can only be used with an allocate-mode fallocate call. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-10-03ceph: use list_move instead of list_del/list_addWei Yongjun
Using list_move() instead of list_del() + list_add(). Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-03ceph: handle CEPH_SESSION_REJECT messageYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: avoid accessing / when mounting a subpathYan, Zheng
Accessing / causes failuire if the client has caps that restrict path Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: fix mandatory flock checkYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: remove warning when ceph_releasepage() is called on dirty pageNeilBrown
If O_DIRECT writes are racing with buffered writes, then the call to invalidate_inode_pages2_range() can call ceph_releasepage() on dirty pages. Most filesystems hold inode_lock() across O_DIRECT writes so they do not suffer this race, but cephfs deliberately drops the lock, and opens a window for the race. This race can be triggered with the generic/036 test from the xfstests test suite. It doesn't happen every time, but it does happen often. As the possibilty is expected, remove the warning, and instead include the PageDirty() status in the debug message. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: ignore error from invalidate_inode_pages2_range() in direct writeNeilBrown
This call can fail if there are dirty pages. The preceding call to filemap_write_and_wait_range() will normally remove dirty pages, but as inode_lock() is not held over calls to ceph_direct_read_write(), it could race with non-direct writes and pages could be dirtied immediately after filemap_write_and_wait_range() returns If there are dirty pages, they will be removed by the subsequent call to truncate_inode_pages_range(), so having them here is not a problem. If the 'ret' value is left holding an error, then in the async IO case (aio_req is not NULL) the loop that would normally call ceph_osdc_start_request() will see the error in 'ret' and abort all requests. This doesn't seem like correct behaviour. So use separate 'ret2' instead of overloading 'ret'. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: fix error handling of start_read()Yan, Zheng
If start_page() fails to add a page to page cache or fails to send OSD request. It should cal put_page() (instead of free_page()) for relevant pages. Besides, start_page() need to cancel fscache readpage if it fails to send OSD request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reported-by: Zhi Zhang <zhang.david2011@gmail.com>
2016-10-03fuse: limit xattr returned sizeMiklos Szeredi
Don't let userspace filesystem give bogus values for the size of xattr and xattr list. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three sets of overlapping changes. Nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03Merge branch 'xfs-4.9-log-recovery-fixes' into for-nextDave Chinner
2016-10-03Merge branch 'iomap-4.9-dax' into for-nextDave Chinner
2016-10-03Merge branch 'xfs-4.9-delalloc-rework' into for-nextDave Chinner
2016-10-03Merge branch 'xfs-4.9-reflink-prep' into for-nextDave Chinner
2016-10-03Merge branch 'iomap-4.9-misc-fixes-1' into for-nextDave Chinner
2016-10-03xfs: update atime before I/O in xfs_file_dio_aio_readChristoph Hellwig
After the call to __blkdev_direct_IO the final reference to the file might have been dropped by aio_complete already, and the call to file_accessed might cause a use after free. Instead update the access time before the I/O, similar to how we update the time stamps before writes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-and-tested-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-03ext2: fix possible integer truncation in ext2_iomap_beginChristoph Hellwig
For 32-bit architectures we need to cast first_block to u64 before shifting it left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-02UBIFS: improve function-level documentationJulia Lawall
Fix various inconsistencies in the documentation associated with various functions. In the case of fs/ubifs/lprops.c, the second parameter of ubifs_get_lp_stats was renamed from st to lst in commit 84abf972ccff ("UBIFS: add re-mount debugging checks") In the case of fs/ubifs/lpt_commit.c, the excess variables have never existed in the associated functions since the code was introduced into the kernel. The others appear to be straightforward typos. Issues detected using Coccinelle (http://coccinelle.lip6.fr/) Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02ubifs: fix host xattr_len when changing xattrPascal Eberhard
When an extended attribute is changed, xattr_len of host inode is recalculated. ui->data_len is updated before computation and result is wrong. This patch adds a temporary variable to fix computation. To reproduce the issue: ~# > a.txt ~# attr -s an-attr -V a-value a.txt ~# attr -s an-attr -V a-bit-bigger-value a.txt Now host inode xattr_len is wrong. Forcing dbg_check_filesystem() generates the following error: [ 130.620140] UBIFS (ubi0:2): background thread "ubifs_bgt0_2" started, PID 565 [ 131.470790] UBIFS error (ubi0:2 pid 564): check_inodes: inode 646 has xattr size 240, but calculated size is 256 [ 131.481697] UBIFS (ubi0:2): dump of the inode 646 sitting in LEB 29:114688 [ 131.488953] magic 0x6101831 [ 131.492876] crc 0x9fce9091 [ 131.496836] node_type 0 (inode node) [ 131.501193] group_type 1 (in node group) [ 131.505788] sqnum 9278 [ 131.509191] len 160 [ 131.512549] key (646, inode) [ 131.516688] creat_sqnum 9270 [ 131.520133] size 0 [ 131.523264] nlink 1 [ 131.526398] atime 1053025857.0 [ 131.530574] mtime 1053025857.0 [ 131.534714] ctime 1053025906.0 [ 131.538849] uid 0 [ 131.542009] gid 0 [ 131.545140] mode 33188 [ 131.548636] flags 0x1 [ 131.551977] xattr_cnt 1 [ 131.555108] xattr_size 240 [ 131.558420] xattr_names 12 [ 131.561670] compr_type 0x1 [ 131.564983] data len 0 [ 131.568125] UBIFS error (ubi0:2 pid 564): dbg_check_filesystem: file-system check failed with error -22 [ 131.578074] CPU: 0 PID: 564 Comm: mount Not tainted 4.4.12-g3639bea54a #24 [ 131.585352] Hardware name: Generic AM33XX (Flattened Device Tree) [ 131.591918] [<c00151c0>] (unwind_backtrace) from [<c0012acc>] (show_stack+0x10/0x14) [ 131.600177] [<c0012acc>] (show_stack) from [<c01c950c>] (dbg_check_filesystem+0x464/0x4d0) [ 131.608934] [<c01c950c>] (dbg_check_filesystem) from [<c019f36c>] (ubifs_mount+0x14f8/0x2130) [ 131.617991] [<c019f36c>] (ubifs_mount) from [<c00d7088>] (mount_fs+0x14/0x98) [ 131.625572] [<c00d7088>] (mount_fs) from [<c00ed674>] (vfs_kern_mount+0x4c/0xd4) [ 131.633435] [<c00ed674>] (vfs_kern_mount) from [<c00efb5c>] (do_mount+0x988/0xb50) [ 131.641471] [<c00efb5c>] (do_mount) from [<c00f004c>] (SyS_mount+0x74/0xa0) [ 131.648837] [<c00f004c>] (SyS_mount) from [<c000fe20>] (ret_fast_syscall+0x0/0x3c) [ 131.665315] UBIFS (ubi0:2): background thread "ubifs_bgt0_2" stops Signed-off-by: Pascal Eberhard <pascal.eberhard@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02ubifs: Use move variable in ubifs_rename()Richard Weinberger
...to make the code more consistent since we use move already in other places. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02ubifs: Implement RENAME_EXCHANGERichard Weinberger
Adds RENAME_EXCHANGE to UBIFS, the operation itself is completely disjunct from a regular rename() that's why we dispatch very early in ubifs_reaname(). RENAME_EXCHANGE used by the renameat2() system call allows the caller to exchange two paths atomically. Both paths have to exist and have to be on the same filesystem. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02ubifs: Implement RENAME_WHITEOUTRichard Weinberger
Adds RENAME_WHITEOUT support to UBIFS, we implement it in the same way as ext4 and xfs do. For an overview of other ways to implement it please refere to commit 7dcf5c3e4527 ("xfs: add RENAME_WHITEOUT support"). Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-02ubifs: Implement O_TMPFILERichard Weinberger
This patchs adds O_TMPFILE support to UBIFS. A temp file is a reference to an unlinked inode, a user holding the reference can use it. As soon it is being closed all data vanishes. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-01fuse: remove duplicate cs->offset assignmentMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: don't use fuse_ioctl_copy_user() helperMiklos Szeredi
The two invocations share little code. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse_ioctl_copy_user(): don't open-code copy_page_{to,from}_iter()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: Use generic xattr opsSeth Forshee
In preparation for posix acl support, rework fuse to use xattr handlers and the generic setxattr/getxattr/listxattr callbacks. Split the xattr code out into it's own file, and promote symbols to module-global scope as needed. Functionally these changes have no impact, as fuse still uses a single handler for all xattrs which uses the old callbacks. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: get rid of fc->flagsMiklos Szeredi
Only two flags: "default_permissions" and "allow_other". All other flags are handled via bitfields. So convert these two as well. They don't change during the lifetime of the filesystem, so this is quite safe. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: listxattr: verify xattr listMiklos Szeredi
Make sure userspace filesystem is returning a well formed list of xattr names (zero or more nonzero length, null terminated strings). [Michael Theall: only verify in the nonzero size case] Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-10-01fuse: use timespec64Miklos Szeredi
And check for valid nsec value before passing into timespec64_to_jiffies(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: don't use ->d_timeMiklos Szeredi
Store in memory pointed to by ->d_fsdata. Use ->d_init() to allocate the storage. Need to use RCU freeing because the data is used in RCU lookup mode. We could cast ->d_fsdata directly on 64bit archs, but I don't think this is worth the extra complexity. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: Add posix ACL supportSeth Forshee
Add a new INIT flag, FUSE_POSIX_ACL, for negotiating ACL support with userspace. When it is set in the INIT response, ACL support will be enabled. ACL support also implies "default_permissions". When ACL support is enabled, the kernel will cache and have responsibility for enforcing ACLs. ACL xattrs will be passed to userspace, which is responsible for updating the ACLs in the filesystem, keeping the file mode in sync, and inheritance of default ACLs when new filesystem nodes are created. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: handle killpriv in userspace fsMiklos Szeredi
Only userspace filesystem can do the killing of suid/sgid without races. So introduce an INIT flag and negotiate support for this. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: fix killing s[ug]id in setattrMiklos Szeredi
Fuse allowed VFS to set mode in setattr in order to clear suid/sgid on chown and truncate, and (since writeback_cache) write. The problem with this is that it'll potentially restore a stale mode. The poper fix would be to let the filesystems do the suid/sgid clearing on the relevant operations. Possibly some are already doing it but there's no way we can detect this. So fix this by refreshing and recalculating the mode. Do this only if ATTR_KILL_S[UG]ID is set to not destroy performance for writes. This is still racy but the size of the window is reduced. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-10-01fuse: invalidate dir dentry after chmodMiklos Szeredi
Without "default_permissions" the userspace filesystem's lookup operation needs to perform the check for search permission on the directory. If directory does not allow search for everyone (this is quite rare) then userspace filesystem has to set entry timeout to zero to make sure permissions are always performed. Changing the mode bits of the directory should also invalidate the (previously cached) dentry to make sure the next lookup will have a chance of updating the timeout, if needed. Reported-by: Jean-Pierre André <jean-pierre.andre@wanadoo.fr> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-09-30f2fs: introduce update_ckpt_flags to clean upJaegeuk Kim
This patch add update_ckpt_flags() to clean up the flow. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: don't submit irrelevant pageChao Yu
While we call ->writepages, there are two cases: a. we didn't writeout any dirty pages, since they are writebacked by other thread concurrently. b. we writeout dirty pages, and have already submitted bio to block layer. In these cases, we don't need to do additional bio flushing unnecessarily, it may split bio in cache into smaller one. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: fix to commit bio cache after flushing node pagesChao Yu
In sync_node_pages, we won't check and commit last merged pages in private bio cache of f2fs, as these pages were taged as writeback, someone who is waiting for writebacking of the page will be blocked until the cache was committed by someone else. We need to commit node type bio cache to avoid potential deadlock or long delay of waiting writeback. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: introduce get_checkpoint_version for cleanupTiezhu Yang
There exists almost same codes when get the value of pre_version and cur_version in function validate_checkpoint, this patch adds get_checkpoint_version to clean up redundant codes. Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: remove dead variableSheng Yong
Signed-off-by: Sheng Yong <shengyong1@huawei.com> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: remove redundant io plugChao Yu
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: support checkpoint error injectionChao Yu
This patch adds to support checkpoint error injection in f2fs for testing fatal error tolerance, it will be useful that it can simulate abnormal power off by f2fs itself instead of calling godown ioctl by running apps. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: fix to recover old fault injection config in ->remount_fsChao Yu
In ->remount_fs, we didn't recover original fault injection config if we encounter error, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: do fault injection initialization in default_optionsChao Yu
Do fault injection initialization in default_options to keep consistent with other default option configurating. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: remove redundant value definitionYunlei He
This patch remove redundant value definition in build_sit_entries Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: support configuring fault injection per superblockChao Yu
Previously, we only support global fault injection configuration, so that when we configure type/rate of fault injection through sysfs, mount option, it will influence all f2fs partition which is being used. It is not make sence, since it will be not convenient if developer want to test separated partitions with different fault injection rate/type simultaneously, also it's not possible to enable fault injection in one partition and disable fault injection in other one. >From now on, we move global configuration of fault injection in module into per-superblock, hence injection testing can be more flexible. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30f2fs: adjust display format of segment bitChao Yu
Just adjust segment bit info printed in procfs. Before: 1008 5|0 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1009 3|183|0 0 61 20 20 0 0 21 80 c0 2 e4 e 54 0 21 21 17 a 44 d0 28 e4 50 40 30 8 0 2d 32 0 5 b0 80 1 43 2 8e f8 7b 2 25 93 bf e0 73 8e 9a 19 44 60 ff e4 cc e6 8e bf f9 ff 5 3d 31 3d 13 1010 3|1 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 After: 1008 5|0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1009 4|434| ff 7d ff bf d9 3f ff e7 ff bf d7 bf ff bb be ff fb df f7 fb fa bf fb fe bb df dd ff fe ef ff fe ef e2 27 bf ab bf fb df fd bd bf fb db fc ff ff 3f ff ff bf ff 5f db 3f fb fb bf fb bf 4f ff ef 1010 4|422| ff bb fe ff ef d7 ee ff ff fc bf ef 7d eb ec fd fb 3f 97 7f ef ff af ff db ff ff 69 bf ff f6 e7 ff fb f7 7b fb df be ff ff ef f3 fe ff ff df fe f7 fa ff b7 77 be fe fb a9 7f 87 a2 ac c7 ff 75 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>