summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-12-09pNFS/flexfiles: Remove a redundant parameter in ff_layout_encode_ioerr()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-09vfs: refactor clone/dedupe_file_range common functionsDarrick J. Wong
Hoist both the XFS reflink inode state and preparation code and the XFS file blocks compare functions into the VFS so that ocfs2 can take advantage of it for reflink and dedupe. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-12-09fs: try to clone files first in vfs_copy_file_rangeChristoph Hellwig
A clone is a perfectly fine implementation of a file copy, so most file systems just implement the copy that way. Instead of duplicating this logic move it to the VFS. Currently btrfs and XFS implement copies the same way as clones and there is no behavior change for them, cifs only implements clones and grow support for copy_file_range with this patch. NFS implements both, so this will allow copy_file_range to work on servers that only implement CLONE and be lot more efficient on servers that implements CLONE and COPY. Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-12-09Merge tag 'ceph-for-4.9-rc9' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fix from Ilya Dryomov: "A fix for an issue with ->d_revalidate() in ceph, causing frequent kernel crashes. Marked for stable - it goes back to 4.6, but started popping up only in 4.8" * tag 'ceph-for-4.9-rc9' of git://github.com/ceph/ceph-client: ceph: don't set req->r_locked_dir in ceph_d_revalidate
2016-12-09vfs: make generic_readlink() staticMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-09vfs: remove ".readlink = generic_readlink" assignmentsMiklos Szeredi
If .readlink == NULL implies generic_readlink(). Generated by: to_del="\.readlink.*=.*generic_readlink" for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-09vfs: default to generic_readlink()Miklos Szeredi
If i_op->readlink is NULL, but i_op->get_link is set then vfs_readlink() defaults to calling generic_readlink(). The IOP_DEFAULT_READLINK flag indicates that the above conditions are met and the default action can be taken. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-09vfs: replace calling i_op->readlink with vfs_readlink()Miklos Szeredi
Also check d_is_symlink() in callers instead of inode->i_op->readlink because following patches will allow NULL ->readlink for symlinks. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-09proc/self: use generic_readlinkMiklos Szeredi
The /proc/self and /proc/self-thread symlinks have separate but identical functionality for reading and following. This cleanup utilizes generic_readlink to remove the duplication. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-09ecryptfs: use vfs_get_link()Miklos Szeredi
Here again we are copying form one buffer to another, while jumping through hoops to make kernel memory look like userspace memory. For no good reason, since vfs_get_link() provides exactly what is needed. As a bonus, now the security hook for readlink is also called on the underlying inode. Note: this can be called from link-following context. But this is okay: - not in RCU mode - commit e54ad7f1ee26 ("proc: prevent stacking filesystems on top") - ecryptfs is *reading* the underlying symlink not following it, so the right security hook is being called Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Tyler Hicks <tyhicks@canonical.com>
2016-12-09Btrfs: don't WARN() in btrfs_transaction_abort() for IO errorsChris Mason
btrfs_transaction_abort() has a WARN() to help us nail down whatever problem lead to the abort. But most of the time, we're aborting for EIO, and the warning just adds noise. Signed-off-by: Chris Mason <clm@fb.com>
2016-12-09bad_inode: add missing i_op initializersMiklos Szeredi
New inode operations were forgotten to be added to bad_inode. Most of the time the op is checked for NULL before being called but marking the inode bad and the check can race (very unlikely). However in case of ->get_link() only DCACHE_SYMLINK_TYPE is checked before calling the op, so there's no race and will definitely oops when trying to follow links on such a beast. Also remove comments about extinct ops. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-12-09Merge branch 'xfs-4.10-misc-fixes-4' into for-nextDave Chinner
2016-12-09xfs: nuke unused tracepoint definitionsEric Sandeen
This is all unused code, so remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-09xfs: use GPF_NOFS when allocating btree cursorsDarrick J. Wong
Use NOFS for allocating btree cursors, since they can be called under the ilock. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-09xfs: use xfs_vn_setattr_size to check on new sizeEryu Guan
Commit 6552321831dc ("xfs: remove i_iolock and use i_rwsem in the VFS inode instead") introduced a regression that truncate(2) doesn't check on new size, so it succeeds even if the new size exceeds the current resource limit. Because xfs_setattr_size() was used instead of xfs_vn_setattr_size(), and the latter calls xfs_vn_change_ok() first to do sanity check on permission and new size. This is found by truncate03 test from ltp, and the following is a simplified reproducer: #!/bin/bash dev=/dev/sda5 mnt=/mnt/xfs mkfs -t xfs -f $dev mount $dev $mnt # set max file size to 16k ulimit -f 16 truncate -s $((16 * 1024 + 1)) /mnt/xfs/testfile [ $? -eq 0 ] && echo "FAIL: truncate exceeded max file size" ulimit -f unlimited umount $mnt Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-09xfs: deprecate barrier/nobarrier mount optionDave Chinner
We always perform integrity operations now, so these mount options don't do anything. Deprecate them and mark them for removal in in a year. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-09xfs: Always flush caches when integrity is requiredDave Chinner
There is no reason anymore for not issuing device integrity operations when teh filesystem requires ordering or data integrity guarantees. We should always issue cache flushes and FUA writes where necessary and let the underlying storage optimise them as necessary for correct integrity operation. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-09xfs: ignore leaf attr ichdr.count in verifier during log replayEric Sandeen
When we create a new attribute, we first create a shortform attribute, and try to fit the new attribute into it. If that fails, we copy the (empty) attribute into a leaf attribute, and do the copy again. Thus there can be a transient state where we have an empty leaf attribute. If we encounter this during log replay, the verifier will fail. So add a test to ignore this part of the leaf attr verification during log replay. Thanks as usual to dchinner for spotting the problem. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-08pNFS/flexfiles: Fix a deadlock on LAYOUTGETFred Isaman
We encountered a deadlock where the SEQUENCE that accompanied the LAYOUTGET triggered a session drain, while ff_layout_alloc_lseg triggered a GETDEVICEINFO. The GETDEVICEINFO hung waiting for the session drain, while the LAYOUTGET held the slot waiting for alloc_lseg to finish. Avoid this by moving the call to nfs4_find_get_deviceid out of ff_layout_alloc_lseg and into nfs4_ff_layout_prepare_ds. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> [dros@primarydata.com: pNFS/flexfiles: fix races in ff_layout_mirror_valid] Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-08ceph: don't set req->r_locked_dir in ceph_d_revalidateJeff Layton
This function sets req->r_locked_dir which is supposed to indicate to ceph_fill_trace that the parent's i_rwsem is locked for write. Unfortunately, there is no guarantee that the dir will be locked when d_revalidate is called, so we really don't want ceph_fill_trace to do any dcache manipulation from this context. Clear req->r_locked_dir since it's clearly not safe to do that. What we really want to know with d_revalidate is whether the dentry still points to the same inode. ceph_fill_trace installs a pointer to the inode in req->r_target_inode, so we can just compare that to d_inode(dentry) to see if it's the same one after the lookup. Also, since we aren't generally interested in the parent here, we can switch to using a GETATTR to hint that to the MDS, which also means that we only need to reserve one cap. Finally, just remove the d_unhashed check. That's really outside the purview of a filesystem's d_revalidate. If the thing became unhashed while we're checking it, then that's up to the VFS to handle anyway. Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry") Link: http://tracker.ceph.com/issues/18041 Reported-by: Donatas Abraitis <donatas.abraitis@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-07f2fs: fix to access nullified flush_cmd_control pointerJaegeuk Kim
f2fs_sync_file() remount_ro - f2fs_readonly - destroy_flush_cmd_control - f2fs_issue_flush - no fcc pointer! So, this patch doesn't free fcc in this case, but just stop its kernel thread which sends flush commands. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-07vfs: fix put_compat_statfs64() does not handle errorsLi Wang
put_compat_statfs64() does NOT return -1 and setting errno to EOVERFLOW when some variables(like: f_bsize) overflowed in the returned struct. The reason is that the ubuf->f_blocks is __u64 type, it couldn't be 4bits as the judgement in put_comat_statfs64(). Here correct the __u32 variables(in struct compat_statfs64) for comparison. reproducer: step1. mount hugetlbfs with two different pagesize on ppc64 arch. $ hugeadm --pool-pages-max 16M:0 $ hugeadm --create-mount $ mount | grep -i hugetlbfs none on /var/lib/hugetlbfs/pagesize-16MB type hugetlbfs (rw,relatime,seclabel,pagesize=16777216) none on /var/lib/hugetlbfs/pagesize-16GB type hugetlbfs (rw,relatime,seclabel,pagesize=17179869184) step2. compile & run this C program. $ cat statfs64_test.c #define _LARGEFILE64_SOURCE #include <stdio.h> #include <sys/syscall.h> #include <sys/statfs.h> int main() { struct statfs64 sb; int err; err = syscall(SYS_statfs64, "/var/lib/hugetlbfs/pagesize-16GB", sizeof(sb), &sb); if (err) return -1; printf("sizeof f_bsize = %d, f_bsize=%ld\n", sizeof(sb.f_bsize), sb.f_bsize); return 0; } $ gcc -m32 statfs64_test.c $ ./a.out sizeof f_bsize = 4, f_bsize=0 Signed-off-by: Li Wang <liwang@redhat.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-07f2fs: free meta pages if sanity check for ckpt is failedJaegeuk Kim
This fixes missing freeing meta pages in the error case. Tested-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-07f2fs: detect wrong layoutJaegeuk Kim
Previous mkfs.f2fs allows small partition inappropriately, so f2fs should detect that as well. Refer this in f2fs-tools. mkfs.f2fs: detect small partition by overprovision ratio and # of segments Reported-and-Tested-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-07pNFS: Layoutreturn must free the layout after the layout-private dataTrond Myklebust
The layout-private data may depend on the layout and/or the inode still existing when it does post-processing and frees its data, so we need to free them after calling lrp->ld_private.ops->free(). This fixes a mirror list corruption issue in the flexfiles driver. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-07pNFS/flexfiles: Fix ff_layout_add_ds_error_locked()Trond Myklebust
When we're merging an old entry into our new entry, we want to ensure that we add the list entry in the correct place. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-07NFSv4: Add missing nfs_put_lock_context()NeilBrown
Otherwise the lock context won't be freed when we're done with it. From: NeilBrown <neilb@suse.com> Fixes: 5bd3f817 ("NFSv4: change nfs4_select_rw_stateid to take a lock_context inplace of lock_owner") Signed-off-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-07ext2: reject inodes with negative sizeDarrick J. Wong
Don't load an inode with a negative size; this causes integer overflow problems in the VFS. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-12-07Merge branch 'xfs-4.10-misc-fixes-3' into for-nextDave Chinner
2016-12-07xfs: use rhashtable to track buffer cacheLucas Stach
On filesystems with a lot of metadata and in metadata intensive workloads xfs_buf_find() is showing up at the top of the CPU cycles trace. Most of the CPU time is spent on CPU cache misses while traversing the rbtree. As the buffer cache does not need any kind of ordering, but fast lookups a hashtable is the natural data structure to use. The rhashtable infrastructure provides a self-scaling hashtable implementation and allows lookups to proceed while the table is going through a resize operation. This reduces the CPU-time spent for the lookups to 1/3 even for small filesystems with a relatively small number of cached buffers, with possibly much larger gains on higher loaded filesystems. [dchinner: reduce minimum hash size to an acceptable size for large filesystems with many AGs with no active use.] [dchinner: remove stale rbtree asserts.] [dchinner: use xfs_buf_map for compare function argument.] [dchinner: make functions static.] [dchinner: remove redundant comments.] Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-06fuse: fix clearing suid, sgid for chown()Miklos Szeredi
Basically, the pjdfstests set the ownership of a file to 06555, and then chowns it (as root) to a new uid/gid. Prior to commit a09f99eddef4 ("fuse: fix killing s[ug]id in setattr"), fuse would send down a setattr with both the uid/gid change and a new mode. Now, it just sends down the uid/gid change. Technically this is NOTABUG, since POSIX doesn't _require_ that we clear these bits for a privileged process, but Linux (wisely) has done that and I think we don't want to change that behavior here. This is caused by the use of should_remove_suid(), which will always return 0 when the process has CAP_FSETID. In fact we really don't need to be calling should_remove_suid() at all, since we've already been indicated that we should remove the suid, we just don't want to use a (very) stale mode for that. This patch should fix the above as well as simplify the logic. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: a09f99eddef4 ("fuse: fix killing s[ug]id in setattr") Cc: <stable@vger.kernel.org> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-12-06btrfs: opencode chunk locking, remove helpersDavid Sterba
The helpers are trivial and we don't use them consistently. Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: remove root parameter from transaction commit/end routinesJeff Mahoney
Now we only use the root parameter to print the root objectid in a tracepoint. We can use the root parameter from the transaction handle for that. It's also used to join the transaction with async commits, so we remove the comment that it's just for checking. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: split btrfs_wait_marked_extents into normal and tree log functionsJeff Mahoney
btrfs_write_and_wait_marked_extents and btrfs_sync_log both call btrfs_wait_marked_extents, which provides a core loop and then handles errors differently based on whether it's it's a log root or not. This means that btrfs_write_and_wait_marked_extents needs to take a root because btrfs_wait_marked_extents requires one, even though it's only used to determine whether the root is a log root. The log root code won't ever call into the transaction commit code using a log root, so we can factor out the core loop and provide the error handling appropriate to each waiter in new routines. This allows us to eventually remove the root argument from btrfs_commit_transaction, and as a result, btrfs_end_transaction. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: take an fs_info directly when the root is not used otherwiseJeff Mahoney
There are loads of functions in btrfs that accept a root parameter but only use it to obtain an fs_info pointer. Let's convert those to just accept an fs_info pointer directly. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: simplify btrfs_wait_cache_io prototypeJeff Mahoney
With the exception of the one case where btrfs_wait_cache_io is called without a block group, it's called with the same arguments. The root argument is only used in the special case, so let's factor out the core and simplify the call in the normal case to require a trans, block group, and path. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: convert extent-tree tracepoints to use fs_infoJeff Mahoney
The extent-tree tracepoints all operate on the extent root, regardless of which root is passed in. Let's just use the extent root objectid instead. If it turns out that nobody is depending on the format of this tracepoint, we can drop the root printing entirely. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: root->fs_info cleanup, access fs_info->delayed_root directlyJeff Mahoney
This results in btrfs_assert_delayed_root_empty and btrfs_destroy_delayed_inode taking an fs_info instead of a root. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: root->fs_info cleanup, add fs_info convenience variablesJeff Mahoney
In routines where someptr->fs_info is referenced multiple times, we introduce a convenience variable. This makes the code considerably more readable. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: root->fs_info cleanup, update_block_group{,flags}Jeff Mahoney
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: root->fs_info cleanup, lock/unlock_chunksJeff Mahoney
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_sizeJeff Mahoney
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: pull node/sector/stripe sizes out of root and into fs_infoJeff Mahoney
We track the node sizes per-root, but they never vary from the values in the superblock. This patch messes with the 80-column style a bit, but subsequent patches to factor out root->fs_info into a convenience variable fix it up again. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: root->fs_info cleanup, io_ctl_initJeff Mahoney
The io_ctl->root member was only being used to access root->fs_info. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: root->fs_info cleanup, use fs_info->dev_root everywhereJeff Mahoney
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: struct reada_control.root -> reada_control.fs_infoJeff Mahoney
The root is never used. We substitute extent_root in for the reada_find_extent call, since it's only ever used to obtain the node size. This call site will be changed to use fs_info in a later patch. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: struct btrfsic_state->root should be an fs_infoJeff Mahoney
The root member is never used except for obtaining an fs_info pointer. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: alloc_reserved_file_extent trace point should use extent_rootJeff Mahoney
Even though a separate root is passed in, we're still operating on the extent root. Let's use that for the trace point. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06btrfs: btrfs_init_new_device should use fs_info->dev_rootJeff Mahoney
btrfs_init_new_device only uses the root passed in via the ioctl to start the transaction. Nothing else that happens is related to whatever root the user used to initiate the ioctl. We can drop the root requirement and just use fs_info->dev_root instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>