summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2014-05-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "This adds ctime update in the new cached writeback mode and also fixes/simplifies the mtime update handling. Support for rename flags (aka renameat2) is also added to the userspace API" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add renameat2 support fuse: clear MS_I_VERSION fuse: clear FUSE_I_CTIME_DIRTY flag on setattr fuse: trust kernel i_ctime only fuse: remove .update_time fuse: allow ctime flushing to userspace fuse: fuse: add time_gran to INIT_OUT fuse: add .write_inode fuse: clean up fsync fuse: fuse: fallocate: use file_update_time() fuse: update mtime on open(O_TRUNC) in atomic_o_trunc mode fuse: update mtime on truncate(2) fuse: do not use uninitialized i_mode fuse: fix mtime update error in fsync fuse: check fallocate mode fuse: add __exit to fuse_ctl_cleanup
2014-05-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "First, there is a critical fix for the new primary-affinity function that went into -rc1. The second batch of patches from Zheng fix a range of problems with directory fragmentation, readdir, and a few odds and ends for cephfs" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: reserve caps for file layout/lock MDS requests ceph: avoid releasing caps that are being used ceph: clear directory's completeness when creating file libceph: fix non-default values check in apply_primary_affinity() ceph: use fpos_cmp() to compare dentry positions ceph: check directory's completeness before emitting directory entry
2014-05-06xfs: remote attribute overwrite causes transaction overrunDave Chinner
Commit e461fcb ("xfs: remote attribute lookups require the value length") passes the remote attribute length in the xfs_da_args structure on lookup so that CRC calculations and validity checking can be performed correctly by related code. This, unfortunately has the side effect of changing the args->valuelen parameter in cases where it shouldn't. That is, when we replace a remote attribute, the incoming replacement stores the value and length in args->value and args->valuelen, but then the lookup which finds the existing remote attribute overwrites args->valuelen with the length of the remote attribute being replaced. Hence when we go to create the new attribute, we create it of the size of the existing remote attribute, not the size it is supposed to be. When the new attribute is much smaller than the old attribute, this results in a transaction overrun and an ASSERT() failure on a debug kernel: XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 331 Fix this by keeping the remote attribute value length separate to the attribute value length in the xfs_da_args structure. The enables us to pass the length of the remote attribute to be removed without overwriting the new attribute's length. Also, ensure that when we save remote block contexts for a later rename we zero the original state variables so that we don't confuse the state of the attribute to be removes with the state of the new attribute that we just added. [Spotted by Brain Foster.] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-05-06xfs: initialize default acls for ->tmpfile()Brian Foster
The current tmpfile handler does not initialize default ACLs. Doing so within xfs_vn_tmpfile() makes it roughly equivalent to xfs_vn_mknod(), which is already used as a common create handler. xfs_vn_mknod() does not currently have a mechanism to determine whether to link the file into the namespace. Therefore, further abstract xfs_vn_mknod() into a new xfs_generic_create() handler with a tmpfile parameter. This new handler calls xfs_create_tmpfile() and d_tmpfile() on the dentry when called via ->tmpfile(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-05-05xfs: Fix wrong error codes being returnedFrom: Tuomas Tynkkynen
xfs_{compat_,}attrmulti_by_handle could return an errno with incorrect sign in some cases. While at it, make sure ENOMEM is returned instead of E2BIG if kmalloc fails. Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-05-05xfs: remove dquot hintsDave Chinner
group and project quota hints are currently stored on the user dquot. If we are attaching quotas to the inode, then the group and project dquots are stored as hints on the user dquot to save having to look them up again later. The thing is, the hints are not used for that inode for the rest of the life of the inode - the dquots are attached directly to the inode itself - so the only time the hints are used is when an inode first has dquots attached. When the hints on the user dquot don't match the dquots being attache dto the inode, they are then removed and replaced with the new hints. If a user is concurrently modifying files in different group and/or project contexts, then this leads to thrashing of the hints attached to user dquot. If user quotas are not enabled, then hints are never even used. So, if the hints are used to avoid the cost of the lookup, is the cost of the lookup significant enough to justify the hint infrstructure? Maybe it was once, when there was a global quota manager shared between all XFS filesystems and was hash table based. However, lookups are now much simpler, requiring only a single lock and radix tree lookup local to the filesystem and no hash or LRU manipulations to be made. Hence the cost of lookup is much lower than when hints were implemented. Turns out that benchmarks show that, too, with thir being no differnce in performance when doing file creation workloads as a single user with user, group and project quotas enabled - the hints do not make the code go any faster. In fact, removing the hints shows a 2-3% reduction in the time it takes to create 50 million inodes.... So, let's just get rid of the hints and the complexity around them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-05-05xfs: bulletfproof xfs_qm_scall_trunc_qfiles()Eric Sandeen
Coverity noticed that if we sent junk into xfs_qm_scall_trunc_qfiles(), we could get back an uninitialized error value. So sanitize the flags we will accept, and initialize error anyway for good measure. (This bug may have been introduced via c61a9e39). Should resolve Coverity CID 1163872. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-05-05xfs: fix Q_XQUOTARM ioctlEric Sandeen
The Q_XQUOTARM quotactl was not working properly, because we weren't passing around proper flags. The xfs_fs_set_xstate() ioctl handler used the same flags for Q_XQUOTAON/OFF as well as for Q_XQUOTARM, but Q_XQUOTAON/OFF look for XFS_UQUOTA_ACCT, XFS_UQUOTA_ENFD, XFS_GQUOTA_ACCT etc, i.e. quota type + state, while Q_XQUOTARM looks only for the type of quota, i.e. XFS_DQ_USER, XFS_DQ_GROUP etc. Unfortunately these flag spaces overlap a bit, so we got semi-random results for Q_XQUOTARM; i.e. the value for XFS_DQ_USER == XFS_UQUOTA_ACCT, etc. yeargh. Add a new quotactl op vector specifically for the QUOTARM operation, since it operates with a different flag space. This has been broken more or less forever, AFAICT. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-05-05UBIFS: fix remount error pathArtem Bityutskiy
Dan's "smatch" checker found out that there was a bug in the error path of the 'ubifs_remount_rw()' function. Instead of jumping to the "out" label which cleans-things up, we just returned. This patch fixes the problem. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-05-05xfs: fully support v5 format filesystemsDave Chinner
We have had this code in the kernel for over a year now and have shaken all the known issues out of the code over the past few releases. It's now time to remove the experimental warnings during mount and fully support the new filesystem format in production systems. Remove the experimental warning, and add a version number to the initial "mounting filesystem" message to tell use what type of filesystem is being mounted. Also, remove the temporary inode cluster size output at mount time now we know that this code works fine. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-05-03dcache: don't need rcu in shrink_dentry_list()Miklos Szeredi
Since now the shrink list is private and nobody can free the dentry while it is on the shrink list, we can remove RCU protection from this. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-03more graceful recovery in umount_collect()Al Viro
Start with shrink_dcache_parent(), then scan what remains. First of all, BUG() is very much an overkill here; we are holding ->s_umount, and hitting BUG() means that a lot of interesting stuff will be hanging after that point (sync(2), for example). Moreover, in cases when there had been more than one leak, we'll be better off reporting all of them. And more than just the last component of pathname - %pd is there for just such uses... That was the last user of dentry_lru_del(), so kill it off... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-03don't remove from shrink list in select_collect()Al Viro
If we find something already on a shrink list, just increment data->found and do nothing else. Loops in shrink_dcache_parent() and check_submounts_and_drop() will do the right thing - everything we did put into our list will be evicted and if there had been nothing, but data->found got non-zero, well, we have somebody else shrinking those guys; just try again. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-01Merge git://git.kvack.org/~bcrl/aio-fixesLinus Torvalds
Pull aio fixes from Ben LaHaise: "The first change from Anatol fixes a regression where io_destroy() no longer waits for outstanding aios to complete. The second corrects a memory leak in an error path for vectored aio operations. Both of these bug fixes should be queued up for stable as well" * git://git.kvack.org/~bcrl/aio-fixes: aio: fix potential leak in aio_run_iocb(). aio: block io_destroy() until all context requests are completed
2014-05-01dentry_kill(): don't try to remove from shrink listAl Viro
If the victim in on the shrink list, don't remove it from there. If shrink_dentry_list() manages to remove it from the list before we are done - fine, we'll just free it as usual. If not - mark it with new flag (DCACHE_MAY_FREE) and leave it there. Eventually, shrink_dentry_list() will get to it, remove the sucker from shrink list and call dentry_kill(dentry, 0). Which is where we'll deal with freeing. Since now dentry_kill(dentry, 0) may happen after or during dentry_kill(dentry, 1), we need to recognize that (by seeing DCACHE_DENTRY_KILLED already set), unlock everything and either free the sucker (in case DCACHE_MAY_FREE has been set) or leave it for ongoing dentry_kill(dentry, 1) to deal with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-01aio: fix potential leak in aio_run_iocb().Leon Yu
iovec should be reclaimed whenever caller of rw_copy_check_uvector() returns, but it doesn't hold when failure happens right after aio_setup_vectored_rw(). Fix that in a such way to avoid hairy goto. Signed-off-by: Leon Yu <chianglungyu@gmail.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org
2014-04-30expand the call of dentry_lru_del() in dentry_kill()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-30new helper: dentry_free()Al Viro
The part of old d_free() that dealt with actual freeing of dentry. Taken out of dentry_kill() into a separate function. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-30fold try_prune_one_dentry()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-30fold d_kill() and d_free()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-29aio: cleanup: flatten kill_ioctx()Benjamin LaHaise
There is no need to have most of the code in kill_ioctx() indented. Flatten it. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2014-04-29aio: report error from io_destroy() when threads race in io_destroy()Benjamin LaHaise
As reported by Anatol Pomozov, io_destroy() fails to report an error when it loses the race to destroy a given ioctx. Since there is a difference in behaviour between the thread that wins the race (which blocks on outstanding io requests) versus lthe thread that loses (which returns immediately), wire up a return code from kill_ioctx() to the io_destroy() syscall. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Anatol Pomozov <anatol.pomozov@gmail.com>
2014-04-28ceph: reserve caps for file layout/lock MDS requestsYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-28ceph: avoid releasing caps that are being usedYan, Zheng
To avoid releasing caps that are being used, encode_inode_release() should send implemented caps to MDS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-28ceph: clear directory's completeness when creating fileYan, Zheng
When creating a file, ceph_set_dentry_offset() puts the new dentry at the end of directory's d_subdirs, then set the dentry's offset based on directory's max offset. The offset does not reflect the real postion of the dentry in directory. Later readdir reply from MDS may change the dentry's position/offset. This inconsistency can cause missing/duplicate entries in readdir result if readdir is partly satisfied by dcache_readdir(). The fix is clear directory's completeness after creating/renaming file. It prevents later readdir from using dcache_readdir(). Fixes: http://tracker.ceph.com/issues/8025 Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-28ceph: use fpos_cmp() to compare dentry positionsYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-28ceph: check directory's completeness before emitting directory entryYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-28fuse: add renameat2 supportMiklos Szeredi
Support RENAME_EXCHANGE and RENAME_NOREPLACE flags on the userspace ABI. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: clear MS_I_VERSIONMiklos Szeredi
Fuse doesn't support i_version (yet). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: clear FUSE_I_CTIME_DIRTY flag on setattrMaxim Patlasov
The patch addresses two use-cases when the flag may be safely cleared: 1. fuse_do_setattr() is called with ATTR_CTIME flag set in attr->ia_valid. In this case attr->ia_ctime bears actual value. In-kernel fuse must send it to the userspace server and then assign the value to inode->i_ctime. 2. fuse_do_setattr() is called with ATTR_SIZE flag set in attr->ia_valid, whereas ATTR_CTIME is not set (truncate(2)). In this case in-kernel fuse must sent "now" to the userspace server and then assign the value to inode->i_ctime. In both cases we could clear I_DIRTY_SYNC, but that needs more thought. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: trust kernel i_ctime onlyMaxim Patlasov
Let the kernel maintain i_ctime locally: update i_ctime explicitly on truncate, fallocate, open(O_TRUNC), setxattr, removexattr, link, rename, unlink. The inode flag I_DIRTY_SYNC serves as indication that local i_ctime should be flushed to the server eventually. The patch sets the flag and updates i_ctime in course of operations listed above. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: remove .update_timeMiklos Szeredi
This implements updating ctime as well as mtime on file_update_time(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: allow ctime flushing to userspaceMaxim Patlasov
The patch extends fuse_setattr_in, and extends the flush procedure (fuse_flush_times()) called on ->write_inode() to send the ctime as well as mtime. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: fuse: add time_gran to INIT_OUTMiklos Szeredi
Allow userspace fs to specify time granularity. This is needed because with writeback_cache mode the kernel is responsible for generating mtime and ctime, but if the underlying filesystem doesn't support nanosecond granularity then the cache will contain a different value from the one stored on the filesystem resulting in a change of times after a cache flush. Make the default granularity 1s. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: add .write_inodeMiklos Szeredi
...and flush mtime from this. This allows us to use the kernel infrastructure for writing out dirty metadata (mtime at this point, but ctime in the next patches and also maybe atime). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: clean up fsyncMiklos Szeredi
Don't need to start I/O twice (once without i_mutex and one within). Also make sure that even if the userspace filesystem doesn't support FSYNC we do all the steps other than sending the message. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: fuse: fallocate: use file_update_time()Miklos Szeredi
in preparation for getting rid of FUSE_I_MTIME_DIRTY. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: update mtime on open(O_TRUNC) in atomic_o_trunc modeMaxim Patlasov
In case of fc->atomic_o_trunc is set, fuse does nothing in fuse_do_setattr() while handling open(O_TRUNC). Hence, i_mtime must be updated explicitly in fuse_finish_open(). The patch also adds extra locking encompassing open(O_TRUNC) operation to avoid races between the truncation and updating i_mtime. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: update mtime on truncate(2)Maxim Patlasov
Handling truncate(2), VFS doesn't set ATTR_MTIME bit in iattr structure; only ATTR_SIZE bit is set. In-kernel fuse must handle the case by setting mtime fields of struct fuse_setattr_in to "now" and set FATTR_MTIME bit even though ATTR_MTIME was not set. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: do not use uninitialized i_modeMaxim Patlasov
When inode is in I_NEW state, inode->i_mode is not initialized yet. Do not use it before fuse_init_inode() is called. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: fix mtime update error in fsyncMiklos Szeredi
Bad case of shadowing. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: check fallocate modeMiklos Szeredi
Don't allow new fallocate modes until we figure out what (if anything) that takes. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28fuse: add __exit to fuse_ctl_cleanupFabian Frederick
fuse_ctl_cleanup is only called by __exit fuse_exit Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28GFS2: lops.c: replace 0 by NULL for pointersFabian Frederick
Sparse warning: fs/gfs2/lops.c:78:29: "warning: Using plain integer as NULL pointer" Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-04-27Merge 3.15-rc3 into staging-nextGreg Kroah-Hartman
2014-04-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: limit the path size in send to PATH_MAX Btrfs: correctly set profile flags on seqlock retry Btrfs: use correct key when repeating search for extent item Btrfs: fix inode caching vs tree log Btrfs: fix possible memory leaks in open_ctree() Btrfs: avoid triggering bug_on() when we fail to start inode caching task Btrfs: move btrfs_{set,clear}_and_info() to ctree.h btrfs: replace error code from btrfs_drop_extents btrfs: Change the hole range to a more accurate value. btrfs: fix use-after-free in mount_subvol()
2014-04-27Merge tag 'driver-core-3.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some kernfs fixes for 3.15-rc3 that resolve some reported problems. Nothing huge, but all needed" * tag 'driver-core-3.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: s390/ccwgroup: Fix memory corruption kernfs: add back missing error check in kernfs_fop_mmap() kernfs: fix a subdir count leak
2014-04-26Btrfs: limit the path size in send to PATH_MAXChris Mason
fs_path_ensure_buf is used to make sure our path buffers for send are big enough for the path names as we construct them. The buffer size is limited to 32K by the length field in the struct. But bugs in the path construction can end up trying to build a huge buffer, and we'll do invalid memmmoves when the buffer length field wraps. This patch is step one, preventing the overflows. Signed-off-by: Chris Mason <clm@fb.com>
2014-04-25Merge tag 'locks-v3.15-2' of git://git.samba.org/jlayton/linuxLinus Torvalds
Pull file locking fixes from Jeff Layton: "File locking related bugfixes for v3.15 (pile #2) - fix for a long-standing bug in __break_lease that can cause soft lockups - renaming of file-private locks to "open file description" locks, and the command macros to more visually distinct names The fix for __break_lease is also in the pile of patches for which Bruce sent a pull request, but I assume that your merge procedure will handle that correctly. For the other patches, I don't like the fact that we need to rename this stuff at this late stage, but it should be settled now (hopefully)" * tag 'locks-v3.15-2' of git://git.samba.org/jlayton/linux: locks: rename FL_FILE_PVT and IS_FILE_PVT to use "*_OFDLCK" instead locks: rename file-private locks to "open file description locks" locks: allow __break_lease to sleep even when break_time is 0
2014-04-25Merge branch 'for-3.15' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfixes from Bruce Fields: "Three small nfsd bugfixes (including one locks.c fix for a bug triggered only from nfsd). Jeff's patches are for long-existing problems that became easier to trigger since the addition of vfs delegation support" * 'for-3.15' of git://linux-nfs.org/~bfields/linux: Revert "nfsd4: fix nfs4err_resource in 4.1 case" nfsd: set timeparms.to_maxval in setup_callback_client locks: allow __break_lease to sleep even when break_time is 0