summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-03-29fold dentry_lock_for_move() into its sole caller and clean it upAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29make non-exchanging __d_move() copy ->d_parent rather than swap themAl Viro
Currently d_move(from, to) does the following: * name/parent of from <- old name/parent of to, from hashed there * to is unhashed * name of to is preserved * if from used to be detached, to gets detached * if from used to be attached, parent of to <- old parent of from. That's both user-visibly bogus and complicates reasoning a lot. Much saner semantics would be * name/parent of from <- name/parent of to, from hashed there. * to is unhashed * name/parent of to is unchanged. The price, of course, is that old parent of from might lose a reference. However, * all potentially cross-directory callers of d_move() have both parents pinned directly; typically, dentries themselves are grabbed only after we have grabbed and locked both parents. IOW, the decrement of old parent's refcount in case of d_move() won't reach zero. * __d_move() from d_splice_alias() is done to detached alias. No refcount decrements in that case * __d_move() from __d_unalias() *can* get the refcount to zero. So let's grab a reference to alias' old parent before calling __d_unalias() and dput() it after we'd dropped rename_lock. That does make d_splice_alias() potentially blocking. However, it has no callers in non-sleepable contexts (and the case where we'd grown that dget/dput pair is _very_ rare, so performance is not an issue). Another thing that needs adjustment is unlocking in the end of __d_move(); folded it in. And cleaned the remnants of bogus ordering from the "lock them in the beginning" counterpart - it's never been right and now (well, for 7 years now) we have that thing always serialized on rename_lock anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29debugfs_lookup(): switch to lookup_one_len_unlocked()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29fold lookup_real() into __lookup_hash()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29split d_path() and friends into a separate fileAl Viro
Those parts of fs/dcache.c are pretty much self-contained. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29dcache.c: trim includesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29fs/dcache: Avoid a try_lock loop in shrink_dentry_list()John Ogness
shrink_dentry_list() holds dentry->d_lock and needs to acquire dentry->d_inode->i_lock. This cannot be done with a spin_lock() operation because it's the reverse of the regular lock order. To avoid ABBA deadlocks it is done with a trylock loop. Trylock loops are problematic in two scenarios: 1) PREEMPT_RT converts spinlocks to 'sleeping' spinlocks, which are preemptible. As a consequence the i_lock holder can be preempted by a higher priority task. If that task executes the trylock loop it will do so forever and live lock. 2) In virtual machines trylock loops are problematic as well. The VCPU on which the i_lock holder runs can be scheduled out and a task on a different VCPU can loop for a whole time slice. In the worst case this can lead to starvation. Commits 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") and 046b961b45f9 ("shrink_dentry_list(): take parent's d_lock earlier") are addressing exactly those symptoms. Avoid the trylock loop by using dentry_kill(). When pruning ancestors, the same code applies that is used to kill a dentry in dput(). This also has the benefit that the locking order is now the same. First the inode is locked, then the parent. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29get rid of trylock loop around dentry_kill()Al Viro
In case when trylock in there fails, deal with it directly in dentry_kill(). Note that in cases when we drop and retake ->d_lock, we need to recheck whether to retain the dentry. Another thing is that dropping/retaking ->d_lock might have ended up with negative dentry turning into positive; that, of course, can happen only once... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29handle move to LRU in retain_dentry()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29dput(): consolidate the "do we need to retain it?" into an inlined helperAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29split the slow part of lock_parent() offAl Viro
Turn the "trylock failed" part into uninlined __lock_parent(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29now lock_parent() can't run into killed dentryAl Viro
all remaining callers hold either a reference or ->i_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29get rid of trylock loop in locking dentries on shrink listAl Viro
In case of trylock failure don't re-add to the list - drop the locks and carefully get them in the right order. For shrink_dentry_list(), somebody having grabbed a reference to dentry means that we can kick it off-list, so if we find dentry being modified under us we don't need to play silly buggers with retries anyway - off the list it is. The locking logics taken out into a helper of its own; lock_parent() is no longer used for dentries that can be killed under us. [fix from Eric Biggers folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29ext4: limit xattr size to INT_MAXEric Biggers
ext4 isn't validating the sizes of xattrs where the value of the xattr is stored in an external inode. This is problematic because ->e_value_size is a u32, but ext4_xattr_get() returns an int. A very large size is misinterpreted as an error code, which ext4_get_acl() translates into a bogus ERR_PTR() for which IS_ERR() returns false, causing a crash. Fix this by validating that all xattrs are <= INT_MAX bytes. This issue has been assigned CVE-2018-1095. https://bugzilla.kernel.org/show_bug.cgi?id=199185 https://bugzilla.redhat.com/show_bug.cgi?id=1560793 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
2018-03-29gfs2: time journal recovery steps accuratelyAbhi Das
This patch spits out the time taken by the various steps in the journal recover process. Previously, the journal recovery time didn't account for finding the journal head in the log which takes up a significant portion of time. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-29xfs: do not log/recover swapext extent owner changes for deleted inodesEric Sandeen
Today if we run xfs_fsr and crash[1], log replay can fail because the recovery code tries to instantiate the donor inode from disk to replay the swapext, but it's been deleted and we get verifier failures when we try to read the inode off disk with i_mode == 0. This fixes both sides: We don't log the swapext change if the inode has been deleted, and we don't try to recover it either. [1] or if systemd doesn't cleanly unmount root, as it is wont to do ... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-29gfs2: Zero out fallocated blocks in fallocate_chunkAndreas Gruenbacher
Instead of zeroing out fallocated blocks in gfs2_iomap_alloc, zero them out in fallocate_chunk, much higher up the call stack. This gets rid of gfs2's abuse of the IOMAP_ZERO flag as well as the gfs2 specific zeronew buffer flag. I can't think of a reason why zeroing out the blocks in gfs2_iomap_alloc would have any benefits: there is no additional locking at that level that would add protection to the newly allocated blocks. While at it, change fallocate over from gs2_block_map to gfs2_iomap_begin. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de>
2018-03-28f2fs: reserve bits for fs-verityEric Biggers
Reserve an F2FS feature flag and inode flag for fs-verity. This is an in-development feature that is planned be discussed at LSF/MM 2018 [1]. It will provide file-based integrity and authenticity for read-only files. Most code will be in a filesystem-independent module, with smaller changes needed to individual filesystems that opt-in to supporting the feature. An early prototype supporting F2FS is available [2]. Reserving the F2FS on-disk bits for fs-verity will prevent users of the prototype from conflicting with other new F2FS features. Note that we're reserving the inode flag in f2fs_inode.i_advise, which isn't really appropriate since it's not a hint or advice. But ->i_advise is already being used to hold the 'encrypt' flag; and F2FS's ->i_flags uses the generic FS_* values, so it seems ->i_flags can't be used for an F2FS-specific flag without additional work to remove the assumption that ->i_flags uses the generic flags namespace. [1] https://marc.info/?l=linux-fsdevel&m=151690752225644 [2] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/log/?h=fs-verity-dev Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-28Merge 4.16-rc7 into char-misc-nextGreg Kroah-Hartman
We want the hyperv fix in here for merging and testing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28fs: move I_DIRTY_INODE to fs.hChristoph Hellwig
And use it in a few more places rather than opencoding the values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) callChristoph Hellwig
I_DIRTY_DATASYNC is a strict superset of I_DIRTY_SYNC semantics, as in mark dirty to be written out by fdatasync as well. So dirtying for both flags makes no sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) callChristoph Hellwig
I_DIRTY_DATASYNC is a strict superset of I_DIRTY_SYNC semantics, as in mark dirty to be written out by fdatasync as well. So dirtying for both flags makes no sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) callsChristoph Hellwig
I_DIRTY_DATASYNC is a strict superset of I_DIRTY_SYNC semantics, as in mark dirty to be written out by fdatasync as well. So dirtying for both flags makes no sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28fs: fold open_check_o_direct into do_dentry_openChristoph Hellwig
do_dentry_open is where we do the actual open of the file, so this is where we should do our O_DIRECT sanity check to cover all potential callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-27f2fs: Add a segment type check in inplace writeYunlei He
This patch add a segment type check in IPU, in case of something wrong with blkadd in dnode. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27f2fs: no need to initialize zero value for GFP_F2FS_ZEROYunlong Song
Since f2fs_inode_info is allocated with flag GFP_F2FS_ZERO, so we do not need to initialize zero value for its member any more. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27f2fs: don't track new nat entry in nat setChao Yu
Nat entry set is used only in checkpoint(), and during checkpoint() we won't flush new nat entry with unallocated address, so we don't need to add new nat entry into nat set, then nat_entry_set::entry_cnt can indicate actual entry count we need to flush in checkpoint(). Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27f2fs: clean up with F2FS_BLK_ALIGNChao Yu
Clean up F2FS_BYTES_TO_BLK(x + F2FS_BLKSIZE - 1) with F2FS_BLK_ALIGN(x). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27rxrpc, afs: Use debug_ids rather than pointers in tracesDavid Howells
In rxrpc and afs, use the debug_ids that are monotonically allocated to various objects as they're allocated rather than pointers as kernel pointers are now hashed making them less useful. Further, the debug ids aren't reused anywhere nearly as quickly. In addition, allow kernel services that use rxrpc, such as afs, to take numbers from the rxrpc counter, assign them to their own call struct and pass them in to rxrpc for both client and service calls so that the trace lines for each will have the same ID tag. Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-27net: Drop pernet_operations::asyncKirill Tkhai
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Convert nfsd_net_opsKirill Tkhai
These pernet_operations look similar to rpcsec_gss_net_ops, they just create and destroy another caches. So, they also can be async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27treewide: Fix typos in printkMasanari Iida
This patch fixes spelling typos found in printk. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-03-26ext4: add validity checks for bitmap block numbersTheodore Ts'o
An privileged attacker can cause a crash by mounting a crafted ext4 image which triggers a out-of-bounds read in the function ext4_valid_block_bitmap() in fs/ext4/balloc.c. This issue has been assigned CVE-2018-1093. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199181 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1560782 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-03-26net: Convert nfs4blocklayout_net_opsKirill Tkhai
These pernet_operations create and destroy per-net pipe and dentry, and they seem safe to be marked as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: Convert nfs4_dns_resolver_opsKirill Tkhai
These pernet_operations look similar to rpcsec_gss_net_ops, they just create and destroy another cache. Also they create and destroy directory. So, they also look safe to be async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26xfs: clean up xfs_mount allocation and dynamic initializersBrian Foster
Most of the generic data structures embedded in xfs_mount are dynamically initialized immediately after mp is allocated. A few fields are left out and initialized during the xfs_mountfs() sequence, after mp has been attached to the superblock. To clean this up and help prevent premature access of associated fields, refactor xfs_mount allocation and all dependent init calls into a new helper. This self-documents that all low level data structures (i.e., locks, trees, etc.) should be initialized before xfs_mount is attached to the superblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-26treewide: simplify Kconfig dependencies for removed archsArnd Bergmann
A lot of Kconfig symbols have architecture specific dependencies. In those cases that depend on architectures we have already removed, they can be omitted. Acked-by: Kalle Valo <kvalo@codeaurora.org> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-03-26Btrfs: dev-replace: make sure target is identical to source when raid56 ↵Liu Bo
rebuild fails In the last step of scrub_handle_error_block, we try to combine good copies on all possible mirrors, this works fine for raid1 and raid10, but not for raid56 as it's doing parity rebuild. If parity rebuild doesn't get back with correct data which matches its checksum, in case of replace we'd rather write what is stored in the source device than the data calculuated from parity. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26Btrfs: raid56: remove redundant async_missing_raid56Liu Bo
async_missing_raid56() is identical to async_read_rebuild(). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: adjust return values of btrfs_inode_by_nameSu Yue
Previously, btrfs_inode_by_name() returned 0 which left caller to check objectid of location even location if the type was invalid. Let btrfs_inode_by_name() return -EUCLEAN if a corrupted location of a dir entry is found. Removal of label out_err also simplifies the function. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ drop unlikely ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: rename btrfs_close_extra_device to btrfs_free_extra_devidsAnand Jain
This function btrfs_close_extra_devices() is about freeing extra devids which once it may have belonged to this filesystem. So rename it and add the comment. The _devid suffix is appropriate as this function won't handle devices which are outside of the filesytem being mounted. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: Remove root argument from cow_file_range_inlineNikolay Borisov
This argument is always set to the root of the inode, which is also passed. So let's get a reference inside the function and simplify the arg list. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26Btrfs: send: fix typo in TLV_PUTLiu Bo
According to tlv_put()'s prototype, data and attrlen needs to be exchanged in the macro, but seems all callers are already aware of this misorder and are therefore not affected. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: Remove root argument from btrfs_log_dentry_safeNikolay Borisov
Now that nothing uses the root arg of btrfs_log_dentry_safe it can be safely removed. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: Remove root arg from btrfs_log_inode_parentNikolay Borisov
btrfs_log_inode_parent is called from 2 places (btrfs_log_dentry_safe and btrfs_log_new_name) both of which pass inode->root as the root argument and the inode itself. Remove the redundant root argument and get a reference to the root directly from the inode, also remove redundant root != inode->root check from the same function. No functional change. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: Remove redundant comment from btrfs_search_forwardNikolay Borisov
This function always sets keep_locks to 1 and saves the old value of keep_locks which is restored at the end. So there is no way it can be called without keep_locks being set. Remove comment imposing redundant requirement on callers. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: move btrfs_listxattr prototype to xattr.hDavid Sterba
There's a proper header for xattr handlers. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: adjust return type of btrfs_getxattrDavid Sterba
The xattr_handler::get prototype returns int, use it. The only ssize_t exception is the per-inode listxattr handler. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: drop extern from function declarationsDavid Sterba
Extern for functions does not make any difference, there are only a few so let's remove them before it's too late. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26btrfs: drop underscores from exported xattr functionsDavid Sterba
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>