summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-11-03NFSv2: Clean up timespec encodeTrond Myklebust
Simplify the struct iattr timestamp encoding by skipping the step of an intermediate struct timespec. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03NFSv2: Fix a typo in encode_sattr()Trond Myklebust
Encode the mtime correctly. Fixes: 95582b0083883 ("vfs: change inode times to use struct timespec64") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03NFSv4: NFSv4 callbacks also support 64-bit timestampsTrond Myklebust
Convert the NFSv4 callbacks to use struct timestamp64, rather than truncating times to 32-bit values. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03NFSv4: Encode 64-bit timestampsTrond Myklebust
NFSv4 supports 64-bit timestamps, so there is no point in converting the struct iattr timestamps to 32-bits before encoding. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03NFS: Convert struct nfs_fattr to use struct timespec64Trond Myklebust
NFSv4 supports 64-bit times, so we should switch to using struct timespec64 when decoding attributes. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03NFS: If nfs_mountpoint_expiry_timeout < 0, do not expire submountsTrond Myklebust
If we set nfs_mountpoint_expiry_timeout to a negative value, then allow that to imply that we do not expire NFSv4 submounts. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03xfs: cleanup use of the XFS_ALLOC_ flagsChristoph Hellwig
Always set XFS_ALLOC_USERDATA for data fork allocations, and check it in xfs_alloc_is_userdata instead of the current obsfucated check. Also remove the xfs_alloc_is_userdata and xfs_alloc_allow_busy_reuse helpers to make the code a little easier to understand. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03xfs: move extent zeroing to xfs_bmapi_allocateChristoph Hellwig
Move the extent zeroing case there for the XFS_BMAPI_ZERO flag outside the low-level allocator and into xfs_bmapi_allocate, where is still is in transaction context, but outside the very lowlevel code where it doesn't belong. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03xfs: refactor xfs_bmapi_allocateChristoph Hellwig
Avoid duplicate userdata and data fork checks by restructuring the code so we only have a helper for userdata allocations that combines these checks in a straight foward way. That also helps to obsoletes the comments explaining what the code does as it is now clearly obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03xfs: simplify the xfs_iomap_write_direct callingChristoph Hellwig
Move the EOF alignment and checking for the next allocated extent into the callers to avoid the need to pass the byte based offset and count as well as looking at the incoming imap. The added benefit is that the caller can unlock the incoming ilock and the function doesn't have funny unbalanced locking contexts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03xfs: don't log the inode in xfs_fs_map_blocks if itChristoph Hellwig
Even if we are asked for a write layout there is no point in logging the inode unless we actually modified it in some way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03xfs: slightly tweak an assert in xfs_fs_map_blocksChristoph Hellwig
We should never see delalloc blocks for a pNFS layout, write or not. Adjust the assert to check for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03xfs: remove the extsize argument to xfs_eof_alignmentChristoph Hellwig
And move the code dependent on it to the one caller that cares instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03xfs: mark xfs_eof_alignment staticChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03xfs: simplify xfs_iomap_eof_align_last_fsbChristoph Hellwig
By open coding xfs_bmap_last_extent instead of calling it through a double indirection we don't need to handle an error return that can't happen given that we are guaranteed to have the extent list in memory already. Also simplify the calling conventions a little and move the extent list assert from the only caller into the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03bdev: Refresh bdev size for disks without partitioningJan Kara
Currently, block device size in not updated on second and further open for block devices where partition scan is disabled. This is particularly annoying for example for DVD drives as that means block device size does not get updated once the media is inserted into a drive if the device is already open when inserting the media. This is actually always the case for example when pktcdvd is in use. Fix the problem by revalidating block device size on every open even for devices with partition scan disabled. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-03bdev: Factor out bdev revalidation into a common helperJan Kara
Factor out code handling revalidation of bdev on disk change into a common helper. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-03debugfs: remove return value of debugfs_create_atomic_t()Greg Kroah-Hartman
No one checks the return value of debugfs_create_atomic_t(), as it's not needed, so make the return value void, so that no one tries to do so in the future. Link: https://lore.kernel.org/r/20191016130332.GA28240@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-02Merge tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fix from Steve French: "A small smb3 memleak fix" * tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6: fix memory leak in large read decrypt offload
2019-11-02debugfs: remove return value of debugfs_create_x8()Greg Kroah-Hartman
No one checks the return value of debugfs_create_x8(), as it's not needed, so make the return value void, so that no one tries to do so in the future. Link: https://lore.kernel.org/r/20191011132931.1186197-5-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-02io-wq: use kfree_rcu() to simplify the codeYueHaibing
The callback function of call_rcu() just calls kfree(), so we can use kfree_rcu() instead of call_rcu() + callback function. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-01Merge tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Anna Schumaker: "This contains two delegation fixes (with the RCU lock leak fix marked for stable), and three patches to fix destroying the the sunrpc back channel. Stable bugfixes: - Fix an RCU lock leak in nfs4_refresh_delegation_stateid() Other fixes: - The TCP back channel mustn't disappear while requests are outstanding - The RDMA back channel mustn't disappear while requests are outstanding - Destroy the back channel when we destroy the host transport - Don't allow a cached open with a revoked delegation" * tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid() NFSv4: Don't allow a cached open with a revoked delegation SUNRPC: Destroy the back channel when we destroy the host transport SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding SUNRPC: The TCP back channel mustn't disappear while requests are outstanding
2019-11-01Merge tag 'for-linus-20191101' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Two small nvme fixes, one is a fabrics connection fix, the other one a cleanup made possible by that fix (Anton, via Keith) - Fix requeue handling in umb ubd (Anton) - Fix spin_lock_irq() nesting in blk-iocost (Dan) - Three small io_uring fixes: - Install io_uring fd after done with ctx (me) - Clear ->result before every poll issue (me) - Fix leak of shadow request on error (Pavel) * tag 'for-linus-20191101' of git://git.kernel.dk/linux-block: iocost: don't nest spin_lock_irq in ioc_weight_write() io_uring: ensure we clear io_kiocb->result before each issue um-ubd: Entrust re-queue to the upper layers nvme-multipath: remove unused groups_only mode in ana log nvme-multipath: fix possible io hang after ctrl reconnect io_uring: don't touch ctx in setup after ring fd install io_uring: Fix leaked shadow_req
2019-11-01Merge branch 'rh/dioread-nolock-1k' into devTheodore Ts'o
2019-11-01NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()Trond Myklebust
A typo in nfs4_refresh_delegation_stateid() means we're leaking an RCU lock, and always returning a value of 'false'. As the function description states, we were always supposed to return 'true' if a matching delegation was found. Fixes: 12f275cdd163 ("NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-11-01NFSv4: Don't allow a cached open with a revoked delegationTrond Myklebust
If the delegation is marked as being revoked, we must not use it for cached opens. Fixes: 869f9dfa4d6d ("NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-11-01io_uring: set -EINTR directly when a signal wakes up in io_cqring_waitJackie Liu
We didn't use -ERESTARTSYS to tell the application layer to restart the system call, but instead return -EINTR. we can set -EINTR directly when wakeup by the signal, which can help us save an assignment operation and comparison operation. Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-01io_uring: support for generic async request cancelJens Axboe
This adds support for IORING_OP_ASYNC_CANCEL, which will attempt to cancel requests that have been punted to async context and are now in-flight. This works for regular read/write requests to files, as long as they haven't been started yet. For socket based IO (or things like accept4(2)), we can cancel work that is already running as well. To cancel a request, the sqe must have ->addr set to the user_data of the request it wishes to cancel. If the request is cancelled successfully, the original request is completed with -ECANCELED and the cancel request is completed with a result of 0. If the request was already running, the original may or may not complete in error. The cancel request will complete with -EALREADY for that case. And finally, if the request to cancel wasn't found, the cancel request is completed with -ENOENT. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-31quota: Check that quota is not dirty before releaseDmitry Monakhov
There is a race window where quota was redirted once we drop dq_list_lock inside dqput(), but before we grab dquot->dq_lock inside dquot_release() TASK1 TASK2 (chowner) ->dqput() we_slept: spin_lock(&dq_list_lock) if (dquot_dirty(dquot)) { spin_unlock(&dq_list_lock); dquot->dq_sb->dq_op->write_dquot(dquot); goto we_slept if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) { spin_unlock(&dq_list_lock); dquot->dq_sb->dq_op->release_dquot(dquot); dqget() mark_dquot_dirty() dqput() goto we_slept; } So dquot dirty quota will be released by TASK1, but on next we_sleept loop we detect this and call ->write_dquot() for it. XFSTEST: https://github.com/dmonakhov/xfstests/commit/440a80d4cbb39e9234df4d7240aee1d551c36107 Link: https://lore.kernel.org/r/20191031103920.3919-2-dmonakhov@openvz.org CC: stable@vger.kernel.org Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Signed-off-by: Jan Kara <jack@suse.cz>
2019-10-31quota: fix livelock in dquot_writeback_dquotsDmitry Monakhov
Write only quotas which are dirty at entry. XFSTEST: https://github.com/dmonakhov/xfstests/commit/b10ad23566a5bf75832a6f500e1236084083cddc Link: https://lore.kernel.org/r/20191031103920.3919-1-dmonakhov@openvz.org CC: stable@vger.kernel.org Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Signed-off-by: Jan Kara <jack@suse.cz>
2019-10-31xfs: properly serialise fallocate against AIO+DIODave Chinner
AIO+DIO can extend the file size on IO completion, and it holds no inode locks while the IO is in flight. Therefore, a race condition exists in file size updates if we do something like this: aio-thread fallocate-thread lock inode submit IO beyond inode->i_size unlock inode ..... lock inode break layouts if (off + len > inode->i_size) new_size = off + len ..... inode_dio_wait() <blocks> ..... completes inode->i_size updated inode_dio_done() .... <wakes> <does stuff no long beyond EOF> if (new_size) xfs_vn_setattr(inode, new_size) Yup, that attempt to extend the file size in the fallocate code turns into a truncate - it removes the whatever the aio write allocated and put to disk, and reduced the inode size back down to where the fallocate operation ends. Fundamentally, xfs_file_fallocate() not compatible with racing AIO+DIO completions, so we need to move the inode_dio_wait() call up to where the lock the inode and break the layouts. Secondly, storing the inode size and then using it unchecked without holding the ILOCK is not safe; we can only do such a thing if we've locked out and drained all IO and other modification operations, which we don't do initially in xfs_file_fallocate. It should be noted that some of the fallocate operations are compound operations - they are made up of multiple manipulations that may zero data, and so we may need to flush and invalidate the file multiple times during an operation. However, we only need to lock out IO and other space manipulation operations once, as that lockout is maintained until the entire fallocate operation has been completed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-31pipe: Use head and tail pointers for the ring, not cursor and lengthDavid Howells
Convert pipes to use head and tail pointers for the buffer ring rather than pointer and length as the latter requires two atomic ops to update (or a combined op) whereas the former only requires one. (1) The head pointer is the point at which production occurs and points to the slot in which the next buffer will be placed. This is equivalent to pipe->curbuf + pipe->nrbufs. The head pointer belongs to the write-side. (2) The tail pointer is the point at which consumption occurs. It points to the next slot to be consumed. This is equivalent to pipe->curbuf. The tail pointer belongs to the read-side. (3) head and tail are allowed to run to UINT_MAX and wrap naturally. They are only masked off when the array is being accessed, e.g.: pipe->bufs[head & mask] This means that it is not necessary to have a dead slot in the ring as head == tail isn't ambiguous. (4) The ring is empty if "head == tail". A helper, pipe_empty(), is provided for this. (5) The occupancy of the ring is "head - tail". A helper, pipe_occupancy(), is provided for this. (6) The number of free slots in the ring is "pipe->ring_size - occupancy". A helper, pipe_space_for_user() is provided to indicate how many slots userspace may use. (7) The ring is full if "head - tail >= pipe->ring_size". A helper, pipe_full(), is provided for this. Signed-off-by: David Howells <dhowells@redhat.com>
2019-10-31ext2: don't set *count in the case of failure in ext2_try_to_allocate()Chengguang Xu
Currently we set *count to num(value 0) in the failure of block allocation in ext2_try_to_allocate(). Without reservation, we reuse *count(value 0) to retry block allocation and wrong *count will cause only allocating maximum 1 block even though having sufficent free blocks in that block group. Finally, it probably cause significant fragmentation. Link: https://lore.kernel.org/r/20191026090721.23794-1-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Jan Kara <jack@suse.cz>
2019-10-31reiserfs: fix extended attributes on the root directoryJeff Mahoney
Since commit d0a5b995a308 (vfs: Add IOP_XATTR inode operations flag) extended attributes haven't worked on the root directory in reiserfs. This is due to reiserfs conditionally setting the sb->s_xattrs handler array depending on whether it located or create the internal privroot directory. It necessarily does this after the root inode is already read in. The IOP_XATTR flag is set during inode initialization, so it never gets set on the root directory. This commit unconditionally assigns sb->s_xattrs and clears IOP_XATTR on internal inodes. The old return values due to the conditional assignment are handled via open_xa_root, which now returns EOPNOTSUPP as the VFS would have done. Link: https://lore.kernel.org/r/20191024143127.17509-1-jeffm@suse.com CC: stable@vger.kernel.org Fixes: d0a5b995a308 ("vfs: Add IOP_XATTR inode operations flag") Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-10-31Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM changes from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - Dynamic tick (nohz) updates, perhaps most notably changes to force the tick on when needed due to lengthy in-kernel execution on CPUs on which RCU is waiting. - Replace rcu_swap_protected() with rcu_prepace_pointer(). - Torture-test updates. - Linux-kernel memory consistency model updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-30io_uring: ensure we clear io_kiocb->result before each issueJens Axboe
We use io_kiocb->result == -EAGAIN as a way to know if we need to re-submit a polled request, as -EAGAIN reporting happens out-of-line for IO submission failures. This field is cleared when we originally allocate the request, but it isn't reset when we retry the submission from async context. This can cause issues where we think something needs a re-issue, but we're really just reading stale data. Reset ->result whenever we re-prep a request for polled submission. Cc: stable@vger.kernel.org Fixes: 9e645e1105ca ("io_uring: add support for sqe links") Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-30fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer()Paul E. McKenney
This commit replaces the use of rcu_swap_protected() with the more intuitively appealing rcu_replace_pointer() as a step towards removing rcu_swap_protected(). Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4gg6Hw@mail.gmail.com/ Reported-by: Linus Torvalds <torvalds@linux-foundation.org> [ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: David Howells <dhowells@redhat.com> Cc: <linux-afs@lists.infradead.org> Cc: <linux-kernel@vger.kernel.org>
2019-10-30io_uring: io_wq_create() returns an error pointer, not NULLJens Axboe
syzbot reported an issue where we crash at setup time if failslab is used. The issue is that io_wq_create() returns an error pointer on failure, not NULL. Hence io_uring thought the io-wq was setup just fine, but in reality it's a garbage error pointer. Use IS_ERR() instead of a NULL check, and assign ret appropriately. Reported-by: syzbot+221cc24572a2fed23b6b@syzkaller.appspotmail.com Fixes: 561fb04a6a22 ("io_uring: replace workqueue usage with io-wq") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-30Merge tag 'gfs2-v5.4-rc5.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: "Fix remounting (broken in -rc1)." * tag 'gfs2-v5.4-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix initialisation of args for remount
2019-10-30gfs2: make gfs2_fs_parameters staticBen Dooks (Codethink)
The gfs2_fs_parameters is not used outside the unit it is declared in, so make it static. Fixes the following sparse warning: fs/gfs2/ops_fstype.c:1331:39: warning: symbol 'gfs2_fs_parameters' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-10-30gfs2: Some whitespace cleanupsAndreas Gruenbacher
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-10-30gfs2: removed unnecessary semicolonAliasgar Surti
There is use of unnecessary semicolon after switch case. Removed the semicolon. Signed-off-by: Aliasgar Surti <aliasgar.surti500@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-10-30gfs2: Fix initialisation of args for remountAndrew Price
When gfs2 was converted to use fs_context, the initialisation of the mount args structure to the currently active args was lost with the removal of gfs2_remount_fs(), so the checks of the new args on remount became checks against the default values instead of the current ones. This caused unexpected remount behaviour and test failures (xfstests generic/294, generic/306 and generic/452). Reinstate the args initialisation, this time in gfs2_init_fs_context() and conditional upon fc->purpose, as that's the only time we get control before the mount args are parsed in the remount process. Fixes: 1f52aa08d12f ("gfs2: Convert gfs2 to fs_context") Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-10-29io_uring: fix race with canceling timeoutsJens Axboe
If we get -1 from hrtimer_try_to_cancel(), we know that the timer is running. Hence leave all completion to the timeout handler. If we don't, we can corrupt the list and miss a completion. Fixes: 11365043e527 ("io_uring: add support for canceling timeout requests") Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com> Tested-by: Hrvoje Zeba <zeba.hrvoje@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29ceph: add missing check in d_revalidate snapdir handlingAl Viro
We should not play with dcache without parent locked... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-29ceph: fix RCU case handling in ceph_d_revalidate()Al Viro
For RCU case ->d_revalidate() is called with rcu_read_lock() and without pinning the dentry passed to it. Which means that it can't rely upon ->d_inode remaining stable; that's the reason for d_inode_rcu(), actually. Make sure we don't reload ->d_inode there. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-29ceph: fix use-after-free in __ceph_remove_cap()Luis Henriques
KASAN reports a use-after-free when running xfstest generic/531, with the following trace: [ 293.903362] kasan_report+0xe/0x20 [ 293.903365] rb_erase+0x1f/0x790 [ 293.903370] __ceph_remove_cap+0x201/0x370 [ 293.903375] __ceph_remove_caps+0x4b/0x70 [ 293.903380] ceph_evict_inode+0x4e/0x360 [ 293.903386] evict+0x169/0x290 [ 293.903390] __dentry_kill+0x16f/0x250 [ 293.903394] dput+0x1c6/0x440 [ 293.903398] __fput+0x184/0x330 [ 293.903404] task_work_run+0xb9/0xe0 [ 293.903410] exit_to_usermode_loop+0xd3/0xe0 [ 293.903413] do_syscall_64+0x1a0/0x1c0 [ 293.903417] entry_SYSCALL_64_after_hwframe+0x44/0xa9 This happens because __ceph_remove_cap() may queue a cap release (__ceph_queue_cap_release) which can be scheduled before that cap is removed from the inode list with rb_erase(&cap->ci_node, &ci->i_caps); And, when this finally happens, the use-after-free will occur. This can be fixed by removing the cap from the inode list before being removed from the session list, and thus eliminating the risk of an UAF. Cc: stable@vger.kernel.org Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-29io_uring: support for larger fixed file setsJens Axboe
There's been a few requests for supporting more fixed files than 1024. This isn't really tricky to do, we just need to split up the file table into multiple tables and index appropriately. As we do so, reduce the max single file table to 512. This enables us to do single page allocs always for the tables, which is an improvement over the situation prior. This patch adds support for up to 64K files, which should be enough for everyone. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29io_uring: protect fixed file indexing with array_index_nospec()Jens Axboe
We index the file tables with a user given value. After we check it's within our limits, use array_index_nospec() to prevent any spectre attacks here. Suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29io_uring: add support for IORING_OP_ACCEPTJens Axboe
This allows an application to call accept4() in an async fashion. Like other opcodes, we first try a non-blocking accept, then punt to async context if we have to. Signed-off-by: Jens Axboe <axboe@kernel.dk>