summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2022-04-25fsnotify: allow adding an inode mark without pinning inodeAmir Goldstein
fsnotify_add_mark() and variants implicitly take a reference on inode when attaching a mark to an inode. Make that behavior opt-out with the mark flag FSNOTIFY_MARK_FLAG_NO_IREF. Instead of taking the inode reference when attaching connector to inode and dropping the inode reference when detaching connector from inode, take the inode reference on attach of the first mark that wants to hold an inode reference and drop the inode reference on detach of the last mark that wants to hold an inode reference. Backends can "upgrade" an existing mark to take an inode reference, but cannot "downgrade" a mark with inode reference to release the refernce. This leaves the choice to the backend whether or not to pin the inode when adding an inode mark. This is intended to be used when adding a mark with ignored mask that is used for optimization in cases where group can afford getting unneeded events and reinstate the mark with ignored mask when inode is accessed again after being evicted. Link: https://lore.kernel.org/r/20220422120327.3459282-12-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25dnotify: use fsnotify group lock helpersAmir Goldstein
Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette") nfsd would close open files in direct reclaim context. There is no guarantee that others memory shrinkers don't do the same and no guarantee that future shrinkers won't do that. For example, if overlayfs implements inode cache of fscache would keep open files to cached objects, inode shrinkers could end up closing open files to underlying fs. Direct reclaim from dnotify mark allocation context may try to close open files that have dnotify marks of the same group and hit a deadlock on mark_mutex. Set the FSNOTIFY_GROUP_NOFS flag to prevent going into direct reclaim from allocations under dnotify group lock and use the safe group lock helpers. Link: https://lore.kernel.org/r/20220422120327.3459282-11-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25nfsd: use fsnotify group lock helpersAmir Goldstein
Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette") nfsd would close open files in direct reclaim context and that could cause a deadlock when fsnotify mark allocation went into direct reclaim and nfsd shrinker tried to free existing fsnotify marks. To avoid issues like this in future code, set the FSNOTIFY_GROUP_NOFS flag on nfsd fsnotify group to prevent going into direct reclaim from fsnotify_add_inode_mark(). Link: https://lore.kernel.org/r/20220422120327.3459282-10-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25inotify: use fsnotify group lock helpersAmir Goldstein
inotify inode marks pin the inode so there is no need to set the FSNOTIFY_GROUP_NOFS flag. Link: https://lore.kernel.org/r/20220422120327.3459282-8-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25fsnotify: create helpers for group mark_mutex lockAmir Goldstein
Create helpers to take and release the group mark_mutex lock. Define a flag FSNOTIFY_GROUP_NOFS in fsnotify_group that determines if the mark_mutex lock is fs reclaim safe or not. If not safe, the lock helpers take the lock and disable direct fs reclaim. In that case we annotate the mutex with a different lockdep class to express to lockdep that an allocation of mark of an fs reclaim safe group may take the group lock of another "NOFS" group to evict inodes. For now, converted only the callers in common code and no backend defines the NOFS flag. It is intended to be set by fanotify for evictable marks support. Link: https://lore.kernel.org/r/20220422120327.3459282-7-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25fsnotify: make allow_dups a property of the groupAmir Goldstein
Instead of passing the allow_dups argument to fsnotify_add_mark() as an argument, define the group flag FSNOTIFY_GROUP_DUPS to express the allow_dups behavior and set this behavior at group creation time for all calls of fsnotify_add_mark(). Rename the allow_dups argument to generic add_flags argument for future use. Link: https://lore.kernel.org/r/20220422120327.3459282-6-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25fsnotify: pass flags argument to fsnotify_alloc_group()Amir Goldstein
Add flags argument to fsnotify_alloc_group(), define and use the flag FSNOTIFY_GROUP_USER in inotify and fanotify instead of the helper fsnotify_alloc_user_group() to indicate user allocation. Although the flag FSNOTIFY_GROUP_USER is currently not used after group allocation, we store the flags argument in the group struct for future use of other group flags. Link: https://lore.kernel.org/r/20220422120327.3459282-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25fsnotify: fix wrong lockdep annotationsAmir Goldstein
Commit 6960b0d909cd ("fsnotify: change locking order") changed some of the mark_mutex locks in direct reclaim path to use: mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING); This change is explained: "...It uses nested locking to avoid deadlock in case we do the final iput() on an inode which still holds marks and thus would take the mutex again when calling fsnotify_inode_delete() in destroy_inode()." The problem is that the mutex_lock_nested() is not a nested lock at all. In fact, it has the opposite effect of preventing lockdep from warning about a very possible deadlock. Due to these wrong annotations, a deadlock that was introduced with nfsd filecache in kernel v5.4 went unnoticed in v5.4.y for over two years until it was reported recently by Khazhismel Kumykov, only to find out that the deadlock was already fixed in kernel v5.5. Fix the wrong lockdep annotations. Cc: Khazhismel Kumykov <khazhy@google.com> Fixes: 6960b0d909cd ("fsnotify: change locking order") Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Link: https://lore.kernel.org/r/20220422120327.3459282-4-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25inotify: move control flags from mask to mark flagsAmir Goldstein
The inotify control flags in the mark mask (e.g. FS_IN_ONE_SHOT) are not relevant to object interest mask, so move them to the mark flags. This frees up some bits in the object interest mask. Link: https://lore.kernel.org/r/20220422120327.3459282-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25inotify: show inotify mask flags in proc fdinfoAmir Goldstein
The inotify mask flags IN_ONESHOT and IN_EXCL_UNLINK are not "internal to kernel" and should be exposed in procfs fdinfo so CRIU can restore them. Fixes: 6933599697c9 ("inotify: hide internal kernel bits from fdinfo") Link: https://lore.kernel.org/r/20220422120327.3459282-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faultsCatalin Marinas
Commit a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl") addressed a lockdep warning by pre-faulting the user pages and attempting the copy_to_user_nofault() in an infinite loop. On architectures like arm64 with MTE, an access may fault within a page at a location different from what fault_in_writeable() probed. Since the sk_offset is rewound to the previous struct btrfs_ioctl_search_header boundary, there is no guaranteed forward progress and search_ioctl() may live-lock. Use fault_in_subpage_writeable() instead of fault_in_writeable() to ensure the permission is checked at the right granularity (smaller than PAGE_SIZE). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220423100751.1870771-4-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-25ceph: fix possible NULL pointer dereference for req->r_sessionXiubo Li
The request will be inserted into the ci->i_unsafe_dirops before assigning the req->r_session, so it's possible that we will hit NULL pointer dereference bug here. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/55327 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-04-25ceph: remove incorrect session state checkXiubo Li
Once the session is opened the s->s_ttl will be set, and when receiving a new mdsmap and the MDS map is changed, it will be possibly will close some sessions and open new ones. And then some sessions will be in CLOSING state evening without unmounting. URL: https://tracker.ceph.com/issues/54979 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-04-25ceph: get snap_rwsem read lock in handle_cap_export for ceph_add_capNiels Dossche
ceph_add_cap says in its function documentation that the caller should hold the read lock on the session snap_rwsem. Furthermore, not only ceph_add_cap needs that lock, when it calls to ceph_lookup_snap_realm it eventually calls ceph_get_snap_realm which states via lockdep that snap_rwsem needs to be held. handle_cap_export calls ceph_add_cap without that mdsc->snap_rwsem held. Thus, since ceph_get_snap_realm and ceph_add_cap both need the lock, the common place to acquire that lock is inside handle_cap_export. Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-04-24io_uring: cleanup error-handling around io_req_completeKanchan Joshi
Move common error-handling to io_req_complete, so that various callers avoid repeating that. Few callers (io_tee, io_splice) require slightly different handling. These are changed to use __io_req_complete instead. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220422101048.419942-1-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: add socket(2) supportJens Axboe
Supports both regular socket(2) where a normal file descriptor is instantiated when called, or direct descriptors. Link: https://lore.kernel.org/r/20220412202240.234207-3-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: add fgetxattr and getxattr supportStefan Roesch
This adds support to io_uring for the fgetxattr and getxattr API. Signed-off-by: Stefan Roesch <shr@fb.com> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20220323154420.3301504-5-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: add fsetxattr and setxattr supportStefan Roesch
This adds support to io_uring for the fsetxattr and setxattr API. Signed-off-by: Stefan Roesch <shr@fb.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20220323154420.3301504-4-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24fs: split off do_getxattr from getxattrStefan Roesch
This splits off do_getxattr function from the getxattr function. This will allow io_uring to call it from its io worker. Signed-off-by: Stefan Roesch <shr@fb.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20220323154420.3301504-3-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24fs: split off setxattr_copy and do_setxattr function from setxattrStefan Roesch
This splits of the setup part of the function setxattr in its own dedicated function called setxattr_copy. In addition it also exposes a new function called do_setxattr for making the setxattr call. This makes it possible to call these two functions from io_uring in the processing of an xattr request. Signed-off-by: Stefan Roesch <shr@fb.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20220323154420.3301504-2-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: return an error when cqe is droppedDylan Yudaken
Right now io_uring will not actively inform userspace if a CQE is dropped. This is extremely rare, requiring a CQ ring overflow, as well as a GFP_ATOMIC kmalloc failure. However the consequences could cause for example applications to go into an undefined state, possibly waiting for a CQE that never arrives. Return an error code (EBADR) in these cases. Since this is expected to be incredibly rare, try and avoid as much as possible affecting the hot code paths, and so it only is returned lazily and when there is no other available CQEs. Once the error is returned, reset the error condition assuming the user is either ok with it or will clean up appropriately. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220421091345.2115755-6-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: use constants for cq_overflow bitfieldDylan Yudaken
Prepare to use this bitfield for more flags by using constants instead of magic value 0 Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220421091345.2115755-5-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: rework io_uring_enter to simplify return valueDylan Yudaken
io_uring_enter returns the count submitted preferrably over an error code. In some code paths this check is not required, so reorganise the code so that the check is only done as needed. This is also a prep for returning error codes only in waiting scenarios. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220421091345.2115755-4-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: trace cqe overflowsDylan Yudaken
Trace cqe overflows in io_uring. Print ocqe before the check, so if it is NULL it indicates that it has been dropped. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220421091345.2115755-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: allow re-poll if we made progressJens Axboe
We currently check REQ_F_POLLED before arming async poll for a notification to retry. If it's set, then we don't allow poll and will punt to io-wq instead. This is done to prevent a situation where a buggy driver will repeatedly return that there's space/data available yet we get -EAGAIN. However, if we already transferred data, then it should be safe to rely on poll again. Gate the check on whether or not REQ_F_PARTIAL_IO is also set. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: support MSG_WAITALL for IORING_OP_SEND(MSG)Jens Axboe
Like commit 7ba89d2af17a for recv/recvmsg, support MSG_WAITALL for the send side. If this flag is set and we do a short send, retry for a stream of seqpacket socket. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: add support for IORING_ASYNC_CANCEL_ANYJens Axboe
Rather than match on a specific key, be it user_data or file, allow canceling any request that we can lookup. Works like IORING_ASYNC_CANCEL_ALL in that it cancels multiple requests, but it doesn't key off user_data or the file. Can't be set with IORING_ASYNC_CANCEL_FD, as that's a key selector. Only one may be used at the time. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220418164402.75259-6-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: allow IORING_OP_ASYNC_CANCEL with 'fd' keyJens Axboe
Currently sqe->addr must contain the user_data of the request being canceled. Introduce the IORING_ASYNC_CANCEL_FD flag, which tells the kernel that we're keying off the file fd instead for cancelation. This allows canceling any request that a) uses a file, and b) was assigned the file based on the value being passed in. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220418164402.75259-5-axboe@kernel.dk
2022-04-24io_uring: add support for IORING_ASYNC_CANCEL_ALLJens Axboe
The current cancelation will lookup and cancel the first request it finds based on the key passed in. Add a flag that allows to cancel any request that matches they key. It completes with the number of requests found and canceled, or res < 0 if an error occured. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220418164402.75259-4-axboe@kernel.dk
2022-04-24io_uring: pass in struct io_cancel_data consistentlyJens Axboe
In preparation for being able to not only key cancel off the user_data, pass in the io_cancel_data struct for the various functions that deal with request cancelation. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220418164402.75259-3-axboe@kernel.dk
2022-04-24io_uring: remove dead 'poll_only' argument to io_poll_cancel()Jens Axboe
It's only called from one location, and it always passes in 'false'. Kill the argument, and just pass in 'false' to io_poll_find(). Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220418164402.75259-2-axboe@kernel.dk
2022-04-24io_uring: refactor io_disarm_next() lockingPavel Begunkov
Split timeout handling into removal + failing, so we can reduce spinlocking time and remove another instance of triple nested locking. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0f00d115f9d4c5749028f19623708ad3695512d6.1650458197.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: move timeout locking in io_timeout_cancel()Pavel Begunkov
Move ->timeout_lock grabbing inside of io_timeout_cancel(), so we can do io_req_task_queue_fail() outside of the lock. It's much nicer than relying on triple nested locking. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cde758c2897930d31e205ed8f476d4ec879a8849.1650458197.git.asml.silence@gmail.com [axboe: drop now wrong timeout_lock annotation] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: store SCM state in io_fixed_file->file_ptrJens Axboe
A previous commit removed SCM accounting for non-unix sockets, as those are the only ones that can cause a fixed file reference. While that is true, it also means we're now dereferencing the file as part of the workqueue driven __io_sqe_files_unregister() after the process has exited. This isn't safe for SCM files, as unix gc may have already reaped them when the process exited. KASAN complains about this: [ 12.307040] Freed by task 0: [ 12.307592] kasan_save_stack+0x28/0x4c [ 12.308318] kasan_set_track+0x28/0x38 [ 12.309049] kasan_set_free_info+0x24/0x44 [ 12.309890] ____kasan_slab_free+0x108/0x11c [ 12.310739] __kasan_slab_free+0x14/0x1c [ 12.311482] slab_free_freelist_hook+0xd4/0x164 [ 12.312382] kmem_cache_free+0x100/0x1dc [ 12.313178] file_free_rcu+0x58/0x74 [ 12.313864] rcu_core+0x59c/0x7c0 [ 12.314675] rcu_core_si+0xc/0x14 [ 12.315496] _stext+0x30c/0x414 [ 12.316287] [ 12.316687] Last potentially related work creation: [ 12.317885] kasan_save_stack+0x28/0x4c [ 12.318845] __kasan_record_aux_stack+0x9c/0xb0 [ 12.319976] kasan_record_aux_stack_noalloc+0x10/0x18 [ 12.321268] call_rcu+0x50/0x35c [ 12.322082] __fput+0x2fc/0x324 [ 12.322873] ____fput+0xc/0x14 [ 12.323644] task_work_run+0xac/0x10c [ 12.324561] do_notify_resume+0x37c/0xe74 [ 12.325420] el0_svc+0x5c/0x68 [ 12.326050] el0t_64_sync_handler+0xb0/0x12c [ 12.326918] el0t_64_sync+0x164/0x168 [ 12.327657] [ 12.327976] Second to last potentially related work creation: [ 12.329134] kasan_save_stack+0x28/0x4c [ 12.329864] __kasan_record_aux_stack+0x9c/0xb0 [ 12.330735] kasan_record_aux_stack+0x10/0x18 [ 12.331576] task_work_add+0x34/0xf0 [ 12.332284] fput_many+0x11c/0x134 [ 12.332960] fput+0x10/0x94 [ 12.333524] __scm_destroy+0x80/0x84 [ 12.334213] unix_destruct_scm+0xc4/0x144 [ 12.334948] skb_release_head_state+0x5c/0x6c [ 12.335696] skb_release_all+0x14/0x38 [ 12.336339] __kfree_skb+0x14/0x28 [ 12.336928] kfree_skb_reason+0xf4/0x108 [ 12.337604] unix_gc+0x1e8/0x42c [ 12.338154] unix_release_sock+0x25c/0x2dc [ 12.338895] unix_release+0x58/0x78 [ 12.339531] __sock_release+0x68/0xec [ 12.340170] sock_close+0x14/0x20 [ 12.340729] __fput+0x18c/0x324 [ 12.341254] ____fput+0xc/0x14 [ 12.341763] task_work_run+0xac/0x10c [ 12.342367] do_notify_resume+0x37c/0xe74 [ 12.343086] el0_svc+0x5c/0x68 [ 12.343510] el0t_64_sync_handler+0xb0/0x12c [ 12.344086] el0t_64_sync+0x164/0x168 We have an extra bit we can use in file_ptr on 64-bit, use that to store whether this file is SCM'ed or not, avoiding the need to look at the file contents itself. This does mean that 32-bit will be stuck with SCM for all registered files, just like 64-bit did before the referenced commit. Fixes: 1f59bc0f18cf ("io_uring: don't scm-account for non af_unix sockets") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: kill ctx arg from io_req_put_rsrcPavel Begunkov
The ctx argument of io_req_put_rsrc() is not used, kill it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bb51bf3ff02775b03e6ea21bc79c25d7870d1644.1650311386.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: add a helper for putting rsrc nodesPavel Begunkov
Add a simple helper to encapsulating dropping rsrc nodes references, it's cleaner and will help if we'd change rsrc refcounting or play with percpu_ref_put() [no]inlining. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/63fdd953ac75898734cd50e8f69e95e6664f46fe.1650311386.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: store rsrc node in req instead of refsPavel Begunkov
req->fixed_rsrc_refs keeps a pointer to rsrc node pcpu references, but it's more natural just to store rsrc node directly. There were some reasons for that in the past but not anymore. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cee1c86ec9023f3e4f6ce8940d58c017ef8782f4.1650311386.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: refactor io_assign_file error pathPavel Begunkov
All io_assign_file() callers do error handling themselves, req_set_fail() in the io_assign_file()'s fail path needlessly bloats the kernel and is not the best abstraction to have. Simplify the error path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/eff77fb1eac2b6a90cca5223813e6a396ffedec0.1650311386.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: use right helpers for file assign lockingPavel Begunkov
We have io_ring_submit_[un]lock() functions helping us with conditional ->uring_lock locking, use them in io_file_get_fixed() instead of hand coding. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c9c9ff1e046f6eb68da0a251962a697f8a2275fa.1650311386.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: add data_race annotationsPavel Begunkov
We have several racy reads, mark them with data_race() to demonstrate this fact. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7e56e750d294c70b2a56938bd733386f19f0eb53.1650056133.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: inline io_req_complete_fail_submit()Pavel Begunkov
Inline io_req_complete_fail_submit(), there is only one caller and the name doesn't tell us much. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fe5851af01dcd39fc84b71b8539c7cbe4658fb6d.1650056133.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: refactor io_submit_sqe()Pavel Begunkov
Remove one extra if for non-linked path of io_submit_sqe(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/03183199d1bf494b4a72eca16d792c8a5945acb4.1650056133.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: refactor lazy link failPavel Begunkov
Remove the lazy link fail logic from io_submit_sqe() and hide it into a helper. It simplifies the code and will be needed in next patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6a68aca9cf4492132da1d7c8a09068b74aba3c65.1650056133.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: introduce IO_REQ_LINK_FLAGSPavel Begunkov
Add a macro for all link request flags to avoid duplication. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/df38b883e31e7e0ca4e364d25a0743862961b180.1650056133.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: refactor io_queue_sqe()Pavel Begunkov
io_queue_sqe() is a part of the submission path and we try hard to keep it inlined, so shed some extra bytes from it by moving the error checking part into io_queue_sqe_arm_apoll() and renaming it accordingly. note: io_queue_sqe_arm_apoll() is not inlined, thus the patch doesn't change the number of function calls for the apoll path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9b79edd246336decfaca79b949a15ac69123490d.1650056133.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: rename io_queue_async_work()Pavel Begunkov
Rename io_queue_async_work(). The name is pretty old but now doesn't reflect well what the function is doing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5d4b25c54cccf084f9f2fd63bd4e4fa4515e998e.1650056133.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: inline io_queue_sqe()Pavel Begunkov
Inline io_queue_sqe() as there is only one caller left, and rename __io_queue_sqe(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d5742683b7a7caceb1c054e91e5b9135b0f3b858.1650056133.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: helper for prep+queuing linked timeoutsPavel Begunkov
We try to aggresively inline the submission path, so it's a good idea to not pollute it with colder code. One of them is linked timeout preparation + queue, which can be extracted into a function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ecf74df7ac77389b6d9211211ec4954e91de98ba.1650056133.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: inline io_free_req()Pavel Begunkov
Inline io_free_req() into its only user and remove an underscore prefix from __io_free_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ed114edef5c256a644f4839bb372df70d8df8e3f.1650056133.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24io_uring: kill io_put_req_deferred()Pavel Begunkov
We have several spots where a call to io_fill_cqe_req() is immediately followed by io_put_req_deferred(). Replace them with __io_req_complete_post() and get rid of io_put_req_deferred() and io_fill_cqe_req(). > size ./fs/io_uring.o text data bss dec hex filename 86942 13734 8 100684 1894c ./fs/io_uring.o > size ./fs/io_uring.o text data bss dec hex filename 86438 13654 8 100100 18704 ./fs/io_uring.o Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/10672a538774ac8986bee6468d960527af59169d.1650056133.git.asml.silence@gmail.com [axboe: fold in followup fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>