summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-24io_uring: move non aligned field to the endPavel Begunkov
Move not cache aligned fields down in io_ring_ctx, should change anything, but makes further refactoring easier. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/518e95d7888e9d481b2c5968dcf3f23db9ea47a5.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: add option to remove SQ indirectionPavel Begunkov
Not many aware, but io_uring submission queue has two levels. The first level usually appears as sq_array and stores indexes into the actual SQ. To my knowledge, no one has ever seriously used it, nor liburing exposes it to users. Add IORING_SETUP_NO_SQARRAY, when set we don't bother creating and using the sq_array and SQ heads/tails will be pointing directly into the SQ. Improves memory footprint, in term of both allocations as well as cache usage, and also should make io_get_sqe() less branchy in the end. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0ffa3268a5ef61d326201ff43a233315c96312e0.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: compact SQ/CQ heads/tailsPavel Begunkov
Queues heads and tails cache line aligned. That makes sq, cq taking 4 lines or 5 lines if we include the rest of struct io_rings (e.g. sq_flags is frequently accessed). Since modern io_uring is mostly single threaded, it doesn't make much send to spread them as such, it wastes space and puts additional pressure on caches. Put them all into a single line. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9c8deddf9a7ed32069235a530d1e117fb460bc4c.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: force inline io_fill_cqe_reqPavel Begunkov
There are only 2 callers of io_fill_cqe_req left, and one of them is extremely hot. Force inline the function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ffce4fc5e3521966def848a4d930586dfe33ae11.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: merge iopoll and normal completion pathsPavel Begunkov
io_do_iopoll() and io_submit_flush_completions() are pretty similar, both filling CQEs and then free a list of requests. Don't duplicate it and make iopoll use __io_submit_flush_completions(), which also helps with inlining and other optimisations. For that, we need to first find all completed iopoll requests and splice them from the iopoll list and then pass it down. This adds one extra list traversal, which should be fine as requests will stay hot in cache. CQ locking is already conditional, introduce ->lockless_cq and skip locking for IOPOLL as it's protected by ->uring_lock. We also add a wakeup optimisation for IOPOLL to __io_cq_unlock_post(), so it works just like io_cqring_ev_posted_iopoll(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3840473f5e8a960de35b77292026691880f6bdbc.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: reorder cqring_flush and wakeupsPavel Begunkov
Unlike in the past, io_commit_cqring_flush() doesn't do anything that may need io_cqring_wake() to be issued after, all requests it completes will go via task_work. Do io_commit_cqring_flush() after io_cqring_wake() to clean up __io_cq_unlock_post(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ed32dcfeec47e6c97bd6b18c152ddce5b218403f.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: optimise extra io_get_cqe null checkPavel Begunkov
If the cached cqe check passes in io_get_cqe*() it already means that the cqe we return is valid and non-zero, however the compiler is unable to optimise null checks like in io_fill_cqe_req(). Do a bit of trickery, return success/fail boolean from io_get_cqe*() and store cqe in the cqe parameter. That makes it do the right thing, erasing the check together with the introduced indirection. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/322ea4d3377d3d4efd8ae90ab8ed28a99f518210.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: refactor __io_get_cqe()Pavel Begunkov
Make __io_get_cqe simpler by not grabbing the cqe from refilled cached, but letting io_get_cqe() do it for us. That's cleaner and removes some duplication. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/74dc8fdf2657e438b2e05e1d478a3596924604e9.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: simplify big_cqe handlingPavel Begunkov
Don't keep big_cqe bits of req in a union with hash_node, find a separate space for it. It's bit safer, but also if we keep it always initialised, we can get rid of ugly REQ_F_CQE32_INIT handling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/447aa1b2968978c99e655ba88db536e903df0fe9.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: cqe init hardeningPavel Begunkov
io_kiocb::cqe stores the completion info which we'll memcpy to userspace, and we rely on callbacks and other later steps to populate it with right values. We have never had problems with that, but it would still be safer to zero it on allocation. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b16a3b64dde678686460d3c3792c3ba6d3d1bc7a.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: improve cqe !tracing hot pathPavel Begunkov
While looking at io_fill_cqe_req()'s asm I stumbled on our trace points turning into the chunk below: trace_io_uring_complete(req->ctx, req, req->cqe.user_data, req->cqe.res, req->cqe.flags, req->extra1, req->extra2); io_uring/io_uring.c:898: trace_io_uring_complete(req->ctx, req, req->cqe.user_data, movq 232(%rbx), %rdi # req_44(D)->big_cqe.extra2, _5 movq 224(%rbx), %rdx # req_44(D)->big_cqe.extra1, _6 movl 84(%rbx), %r9d # req_44(D)->cqe.D.81184.flags, _7 movl 80(%rbx), %r8d # req_44(D)->cqe.res, _8 movq 72(%rbx), %rcx # req_44(D)->cqe.user_data, _9 movq 88(%rbx), %rsi # req_44(D)->ctx, _10 ./arch/x86/include/asm/jump_label.h:27: asm_volatile_goto("1:" 1:jmp .L1772 # objtool NOPs this # ... It does a jump_label for actual tracing, but those 6 moves will stay there in the hottest io_uring path. As an optimisation, add a trace_io_uring_complete_enabled() check, which is also uses jump_labels, it tricks the compiler into behaving. It removes the junk without changing anything else int the hot path. Note: apparently, it's not only me noticing it, and people are also working it around. We should remove the check when it's solved generically or rework tracing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/555d8312644b3776f4be7e23f9b92943875c4bc7.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-17io_uring/rsrc: Annotate struct io_mapped_ubuf with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct io_mapped_ubuf. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: io-uring@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230817212146.never.853-kees@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-16io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is usedJens Axboe
If we setup the ring with SQPOLL, then that polling thread has its own io-wq setup. This means that if the application uses IORING_REGISTER_IOWQ_AFF to set the io-wq affinity, we should not be setting it for the invoking task, but rather the sqpoll task. Add an sqpoll helper that parks the thread and updates the affinity, and use that one if we're using SQPOLL. Fixes: fe76421d1da1 ("io_uring: allow user configurable IO thread CPU affinity") Cc: stable@vger.kernel.org # 5.10+ Link: https://github.com/axboe/liburing/discussions/884 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11io_uring: simplify io_run_task_work_sig returnPavel Begunkov
Nobody cares about io_run_task_work_sig returning 1, we only check for negative errors. Simplify by keeping to 0/-error returns. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3aec8a532c003d6e50739b969a82989402696170.1691757663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11io_uring/rsrc: keep one global dummy_ubufPavel Begunkov
We set empty registered buffers to dummy_ubuf as an optimisation. Currently, we allocate the dummy entry for each ring, whenever we can simply have one global instance. We're casting out const on assignment, it's fine as we're not going to change the content of the dummy, the constness gives us an extra layer of protection if sth ever goes wrong. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e4a96dda35ab755914bc43f6781bba0df97ac489.1691757663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11io_uring: never overflow io_aux_cqePavel Begunkov
Now all callers of io_aux_cqe() set allow_overflow to false, remove the parameter and not allow overflowing auxilary multishot cqes. When CQ is full the function callers and all multishot requests in general are expected to complete the request. That prevents indefinite in-background grows of the overflow list and let's the userspace to handle the backlog at its own pace. Resubmitting a request should also be faster than accounting a bunch of overflows, so it should be better for perf when it happens, but a well behaving userspace should be trying to avoid overflows in any case. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bb20d14d708ea174721e58bb53786b0521e4dd6d.1691757663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11io_uring: remove return from io_req_cqe_overflow()Pavel Begunkov
Nobody checks io_req_cqe_overflow()'s return, make it return void. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8f2029ad0c22f73451664172d834372608ee0a77.1691757663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11io_uring: open code io_fill_cqe_req()Pavel Begunkov
io_fill_cqe_req() is only called from one place, open code it, and rename __io_fill_cqe_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f432ce75bb1c94cadf0bd2add4d6aa510bd1fb36.1691757663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11io_uring/net: don't overflow multishot recvPavel Begunkov
Don't allow overflowing multishot recv CQEs, it might get out of hand, hurt performance, and in the worst case scenario OOM the task. Cc: stable@vger.kernel.org Fixes: b3fdea6ecb55c ("io_uring: multishot recv") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0b295634e8f1b71aa764c984608c22d85f88f75c.1691757663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11io_uring/net: don't overflow multishot acceptPavel Begunkov
Don't allow overflowing multishot accept CQEs, we want to limit the grows of the overflow list. Cc: stable@vger.kernel.org Fixes: 4e86a2c980137 ("io_uring: implement multishot mode for accept") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7d0d749649244873772623dd7747966f516fe6e2.1691757663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11io_uring/io-wq: don't gate worker wake up success on wake_up_process()Jens Axboe
All we really care about is finding a free worker. If said worker is already running, it's either starting new work already or it's just finishing up existing work. For the latter, we'll be finding this work item next anyway, and for the former, if the worker does go to sleep, it'll create a new worker anyway as we have pending items. This reduces try_to_wake_up() overhead considerably: 23.16% -10.46% [kernel.kallsyms] [k] try_to_wake_up Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11io_uring/io-wq: reduce frequency of acct->lock acquisitionsJens Axboe
When we check if we have work to run, we grab the acct lock, check, drop it, and then return the result. If we do have work to run, then running the work will again grab acct->lock and get the work item. This causes us to grab acct->lock more frequently than we need to. If we have work to do, have io_acct_run_queue() return with the acct lock still acquired. io_worker_handle_work() is then always invoked with the acct lock already held. In a simple test cases that stats files (IORING_OP_STATX always hits io-wq), we see a nice reduction in locking overhead with this change: 19.32% -12.55% [kernel.kallsyms] [k] __cmpwait_case_32 20.90% -12.07% [kernel.kallsyms] [k] queued_spin_lock_slowpath Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11io_uring/io-wq: don't grab wq->lock for worker activationJens Axboe
The worker free list is RCU protected, and checks for workers going away when iterating it. There's no need to hold the wq->lock around the lookup. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-10io_uring: remove unnecessary forward declarationJens Axboe
We never use io_move_task_work_from_local() before it's defined in the file anyway, so kill the forward declaration. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-10io_uring: have io_file_put() take an io_kiocb rather than the fileJens Axboe
No functional changes in this patch, just a prep patch for needing the request in io_file_put(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-10io_uring/splice: use fput() directlyJens Axboe
No point in using io_file_put() here, as we need to check if it's a fixed file in the caller anyway. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-10io_uring/fdinfo: get rid of ref trygetJens Axboe
The caller holds a reference to the ring itself, so by definition the ring cannot go away. There's no need to play games with tryget for the reference, as we don't need an extra reference at all. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09io_uring: cleanup 'ret' handling in io_iopoll_check()Jens Axboe
We return 0 for success, or -error when there's an error. Move the 'ret' variable into the loop where we are actually using it, to make it clearer that we don't carry this variable forward for return outside of the loop. While at it, also move the need_resched() break condition out of the while check itself, keeping it with the signal pending check. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09io_uring: break iopolling on signalPavel Begunkov
Don't keep spinning iopoll with a signal set. It'll eventually return back, e.g. by virtue of need_resched(), but it's not a nice user experience. Cc: stable@vger.kernel.org Fixes: def596e9557c9 ("io_uring: support for IO polling") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/eeba551e82cad12af30c3220125eb6cb244cc94c.1691594339.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09io_uring: kill io_uring userspace examplesPavel Begunkov
There are tons of io_uring tests and examples in liburing and on the Internet. If you're looking for a benchmark, io_uring-bench.c is just an acutely outdated version of fio/io_uring. And for basic condensed init template for likes of selftests take a peek at io_uring_zerocopy_tx.c. Kill tools/io_uring/, it's a burden keeping it here. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7c740701d3b475dcad8c92602a551044f72176b4.1691543666.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09io_uring: fix false positive KASAN warningsPavel Begunkov
io_req_local_work_add() peeks into the work list, which can be executed in the meanwhile. It's completely fine without KASAN as we're in an RCU read section and it's SLAB_TYPESAFE_BY_RCU. With KASAN though it may trigger a false positive warning because internal io_uring caches are sanitised. Remove sanitisation from the io_uring request cache for now. Cc: stable@vger.kernel.org Fixes: 8751d15426a31 ("io_uring: reduce scheduling due to tw") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c6fbf7a82a341e66a0007c76eefd9d57f2d3ba51.1691541473.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09io_uring: fix drain stalls by invalid SQEPavel Begunkov
cq_extra is protected by ->completion_lock, which io_get_sqe() misses. The bug is harmless as it doesn't happen in real life, requires invalid SQ index array and racing with submission, and only messes up the userspace, i.e. stall requests execution but will be cleaned up on ring destruction. Fixes: 15641e427070f ("io_uring: don't cache number of dropped SQEs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/66096d54651b1a60534bb2023f2947f09f50ef73.1691538547.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09io_uring/rsrc: Remove unused declaration io_rsrc_put_tw()Yue Haibing
Commit 36b9818a5a84 ("io_uring/rsrc: don't offload node free") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230808151058.4572-1-yuehaibing@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09io_uring: annotate the struct io_kiocb slab for appropriate user copyJens Axboe
When compiling the kernel with clang and having HARDENED_USERCOPY enabled, the liburing openat2.t test case fails during request setup: usercopy: Kernel memory overwrite attempt detected to SLUB object 'io_kiocb' (offset 24, size 24)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 3 PID: 413 Comm: openat2.t Tainted: G N 6.4.3-g6995e2de6891-dirty #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014 RIP: 0010:usercopy_abort+0x84/0x90 Code: ce 49 89 ce 48 c7 c3 68 48 98 82 48 0f 44 de 48 c7 c7 56 c6 94 82 4c 89 de 48 89 c1 41 52 41 56 53 e8 e0 51 c5 00 48 83 c4 18 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 41 57 41 56 RSP: 0018:ffffc900016b3da0 EFLAGS: 00010296 RAX: 0000000000000062 RBX: ffffffff82984868 RCX: 4e9b661ac6275b00 RDX: ffff8881b90ec580 RSI: ffffffff82949a64 RDI: 00000000ffffffff RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000 R10: ffffc900016b3c88 R11: ffffc900016b3c30 R12: 00007ffe549659e0 R13: ffff888119014000 R14: 0000000000000018 R15: 0000000000000018 FS: 00007f862e3ca680(0000) GS:ffff8881b90c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005571483542a8 CR3: 0000000118c11000 CR4: 00000000003506e0 Call Trace: <TASK> ? __die_body+0x63/0xb0 ? die+0x9d/0xc0 ? do_trap+0xa7/0x180 ? usercopy_abort+0x84/0x90 ? do_error_trap+0xc6/0x110 ? usercopy_abort+0x84/0x90 ? handle_invalid_op+0x2c/0x40 ? usercopy_abort+0x84/0x90 ? exc_invalid_op+0x2f/0x40 ? asm_exc_invalid_op+0x16/0x20 ? usercopy_abort+0x84/0x90 __check_heap_object+0xe2/0x110 __check_object_size+0x142/0x3d0 io_openat2_prep+0x68/0x140 io_submit_sqes+0x28a/0x680 __se_sys_io_uring_enter+0x120/0x580 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x55714834de26 Code: ca 01 0f b6 82 d0 00 00 00 8b ba cc 00 00 00 45 31 c0 31 d2 41 b9 08 00 00 00 83 e0 01 c1 e0 04 41 09 c2 b8 aa 01 00 00 0f 05 <c3> 66 0f 1f 84 00 00 00 00 00 89 30 eb 89 0f 1f 40 00 8b 00 a8 06 RSP: 002b:00007ffe549659c8 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa RAX: ffffffffffffffda RBX: 00007ffe54965a50 RCX: 000055714834de26 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000008 R10: 0000000000000000 R11: 0000000000000246 R12: 000055714834f057 R13: 00007ffe54965a50 R14: 0000000000000001 R15: 0000557148351dd8 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- when it tries to copy struct open_how from userspace into the per-command space in the io_kiocb. There's nothing wrong with the copy, but we're missing the appropriate annotations for allowing user copies to/from the io_kiocb slab. Allow copies in the per-command area, which is from the 'file' pointer to when 'opcode' starts. We do have existing user copies there, but they are not all annotated like the one that openat2_prep() uses, copy_struct_from_user(). But in practice opcodes should be allowed to copy data into their per-command area in the io_kiocb. Reported-by: Breno Leitao <leitao@debian.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09io_uring: Add io_uring command support for socketsBreno Leitao
Enable io_uring commands on network sockets. Create two new SOCKET_URING_OP commands that will operate on sockets. In order to call ioctl on sockets, use the file_operations->io_uring_cmd callbacks, and map it to a uring socket function, which handles the SOCKET_URING_OP accordingly, and calls socket ioctls. This patches was tested by creating a new test case in liburing. Link: https://github.com/leitao/liburing/tree/io_uring_cmd Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230627134424.2784797-1-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-17io_uring/cancel: wire up IORING_ASYNC_CANCEL_OP for sync cancelJens Axboe
Allow usage of IORING_ASYNC_CANCEL_OP through the sync cancelation API as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-17io_uring/cancel: support opcode based lookup and cancelationJens Axboe
Add IORING_ASYNC_CANCEL_OP flag for cancelation, which allows the application to target cancelation based on the opcode of the original request. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-17io_uring/cancel: add IORING_ASYNC_CANCEL_USERDATAJens Axboe
Add a flag to explicitly match on user_data in the request for cancelation purposes. This is the default behavior if none of the other match flags are set, but if we ALSO want to match on user_data, then this flag can be set. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-17io_uring: use cancelation match helper for poll and timeout requestsJens Axboe
Get rid of the request vs io_cancel_data checking and just use the exported helper for this. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-17io_uring/cancel: fix sequence matching for IORING_ASYNC_CANCEL_ANYJens Axboe
We always need to check/update the cancel sequence if IORING_ASYNC_CANCEL_ALL is set. Also kill the redundant check for IORING_ASYNC_CANCEL_ANY at the end, if we get here we know it's not set as we would've matched it higher up. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-17io_uring/cancel: abstract out request match helperJens Axboe
We have different match code in a variety of spots. Start the cleanup of this by abstracting out a helper that can be used to check if a given request matches the cancelation criteria outlined in io_cancel_data. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-17io_uring/timeout: always set 'ctx' in io_cancel_dataJens Axboe
In preparation for using a generic handler to match requests for cancelation purposes, ensure that ctx is set in io_cancel_data. The timeout handlers don't check for this as it'll always match, but we'll need it set going forward. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-17io_uring/poll: always set 'ctx' in io_cancel_dataJens Axboe
This isn't strictly necessary for this callsite, as it uses it's internal lookup for this cancelation purpose. But let's be consistent with how it's used in general and set ctx as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-16Linux 6.5-rc2v6.5-rc2Linus Torvalds
2023-07-16Merge tag 'xtensa-20230716' of https://github.com/jcmvbkbc/linux-xtensaLinus Torvalds
Pull xtensa fixes from Max Filippov: - fix interaction between unaligned exception handler and load/store exception handler - fix parsing ISS network interface specification string - add comment about etherdev freeing to ISS network driver * tag 'xtensa-20230716' of https://github.com/jcmvbkbc/linux-xtensa: xtensa: fix unaligned and load/store configuration interaction xtensa: ISS: fix call to split_if_spec xtensa: ISS: add comment about etherdev freeing
2023-07-16Merge tag 'perf_urgent_for_v6.5_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Fix a lockdep warning when the event given is the first one, no event group exists yet but the code still goes and iterates over event siblings * tag 'perf_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix lockdep warning in for_each_sibling_event() on SPR
2023-07-16Merge tag 'objtool_urgent_for_v6.5_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Borislav Petkov: - Mark copy_iovec_from_user() __noclone in order to prevent gcc from doing an inter-procedural optimization and confuse objtool - Initialize struct elf fully to avoid build failures * tag 'objtool_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: iov_iter: Mark copy_iovec_from_user() noclone objtool: initialize all of struct elf
2023-07-16Merge tag 'sched_urgent_for_v6.5_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Remove a cgroup from under a polling process properly - Fix the idle sibling selection * tag 'sched_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/psi: use kernfs polling functions for PSI trigger polling sched/fair: Use recent_used_cpu to test p->cpus_ptr
2023-07-16Merge tag 'pinctrl-v6.5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "I'm mostly on vacation but what would vacation be without a few critical fixes so people can use their gaming laptops when hiding away from the sun (or rain)? - Fix a really annoying interrupt storm in the AMD driver affecting Asus TUF gaming notebooks - Fix device tree parsing in the Renesas driver" * tag 'pinctrl-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: amd: Unify debounce handling into amd_pinconf_set() pinctrl: amd: Drop pull up select configuration pinctrl: amd: Use amd_pinconf_set() for all config options pinctrl: amd: Only use special debounce behavior for GPIO 0 pinctrl: renesas: rzg2l: Handle non-unique subnode names pinctrl: renesas: rzv2m: Handle non-unique subnode names
2023-07-16Merge tag '6.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Two reconnect fixes: important fix to address inFlight count to leak (which can leak credits), and fix for better handling a deleted share - DFS fix - SMB1 cleanup fix - deferred close fix * tag '6.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix mid leak during reconnection after timeout threshold cifs: is_network_name_deleted should return a bool smb: client: fix missed ses refcounting smb: client: Fix -Wstringop-overflow issues cifs: if deferred close is disabled then close files immediately