summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-01-20fs/adfs: inode: fix adfs_mode2atts()Russell King
Fix adfs_mode2atts() to actually update the file permissions on the media rather than using the current inode mode. Note also that directories do not have read/write permissions stored on the media. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-20fs/adfs: inode: update timestamps to centisecond precisionRussell King
Despite ADFS timestamps having centi-second granularity, and Linux gaining fine-grained timestamp support in v2.5.48, fs/adfs was never updated. Update fs/adfs to centi-second support, and ensure that the inode ctime always reflects what is written in underlying media. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-20io_uring: use labeled array init in io_op_defsPavel Begunkov
Don't rely on implicit ordering of IORING_OP_ and explicitly place them at a right place in io_op_defs. Now former comments are now a part of the code and won't ever outdate. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: optimise sqe-to-req flags translationPavel Begunkov
For each IOSQE_* flag there is a corresponding REQ_F_* flag. And there is a repetitive pattern of their translation: e.g. if (sqe->flags & SQE_FLAG*) req->flags |= REQ_F_FLAG* Use same numeric values/bits for them and copy instead of manual handling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: remove REQ_F_IO_DRAINEDPavel Begunkov
A request can get into the defer list only once, there is no need for marking it as drained, so remove it. This probably was left after extracting __need_defer() for use in timeouts. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: file switch work needs to get flushed on exitJens Axboe
We currently flush early, but if we have something in progress and a new switch is scheduled, we need to ensure to flush after our teardown as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: hide uring_fd in ctxPavel Begunkov
req->ring_fd and req->ring_file are used only during the prep stage during submission, which is is protected by mutex. There is no need to store them per-request, place them in ctx. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: remove extra check in __io_commit_cqringPavel Begunkov
__io_commit_cqring() is almost always called when there is a change in the rings, so the check is rather pessimising. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: optimise use of ctx->drain_nextPavel Begunkov
Move setting ctx->drain_next to the only place it could be set, when it got linked non-head requests. The same for checking it, it's interesting only for a head of a link or a non-linked request. No functional changes here. This removes some code from the common path and also removes REQ_F_DRAIN_LINK flag, as it doesn't need it anymore. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add support for probing opcodesJens Axboe
The application currently has no way of knowing if a given opcode is supported or not without having to try and issue one and see if we get -EINVAL or not. And even this approach is fraught with peril, as maybe we're getting -EINVAL due to some fields being missing, or maybe it's just not that easy to issue that particular command without doing some other leg work in terms of setup first. This adds IORING_REGISTER_PROBE, which fills in a structure with info on what it supported or not. This will work even with sparse opcode fields, which may happen in the future or even today if someone backports specific features to older kernels. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: account fixed file references correctly in batchJens Axboe
We can't assume that the whole batch has fixed files in it. If it's a mix, or none at all, then we can end up doing a ref put that either messes up accounting, or causes an oops if we have no fixed files at all. Also ensure we free requests properly between inflight accounted and normal requests. Fixes: 82c721577011 ("io_uring: extend batch freeing to cover more cases") Reported-by: Dmitrii Dolgov <9erthalion6@gmail.com> Reported-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Dmitrii Dolgov <9erthalion6@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add opcode to issue trace eventJens Axboe
For some test apps at least, user_data is just zeroes. So it's not a good way to tell what the command actually is. Add the opcode to the issue trace point. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add support for IORING_OP_OPENAT2Jens Axboe
Add support for the new openat2(2) system call. It's trivial to do, as we can have openat(2) just be wrapped around it. Suggested-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: remove 'fname' from io_open structureJens Axboe
We only use it internally in the prep functions for both statx and openat, so we don't need it to be persistent across the request. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add 'struct open_how' to the openat request contextJens Axboe
We'll need this for openat2(2) support, remove flags and mode from the existing io_open struct. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: enable option to only trigger eventfd for async completionsJens Axboe
If an application is using eventfd notifications with poll to know when new SQEs can be issued, it's expecting the following read/writes to complete inline. And with that, it knows that there are events available, and don't want spurious wakeups on the eventfd for those requests. This adds IORING_REGISTER_EVENTFD_ASYNC, which works just like IORING_REGISTER_EVENTFD, except it only triggers notifications for events that happen from async completions (IRQ, or io-wq worker completions). Any completions inline from the submission itself will not trigger notifications. Suggested-by: Mark Papadakis <markuspapadakis@icloud.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: change io_ring_ctx bool fields into bit fieldsJens Axboe
In preparation for adding another one, which would make us spill into another long (and hence bump the size of the ctx), change them to bit fields. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: file set registration should use interruptible waitsJens Axboe
If an application attempts to register a set with unbounded requests pending, we can be stuck here forever if they don't complete. We can make this wait interruptible, and just abort if we get signaled. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: Remove unnecessary null checkYueHaibing
Null check kfree is redundant, so remove it. This is detected by coccinelle. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add support for send(2) and recv(2)Jens Axboe
This adds IORING_OP_SEND for send(2) support, and IORING_OP_RECV for recv(2) support. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: remove extra io_wq_current_is_worker()Pavel Begunkov
io_wq workers use io_issue_sqe() to forward sqes and never io_queue_sqe(). Remove extra check for io_wq_current_is_worker() Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: optimise commit_sqring() for common casePavel Begunkov
It should be pretty rare to not submitting anything when there is something in the ring. No need to keep heuristics for this case. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: optimise head checks in io_get_sqring()Pavel Begunkov
A user may ask to submit more than there is in the ring, and then io_uring will submit as much as it can. However, in the last iteration it will allocate an io_kiocb and immediately free it. It could do better and adjust @to_submit to what is in the ring. And since the ring's head is already checked here, there is no need to do it in the loop, spamming with smp_load_acquire()'s barriers Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: clamp to_submit in io_submit_sqes()Pavel Begunkov
Make io_submit_sqes() to clamp @to_submit itself. It removes duplicated code and prepares for following changes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add support for IORING_SETUP_CLAMPJens Axboe
Some applications like to start small in terms of ring size, and then ramp up as needed. This is a bit tricky to do currently, since we don't advertise the max ring size. This adds IORING_SETUP_CLAMP. If set, and the values for SQ or CQ ring size exceed what we support, then clamp them at the max values instead of returning -EINVAL. Since we return the chosen ring sizes after setup, no further changes are needed on the application side. io_uring already changes the ring sizes if the application doesn't ask for power-of-two sizes, for example. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: extend batch freeing to cover more casesJens Axboe
Currently we only batch free if fixed files are used, no links, no aux data, etc. This extends the batch freeing to only exclude the linked case and fallback case, and make io_free_req_many() handle the other cases just fine. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: wrap multi-req freeing in struct req_batchJens Axboe
This cleans up the code a bit, and it allows us to build on top of the multi-req freeing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: batch getting pcpu referencesPavel Begunkov
percpu_ref_tryget() has its own overhead. Instead getting a reference for each request, grab a bunch once per io_submit_sqes(). ~5% throughput boost for a "submit and wait 128 nops" benchmark. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> __io_req_free_empty() -> __io_req_do_free() Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add IORING_OP_MADVISEJens Axboe
This adds support for doing madvise(2) through io_uring. We assume that any operation can block, and hence punt everything async. This could be improved, but hard to make bullet proof. The async punt ensures it's safe. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add IORING_OP_FADVISEJens Axboe
This adds support for doing fadvise through io_uring. We assume that WILLNEED doesn't block, but that DONTNEED may block. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: allow use of offset == -1 to mean file positionJens Axboe
This behaves like preadv2/pwritev2 with offset == -1, it'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple read/writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position. Since this feature isn't easily detectable by doing a read or write, add a feature flags, IORING_FEAT_RW_CUR_POS, to allow applications to detect presence of this feature. Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add non-vectored read/write commandsJens Axboe
For uses cases that don't already naturally have an iovec, it's easier (or more convenient) to just use a buffer address + length. This is particular true if the use case is from languages that want to create a memory safe abstraction on top of io_uring, and where introducing the need for the iovec may impose an ownership issue. For those cases, they currently need an indirection buffer, which means allocating data just for this purpose. Add basic read/write that don't require the iovec. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: improve poll completion performanceJens Axboe
For busy IORING_OP_POLL_ADD workloads, we can have enough contention on the completion lock that we fail the inline completion path quite often as we fail the trylock on that lock. Add a list for deferred completions that we can use in that case. This helps reduce the number of async offloads we have to do, as if we get multiple completions in a row, we'll piggy back on to the poll_llist instead of having to queue our own offload. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: split overflow state into SQ and CQ sideJens Axboe
We currently check ->cq_overflow_list from both SQ and CQ context, which causes some bouncing of that cache line. Add separate bits of state for this instead, so that the SQ side can check using its own state, and likewise for the CQ side. This adds ->sq_check_overflow with the SQ state, and ->cq_check_overflow with the CQ state. If we hit an overflow condition, both of these bits are set. Likewise for overflow flush clear, we clear both bits. For the fast path of just checking if there's an overflow condition on either the SQ or CQ side, we can use our own private bit for this. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add lookup table for various opcode needsJens Axboe
We currently have various switch statements that check if an opcode needs a file, mm, etc. These are hard to keep in sync as opcodes are added. Add a struct io_op_def that holds all of this information, so we have just one spot to update when opcodes are added. This also enables us to NOT allocate req->io if a deferred command doesn't need it, and corrects some mistakes we had in terms of what commands need mm context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: remove two unnecessary function declarationsJens Axboe
__io_free_req() and io_double_put_req() aren't used before they are defined, so we can kill these two forwards. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: move *queue_link_head() from common pathPavel Begunkov
Move io_queue_link_head() to links handling code in io_submit_sqe(), so it wouldn't need extra checks and would have better data locality. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: rename prev to headPavel Begunkov
Calling "prev" a head of a link is a bit misleading. Rename it Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add IOSQE_ASYNCJens Axboe
io_uring defaults to always doing inline submissions, if at all possible. But for larger copies, even if the data is fully cached, that can take a long time. Add an IOSQE_ASYNC flag that the application can set on the SQE - if set, it'll ensure that we always go async for those kinds of requests. Use the io-wq IO_WQ_WORK_CONCURRENT flag to ensure we get the concurrency we desire for this case. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io-wq: support concurrent non-blocking workJens Axboe
io-wq assumes that work will complete fast (and not block), so it doesn't create a new worker when work is enqueued, if we already have at least one worker running. This is done on the assumption that if work is running, then it will complete fast. Add an option to force io-wq to fork a new worker for work queued. This is signaled by setting IO_WQ_WORK_CONCURRENT on the work item. For that case, io-wq will create a new worker, even though workers are already running. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add support for IORING_OP_STATXJens Axboe
This provides support for async statx(2) through io_uring. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20fs: make two stat prep helpers availableJens Axboe
To implement an async stat, we need to provide the flags mapping and the statx user copy. Make them available internally, through fs/internal.h. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: avoid ring quiesce for fixed file set unregister and updateJens Axboe
We currently fully quiesce the ring before an unregister or update of the fixed fileset. This is very expensive, and we can be a bit smarter about this. Add a percpu refcount for the file tables as a whole. Grab a percpu ref when we use a registered file, and put it on completion. This is cheap to do. Upon removal of a file from a set, switch the ref count to atomic mode. When we hit zero ref on the completion side, then we know we can drop the previously registered files. When the old files have been dropped, switch the ref back to percpu mode for normal operation. Since there's a period between doing the update and the kernel being done with it, add a IORING_OP_FILES_UPDATE opcode that can perform the same action. The application knows the update has completed when it gets the CQE for it. Between doing the update and receiving this completion, the application must continue to use the unregistered fd if submitting IO on this particular file. This takes the runtime of test/file-register from liburing from 14s to about 0.7s. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add support for IORING_OP_CLOSEJens Axboe
This works just like close(2), unsurprisingly. We remove the file descriptor and post the completion inline, then offload the actual (potential) last file put to async context. Mark the async part of this work as uncancellable, as we really must guarantee that the latter part of the close is run. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io-wq: add support for uncancellable workJens Axboe
Not all work can be cancelled, some of it we may need to guarantee that it runs to completion. Allow the caller to set IO_WQ_WORK_NO_CANCEL on work that must not be cancelled. Note that the caller work function must also check for IO_WQ_WORK_NO_CANCEL on work that is marked IO_WQ_WORK_CANCEL. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20fs: move filp_close() outside of __close_fd_get_file()Jens Axboe
Just one caller of this, and just use filp_close() there manually. This is important to allow async close/removal of the fd. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add support for IORING_OP_OPENATJens Axboe
This works just like openat(2), except it can be performed async. For the normal case of a non-blocking path lookup this will complete inline. If we have to do IO to perform the open, it'll be done from async context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20fs: make build_open_flags() available internallyJens Axboe
This is a prep patch for supporting non-blocking open from io_uring. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20io_uring: add support for fallocate()Jens Axboe
This exposes fallocate(2) through io_uring. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20Merge branch 'io_uring-5.5' into for-5.6/io_uring-vfsJens Axboe
Pull in compatability fix for the files_update command. * io_uring-5.5: io_uring: fix compat for IORING_REGISTER_FILES_UPDATE