summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-30Merge tag 'sched-urgent-2021-06-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix a small inconsistency (bug) in load tracking, caught by a new warning that several people reported. - Flip CONFIG_SCHED_CORE to default-disabled, and update the Kconfig help text. * tag 'sched-urgent-2021-06-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Disable CONFIG_SCHED_CORE by default sched/fair: Ensure _sum and _avg values stay consistent
2021-06-30Merge tag 'microblaze-v5.14' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds
Pull microblaze updates from Michal Simek: - Remove unused PAGE_UP/DOWN macros - Fix trivial spelling mistake * tag 'microblaze-v5.14' of git://git.monstr.eu/linux-2.6-microblaze: arch: microblaze: Fix spelling mistake "vesion" -> "version" microblaze: Cleanup unused functions
2021-06-30Merge tag 'safesetid-5.14' of git://github.com/micah-morton/linuxLinus Torvalds
Pull SafeSetID update from Micah Morton: "One very minor code cleanup change that marks a variable as __initdata" * tag 'safesetid-5.14' of git://github.com/micah-morton/linux: LSM: SafeSetID: Mark safesetid_initialized as __initdata
2021-06-30Merge tag 'Smack-for-5.14' of git://github.com/cschaufler/smack-nextLinus Torvalds
Pull smack updates from Casey Schaufler: "There is nothing more significant than an improvement to a byte count check in smackfs. All changes have been in next for weeks" * tag 'Smack-for-5.14' of git://github.com/cschaufler/smack-next: Smack: fix doc warning Revert "Smack: Handle io_uring kernel thread privileges" smackfs: restrict bytes count in smk_set_cipso() security/smack/: fix misspellings using codespell tool
2021-06-30Merge tag 'audit-pr-20210629' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Another merge window, another small audit pull request. Four patches in total: one is cosmetic, one removes an unnecessary initialization, one renames some enum values to prevent name collisions, and one converts list_del()/list_add() to list_move(). None of these are earth shattering and all pass the audit-testsuite tests while merging cleanly on top of your tree from earlier today" * tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: remove unnecessary 'ret' initialization audit: remove trailing spaces and tabs audit: Use list_move instead of list_del/list_add audit: Rename enum audit_state constants to avoid AUDIT_DISABLED redefinition audit: add blank line after variable declarations
2021-06-30Merge tag 'selinux-pr-20210629' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull SELinux updates from Paul Moore: - The slow_avc_audit() function is now non-blocking so we can remove the AVC_NONBLOCKING tricks; this also includes the 'flags' variant of avc_has_perm(). - Use kmemdup() instead of kcalloc()+copy when copying parts of the SELinux policydb. - The InfiniBand device name is now passed by reference when possible in the SELinux code, removing a strncpy(). - Minor cleanups including: constification of avtab function args, removal of useless LSM/XFRM function args, SELinux kdoc fixes, and removal of redundant assignments. * tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit() selinux: slow_avc_audit has become non-blocking selinux: Fix kernel-doc selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC lsm_audit,selinux: pass IB device name by reference selinux: Remove redundant assignment to rc selinux: Corrected comment to match kernel-doc comment selinux: delete selinux_xfrm_policy_lookup() useless argument selinux: constify some avtab function arguments selinux: simplify duplicate_policydb_cond_list() by using kmemdup()
2021-06-30ubd: remove dead code in ubd_setup_commonChristoph Hellwig
Remove some leftovers of the fake major number parsing that cause complains from some compilers. Fixes: 2933a1b2c6f3 ("ubd: remove the code to register as the legacy IDE driver") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210628093937.1325608-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30nvme: use return value from blk_execute_rq()Keith Busch
We don't have an nvme status to report if the driver's .queue_rq() returns an error without dispatching the requested nvme command. Check the return value from blk_execute_rq() for all passthrough commands so the caller may know their command was not successful. If the command is from the target passthrough interface and fails to dispatch, synthesize the response back to the host as a internal target error. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-5-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30block: return errors from blk_execute_rq()Keith Busch
The synchronous blk_execute_rq() had not provided a way for its callers to know if its request was successful or not. Return the blk_status_t result of the request. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-4-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30nvme: use blk_execute_rq() for passthrough commandsKeith Busch
The generic blk_execute_rq() knows how to handle polled completions. Use that instead of implementing an nvme specific handler. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-3-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30block: support polling through blk_execute_rqKeith Busch
Poll for completions if the request's hctx is a polling type. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-2-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30block: remove REQ_OP_SCSI_{IN,OUT}Christoph Hellwig
With the legacy IDE driver gone drivers now use either REQ_OP_DRV_* or REQ_OP_SCSI_*, so unify the two concepts of passthrough requests into a single one. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30block: mark blk_mq_init_queue_data staticChristoph Hellwig
All driver uses are gone now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210624081012.256464-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: rewrite loop_exit using idr_for_each_entryChristoph Hellwig
Use idr_for_each_entry to simplify removing all devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: split loop_lookupChristoph Hellwig
loop_lookup has two callers - one wants to do the a find by index and the other wants any unbound loop device. Open code the respective functionality in each caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: don't allow deleting an unspecified loop deviceChristoph Hellwig
Passing a negative index to loop_lookup while return any unbound device. Doing that for a delete does not make much sense, so add check to explicitly reject that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: move loop_ctl_mutex locking into loop_addChristoph Hellwig
Move acquiring and releasing loop_ctl_mutex from the callers into loop_add. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: split loop_control_ioctlChristoph Hellwig
Split loop_control_ioctl into a helper for each command. This keeps the code nicely separated for the upcoming locking changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: don't call loop_lookup before adding a loop deviceChristoph Hellwig
loop_add returns the right error if the slot wasn't available. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: remove the l argument to loop_addChristoph Hellwig
None of the callers cares about the allocated struct loop_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: reduce loop_ctl_mutex coverage in loop_exitChristoph Hellwig
loop_ctl_mutex is only needed to iterate the IDR for removing the loop devices, so reduce the coverage. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: reorder loop_exitChristoph Hellwig
Unregister the misc and blockdevice first to prevent further access, and only then iterate to remove the devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30mmc: initialized disk->minorsChristoph Hellwig
Fix a let hunk from the blk_mq_alloc_disk conversion. Fixes: 281ea6a5bfdc ("mmc: switch to blk_mq_alloc_disk") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210621080144.3655131-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30mmc: switch to blk_mq_alloc_diskChristoph Hellwig
Use the blk_mq_alloc_disk to allocate the request_queue and gendisk together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210616053934.880951-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30mmc: remove an extra blk_{get,put}_queue pairChristoph Hellwig
The gendisk already acquires a reference to the queue when add_disk is called, which dropped on put_disk. So remove the superflous extra refcounting. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210616053934.880951-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30nbd: provide a way for userspace processes to identify device backendsPrasanna Kumar Kalever
Problem: On reconfigure of device, there is no way to defend if the backend storage is matching with the initial backend storage. Say, if an initial connect request for backend "pool1/image1" got mapped to /dev/nbd0 and the userspace process is terminated. A next reconfigure request within NBD_ATTR_DEAD_CONN_TIMEOUT is allowed to use /dev/nbd0 for a different backend "pool1/image2" For example, an operation like below could be dangerous: $ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4" $ sudo pkill -9 rbd-nbd $ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs" Solution: Provide a way for userspace processes to keep some metadata to identify between the device and the backend, so that when a reconfigure request is made, we can compare and avoid such dangerous operations. With this solution, as part of the initial connect request, backend path can be stored in the sysfs per device config, so that on a reconfigure request it's easy to check if the backend path matches with the initial connect backend path. Please note, ioctl interface to nbd will not have these changes, as there won't be any reconfigure. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210429102828.31248-1-prasanna.kalever@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30ubd: use blk_mq_alloc_disk and blk_cleanup_diskChristoph Hellwig
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210614060759.3965724-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30ubd: remove the code to register as the legacy IDE driverChristoph Hellwig
With the legacy IDE driver long deprecated, and modern userspace being much more flexible about dev_t assignments there is no reason to fake a registration as the legacy IDE driver in ubd. This registeration is a little problematic as it registers the same request_queue for multiple gendisks, so just remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20210614060759.3965724-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30null_blk: remove an unused variable assignment in null_add_devChristoph Hellwig
Fix up the recent blk_alloc_disk conversion. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210614060231.3965278-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30mtip32xx: use blk_mq_alloc_disk and blk_cleanup_diskChristoph Hellwig
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210614060343.3965416-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30mtip32xx: simplify sysfs setupChristoph Hellwig
Pass the driver specific attributes directly to device_add_disk instead of manually creating them after the disk registration. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210614060343.3965416-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30Merge tag 'clang-features-v5.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull clang feature updates from Kees Cook: - Add CC_HAS_NO_PROFILE_FN_ATTR in preparation for PGO support in the face of the noinstr attribute, paving the way for PGO and fixing GCOV. (Nick Desaulniers) - x86_64 LTO coverage is expanded to 32-bit x86. (Nathan Chancellor) - Small fixes to CFI. (Mark Rutland, Nathan Chancellor) * tag 'clang-features-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR compiler_attributes.h: cleanups for GCC 4.9+ compiler_attributes.h: define __no_profile, add to noinstr x86, lto: Enable Clang LTO for 32-bit as well CFI: Move function_nocfi() into compiler.h MAINTAINERS: Add Clang CFI section
2021-06-30io_uring: code clean for kiocb_done()Hao Xu
A simple code clean for kiocb_done() Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: spin in iopoll() only when reqs are in a single queueHao Xu
We currently spin in iopoll() when requests to be iopolled are for same file(device), while one device may have multiple hardware queues. given an example: hw_queue_0 | hw_queue_1 req(30us) req(10us) If we first spin on iopolling for the hw_queue_0. the avg latency would be (30us + 30us) / 2 = 30us. While if we do round robin, the avg latency would be (30us + 10us) / 2 = 20us since we reap the request in hw_queue_1 in time. So it's better to do spinning only when requests are in same hardware queue. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: pre-initialise some of req fieldsPavel Begunkov
Most of requests are allocated from an internal cache, so it's waste of time fully initialising them every time. Instead, let's pre-init some of the fields we can during initial allocation (e.g. kmalloc(), see io_alloc_req()) and keep them valid on request recycling. There are four of them in this patch: ->ctx is always stays the same ->link is NULL on free, it's an invariant ->result is not even needed to init, just a precaution ->async_data we now clean in io_dismantle_req() as it's likely to never be allocated. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/892ba0e71309bba9fe9e0142472330bbf9d8f05d.1624739600.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: refactor io_submit_flush_completionsPavel Begunkov
Don't init req_batch before we actually need it. Also, add a small clean up for req declaration. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ad85512e12bd3a20d521e9782750300970e5afc8.1624739600.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: optimise hot path restricted checksPavel Begunkov
Move likely/unlikely from io_check_restriction() to specifically ctx->restricted check, because doesn't do what it supposed to and make the common path take an extra jump. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/22bf70d0a543dfc935d7276bdc73081784e30698.1624739600.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: remove not needed PF_EXITING checkPavel Begunkov
Since cancellation got moved before exit_signals(), there is no one left who can call io_run_task_work() with PF_EXIING set, so remove the check. Note that __io_req_task_submit() still needs a similar check. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f7f305ececb1e6044ea649fb983ca754805bb884.1624739600.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: mainstream sqpoll task_work runningPavel Begunkov
task_works are widely used, so place io_run_task_work() directly into the main path of io_sq_thread(), and remove it from other places where it's not needed anymore. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/24eb5e35d519c590d3dffbd694b4c61a5fe49029.1624739600.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: refactor io_arm_poll_handler()Pavel Begunkov
gcc 11 goes a weird path and duplicates most of io_arm_poll_handler() for READ and WRITE cases. Help it and move all pollin vs pollout specific bits under a single if-else, so there is no temptation for this kind of unfolding. before vs after: text data bss dec hex filename 85362 12650 8 98020 17ee4 ./fs/io_uring.o 85186 12650 8 97844 17e34 ./fs/io_uring.o Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1deea0037293a922a0358e2958384b2e42437885.1624739600.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: reduce latency by reissueing the operationOlivier Langlois
It is quite frequent that when an operation fails and returns EAGAIN, the data becomes available between that failure and the call to vfs_poll() done by io_arm_poll_handler(). Detecting the situation and reissuing the operation is much faster than going ahead and push the operation to the io-wq. Performance improvement testing has been performed with: Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll. 4 measurements have been taken: 1. The time it takes to process a read request when data is already available 2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data was available 3. The time it takes to execute io_queue_async_work() 4. The time it takes to complete a read request asynchronously 2.25% of all the read operations did use the new path. ready data (baseline) avg 3657.94182918628 min 580 max 20098 stddev 1213.15975908162 reissue completion average 7882.67567567568 min 2316 max 28811 stddev 1982.79172973284 insert io-wq time average 8983.82276995305 min 3324 max 87816 stddev 2551.60056552038 async time completion average 24670.4758861127 min 10758 max 102612 stddev 3483.92416873804 Conclusion: On average reissuing the sqe with the patch code is 1.1uSec faster and in the worse case scenario 59uSec faster than placing the request on io-wq On average completion time by reissuing the sqe with the patch code is 16.79uSec faster and in the worse case scenario 73.8uSec faster than async completion. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9e8441419bb1b8f3c3fcc607b2713efecdef2136.1624364038.git.olivier@trillion01.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: add IOPOLL and reserved field checks to IORING_OP_UNLINKATJens Axboe
We can't support IOPOLL with non-pollable request types, and we should check for unused/reserved fields like we do for other request types. Fixes: 14a1143b68ee ("io_uring: add support for IORING_OP_UNLINKAT") Cc: stable@vger.kernel.org Reported-by: Dmitry Kadashev <dkadashev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: add IOPOLL and reserved field checks to IORING_OP_RENAMEATJens Axboe
We can't support IOPOLL with non-pollable request types, and we should check for unused/reserved fields like we do for other request types. Fixes: 80a261fd0032 ("io_uring: add support for IORING_OP_RENAMEAT") Cc: stable@vger.kernel.org Reported-by: Dmitry Kadashev <dkadashev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: refactor io_openat2()Pavel Begunkov
Put do_filp_open() fail path of io_openat2() under a single if, deduplicating put_unused_fd(), making it look better and helping the hot path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f4c84d25c049d0af2adc19c703bbfef607200209.1624543113.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: simplify struct io_uring_sqe layoutPavel Begunkov
Flatten struct io_uring_sqe, the last union is exactly 64B, so move them out of union { struct { ... }}, and decrease __pad2 size. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2e21ef7aed136293d654450bc3088973a8adc730.1624543113.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: update sqe layout build checksPavel Begunkov
Add missing BUILD_BUG_SQE_ELEM() for ->buf_group verifying that SQE layout doesn't change. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1f9d21bd74599b856b3a632be4c23ffa184a3ef0.1624543113.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: fix code style problemsPavel Begunkov
Fix a bunch of problems mostly found by checkpatch.pl Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cfaf9a2f27b43934144fe9422a916bd327099f44.1624543113.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: refactor io_sq_thread()Pavel Begunkov
Move needs_sched declaration into the block where it's used, so it's harder to misuse/wrongfully reuse. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e4a07db1353ee38b924dd1b45394cf8e746130b4.1624543113.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30io_uring: don't change sqpoll creds if not neededPavel Begunkov
SQPOLL doesn't need to change creds if it's not submitting requests. Move creds overriding into __io_sq_thread() after checking if there are SQEs pending. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c54368da2357ac539e0a333f7cfff70d5fb045b2.1624543113.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30vfio/pci: Handle concurrent vma faultsAlex Williamson
io_remap_pfn_range() will trigger a BUG_ON if it encounters a populated pte within the mapping range. This can occur because we map the entire vma on fault and multiple faults can be blocked behind the vma_lock. This leads to traces like the one reported below. We can use our vma_list to test whether a given vma is mapped to avoid this issue. [ 1591.733256] kernel BUG at mm/memory.c:2177! [ 1591.739515] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 1591.747381] Modules linked in: vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O) [ 1591.760536] CPU: 2 PID: 227 Comm: lcore-worker-2 Tainted: G O 5.11.0-rc3+ #1 [ 1591.770735] Hardware name: , BIOS HixxxxFPGA 1P B600 V121-1 [ 1591.778872] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--) [ 1591.786134] pc : remap_pfn_range+0x214/0x340 [ 1591.793564] lr : remap_pfn_range+0x1b8/0x340 [ 1591.799117] sp : ffff80001068bbd0 [ 1591.803476] x29: ffff80001068bbd0 x28: 0000042eff6f0000 [ 1591.810404] x27: 0000001100910000 x26: 0000001300910000 [ 1591.817457] x25: 0068000000000fd3 x24: ffffa92f1338e358 [ 1591.825144] x23: 0000001140000000 x22: 0000000000000041 [ 1591.832506] x21: 0000001300910000 x20: ffffa92f141a4000 [ 1591.839520] x19: 0000001100a00000 x18: 0000000000000000 [ 1591.846108] x17: 0000000000000000 x16: ffffa92f11844540 [ 1591.853570] x15: 0000000000000000 x14: 0000000000000000 [ 1591.860768] x13: fffffc0000000000 x12: 0000000000000880 [ 1591.868053] x11: ffff0821bf3d01d0 x10: ffff5ef2abd89000 [ 1591.875932] x9 : ffffa92f12ab0064 x8 : ffffa92f136471c0 [ 1591.883208] x7 : 0000001140910000 x6 : 0000000200000000 [ 1591.890177] x5 : 0000000000000001 x4 : 0000000000000001 [ 1591.896656] x3 : 0000000000000000 x2 : 0168044000000fd3 [ 1591.903215] x1 : ffff082126261880 x0 : fffffc2084989868 [ 1591.910234] Call trace: [ 1591.914837] remap_pfn_range+0x214/0x340 [ 1591.921765] vfio_pci_mmap_fault+0xac/0x130 [vfio_pci] [ 1591.931200] __do_fault+0x44/0x12c [ 1591.937031] handle_mm_fault+0xcc8/0x1230 [ 1591.942475] do_page_fault+0x16c/0x484 [ 1591.948635] do_translation_fault+0xbc/0xd8 [ 1591.954171] do_mem_abort+0x4c/0xc0 [ 1591.960316] el0_da+0x40/0x80 [ 1591.965585] el0_sync_handler+0x168/0x1b0 [ 1591.971608] el0_sync+0x174/0x180 [ 1591.978312] Code: eb1b027f 540000c0 f9400022 b4fffe02 (d4210000) Fixes: 11c4cd07ba11 ("vfio-pci: Fault mmaps to enable vma tracking") Reported-by: Zeng Tao <prime.zeng@hisilicon.com> Suggested-by: Zeng Tao <prime.zeng@hisilicon.com> Link: https://lore.kernel.org/r/162497742783.3883260.3282953006487785034.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>