summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-12-09i2c: sh_mobile: Mark adapter suspended during suspendGeert Uytterhoeven
When a driver tries to send an I2C message while the adapter is suspended, this typically fails with: i2c-sh_mobile e60b0000.i2c: Transfer request timed out Avoid accessing the adapter while it is suspended by marking it suspended during suspend. This allows the I2C core to catch this, and print a warning: WARNING: CPU: 1 PID: 13 at drivers/i2c/i2c-core.h:54 __i2c_transfer+0x4a4/0x4e4 i2c i2c-6: Transfer while suspended Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-12-09i2c: owl: Add compatible for the Actions Semi S500 I2C controllerCristian Ciocaltea
Add S500 variant to the list of devices supported by the Actions Semi Owl I2C driver. Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-12-09dt-bindings: i2c: owl: Convert Actions Semi Owl binding to a schemaCristian Ciocaltea
Convert the Actions Semi Owl I2C DT binding to a YAML schema for enabling DT validation. Additionally, add a new compatible string corresponding to the I2C controller found in the S500 variant of the Actions Semi Owl SoCs family. Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-12-09Merge tag 'omap-for-v5.11/genpd-rest-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/omap-genpd Remaining genpd changes for omaps for v5.11 This series contains the remaining genpd changes for omap4/5, and dra7 to add the power domain and reset control data to omap-prm driver. We also update several devices to probe without platform data to get us closer to booting omap4/5, and dra7 without platform data. There is also a build fix for the earlier am437x series that I should have applied into a separate branch on top of the am437x breaking commit. It ended here as I was originally planning to send out a single pull request for all the genpd changes, but then decided to break it down to smaller chunks. It's all really a larger single git branch though, so this should be OK and I really did not want to start reorganizing the branch after testing it and having it sit in Linux next. The changes done here are: - Clock driver needs idlest check dropped for IVA for omap4 and dra7 - Add remaining power domain and reset control data to omap-prm driver for omap4/5 and dra7 - Add device tree data for remaining power domains and reset control for omap4/5 and dra7 dts files - Update dss, dsp, iva and gpmc dts files to use genpd and to drop the remaining platform data - Update dss for omap5 to use genpd - Update dra7 iva to to use genpd and to drop the remaining platform data * tag 'omap-for-v5.11/genpd-rest-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: Fix am4 only build after genpd changes ARM: dts: Configure power domain for omap5 dss ARM: dts: omap5: add remaining PRM instances soc: ti: omap-prm: omap5: add genpd support for remaining PRM instances ARM: OMAP2+: Drop legacy platform data for dra7 gpmc ARM: dts: Configure interconnect target module for dra7 iva ARM: dts: dra7: add remaining PRM instances soc: ti: omap-prm: dra7: add genpd support for remaining PRM instances clk: ti: dra7: Drop idlest polling from IVA clkctrl clocks ARM: OMAP2+: Drop legacy platform data for omap4 gpmc ARM: OMAP2+: Drop legacy platform data for omap4 iva ARM: dts: Configure power domain for omap4 dsp ARM: dts: Configure power domain for omap4 dss ARM: dts: omap4: add remaining PRM instances soc: ti: omap-prm: omap4: add genpd support for remaining PRM instances clk: ti: omap4: Drop idlest polling from IVA clkctrl clocks Link: https://lore.kernel.org/r/pull-1606806458-694517@atomide.com-4 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-12-09Merge tag 'omap-for-v5.11/genpd-am437x-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/omap-genpd Update am473x to boot without platform data Similar to am335x, we can now update am437x dts files to boot with genpd and simple-pm-bus, and drop the related platform data. To do that, we need to do the following changes for am437x: - Update the clock driver to keep the l3_main clock always on for suspend and resume to work - Add power domain and reset controller data to omap-prm driver - Configure interconnect clocks for system timers as those are now managed separately by the drivers/clocksource drivers - Update control module, wkup_m3, emif, ocmcram, mpuss and l3_noc for device tree data and drop the legacy platform data - Update the interconnect instances to boot with gendp and simple-pm-bus - Drop the remaining platform data for am437x * tag 'omap-for-v5.11/genpd-am437x-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: Drop legacy remaining legacy platform data for am4 ARM: dts: Use simple-pm-bus for genpd for am4 l3 ARM: dts: Move am4 l3 noc to a separate node ARM: dts: Use simple-pm-bus for genpd for am4 l4_per ARM: dts: Use simple-pm-bus for genpd for am4 l4_fast ARM: dts: Use simple-pm-bus for genpd for am4 l4_wkup ARM: OMAP2+: Drop legacy platform data for am4 mpuss ARM: OMAP2+: Drop legacy platform data for am4 ocmcram ARM: OMAP2+: Drop legacy platform data for am4 emif ARM: OMAP2+: Drop legacy platform data for am4 wkup_m3 ARM: dts: Configure interconnect target module for am4 wkup_m3 ARM: dts: Configure RTC powerdomain for am4 ARM: OMAP2+: Drop legacy platform data for am4 control module ARM: dts: Configure also interconnect clocks for am4 system timer ARM: dts: am43xx: add remaining PRM instances soc: ti: omap-prm: am4: add genpd support for remaining PRM instances clk: ti: am437x: Keep am4 l3 main clock always on for genpd Link: https://lore.kernel.org/r/pull-1606806458-694517@atomide.com-3 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-12-09Merge tag 'omap-for-v5.11/genpd-am335x-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/omap-genpd Update am335x to boot without platform data With the driver updates done for genpd support, we can now update am335x dts files to boot with genpd and simple-pm-bus, and drop the related platform data. To do that, we need to do the following changes for am335x: - Add the remaining power domain and reset controller instances - Configure interconnect clocks for system timers as those are now managed separately by the drivers/clocksource drivers - Update control module, RTC, gpmc, debugss, emif, ocmcram, instr, and mpuss for device tree data and drop the legacy platform data - Update the interconnect instances to boot with gendp and simple-pm-bus - Drop the remaining platform data for am335x - Add kconfig option for OMAP_HWMOD to build it only for the SoCs that need it * tag 'omap-for-v5.11/genpd-am335x-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: Build hwmod related code as needed ARM: OMAP2+: Drop legacy remaining legacy platform data for am3 ARM: dts: Use simple-pm-bus for genpd for am3 l3 ARM: dts: Use simple-pm-bus for genpd for am3 l4_per ARM: dts: Use simple-pm-bus for genpd for am3 l4_fast ARM: dts: Use simple-pm-bus for genpd for am3 l4_wkup ARM: OMAP2+: Drop legacy platform data for am3 mpuss ARM: OMAP2+: Drop legacy platform data for am3 instr ARM: OMAP2+: Drop legacy platform data for am3 ocmcram ARM: OMAP2+: Drop legacy platform data for am3 emif ARM: OMAP2+: Drop legacy platform data for am3 debugss ARM: OMAP2+: Drop legacy platform data for am3 and am4 gpmc ARM: OMAP2+: Drop legacy platform data for am3 wkup_m3 ARM: dts: Configure interconnect target module for am3 wkup_m3 ARM: dts: Configure RTC powerdomain for am3 ARM: OMAP2+: Drop legacy platform data for am3 control module ARM: dts: Configure also interconnect clocks for am4 system timer ARM: dts: am33xx: add remaining PRM instances Link: https://lore.kernel.org/r/pull-1606806458-694517@atomide.com-2 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-12-09Merge tag 'omap-for-v5.11/genpd-drivers-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/omap-genpd Driver changes for omaps for genpd for v5.11 merge window This series of changes allows booting am335x with genpd and device tree data without the legacy platform data. Also at least am437x can be booted with gendp with power domain and dts data. The SoC specific dts changes will be a separate pull request. We need the following driver changes merged before the dts changes can be done: - platform code needs a few improvments to probe l4_wkup first for clocks, and to bail out when there is no platform data - ti-sysc driver needs a non-urgent fix for asserting rstctrl reset only after disabling the clocks, to probe modules with no known control registers, and added quirk handling for gpmc devices - omap-prm driver needs a non-urgent fix for reset status bit, support added for pm_clk, and then we add the rest of am335x power domain data - clock driver for am335x needs to keep l3_main clock enabled with genpd for suspend and resume to work - wkup_m3 remoteproc driver needs support added for reset control if available instead of the legacy pdata callbacks - pm33xx driver needs PM runtime support added for genpd The am335x specific driver changes for the clock, wkup_m3, pm33xx and remoteproc drivers are quite trivial and have not caused merge conflicts in Linux next. I did not get acks for these changes except from Santosh but had already pushed out the branch already at that point. So I've added the related driver maintainers to Cc. * tag 'omap-for-v5.11/genpd-drivers-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: remoteproc/wkup_m3: Use reset control driver if available soc: ti: pm33xx: Enable basic PM runtime support for genpd soc: ti: omap-prm: am3: add genpd support for remaining PRM instances soc: ti: omap-prm: Add pm_clk for genpd clk: ti: am33xx: Keep am3 l3 main clock always on for genpd bus: ti-sysc: Implement GPMC debug quirk to drop platform data bus: ti-sysc: Support modules without control registers ARM: OMAP2+: Probe PRCM first to probe l4_wkup with simple-pm-bus ARM: OMAP2+: Check for inited flag bus: ti-sysc: Assert reset only after disabling clocks soc: ti: omap-prm: Do not check rstst bit on deassert if already deasserted bus: ti-sysc: Fix bogus resetdone warning on enable for cpsw bus: ti-sysc: Fix reset status check for modules with quirks ARM: OMAP2+: Fix missing select PM_GENERIC_DOMAINS_OF ARM: OMAP2+: Fix location for select PM_GENERIC_DOMAINS Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-12-09s390/smp: perform initial CPU reset also for SMT siblingsSven Schnelle
Not resetting the SMT siblings might leave them in unpredictable state. One of the observed problems was that the CPU timer wasn't reset and therefore large system time values where accounted during CPU bringup. Cc: <stable@kernel.org> # 4.0 Fixes: 10ad34bc76dfb ("s390: add SMT support") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-09s390/mm: use invalid asce for user space when switching to init_mmHeiko Carstens
Currently only idle_task_exit() explicitly switches (switch_mm) to init_mm. This causes the kernel asce to be loaded into cr7 and therefore it would be used for potential user space accesses. This is currently no problem since idle_task_exit() is nearly the last thing a CPU executes before it is taken down. However things might change - and therefore make sure that always the invalid asce is used for cr7 when active_mm is init_mm. This makes sure that all potential user space accesses will fail, instead of accessing kernel address space. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-09s390/idle: fix accounting with machine checksSven Schnelle
When a machine check interrupt is triggered during idle, the code is using the async timer/clock for idle time calculation. It should use the machine check enter timer/clock which is passed to the macro. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Cc: <stable@vger.kernel.org> # 5.8 Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-09s390/idle: add missing mt_cycles calculationSven Schnelle
During removal of the critical section cleanup the calculation of mt_cycles during idle was removed. This causes invalid accounting on systems with SMT enabled. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Cc: <stable@vger.kernel.org> # 5.8 Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-09s390/boot: add build-id to decompressorPhilipp Rudo
More and more functionality from the early boot phase gets carried over to the decompressor. With this the complexity of the code and thus the chance to introduce bugs increases. In order to be able to debug these early boot bugs the distributions have to package the decompressors vmlinux together with the other debuginfos. However for that the distributions require the vmlinux to contain a build-id. Per default the section containing the build-id is placed first in the section table. So make sure to move it behind the .text section otherwise the image would be unbootable. Signed-off-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-09s390/kexec_file: fix diag308 subcode when loading crash kernelPhilipp Rudo
diag308 subcode 0 performes a clear reset which inlcudes the reset of all registers in the system. While this is the preferred behavior when loading a normal kernel via kexec it prevents the crash kernel to store the register values in the dump. To prevent this use subcode 1 when loading a crash kernel instead. Fixes: ee337f5469fd ("s390/kexec_file: Add crash support to image loader") Cc: <stable@vger.kernel.org> # 4.17 Signed-off-by: Philipp Rudo <prudo@linux.ibm.com> Reported-by: Xiaoying Yan <yiyan@redhat.com> Tested-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-09s390/cio: fix use-after-free in ccw_device_destroy_consoleQinglang Miao
Use of sch->dev reference after the put_device() call could trigger the use-after-free bugs. Fix this by simply adjusting the position of put_device. Fixes: 37db8985b211 ("s390/cio: add basic protected virtualization support") Reported-by: Hulk Robot <hulkci@huawei.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com> [vneethv@linux.ibm.com: Slight modification in the commit-message] Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-09dt-bindings: mfd: fix stm32 timers exampleFabrice Gasnier
The stm32 timers example name should match the pattern timer@. Also, the example is based on stm32mp1 timer 2, so the identifier should be '1' instead of '0' (e.g. timer 1). Fixes: bfbcbf88f9db ("dt-bindings: timer: Convert stm32 timer bindings to json-schema") Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com> Link: https://lore.kernel.org/r/1606913114-25693-1-git-send-email-fabrice.gasnier@foss.st.com Signed-off-by: Rob Herring <robh@kernel.org>
2020-12-09vrf: handle CONFIG_IPV6 not set for vrf_add_mac_header_if_unset()Andrea Mayer
The vrf_add_mac_header_if_unset() is defined within a conditional compilation block which depends on the CONFIG_IPV6 macro. However, the vrf_add_mac_header_if_unset() needs to be called also by IPv4 related code and when the CONFIG_IPV6 is not set, this function is missing. As a consequence, the build process stops reporting the error: ERROR: implicit declaration of function 'vrf_add_mac_header_if_unset' The problem is solved by *only* moving functions vrf_add_mac_header_if_unset() and vrf_prepare_mac_header() out of the conditional block. Reported-by: kernel test robot <lkp@intel.com> Fixes: 0489390882202 ("vrf: add mac header for tunneled packets when sniffer is attached") Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20201208175210.8906-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09RDMA/cm: Fix an attempt to use non-valid pointer when cleaning timewaitLeon Romanovsky
If cm_create_timewait_info() fails, the timewait_info pointer will contain an error value and will be used in cm_remove_remote() later. general protection fault, probably for non-canonical address 0xdffffc0000000024: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0×0000000000000120-0×0000000000000127] CPU: 2 PID: 12446 Comm: syz-executor.3 Not tainted 5.10.0-rc5-5d4c0742a60e #27 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:cm_remove_remote.isra.0+0x24/0×170 drivers/infiniband/core/cm.c:978 Code: 84 00 00 00 00 00 41 54 55 53 48 89 fb 48 8d ab 2d 01 00 00 e8 7d bf 4b fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 04 02 48 89 ea 83 e2 07 38 d0 7f 08 84 c0 0f 85 fc 00 00 00 RSP: 0018:ffff888013127918 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: ffffc9000a18b000 RDX: 0000000000000024 RSI: ffffffff82edc573 RDI: fffffffffffffff4 RBP: 0000000000000121 R08: 0000000000000001 R09: ffffed1002624f1d R10: 0000000000000003 R11: ffffed1002624f1c R12: ffff888107760c70 R13: ffff888107760c40 R14: fffffffffffffff4 R15: ffff888107760c9c FS: 00007fe1ffcc1700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2ff21000 CR3: 000000010f504001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: cm_destroy_id+0x189/0×15b0 drivers/infiniband/core/cm.c:1155 cma_connect_ib drivers/infiniband/core/cma.c:4029 [inline] rdma_connect_locked+0x1100/0×17c0 drivers/infiniband/core/cma.c:4107 rdma_connect+0x2a/0×40 drivers/infiniband/core/cma.c:4140 ucma_connect+0x277/0×340 drivers/infiniband/core/ucma.c:1069 ucma_write+0x236/0×2f0 drivers/infiniband/core/ucma.c:1724 vfs_write+0x220/0×830 fs/read_write.c:603 ksys_write+0x1df/0×240 fs/read_write.c:658 do_syscall_64+0x33/0×40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Link: https://lore.kernel.org/r/20201204064205.145795-1-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Reported-by: Amit Matityahu <mitm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-09Merge tag 'samsung-drivers-5.11-2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/drivers Samsung SoC drivers changes for v5.11, part two 1. Mark PM functions of newly added clkout module as unused to silence !CONFIG_PM warnings. 2. Initialize ChipID driver later - in arch initcall. * tag 'samsung-drivers-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: clk: samsung: mark PM functions as __maybe_unused soc: samsung: exynos-chipid: initialize later - with arch_initcall soc: samsung: exynos-chipid: order list of SoCs by name Link: https://lore.kernel.org/r/20201207074528.4475-1-krzk@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-12-09Merge tag 'zynqmp-soc-for-v5.11-v2' of https://github.com/Xilinx/linux-xlnx ↵Arnd Bergmann
into arm/drivers arm64: soc: ZynqMP SoC changes for v5.11 v2 - Small alignments in Xilinx Firmware driver - Exposing syscon interface for VCU driver * tag 'zynqmp-soc-for-v5.11-v2' of https://github.com/Xilinx/linux-xlnx: firmware: xilinx: Properly align function parameter firmware: xilinx: Add a blank line after function declaration firmware: xilinx: Remove additional newline firmware: xilinx: Fix kernel-doc warnings firmware: xlnx-zynqmp: fix compilation warning soc: xilinx: vcu: add missing register NUM_CORE soc: xilinx: vcu: use vcu-settings syscon registers dt-bindings: soc: xlnx: extract xlnx, vcu-settings to separate binding soc: xilinx: vcu: drop useless success message Link: https://lore.kernel.org/r/71d38756-4456-29fc-26a3-341e1d09aafe@monstr.eu Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-12-09dyndbg: fix use before null checkJim Cromie
In commit a2d375eda771 ("dyndbg: refine export, rename to dynamic_debug_exec_queries()"), a string is copied before checking it isn't NULL. Fix this, report a usage/interface error, and return the proper error code. Fixes: a2d375eda771 ("dyndbg: refine export, rename to dynamic_debug_exec_queries()") Cc: stable@vger.kernel.org -- -v2 drop comment tweak, improve commit message Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20201209183625.2432329-1-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-09io_uring: fix io_cqring_events()'s noflushPavel Begunkov
Checking !list_empty(&ctx->cq_overflow_list) around noflush in io_cqring_events() is racy, because if it fails but a request overflowed just after that, io_cqring_overflow_flush() still will be called. Remove the second check, it shouldn't be a problem for performance, because there is cq_check_overflow bit check just above. Cc: <stable@vger.kernel.org> # 5.5+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: fix racy IOPOLL flush overflowPavel Begunkov
It's not safe to call io_cqring_overflow_flush() for IOPOLL mode without hodling uring_lock, because it does synchronisation differently. Make sure we have it. As for io_ring_exit_work(), we don't even need it there because io_ring_ctx_wait_and_kill() already set force flag making all overflowed requests to be dropped. Cc: <stable@vger.kernel.org> # 5.5+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: fix racy IOPOLL completionsPavel Begunkov
IOPOLL allows buffer remove/provide requests, but they doesn't synchronise by rules of IOPOLL, namely it have to hold uring_lock. Cc: <stable@vger.kernel.org> # 5.7+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: always let io_iopoll_complete() complete polled ioXiaoguang Wang
Abaci Fuzz reported a double-free or invalid-free BUG in io_commit_cqring(): [ 95.504842] BUG: KASAN: double-free or invalid-free in io_commit_cqring+0x3ec/0x8e0 [ 95.505921] [ 95.506225] CPU: 0 PID: 4037 Comm: io_wqe_worker-0 Tainted: G B W 5.10.0-rc5+ #1 [ 95.507434] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 95.508248] Call Trace: [ 95.508683] dump_stack+0x107/0x163 [ 95.509323] ? io_commit_cqring+0x3ec/0x8e0 [ 95.509982] print_address_description.constprop.0+0x3e/0x60 [ 95.510814] ? vprintk_func+0x98/0x140 [ 95.511399] ? io_commit_cqring+0x3ec/0x8e0 [ 95.512036] ? io_commit_cqring+0x3ec/0x8e0 [ 95.512733] kasan_report_invalid_free+0x51/0x80 [ 95.513431] ? io_commit_cqring+0x3ec/0x8e0 [ 95.514047] __kasan_slab_free+0x141/0x160 [ 95.514699] kfree+0xd1/0x390 [ 95.515182] io_commit_cqring+0x3ec/0x8e0 [ 95.515799] __io_req_complete.part.0+0x64/0x90 [ 95.516483] io_wq_submit_work+0x1fa/0x260 [ 95.517117] io_worker_handle_work+0xeac/0x1c00 [ 95.517828] io_wqe_worker+0xc94/0x11a0 [ 95.518438] ? io_worker_handle_work+0x1c00/0x1c00 [ 95.519151] ? __kthread_parkme+0x11d/0x1d0 [ 95.519806] ? io_worker_handle_work+0x1c00/0x1c00 [ 95.520512] ? io_worker_handle_work+0x1c00/0x1c00 [ 95.521211] kthread+0x396/0x470 [ 95.521727] ? _raw_spin_unlock_irq+0x24/0x30 [ 95.522380] ? kthread_mod_delayed_work+0x180/0x180 [ 95.523108] ret_from_fork+0x22/0x30 [ 95.523684] [ 95.523985] Allocated by task 4035: [ 95.524543] kasan_save_stack+0x1b/0x40 [ 95.525136] __kasan_kmalloc.constprop.0+0xc2/0xd0 [ 95.525882] kmem_cache_alloc_trace+0x17b/0x310 [ 95.533930] io_queue_sqe+0x225/0xcb0 [ 95.534505] io_submit_sqes+0x1768/0x25f0 [ 95.535164] __x64_sys_io_uring_enter+0x89e/0xd10 [ 95.535900] do_syscall_64+0x33/0x40 [ 95.536465] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 95.537199] [ 95.537505] Freed by task 4035: [ 95.538003] kasan_save_stack+0x1b/0x40 [ 95.538599] kasan_set_track+0x1c/0x30 [ 95.539177] kasan_set_free_info+0x1b/0x30 [ 95.539798] __kasan_slab_free+0x112/0x160 [ 95.540427] kfree+0xd1/0x390 [ 95.540910] io_commit_cqring+0x3ec/0x8e0 [ 95.541516] io_iopoll_complete+0x914/0x1390 [ 95.542150] io_do_iopoll+0x580/0x700 [ 95.542724] io_iopoll_try_reap_events.part.0+0x108/0x200 [ 95.543512] io_ring_ctx_wait_and_kill+0x118/0x340 [ 95.544206] io_uring_release+0x43/0x50 [ 95.544791] __fput+0x28d/0x940 [ 95.545291] task_work_run+0xea/0x1b0 [ 95.545873] do_exit+0xb6a/0x2c60 [ 95.546400] do_group_exit+0x12a/0x320 [ 95.546967] __x64_sys_exit_group+0x3f/0x50 [ 95.547605] do_syscall_64+0x33/0x40 [ 95.548155] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason is that once we got a non EAGAIN error in io_wq_submit_work(), we'll complete req by calling io_req_complete(), which will hold completion_lock to call io_commit_cqring(), but for polled io, io_iopoll_complete() won't hold completion_lock to call io_commit_cqring(), then there maybe concurrent access to ctx->defer_list, double free may happen. To fix this bug, we always let io_iopoll_complete() complete polled io. Cc: <stable@vger.kernel.org> # 5.5+ Reported-by: Abaci Fuzz <abaci@linux.alibaba.com> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: add timeout updatePavel Begunkov
Support timeout updates through IORING_OP_TIMEOUT_REMOVE with passed in IORING_TIMEOUT_UPDATE. Updates doesn't support offset timeout mode. Oirignal timeout.off will be ignored as well. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [axboe: remove now unused 'ret' variable] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: restructure io_timeout_cancel()Pavel Begunkov
Add io_timeout_extract() helper, which searches and disarms timeouts, but doesn't complete them. No functional changes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: fix files cancellationPavel Begunkov
io_uring_cancel_files()'s task check condition mistakenly got flipped. 1. There can't be a request in the inflight list without IO_WQ_WORK_FILES, kill this check to keep the whole condition simpler. 2. Also, don't call the function for files==NULL to not do such a check, all that staff is already handled well by its counter part, __io_uring_cancel_task_requests(). With that just flip the task check. Also, it iowq-cancels all request of current task there, don't forget to set right ->files into struct io_task_cancel. Fixes: c1973b38bf639 ("io_uring: cancel only requests of current task") Reported-by: syzbot+c0d52d0b3c0c3ffb9525@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: use bottom half safe lock for fixed file dataJens Axboe
io_file_data_ref_zero() can be invoked from soft-irq from the RCU core, hence we need to ensure that the file_data lock is bottom half safe. Use the _bh() variants when grabbing this lock. Reported-by: syzbot+1f4ba1e5520762c523c6@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: fix miscounting ios_leftPavel Begunkov
io_req_init() doesn't decrement state->ios_left if a request doesn't need ->file, it just returns before that on if(!needs_file). That's not really a problem but may cause overhead for an additional fput(). Also inline and kill io_req_set_file() as it's of no use anymore. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: change submit file state invariantPavel Begunkov
Keep submit state invariant of whether there are file refs left based on state->nr_refs instead of (state->file==NULL), and always check against the first one. It's easier to track and allows to remove 1 if. It also automatically leaves struct submit_state in a consistent state after io_submit_state_end(), that's not used yet but nice. btw rename has_refs to file_refs for more clarity. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: check kthread stopped flag when sq thread is unparkedXiaoguang Wang
syzbot reports following issue: INFO: task syz-executor.2:12399 can't die for more than 143 seconds. task:syz-executor.2 state:D stack:28744 pid:12399 ppid: 8504 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:3773 [inline] __schedule+0x893/0x2170 kernel/sched/core.c:4522 schedule+0xcf/0x270 kernel/sched/core.c:4600 schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x163/0x260 kernel/sched/completion.c:138 kthread_stop+0x17a/0x720 kernel/kthread.c:596 io_put_sq_data fs/io_uring.c:7193 [inline] io_sq_thread_stop+0x452/0x570 fs/io_uring.c:7290 io_finish_async fs/io_uring.c:7297 [inline] io_sq_offload_create fs/io_uring.c:8015 [inline] io_uring_create fs/io_uring.c:9433 [inline] io_uring_setup+0x19b7/0x3730 fs/io_uring.c:9507 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45deb9 Code: Unable to access opcode bytes at RIP 0x45de8f. RSP: 002b:00007f174e51ac78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 RAX: ffffffffffffffda RBX: 0000000000008640 RCX: 000000000045deb9 RDX: 0000000000000000 RSI: 0000000020000140 RDI: 00000000000050e5 RBP: 000000000118bf58 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c R13: 00007ffed9ca723f R14: 00007f174e51b9c0 R15: 000000000118bf2c INFO: task syz-executor.2:12399 blocked for more than 143 seconds. Not tainted 5.10.0-rc3-next-20201110-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Currently we don't have a reproducer yet, but seems that there is a race in current codes: => io_put_sq_data ctx_list is empty now. | ==> kthread_park(sqd->thread); | | T1: sq thread is parked now. ==> kthread_stop(sqd->thread); | KTHREAD_SHOULD_STOP is set now.| ===> kthread_unpark(k); | | T2: sq thread is now unparkd, run again. | | T3: sq thread is now preempted out. | ===> wake_up_process(k); | | | T4: Since sqd ctx_list is empty, needs_sched will be true, | then sq thread sets task state to TASK_INTERRUPTIBLE, | and schedule, now sq thread will never be waken up. ===> wait_for_completion | I have artificially used mdelay() to simulate above race, will get same stack like this syzbot report, but to be honest, I'm not sure this code race triggers syzbot report. To fix this possible code race, when sq thread is unparked, need to check whether sq thread has been stopped. Reported-by: syzbot+03beeb595f074db9cfd1@syzkaller.appspotmail.com Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: share fixed_file_refs b/w multiple rsrcsPavel Begunkov
Double fixed files for splice/tee are done in a nasty way, it takes 2 ref_node refs, and during the second time it blindly overrides req->fixed_file_refs hoping that it haven't changed. That works because all that is done under iouring_lock in a single go but is error-prone. Bind everything explicitly to a single ref_node and take only one ref, with current ref_node ordering it's guaranteed to keep all files valid awhile the request is inflight. That's mainly a cleanup + preparation for generic resource handling, but also saves pcpu_ref get/put for splice/tee with 2 fixed files. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: replace inflight_wait with tctx->waitPavel Begunkov
As tasks now cancel only theirs requests, and inflight_wait is awaited only in io_uring_cancel_files(), which should be called with ->in_idle set, instead of keeping a separate inflight_wait use tctx->wait. That will add some spurious wakeups but actually is safer from point of not hanging the task. e.g. task1 | IRQ | *start* io_complete_rw_common(link) | link: req1 -> req2 -> req3(with files) *cancel_files() | io_wq_cancel(), etc. | | put_req(link), adds to io-wq req2 schedule() | So, task1 will never try to cancel req2 or req3. If req2 is long-standing (e.g. read(empty_pipe)), this may hang. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: don't take fs for recvmsg/sendmsgPavel Begunkov
We don't even allow not plain data msg_control, which is disallowed in __sys_{send,revb}msg_sock(). So no need in fs for IORING_OP_SENDMSG and IORING_OP_RECVMSG. fs->lock is less contanged not as much as before, but there are cases that can be, e.g. IOSQE_ASYNC. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: only wake up sq thread while current task is in io worker contextXiaoguang Wang
If IORING_SETUP_SQPOLL is enabled, sqes are either handled in sq thread task context or in io worker task context. If current task context is sq thread, we don't need to check whether should wake up sq thread. io_iopoll_req_issued() calls wq_has_sleeper(), which has smp_mb() memory barrier, before this patch, perf shows obvious overhead: Samples: 481K of event 'cycles', Event count (approx.): 299807382878 Overhead Comma Shared Object Symbol 3.69% :9630 [kernel.vmlinux] [k] io_issue_sqe With this patch, perf shows: Samples: 482K of event 'cycles', Event count (approx.): 299929547283 Overhead Comma Shared Object Symbol 0.70% :4015 [kernel.vmlinux] [k] io_issue_sqe It shows some obvious improvements. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: don't acquire uring_lock twiceXiaoguang Wang
Both IOPOLL and sqes handling need to acquire uring_lock, combine them together, then we just need to acquire uring_lock once. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: initialize 'timeout' properly in io_sq_thread()Xiaoguang Wang
Some static checker reports below warning: fs/io_uring.c:6939 io_sq_thread() error: uninitialized symbol 'timeout'. This is a false positive, but let's just initialize 'timeout' to make sure we don't trip over this. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: refactor io_sq_thread() handlingXiaoguang Wang
There are some issues about current io_sq_thread() implementation: 1. The prepare_to_wait() usage in __io_sq_thread() is weird. If multiple ctxs share one same poll thread, one ctx will put poll thread in TASK_INTERRUPTIBLE, but if other ctxs have work to do, we don't need to change task's stat at all. I think only if all ctxs don't have work to do, we can do it. 2. We use round-robin strategy to make multiple ctxs share one same poll thread, but there are various condition in __io_sq_thread(), which seems complicated and may affect round-robin strategy. To improve above issues, I take below actions: 1. If multiple ctxs share one same poll thread, only if all all ctxs don't have work to do, we can call prepare_to_wait() and schedule() to make poll thread enter sleep state. 2. To make round-robin strategy more straight, I simplify __io_sq_thread() a bit, it just does io poll and sqes submit work once, does not check various condition. 3. For multiple ctxs share one same poll thread, we choose the biggest sq_thread_idle among these ctxs as timeout condition, and will update it when ctx is in or out. 4. Not need to check EBUSY especially, if io_submit_sqes() returns EBUSY, IORING_SQ_CQ_OVERFLOW should be set, helper in liburing should be aware of cq overflow and enters kernel to flush work. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: always batch cancel in *cancel_files()Pavel Begunkov
Instead of iterating over each request and cancelling it individually in io_uring_cancel_files(), try to cancel all matching requests and use ->inflight_list only to check if there anything left. In many cases it should be faster, and we can reuse a lot of code from task cancellation. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: pass files into kill timeouts/pollPavel Begunkov
Make io_poll_remove_all() and io_kill_timeouts() to match against files as well. A preparation patch, effectively not used by now. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: don't iterate io_uring_cancel_files()Pavel Begunkov
io_uring_cancel_files() guarantees to cancel all matching requests, that's not necessary to do that in a loop. Move it up in the callchain into io_uring_cancel_task_requests(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: cancel only requests of current taskPavel Begunkov
io_uring_cancel_files() cancels all request that match files regardless of task. There is no real need in that, cancel only requests of the specified task. That also handles SQPOLL case as it already changes task to it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: add a {task,files} pair matching helperPavel Begunkov
Add io_match_task() that matches both task and files. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: simplify io_task_match()Pavel Begunkov
If IORING_SETUP_SQPOLL is set all requests belong to the corresponding SQPOLL task, so skip task checking in that case and always match. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: inline io_import_iovec()Pavel Begunkov
Inline io_import_iovec() and leave only its former __io_import_iovec() renamed to the original name. That makes it more obious what is reused in io_read/write(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: remove duplicated io_size from rwPavel Begunkov
io_size and iov_count in io_read() and io_write() hold the same value, kill the last one. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09fs/io_uring Don't use the return value from import_iovec().David Laight
This is the only code that relies on import_iovec() returning iter.count on success. This allows a better interface to import_iovec(). Signed-off-by: David Laight <david.laight@aculab.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: NULL files dereference by SQPOLLPavel Begunkov
SQPOLL task may find sqo_task->files == NULL and __io_sq_thread_acquire_files() would leave it unset, so following fget_many() and others try to dereference NULL and fault. Propagate an error files are missing. [ 118.962785] BUG: kernel NULL pointer dereference, address: 0000000000000020 [ 118.963812] #PF: supervisor read access in kernel mode [ 118.964534] #PF: error_code(0x0000) - not-present page [ 118.969029] RIP: 0010:__fget_files+0xb/0x80 [ 119.005409] Call Trace: [ 119.005651] fget_many+0x2b/0x30 [ 119.005964] io_file_get+0xcf/0x180 [ 119.006315] io_submit_sqes+0x3a4/0x950 [ 119.007481] io_sq_thread+0x1de/0x6a0 [ 119.007828] kthread+0x114/0x150 [ 119.008963] ret_from_fork+0x22/0x30 Reported-by: Josef Grieb <josef.grieb@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: add timeout support for io_uring_enter()Hao Xu
Now users who want to get woken when waiting for events should submit a timeout command first. It is not safe for applications that split SQ and CQ handling between two threads, such as mysql. Users should synchronize the two threads explicitly to protect SQ and that will impact the performance. This patch adds support for timeout to existing io_uring_enter(). To avoid overloading arguments, it introduces a new parameter structure which contains sigmask and timeout. I have tested the workloads with one thread submiting nop requests while the other reaping the cqe with timeout. It shows 1.8~2x faster when the iodepth is 16. Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> [axboe: various cleanups/fixes, and name change to SIG_IS_DATA] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-09io_uring: only plug when appropriateJens Axboe
We unconditionally call blk_start_plug() when starting the IO submission, but we only really should do that if we have more than 1 request to submit AND we're potentially dealing with block based storage underneath. For any other type of request, it's just a waste of time to do so. Add a ->plug bit to io_op_def and set it for read/write requests. We could make this more precise and check the file itself as well, but it doesn't matter that much and would quickly become more expensive. Signed-off-by: Jens Axboe <axboe@kernel.dk>