summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-04-01io_uring: fix EIOCBQUEUED iter revertPavel Begunkov
iov_iter_revert() is done in completion handlers that happensf before read/write returns -EIOCBQUEUED, no need to repeat reverting afterwards. Moreover, even though it may appear being just a no-op, it's actually races with 1) user forging a new iovec of a different size 2) reissue, that is done via io-wq continues completely asynchronously. Fixes: 3e6a0d3c7571c ("io_uring: fix -EAGAIN retry with IOPOLL") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01io_uring/io-wq: protect against sprintf overflowPavel Begunkov
task_pid may be large enough to not fit into the left space of TASK_COMM_LEN-sized buffers and overflow in sprintf. We not so care about uniqueness, so replace it with safer snprintf(). Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1702c6145d7e1c46fbc382f28334c02e1a3d3994.1617267273.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01io_uring: don't mark S_ISBLK async work as unboundedJens Axboe
S_ISBLK is marked as unbounded work for async preparation, because it doesn't match S_ISREG. That is incorrect, as any read/write to a block device is also a bounded operation. Fix it up and ensure that S_ISBLK isn't marked unbounded. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01ARM: mvebu: avoid clang -Wtautological-constant warningArnd Bergmann
Clang warns about the comparison when using a 32-bit phys_addr_t: drivers/bus/mvebu-mbus.c:621:17: error: result of comparison of constant 4294967296 with expression of type 'phys_addr_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] if (reg_start >= 0x100000000ULL) Add a cast to shut up the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20210323131952.2835509-1-arnd@kernel.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-01ARM: pxa: mainstone: avoid -Woverride-init warningArnd Bergmann
The default initializer at the start of the array causes a warning when building with W=1: In file included from arch/arm/mach-pxa/mainstone.c:47: arch/arm/mach-pxa/mainstone.h:124:33: error: initialized field overwritten [-Werror=override-init] 124 | #define MAINSTONE_IRQ(x) (MAINSTONE_NR_IRQS + (x)) | ^ arch/arm/mach-pxa/mainstone.h:133:33: note: in expansion of macro 'MAINSTONE_IRQ' 133 | #define MAINSTONE_S0_CD_IRQ MAINSTONE_IRQ(9) | ^~~~~~~~~~~~~ arch/arm/mach-pxa/mainstone.c:506:15: note: in expansion of macro 'MAINSTONE_S0_CD_IRQ' 506 | [5] = MAINSTONE_S0_CD_IRQ, | ^~~~~~~~~~~~~~~~~~~ Rework the initializer to list each element explicitly and only once. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210323130849.2362001-1-arnd@kernel.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-01ARM: omap1: fix building with clang IASArnd Bergmann
The clang integrated assembler fails to build one file with a complex asm instruction: arch/arm/mach-omap1/ams-delta-fiq-handler.S:249:2: error: invalid instruction, any one of the following would fix this: mov r10, #(1 << (((NR_IRQS_LEGACY + 12) - NR_IRQS_LEGACY) % 32)) @ set deferred_fiq bit ^ arch/arm/mach-omap1/ams-delta-fiq-handler.S:249:2: note: instruction requires: armv6t2 mov r10, #(1 << (((NR_IRQS_LEGACY + 12) - NR_IRQS_LEGACY) % 32)) @ set deferred_fiq bit ^ arch/arm/mach-omap1/ams-delta-fiq-handler.S:249:2: note: instruction requires: thumb2 mov r10, #(1 << (((NR_IRQS_LEGACY + 12) - NR_IRQS_LEGACY) % 32)) @ set deferred_fiq bit ^ The problem is that 'NR_IRQS_LEGACY' is not defined here. Apparently gas does not care because we first add and then subtract this number, leading to the immediate value to be the same regardless of the specific definition of NR_IRQS_LEGACY. Neither the way that 'gas' just silently builds this file, nor the way that clang IAS makes nonsensical suggestions for how to fix it is great. Fortunately there is an easy fix, which is to #include the header that contains the definition. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20210308153430.2530616-1-arnd@kernel.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-01soc/fsl: qbman: fix conflicting alignment attributesArnd Bergmann
When building with W=1, gcc points out that the __packed attribute on struct qm_eqcr_entry conflicts with the 8-byte alignment attribute on struct qm_fd inside it: drivers/soc/fsl/qbman/qman.c:189:1: error: alignment 1 of 'struct qm_eqcr_entry' is less than 8 [-Werror=packed-not-aligned] I assume that the alignment attribute is the correct one, and that qm_eqcr_entry cannot actually be unaligned in memory, so add the same alignment on the outer struct. Fixes: c535e923bb97 ("soc/fsl: Introduce DPAA 1.x QMan device driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210323131530.2619900-1-arnd@kernel.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-01ARM: keystone: fix integer overflow warningArnd Bergmann
clang warns about an impossible condition when building with 32-bit phys_addr_t: arch/arm/mach-keystone/keystone.c:79:16: error: result of comparison of constant 51539607551 with expression of type 'phys_addr_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] mem_end > KEYSTONE_HIGH_PHYS_END) { ~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~ arch/arm/mach-keystone/keystone.c:78:16: error: result of comparison of constant 34359738368 with expression of type 'phys_addr_t' (aka 'unsigned int') is always true [-Werror,-Wtautological-constant-out-of-range-compare] if (mem_start < KEYSTONE_HIGH_PHYS_START || ~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~ Change the temporary variable to a fixed-size u64 to avoid the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Santosh Shilimkar <ssantosh@kernel.org> Link: https://lore.kernel.org/r/20210323131814.2751750-1-arnd@kernel.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-02powerpc/vdso: Make sure vdso_wrapper.o is rebuilt everytime vdso.so is rebuiltChristophe Leroy
Commit bce74491c300 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o") moved vdso32_wrapper.o and vdso64_wrapper.o out of arch/powerpc/kernel/vdso[32/64]/ and removed the dependencies in the Makefile. This leads to the wrappers not being re-build hence the kernel embedding the old vdso library. Add back missing dependencies to ensure vdso32_wrapper.o and vdso64_wrapper.o are rebuilt when vdso32.so.dbg and vdso64.so.dbg are changed. Fixes: bce74491c300 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8bb015bc98c51d8ced581415b7e3d157e18da7c9.1617181918.git.christophe.leroy@csgroup.eu
2021-04-02powerpc/signal32: Fix Oops on sigreturn with unmapped VDSOChristophe Leroy
PPC32 encounters a KUAP fault when trying to handle a signal with VDSO unmapped. Kernel attempted to read user page (7fc07ec0) - exploit attempt? (uid: 0) BUG: Unable to handle kernel data access on read at 0x7fc07ec0 Faulting instruction address: 0xc00111d4 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=16K PREEMPT CMPC885 CPU: 0 PID: 353 Comm: sigreturn_vdso Not tainted 5.12.0-rc4-s3k-dev-01553-gb30c310ea220 #4814 NIP: c00111d4 LR: c0005a28 CTR: 00000000 REGS: cadb3dd0 TRAP: 0300 Not tainted (5.12.0-rc4-s3k-dev-01553-gb30c310ea220) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 48000884 XER: 20000000 DAR: 7fc07ec0 DSISR: 88000000 GPR00: c0007788 cadb3e90 c28d4a40 7fc07ec0 7fc07ed0 000004e0 7fc07ce0 00000000 GPR08: 00000001 00000001 7fc07ec0 00000000 28000282 1001b828 100a0920 00000000 GPR16: 100cac0c 100b0000 105c43a4 105c5685 100d0000 100d0000 100d0000 100b2e9e GPR24: ffffffff 105c43c8 00000000 7fc07ec8 cadb3f40 cadb3ec8 c28d4a40 00000000 NIP [c00111d4] flush_icache_range+0x90/0xb4 LR [c0005a28] handle_signal32+0x1bc/0x1c4 Call Trace: [cadb3e90] [100d0000] 0x100d0000 (unreliable) [cadb3ec0] [c0007788] do_notify_resume+0x260/0x314 [cadb3f20] [c000c764] syscall_exit_prepare+0x120/0x184 [cadb3f30] [c00100b4] ret_from_syscall+0xc/0x28 --- interrupt: c00 at 0xfe807f8 NIP: 0fe807f8 LR: 10001060 CTR: c0139378 REGS: cadb3f40 TRAP: 0c00 Not tainted (5.12.0-rc4-s3k-dev-01553-gb30c310ea220) MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 28000482 XER: 20000000 GPR00: 00000025 7fc081c0 77bb1690 00000000 0000000a 28000482 00000001 0ff03a38 GPR08: 0000d032 00006de5 c28d4a40 00000009 88000482 1001b828 100a0920 00000000 GPR16: 100cac0c 100b0000 105c43a4 105c5685 100d0000 100d0000 100d0000 100b2e9e GPR24: ffffffff 105c43c8 00000000 77ba7628 10002398 10010000 10002124 00024000 NIP [0fe807f8] 0xfe807f8 LR [10001060] 0x10001060 --- interrupt: c00 Instruction dump: 38630010 7c001fac 38630010 4200fff0 7c0004ac 4c00012c 4e800020 7c001fac 2c0a0000 38630010 4082ffcc 4bffffe4 <7c00186c> 2c070000 39430010 4082ff8c ---[ end trace 3973fb72b049cb06 ]--- This is because flush_icache_range() is called on user addresses. The same problem was detected some time ago on PPC64. It was fixed by enabling KUAP in commit 59bee45b9712 ("powerpc/mm: Fix missing KUAP disable in flush_coherent_icache()"). PPC32 doesn't use flush_coherent_icache() and fallbacks on clean_dcache_range() and invalidate_icache_range(). We could fix it similarly by enabling user access in those functions, but this is overkill for just flushing two instructions. The two instructions are 8 bytes aligned, so a single dcbst/icbi is enough to flush them. Do like __patch_instruction() and inline a dcbst followed by an icbi just after the write of the instructions, while user access is still allowed. The isync is not required because rfi will be used to return to user. icbi() is handled as a read so read-write user access is needed. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bde9154e5351a5ac7bca3d59cdb5a5e8edacbb79.1617199569.git.christophe.leroy@csgroup.eu
2021-04-02powerpc/ptrace: Don't return error when getting/setting FP regs without ↵Christophe Leroy
CONFIG_PPC_FPU_REGS An #ifdef CONFIG_PPC_FPU_REGS is missing in arch_ptrace() leading to the following Oops because [REGSET_FPR] entry is not initialised in native_regsets[]. [ 41.917608] BUG: Unable to handle kernel instruction fetch [ 41.922849] Faulting instruction address: 0xff8fd228 [ 41.927760] Oops: Kernel access of bad area, sig: 11 [#1] [ 41.933089] BE PAGE_SIZE=4K PREEMPT CMPC885 [ 41.940753] Modules linked in: [ 41.943768] CPU: 0 PID: 366 Comm: gdb Not tainted 5.12.0-rc5-s3k-dev-01666-g7aac86a0f057-dirty #4835 [ 41.952800] NIP: ff8fd228 LR: c004d9e0 CTR: ff8fd228 [ 41.957790] REGS: caae9df0 TRAP: 0400 Not tainted (5.12.0-rc5-s3k-dev-01666-g7aac86a0f057-dirty) [ 41.966741] MSR: 40009032 <EE,ME,IR,DR,RI> CR: 82004248 XER: 20000000 [ 41.973540] [ 41.973540] GPR00: c004d9b4 caae9eb0 c1b64f60 c1b64520 c0713cd4 caae9eb8 c1bacdfc 00000004 [ 41.973540] GPR08: 00000200 ff8fd228 c1bac700 00001032 28004242 1061aaf4 00000001 106d64a0 [ 41.973540] GPR16: 00000000 00000000 7fa0a774 10610000 7fa0aef9 00000000 10610000 7fa0a538 [ 41.973540] GPR24: 7fa0a580 7fa0a570 c1bacc00 c1b64520 c1bacc00 caae9ee8 00000108 c0713cd4 [ 42.009685] NIP [ff8fd228] 0xff8fd228 [ 42.013300] LR [c004d9e0] __regset_get+0x100/0x124 [ 42.018036] Call Trace: [ 42.020443] [caae9eb0] [c004d9b4] __regset_get+0xd4/0x124 (unreliable) [ 42.026899] [caae9ee0] [c004da94] copy_regset_to_user+0x5c/0xb0 [ 42.032751] [caae9f10] [c002f640] sys_ptrace+0xe4/0x588 [ 42.037915] [caae9f30] [c0011010] ret_from_syscall+0x0/0x28 [ 42.043422] --- interrupt: c00 at 0xfd1f8e4 [ 42.047553] NIP: 0fd1f8e4 LR: 1004a688 CTR: 00000000 [ 42.052544] REGS: caae9f40 TRAP: 0c00 Not tainted (5.12.0-rc5-s3k-dev-01666-g7aac86a0f057-dirty) [ 42.061494] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 48004442 XER: 00000000 [ 42.068551] [ 42.068551] GPR00: 0000001a 7fa0a040 77dad7e0 0000000e 00000170 00000000 7fa0a078 00000004 [ 42.068551] GPR08: 00000000 108deb88 108dda40 106d6010 44004442 1061aaf4 00000001 106d64a0 [ 42.068551] GPR16: 00000000 00000000 7fa0a774 10610000 7fa0aef9 00000000 10610000 7fa0a538 [ 42.068551] GPR24: 7fa0a580 7fa0a570 1078fe00 1078fd70 1078fd70 00000170 0fdd3244 0000000d [ 42.104696] NIP [0fd1f8e4] 0xfd1f8e4 [ 42.108225] LR [1004a688] 0x1004a688 [ 42.111753] --- interrupt: c00 [ 42.114768] Instruction dump: [ 42.117698] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX [ 42.125443] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX [ 42.133195] ---[ end trace d35616f22ab2100c ]--- Adding the missing #ifdef is not good because gdb doesn't like getting an error when getting registers. Instead, make ptrace return 0s when CONFIG_PPC_FPU_REGS is not set. Fixes: b6254ced4da6 ("powerpc/signal: Don't manage floating point regs when no FPU") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9121a44a2d50ba1af18d8aa5ada06c9a3bea8afd.1617200085.git.christophe.leroy@csgroup.eu
2021-04-01null_blk: fix command timeout completion handlingDamien Le Moal
Memory backed or zoned null block devices may generate actual request timeout errors due to the submission path being blocked on memory allocation or zone locking. Unlike fake timeouts or injected timeouts, the request submission path will call blk_mq_complete_request() or blk_mq_end_request() for these real timeout errors, causing a double completion and use after free situation as the block layer timeout handler executes blk_mq_rq_timed_out() and __blk_mq_free_request() in blk_mq_check_expired(). This problem often triggers a NULL pointer dereference such as: BUG: kernel NULL pointer dereference, address: 0000000000000050 RIP: 0010:blk_mq_sched_mark_restart_hctx+0x5/0x20 ... Call Trace: dd_finish_request+0x56/0x80 blk_mq_free_request+0x37/0x130 null_handle_cmd+0xbf/0x250 [null_blk] ? null_queue_rq+0x67/0xd0 [null_blk] blk_mq_dispatch_rq_list+0x122/0x850 __blk_mq_do_dispatch_sched+0xbb/0x2c0 __blk_mq_sched_dispatch_requests+0x13d/0x190 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x49/0x90 process_one_work+0x26c/0x580 worker_thread+0x55/0x3c0 ? process_one_work+0x580/0x580 kthread+0x134/0x150 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x1f/0x30 This problem very often triggers when running the full btrfs xfstests on a memory-backed zoned null block device in a VM with limited amount of memory. Avoid this by executing blk_mq_complete_request() in null_timeout_rq() only for commands that are marked for a fake timeout completion using the fake_timeout boolean in struct null_cmd. For timeout errors injected through debugfs, the timeout handler will execute blk_mq_complete_request()i as before. This is safe as the submission path does not execute complete requests in this case. In null_timeout_rq(), also make sure to set the command error field to BLK_STS_TIMEOUT and to propagate this error through to the request completion. Reported-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Reviewed-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210331225244.126426-1-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01idr test suite: Improve reporting from idr_find_test_1Matthew Wilcox (Oracle)
Instead of just reporting an assertion failure, report enough information that we can start diagnosing exactly went wrong. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01idr test suite: Create anchor before launching throbberMatthew Wilcox (Oracle)
The throbber could race with creation of the anchor entry and cause the IDR to have zero entries in it, which would cause the test to fail. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01idr test suite: Take RCU read lock in idr_find_test_1Matthew Wilcox (Oracle)
When run on a single CPU, this test would frequently access already-freed memory. Due to timing, this bug never showed up on multi-CPU tests. Reported-by: Chris von Recklinghausen <crecklin@redhat.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01radix tree test suite: Register the main thread with the RCU libraryMatthew Wilcox (Oracle)
Several test runners register individual worker threads with the RCU library, but neglect to register the main thread, which can lead to objects being freed while the main thread is in what appears to be an RCU critical section. Reported-by: Chris von Recklinghausen <crecklin@redhat.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()Vitaly Kuznetsov
Commit 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") broke CPU0 hotplug on certain systems, e.g. I'm observing the following on AWS Nitro (e.g r5b.xlarge but other instance types are affected as well): # echo 0 > /sys/devices/system/cpu/cpu0/online # echo 1 > /sys/devices/system/cpu/cpu0/online <10 seconds delay> -bash: echo: write error: Input/output error In fact, the above mentioned commit only revealed the problem and did not introduce it. On x86, to wakeup CPU an NMI is being used and hlt_play_dead()/mwait_play_dead() loops are prepared to handle it: /* * If NMI wants to wake up CPU0, start CPU0. */ if (wakeup_cpu0()) start_cpu0(); cpuidle_play_dead() -> acpi_idle_play_dead() (which is now being called on systems where it wasn't called before the above mentioned commit) serves the same purpose but it doesn't have a path for CPU0. What happens now on wakeup is: - NMI is sent to CPU0 - wakeup_cpu0_nmi() works as expected - we get back to while (1) loop in acpi_idle_play_dead() - safe_halt() puts CPU0 to sleep again. The straightforward/minimal fix is add the special handling for CPU0 on x86 and that's what the patch is doing. Fixes: 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: 5.10+ <stable@vger.kernel.org> # 5.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-01ASoC: codecs: lpass-rx-macro: set npl clock rate correctlySrinivas Kandagatla
NPL clock rate is twice the MCLK rate, so set this correctly to avoid soundwire timeouts. Fixes: af3d54b99764 ("ASoC: codecs: lpass-rx-macro: add support for lpass rx macro") Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210331171235.24824-2-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2021-04-01ASoC: codecs: lpass-tx-macro: set npl clock rate correctlySrinivas Kandagatla
NPL clock rate is twice the MCLK rate, so set this correctly to avoid soundwire timeouts. Fixes: c39667ddcfc5 ("ASoC: codecs: lpass-tx-macro: add support for lpass tx macro") Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210331171235.24824-1-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2021-04-01Merge tag 'imx-fixes-5.12-2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 5.12, round 2: - Fix a system failure on imx6qdl-phytec-pfla02 board when booting from SD, by adding missing vmmc supply for SD interfaces. - Fix address typo in i.MX8MM/Q IOMUXC_SD1_DATA0_GPIO2_IO2 definition. * tag 'imx-fixes-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx6: pbab01: Set vmmc supply for both SD interfaces arm64: dts: imx8mm/q: Fix pad control of SD1_DATA0 Link: https://lore.kernel.org/r/20210330090236.GQ22955@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-01Merge tag 'omap-for-v5.12/fixes-rc4-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes More fixes for omaps for v5.12-rc cycle Two fixes for hangs, mmc slot order fix, and a voltage typo fix: - Remove unused duplicate sha2md5_fck clock node that can race with the OMAP4_SHA2MD5_CLKCTRL clock node for disable for unused clocks - Add aliases for omap4/5 mmc to put the slots back into the right order again - Fix typo for bionic voltage controllers that accidentally use mpu for all instances instead of mpu, core and iva - Fix random hangs for droid4 caused by missing fix from TI Android kernel tree to do a dummy smc call on cpuidle wakeup path * tag 'omap-for-v5.12/fixes-rc4-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP4: PM: update ROM return address for OSWR and OFF ARM: OMAP4: Fix PMIC voltage domains for bionic ARM: dts: Fix moving mmc devices with aliases for omap4 & 5 ARM: dts: Drop duplicate sha2md5_fck to fix clk_disable race Link: https://lore.kernel.org/r/pull-1616584662-702939@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-01Merge tag 'arm-soc/for-5.12/devicetree-part2' of ↵Arnd Bergmann
https://github.com/Broadcom/stblinux into arm/fixes This pull request contains Broadcom ARM-based SoC changes for 5.12, please pull the following: - Florian reverts the adding of the second level interrupt controller for HDMI BSC interrupts since they collide with the main I2C controller (i2c-bcm2835). * tag 'arm-soc/for-5.12/devicetree-part2' of https://github.com/Broadcom/stblinux: Revert "ARM: dts: bcm2711: Add the BSC interrupt controller" Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-01selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0)Vitaly Kuznetsov
Add a test for the issue when KVM_SET_CLOCK(0) call could cause TSC page value to go very big because of a signedness issue around hv_clock->system_time. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210326155551.17446-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01KVM: x86: Prevent 'hv_clock->system_time' from going negative in ↵Vitaly Kuznetsov
kvm_guest_time_update() When guest time is reset with KVM_SET_CLOCK(0), it is possible for 'hv_clock->system_time' to become a small negative number. This happens because in KVM_SET_CLOCK handling we set 'kvm->arch.kvmclock_offset' based on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled, kvm_guest_time_update() does (masterclock in use case): hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset; And 'master_kernel_ns' represents the last time when masterclock got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a problem, the difference is very small, e.g. I'm observing hv_clock.system_time = -70 ns. The issue comes from the fact that 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in compute_tsc_page_parameters() becomes a very big number. Use 'master_kernel_ns' instead of get_kvmclock_ns() when masterclock is in use and get_kvmclock_base_ns() when it's not to prevent 'system_time' from going negative. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210331124130.337992-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01KVM: x86: disable interrupts while pvclock_gtod_sync_lock is takenPaolo Bonzini
pvclock_gtod_sync_lock can be taken with interrupts disabled if the preempt notifier calls get_kvmclock_ns to update the Xen runstate information: spin_lock include/linux/spinlock.h:354 [inline] get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587 kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69 kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100 kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline] kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062 So change the users of the spinlock to spin_lock_irqsave and spin_unlock_irqrestore. Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information") Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01KVM: x86: reduce pvclock_gtod_sync_lock critical sectionsPaolo Bonzini
There is no need to include changes to vcpu->requests into the pvclock_gtod_sync_lock critical section. The changes to the shared data structures (in pvclock_update_vm_gtod_copy) already occur under the lock. Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01Merge branch 'kvm-fix-svm-races' into kvm-masterPaolo Bonzini
2021-04-01KVM: SVM: ensure that EFER.SVME is set when running nested guest or on ↵Paolo Bonzini
nested vmexit Fixing nested_vmcb_check_save to avoid all TOC/TOU races is a bit harder in released kernels, so do the bare minimum by avoiding that EFER.SVME is cleared. This is problematic because svm_set_efer frees the data structures for nested virtualization if EFER.SVME is cleared. Also check that EFER.SVME remains set after a nested vmexit; clearing it could happen if the bit is zero in the save area that is passed to KVM_SET_NESTED_STATE (the save area of the nested state corresponds to the nested hypervisor's state and is restored on the next nested vmexit). Cc: stable@vger.kernel.org Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01KVM: SVM: load control fields from VMCB12 before checking themPaolo Bonzini
Avoid races between check and use of the nested VMCB controls. This for example ensures that the VMRUN intercept is always reflected to the nested hypervisor, instead of being processed by the host. Without this patch, it is possible to end up with svm->nested.hsave pointing to the MSR permission bitmap for nested guests. This bug is CVE-2021-29657. Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@vger.kernel.org Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01Merge tag 'amd-drm-fixes-5.12-2021-03-31' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.12-2021-03-31: amdgpu: - Polaris idle power fix - VM fix - Vangogh S3 fix - Fixes for non-4K page sizes amdkfd: - dqm fence memory corruption fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210401020057.17831-1-alexander.deucher@amd.com
2021-04-01Merge tag 'exynos-drm-fixes-for-v5.12-rc6' of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes Just one cleanup which drops of_gpio.h inclusion. - This header file isn't used anymore so drop it. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Inki Dae <inki.dae@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/1617016858-14081-1-git-send-email-inki.dae@samsung.com
2021-03-31drm/amdgpu: check alignment on CPU page for bo mapXℹ Ruoyao
The page table of AMDGPU requires an alignment to CPU page so we should check ioctl parameters for it. Return -EINVAL if some parameter is unaligned to CPU page, instead of corrupt the page table sliently. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-03-31drm/amdgpu: Set a suitable dev_info.gart_page_sizeHuacai Chen
In Mesa, dev_info.gart_page_size is used for alignment and it was set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU driver requires an alignment on CPU pages. So, for non-4KB page system, gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE). Signed-off-by: Rui Wang <wangr@lemote.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1 [Xi: rebased for drm-next, use max_t for checkpatch, and reworded commit message.] Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Tested-by: Dan Horák <dan@danny.cz> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-03-31drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in suspendAlex Deucher
Do the same thing we do for Renoir. We can check, but since the sbios has started DPM, it will always return true which causes the driver to skip some of the SMU init when it shouldn't. Reviewed-by: Zhan Liu <zhan.liu@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-03-31drm/amdkfd: dqm fence memory corruptionQu Huang
Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption. Changes since v1: * Change dqm->fence_addr as a u64 pointer to fix this issue, also fix up query_status and amdkfd_fence_wait_timeout function uses 64 bit fence value to make them consistent. Signed-off-by: Qu Huang <jinsdb@126.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-03-31block: only update parent bi_status when bio failYufen Yu
For multiple split bios, if one of the bio is fail, the whole should return error to application. But we found there is a race between bio_integrity_verify_fn and bio complete, which return io success to application after one of the bio fail. The race as following: split bio(READ) kworker nvme_complete_rq blk_update_request //split error=0 bio_endio bio_integrity_endio queue_work(kintegrityd_wq, &bip->bip_work); bio_integrity_verify_fn bio_endio //split bio __bio_chain_endio if (!parent->bi_status) <interrupt entry> nvme_irq blk_update_request //parent error=7 req_bio_endio bio->bi_status = 7 //parent bio <interrupt exit> parent->bi_status = 0 parent->bi_end_io() // return bi_status=0 The bio has been split as two: split and parent. When split bio completed, it depends on kworker to do endio, while bio_integrity_verify_fn have been interrupted by parent bio complete irq handler. Then, parent bio->bi_status which have been set in irq handler will overwrite by kworker. In fact, even without the above race, we also need to conside the concurrency beteen mulitple split bio complete and update the same parent bi_status. Normally, multiple split bios will be issued to the same hctx and complete from the same irq vector. But if we have updated queue map between multiple split bios, these bios may complete on different hw queue and different irq vector. Then the concurrency update parent bi_status may cause the final status error. Suggested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-31xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory modelOng Boon Leong
xdp_return_frame() may be called outside of NAPI context to return xdpf back to page_pool. xdp_return_frame() calls __xdp_return() with napi_direct = false. For page_pool memory model, __xdp_return() calls xdp_return_frame_no_direct() unconditionally and below false negative kernel BUG throw happened under preempt-rt build: [ 430.450355] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3884 [ 430.451678] caller is __xdp_return+0x1ff/0x2e0 [ 430.452111] CPU: 0 PID: 3884 Comm: modprobe Tainted: G U E 5.12.0-rc2+ #45 Changes in v2: - This patch fixes the issue by making xdp_return_frame_no_direct() is only called if napi_direct = true, as recommended for better by Jesper Dangaard Brouer. Thanks! Fixes: 2539650fadbf ("xdp: Helpers for disabling napi_direct of xdp_return_frame") Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31Revert "net: correct sk_acceptq_is_full()"Eric Dumazet
This reverts commit f211ac154577ec9ccf07c15f18a6abf0d9bdb4ab. We had similar attempt in the past, and we reverted it. History: 64a146513f8f12ba204b7bf5cb7e9505594ead42 [NET]: Revert incorrect accept queue backlog changes. 8488df894d05d6fa41c2bd298c335f944bb0e401 [NET]: Fix bugs in "Whether sock accept queue is full" checking I am adding a fat comment so that future attempts will be much harder. Fixes: f211ac154577 ("net: correct sk_acceptq_is_full()") Cc: iuyacan <yacanliu@163.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31Merge tag 'mlx5-fixes-2021-03-31' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-03-31 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-03-31 1) Fix ipv4 pmtu checks for xfrm anf vti interfaces. From Eyal Birger. 2) There are situations where the socket passed to xfrm_output_resume() is not the same as the one attached to the skb. Use the socket passed to xfrm_output_resume() to avoid lookup failures when xfrm is used with VRFs. From Evan Nimmo. 3) Make the xfrm_state_hash_generation sequence counter per network namespace because but its write serialization lock is also per network namespace. Write protection is insufficient otherwise. From Ahmed S. Darwish. 4) Fixup sctp featue flags when used with esp offload. From Xin Long. 5) xfrm BEET mode doesn't support fragments for inner packets. This is a limitation of the protocol, so no fix possible. Warn at least to notify the user about that situation. From Xin Long. 6) Fix NULL pointer dereference on policy lookup when namespaces are uses in combination with esp offload. 7) Fix incorrect transformation on esp offload when packets get segmented at layer 3. 8) Fix some user triggered usages of WARN_ONCE in the xfrm compat layer. From Dmitry Safonov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31net/rds: Fix a use after free in rds_message_map_pagesLv Yunlong
In rds_message_map_pages, the rm is freed by rds_message_put(rm). But rm is still used by rm->data.op_sg in return value. My patch assigns ERR_CAST(rm->data.op_sg) to err before the rm is freed to avoid the uaf. Fixes: 7dba92037baf3 ("net/rds: Use ERR_PTR for rds_message_alloc_sgs()") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31neighbour: Disregard DEAD dst in neigh_updateTong Zhu
After a short network outage, the dst_entry is timed out and put in DST_OBSOLETE_DEAD. We are in this code because arp reply comes from this neighbour after network recovers. There is a potential race condition that dst_entry is still in DST_OBSOLETE_DEAD. With that, another neighbour lookup causes more harm than good. In best case all packets in arp_queue are lost. This is counterproductive to the original goal of finding a better path for those packets. I observed a worst case with 4.x kernel where a dst_entry in DST_OBSOLETE_DEAD state is associated with loopback net_device. It leads to an ethernet header with all zero addresses. A packet with all zero source MAC address is quite deadly with mac80211, ath9k and 802.11 block ack. It fails ieee80211_find_sta_by_ifaddr in ath9k (xmit.c). Ath9k flushes tx queue (ath_tx_complete_aggr). BAW (block ack window) is not updated. BAW logic is damaged and ath9k transmission is disabled. Signed-off-by: Tong Zhu <zhutong@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31net/mlx5e: Guarantee room for XSK wakeup NOP on async ICOSQTariq Toukan
XSK wakeup flow triggers an IRQ by posting a NOP WQE and hitting the doorbell on the async ICOSQ. It maintains its state so that it doesn't issue another NOP WQE if it has an outstanding one already. For this flow to work properly, the NOP post must not fail. Make sure to reserve room for the NOP WQE in all WQE posts to the async ICOSQ. Fixes: 8d94b590f1e4 ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one") Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support") Fixes: 0419d8c9d8f8 ("net/mlx5e: kTLS, Add kTLS RX resync support") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-31net/mlx5e: Consider geneve_opts for encap contextsDima Chumak
Current algorithm for encap keys is legacy from initial vxlan implementation and doesn't take into account all possible fields of a tunnel. For example, for a Geneve tunnel, which may have additional TLV options, they are ignored when comparing encap keys and a rule can be attached to an incorrect encap entry. Fix that by introducing encap_info_equal() operation in struct mlx5e_tc_tunnel. Geneve tunnel type uses custom implementation, which extends generic algorithm and considers options if they are set. Fixes: 7f1a546e3222 ("net/mlx5e: Consider tunnel type for encap contexts") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-31net/mlx5: Don't request more than supported EQsDaniel Jurgens
Calculating the number of compeltion EQs based on the number of available IRQ vectors doesn't work now that all async EQs share one IRQ. Thus the max number of EQs can be exceeded on systems with more than approximately 256 CPUs. Take this into account when calculating the number of available completion EQs. Fixes: 81bfa206032a ("net/mlx5: Use a single IRQ for all async EQs") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-31net/mlx5e: kTLS, Fix RX counters atomicityTariq Toukan
Some TLS RX counters increment per socket/connection, and are not protected against parallel modifications from several cores. Switch them to atomic counters by taking them out of the RQ stats into the global atomic TLS stats. In this patch, we touch 'rx_tls_ctx/del' that count the number of device-offloaded RX TLS connections added/deleted. These counters are updated in the add/del callbacks, out of the fast data-path. This change is not needed for counters that increment only in NAPI context, as they are protected by the NAPI mechanism. Keep them as tls_* counters under 'struct mlx5e_rq_stats'. Fixes: 76c1e1ac2aae ("net/mlx5e: kTLS, Add kTLS RX stats") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-31net/mlx5e: kTLS, Fix TX counters atomicityTariq Toukan
Some TLS TX counters increment per socket/connection, and are not protected against parallel modifications from several cores. Switch them to atomic counters by taking them out of the SQ stats into the global atomic TLS stats. In this patch, we touch a single counter 'tx_tls_ctx' that counts the number of device-offloaded TX TLS connections added. Now that this counter can be increased without the for having the SQ context in hand, move it to the mlx5e_ktls_add_tx() callback where it really belongs, out of the fast data-path. This change is not needed for counters that increment only in NAPI context or under the TX lock, as they are already protected. Keep them as tls_* counters under 'struct mlx5e_sq_stats'. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-31net/mlx5: E-switch, Create vport miss group only if src rewrite is supportedMaor Dickman
Create send to vport miss group was added in order to support traffic recirculation to root table with metadata source rewrite. This group is created also in case source rewrite isn't supported. Fixed by creating send to vport miss group only if source rewrite is supported by FW. Fixes: 8e404fefa58b ("net/mlx5e: Match recirculated packet miss in slow table using reg_c1") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-31net/mlx5e: Fix ethtool indication of connector typeAya Levin
Use connector_type read from PTYS register when it's valid, based on corresponding capability bit. Fixes: 5b4793f81745 ("net/mlx5e: Add support for reading connector type from PTYS") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-31net/mlx5: Delete auxiliary bus driver eth-rep firstMaor Dickman
Delete auxiliary bus drivers flow deletes the eth driver first and then the eth-reps driver but eth-reps devices resources are depend on eth device. Fixed by changing the delete order of auxiliary bus drivers to delete the eth-rep driver first and after it the eth driver. Fixes: 601c10c89cbb ("net/mlx5: Delete custom device management logic") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>