summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-09-22KVM: nVMX: Fix nested bus lock VM exitChenyi Qiang
Nested bus lock VM exits are not supported yet. If L2 triggers bus lock VM exit, it will be directed to L1 VMM, which would cause unexpected behavior. Therefore, handle L2's bus lock VM exits in L0 directly. Fixes: fe6b6bc802b4 ("KVM: VMX: Enable bus lock VM exit") Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210914095041.29764-1-chenyi.qiang@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: x86: Identify vCPU0 by its vcpu_idx instead of its vCPUs array entrySean Christopherson
Use vcpu_idx to identify vCPU0 when updating HyperV's TSC page, which is shared by all vCPUs and "owned" by vCPU0 (because vCPU0 is the only vCPU that's guaranteed to exist). Using kvm_get_vcpu() to find vCPU works, but it's a rather odd and suboptimal method to check the index of a given vCPU. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210910183220.2397812-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: x86: Query vcpu->vcpu_idx directly and drop its accessorSean Christopherson
Read vcpu->vcpu_idx directly instead of bouncing through the one-line wrapper, kvm_vcpu_get_idx(), and drop the wrapper. The wrapper is a remnant of the original implementation and serves no purpose; remove it before it gains more users. Back when kvm_vcpu_get_idx() was added by commit 497d72d80a78 ("KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus"), the implementation was more than just a simple wrapper as vcpu->vcpu_idx did not exist and retrieving the index meant walking over the vCPU array to find the given vCPU. When vcpu_idx was introduced by commit 8750e72a79dd ("KVM: remember position in kvm->vcpus array"), the helper was left behind, likely to avoid extra thrash (but even then there were only two users, the original arm usage having been removed at some point in the past). No functional change intended. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210910183220.2397812-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22kvm: fix wrong exception emulation in check_rdtscHou Wenlong
According to Intel's SDM Vol2 and AMD's APM Vol3, when CR4.TSD is set, use rdtsc/rdtscp instruction above privilege level 0 should trigger a #GP. Fixes: d7eb82030699e ("KVM: SVM: Add intercept checks for remaining group7 instructions") Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com> Message-Id: <1297c0dd3f1bb47a6d089f850b629c7aa0247040.1629257115.git.houwenlong93@linux.alibaba.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: SEV: Pin guest memory for write for RECEIVE_UPDATE_DATASean Christopherson
Require the target guest page to be writable when pinning memory for RECEIVE_UPDATE_DATA. Per the SEV API, the PSP writes to guest memory: The result is then encrypted with GCTX.VEK and written to the memory pointed to by GUEST_PADDR field. Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command") Cc: stable@vger.kernel.org Cc: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210914210951.2994260-2-seanjc@google.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Peter Gonda <pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: SVM: fix missing sev_decommission in sev_receive_startMingwei Zhang
DECOMMISSION the current SEV context if binding an ASID fails after RECEIVE_START. Per AMD's SEV API, RECEIVE_START generates a new guest context and thus needs to be paired with DECOMMISSION: The RECEIVE_START command is the only command other than the LAUNCH_START command that generates a new guest context and guest handle. The missing DECOMMISSION can result in subsequent SEV launch failures, as the firmware leaks memory and might not able to allocate more SEV guest contexts in the future. Note, LAUNCH_START suffered the same bug, but was previously fixed by commit 934002cd660b ("KVM: SVM: Call SEV Guest Decommission if ASID binding fails"). Cc: Alper Gun <alpergun@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: David Rienjes <rientjes@google.com> Cc: Marc Orr <marcorr@google.com> Cc: John Allen <john.allen@amd.com> Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vipin Sharma <vipinsh@google.com> Cc: stable@vger.kernel.org Reviewed-by: Marc Orr <marcorr@google.com> Acked-by: Brijesh Singh <brijesh.singh@amd.com> Fixes: af43cbbf954b ("KVM: SVM: Add support for KVM_SEV_RECEIVE_START command") Signed-off-by: Mingwei Zhang <mizhang@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210912181815.3899316-1-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: SEV: Acquire vcpu mutex when updating VMSAPeter Gonda
The update-VMSA ioctl touches data stored in struct kvm_vcpu, and therefore should not be performed concurrently with any VCPU ioctl that might cause KVM or the processor to use the same data. Adds vcpu mutex guard to the VMSA updating code. Refactors out __sev_launch_update_vmsa() function to deal with per vCPU parts of sev_launch_update_vmsa(). Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Signed-off-by: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20210915171755.3773766-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: do not shrink halt_poll_ns below grow_startSergey Senozhatsky
grow_halt_poll_ns() ignores values between 0 and halt_poll_ns_grow_start (10000 by default). However, when we shrink halt_poll_ns we may fall way below halt_poll_ns_grow_start and endup with halt_poll_ns values that don't make a lot of sense: like 1 or 9, or 19. VCPU1 trace (halt_poll_ns_shrink equals 2): VCPU1 grow 10000 VCPU1 shrink 5000 VCPU1 shrink 2500 VCPU1 shrink 1250 VCPU1 shrink 625 VCPU1 shrink 312 VCPU1 shrink 156 VCPU1 shrink 78 VCPU1 shrink 39 VCPU1 shrink 19 VCPU1 shrink 9 VCPU1 shrink 4 Mirror what grow_halt_poll_ns() does and set halt_poll_ns to 0 as soon as new shrink-ed halt_poll_ns value falls below halt_poll_ns_grow_start. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210902031100.252080-1-senozhatsky@chromium.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: nVMX: fix comments of handle_vmon()Yu Zhang
"VMXON pointer" is saved in vmx->nested.vmxon_ptr since commit 3573e22cfeca ("KVM: nVMX: additional checks on vmxon region"). Also, handle_vmptrld() & handle_vmclear() now have logic to check the VMCS pointer against the VMXON pointer. So just remove the obsolete comments of handle_vmon(). Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Message-Id: <20210908171731.18885-1-yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: x86: Handle SRCU initialization failure during page track initHaimin Zhang
Check the return of init_srcu_struct(), which can fail due to OOM, when initializing the page track mechanism. Lack of checking leads to a NULL pointer deref found by a modified syzkaller. Reported-by: TCS Robot <tcs_robot@tencent.com> Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Message-Id: <1630636626-12262-1-git-send-email-tcs_kernel@tencent.com> [Move the call towards the beginning of kvm_arch_init_vm. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: VMX: Remove defunct "nr_active_uret_msrs" fieldSean Christopherson
Remove vcpu_vmx.nr_active_uret_msrs and its associated comment, which are both defunct now that KVM keeps the list constant and instead explicitly tracks which entries need to be loaded into hardware. No functional change intended. Fixes: ee9d22e08d13 ("KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210908002401.1947049-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22selftests: KVM: Align SMCCC call with the spec in steal_timeOliver Upton
The SMC64 calling convention passes a function identifier in w0 and its parameters in x1-x17. Given this, there are two deviations in the SMC64 call performed by the steal_time test: the function identifier is assigned to a 64 bit register and the parameter is only 32 bits wide. Align the call with the SMCCC by using a 32 bit register to handle the function identifier and increasing the parameter width to 64 bits. Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <20210921171121.2148982-3-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22selftests: KVM: Fix check for !POLLIN in demand_paging_testOliver Upton
The logical not operator applies only to the left hand side of a bitwise operator. As such, the check for POLLIN not being set in revents wrong. Fix it by adding parentheses around the bitwise expression. Fixes: 4f72180eb4da ("KVM: selftests: Add demand paging content to the demand paging test") Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20210921171121.2148982-2-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: x86: Clear KVM's cached guest CR3 at RESET/INITSean Christopherson
Explicitly zero the guest's CR3 and mark it available+dirty at RESET/INIT. Per Intel's SDM and AMD's APM, CR3 is zeroed at both RESET and INIT. For RESET, this is a nop as vcpu is zero-allocated. For INIT, the bug has likely escaped notice because no firmware/kernel puts its page tables root at PA=0, let alone relies on INIT to get the desired CR3 for such page tables. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210921000303.400537-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: x86: Mark all registers as avail/dirty at vCPU creationSean Christopherson
Mark all registers as available and dirty at vCPU creation, as the vCPU has obviously not been loaded into hardware, let alone been given the chance to be modified in hardware. On SVM, reading from "uninitialized" hardware is a non-issue as VMCBs are zero allocated (thus not truly uninitialized) and hardware does not allow for arbitrary field encoding schemes. On VMX, backing memory for VMCSes is also zero allocated, but true initialization of the VMCS _technically_ requires VMWRITEs, as the VMX architectural specification technically allows CPU implementations to encode fields with arbitrary schemes. E.g. a CPU could theoretically store the inverted value of every field, which would result in VMREAD to a zero-allocated field returns all ones. In practice, only the AR_BYTES fields are known to be manipulated by hardware during VMREAD/VMREAD; no known hardware or VMM (for nested VMX) does fancy encoding of cacheable field values (CR0, CR3, CR4, etc...). In other words, this is technically a bug fix, but practically speakings it's a glorified nop. Failure to mark registers as available has been a lurking bug for quite some time. The original register caching supported only GPRs (+RIP, which is kinda sorta a GPR), with the masks initialized at ->vcpu_reset(). That worked because the two cacheable registers, RIP and RSP, are generally speaking not read as side effects in other flows. Arguably, commit aff48baa34c0 ("KVM: Fetch guest cr3 from hardware on demand") was the first instance of failure to mark regs available. While _just_ marking CR3 available during vCPU creation wouldn't have fixed the VMREAD from an uninitialized VMCS bug because ept_update_paging_mode_cr0() unconditionally read vmcs.GUEST_CR3, marking CR3 _and_ intentionally not reading GUEST_CR3 when it's available would have avoided VMREAD to a technically-uninitialized VMCS. Fixes: aff48baa34c0 ("KVM: Fetch guest cr3 from hardware on demand") Fixes: 6de4f3ada40b ("KVM: Cache pdptrs") Fixes: 6de12732c42c ("KVM: VMX: Optimize vmx_get_rflags()") Fixes: 2fb92db1ec08 ("KVM: VMX: Cache vmcs segment fields") Fixes: bd31fe495d0d ("KVM: VMX: Add proper cache tracking for CR0") Fixes: f98c1e77127d ("KVM: VMX: Add proper cache tracking for CR4") Fixes: 5addc235199f ("KVM: VMX: Cache vmcs.EXIT_QUALIFICATION using arch avail_reg flags") Fixes: 8791585837f6 ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210921000303.400537-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: selftests: Remove __NR_userfaultfd syscall fallbackSean Christopherson
Revert the __NR_userfaultfd syscall fallback added for KVM selftests now that x86's unistd_{32,63}.h overrides are under uapi/ and thus not in KVM selftests' search path, i.e. now that KVM gets x86 syscall numbers from the installed kernel headers. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210901203030.1292304-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugsSean Christopherson
Add a test to verify an rseq's CPU ID is updated correctly if the task is migrated while the kernel is handling KVM_RUN. This is a regression test for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM without updating rseq, leading to a stale CPU ID and other badness. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Message-Id: <20210901203030.1292304-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22tools: Move x86 syscall number fallbacks to .../uapi/Sean Christopherson
Move unistd_{32,64}.h from x86/include/asm to x86/include/uapi/asm so that tools/selftests that install kernel headers, e.g. KVM selftests, can include non-uapi tools headers, e.g. to get 'struct list_head', without effectively overriding the installed non-tool uapi headers. Swapping KVM's search order, e.g. to search the kernel headers before tool headers, is not a viable option as doing results in linux/type.h and other core headers getting pulled from the kernel headers, which do not have the kernel-internal typedefs that are used through tools, including many files outside of selftests/kvm's control. Prior to commit cec07f53c398 ("perf tools: Move syscall number fallbacks from perf-sys.h to tools/arch/x86/include/asm/"), the handcoded numbers were actual fallbacks, i.e. overriding unistd_{32,64}.h from the kernel headers was unintentional. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210901203030.1292304-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()Sean Christopherson
Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now that the two function are always called back-to-back by architectures that have rseq. The rseq helper is stubbed out for architectures that don't support rseq, i.e. this is a nop across the board. Note, tracehook_notify_resume() is horribly named and arguably does not belong in tracehook.h as literally every line of code in it has nothing to do with tracing. But, that's been true since commit a42c6ded827d ("move key_repace_session_keyring() into tracehook_notify_resume()") first usurped tracehook_notify_resume() back in 2012. Punt cleaning that mess up to future patches. No functional change intended. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210901203030.1292304-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guestSean Christopherson
Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to transferring to a KVM guest, which is roughly equivalent to an exit to userspace and processes many of the same pending actions. While the task cannot be in an rseq critical section as the KVM path is reachable only by via ioctl(KVM_RUN), the side effects that apply to rseq outside of a critical section still apply, e.g. the current CPU needs to be updated if the task is migrated. Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults and other badness in userspace VMMs that use rseq in combination with KVM, e.g. due to the CPU ID being stale after task migration. Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function") Reported-by: Peter Foley <pefoley@google.com> Bisected-by: Doug Evans <dje@google.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210901203030.1292304-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22irqchip/gic: Work around broken Renesas integrationMarc Zyngier
Geert reported that the GIC driver locks up on a Renesas system since 005c34ae4b44f085 ("irqchip/gic: Atomically update affinity") fixed the driver to use writeb_relaxed() instead of writel_relaxed(). As it turns out, the interconnect used on this system mandates 32bit wide accesses for all MMIO transactions, even if the GIC architecture specifically mandates for some registers to be byte accessible. Gahhh... Work around the issue by crudly detecting the offending system, and falling back to an inefficient RMW+lock implementation. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/CAMuHMdV+Ev47K5NO8XHsanSq5YRMCHn2gWAQyV-q2LpJVy9HiQ@mail.gmail.com
2021-09-22mptcp: ensure tx skbs always have the MPTCP extPaolo Abeni
Due to signed/unsigned comparison, the expression: info->size_goal - skb->len > 0 evaluates to true when the size goal is smaller than the skb size. That results in lack of tx cache refill, so that the skb allocated by the core TCP code lacks the required MPTCP skb extensions. Due to the above, syzbot is able to trigger the following WARN_ON(): WARNING: CPU: 1 PID: 810 at net/mptcp/protocol.c:1366 mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366 Modules linked in: CPU: 1 PID: 810 Comm: syz-executor.4 Not tainted 5.14.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366 Code: ff 4c 8b 74 24 50 48 8b 5c 24 58 e9 0f fb ff ff e8 13 44 8b f8 4c 89 e7 45 31 ed e8 98 57 2e fe e9 81 f4 ff ff e8 fe 43 8b f8 <0f> 0b 41 bd ea ff ff ff e9 6f f4 ff ff 4c 89 e7 e8 b9 8e d2 f8 e9 RSP: 0018:ffffc9000531f6a0 EFLAGS: 00010216 RAX: 000000000000697f RBX: 0000000000000000 RCX: ffffc90012107000 RDX: 0000000000040000 RSI: ffffffff88eac9e2 RDI: 0000000000000003 RBP: ffff888078b15780 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff88eac017 R11: 0000000000000000 R12: ffff88801de0a280 R13: 0000000000006b58 R14: ffff888066278280 R15: ffff88803c2fe9c0 FS: 00007fd9f866e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007faebcb2f718 CR3: 00000000267cb000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __mptcp_push_pending+0x1fb/0x6b0 net/mptcp/protocol.c:1547 mptcp_release_cb+0xfe/0x210 net/mptcp/protocol.c:3003 release_sock+0xb4/0x1b0 net/core/sock.c:3206 sk_stream_wait_memory+0x604/0xed0 net/core/stream.c:145 mptcp_sendmsg+0xc39/0x1bc0 net/mptcp/protocol.c:1749 inet6_sendmsg+0x99/0xe0 net/ipv6/af_inet6.c:643 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 sock_write_iter+0x2a0/0x3e0 net/socket.c:1057 call_write_iter include/linux/fs.h:2163 [inline] new_sync_write+0x40b/0x640 fs/read_write.c:507 vfs_write+0x7cf/0xae0 fs/read_write.c:594 ksys_write+0x1ee/0x250 fs/read_write.c:647 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665f9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fd9f866e188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000056c038 RCX: 00000000004665f9 RDX: 00000000000e7b78 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056c038 R13: 0000000000a9fb1f R14: 00007fd9f866e300 R15: 0000000000022000 Fix the issue rewriting the relevant expression to avoid sign-related problems - note: size_goal is always >= 0. Additionally, ensure that the skb in the tx cache always carries the relevant extension. Reported-and-tested-by: syzbot+263a248eec3e875baa7b@syzkaller.appspotmail.com Fixes: 1094c6fe7280 ("mptcp: fix possible divide by zero") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-22qed: rdma - don't wait for resources under hw error recovery flowShai Malin
If the HW device is during recovery, the HW resources will never return, hence we shouldn't wait for the CID (HW context ID) bitmaps to clear. This fix speeds up the error recovery flow. Fixes: 64515dc899df ("qed: Add infrastructure for error detection and recovery") Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-22irqchip/renesas-rza1: Use semicolons instead of commasGeert Uytterhoeven
This code works, but it is cleaner to use semicolons at the end of statements instead of commas. Extracted from a big anonymous patch by Julia Lawall <julia.lawall@inria.fr>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/b1710bb6ea5faa7a7fe74404adb0beb951e0bf8c.1631699160.git.geert+renesas@glider.be
2021-09-22irqchip/gic-v3-its: Fix potential VPE leak on errorKaige Fu
In its_vpe_irq_domain_alloc, when its_vpe_init() returns an error, there is an off-by-one in the number of VPEs to be freed. Fix it by simply passing the number of VPEs allocated, which is the index of the loop iterating over the VPEs. Fixes: 7d75bbb4bc1a ("irqchip/gic-v3-its: Add VPE irq domain allocation/teardown") Signed-off-by: Kaige Fu <kaige.fu@linux.alibaba.com> [maz: fixed commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/d9e36dee512e63670287ed9eff884a5d8d6d27f2.1631672311.git.kaige.fu@linux.alibaba.com
2021-09-22irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix buildRandy Dunlap
irq-goldfish-pic uses GENERIC_IRQ_CHIP interfaces so select that symbol to fix build errors. Fixes these build errors: mips-linux-ld: drivers/irqchip/irq-goldfish-pic.o: in function `goldfish_pic_of_init': irq-goldfish-pic.c:(.init.text+0xc0): undefined reference to `irq_alloc_generic_chip' mips-linux-ld: irq-goldfish-pic.c:(.init.text+0xf4): undefined reference to `irq_gc_unmask_enable_reg' mips-linux-ld: irq-goldfish-pic.c:(.init.text+0xf8): undefined reference to `irq_gc_unmask_enable_reg' mips-linux-ld: irq-goldfish-pic.c:(.init.text+0x100): undefined reference to `irq_gc_mask_disable_reg' mips-linux-ld: irq-goldfish-pic.c:(.init.text+0x104): undefined reference to `irq_gc_mask_disable_reg' mips-linux-ld: irq-goldfish-pic.c:(.init.text+0x11c): undefined reference to `irq_setup_generic_chip' mips-linux-ld: irq-goldfish-pic.c:(.init.text+0x168): undefined reference to `irq_remove_generic_chip' Fixes: 4235ff50cf98 ("irqchip/irq-goldfish-pic: Add Goldfish PIC driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Miodrag Dinic <miodrag.dinic@mips.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Cc: Goran Ferenc <goran.ferenc@mips.com> Cc: Aleksandar Markovic <aleksandar.markovic@mips.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210905162519.21507-1-rdunlap@infradead.org
2021-09-22irqchip/mbigen: Repair non-kernel-doc notationRandy Dunlap
Fix kernel-doc warnings in irq-mbigen.c: irq-mbigen.c:29: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * In mbigen vector register irq-mbigen.c:43: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * offset of clear register in mbigen node irq-mbigen.c:50: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * offset of interrupt type register Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Jun Ma <majun258@huawei.com> Cc: Yun Wu <wuyun.wu@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Cc: Aditya Srivastava <yashsri421@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210905033644.15988-1-rdunlap@infradead.org
2021-09-22irqdomain: Change the type of 'size' in __irq_domain_add() to be consistentBixuan Cui
The 'size' is used in struct_size(domain, revmap, size) and its input parameter type is 'size_t'(unsigned int). Changing the size to 'unsigned int' to make the type consistent. Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210916025203.44841-1-cuibixuan@huawei.com
2021-09-22irqchip/armada-370-xp: Fix ack/eoi breakageMarc Zyngier
When converting the driver to using handle_percpu_devid_irq, we forgot to repaint the irq_eoi() callback into irq_ack(), as handle_percpu_devid_fasteoi_ipi() was actually using EOI really early in the handling. Yes this was a stupid idea. Fix this by using the HW ack method as irq_ack(). Fixes: e52e73b7e9f7 ("irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq()") Reported-by: Steffen Trumtrar <s.trumtrar@pengutronix.de> Tested-by: Steffen Trumtrar <s.trumtrar@pengutronix.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87tuiexq5f.fsf@pengutronix.de
2021-09-22ext2: fix sleeping in atomic bugs on errorDan Carpenter
The ext2_error() function syncs the filesystem so it sleeps. The caller is holding a spinlock so it's not allowed to sleep. ext2_statfs() <- disables preempt -> ext2_count_free_blocks() -> ext2_get_group_desc() Fix this by using WARN() to print an error message and a stack trace instead of using ext2_error(). Link: https://lore.kernel.org/r/20210921203233.GA16529@kili Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-09-22gpio/rockchip: fix get_direction value handlingHeiko Stuebner
The function uses the newly introduced rockchip_gpio_readl_bit() which directly returns the actual value of the requeste bit. So using the existing bit-wise check for the bit inside the value will always return 0. Fix this by dropping the bit manipulation on the result. Fixes: 3bcbd1a85b68 ("gpio/rockchip: support next version gpio controller") Signed-off-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-09-22gpio/rockchip: extended debounce support is only available on v2Heiko Stuebner
The gpio driver runs into issues on v1 gpio blocks, as the db_clk and the whole extended debounce support is only ever defined on v2. So checking for the IS_ERR on the db_clk is not enough, as it will be NULL on v1. Fix this by adding the needed condition for v2 first before checking the existence of the db_clk. This caused my rk3288-veyron-pinky to enter a reboot loop when it tried to enable the power-key as adc-key device. Fixes: 3bcbd1a85b68 ("gpio/rockchip: support next version gpio controller") Signed-off-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-09-22gpio: gpio-aspeed-sgpio: Fix wrong hwirq in irq handler.Steven Lee
The current hwirq is calculated based on the old GPIO pin order(input GPIO range is from 0 to ngpios - 1). It should be calculated based on the current GPIO input pin order(input GPIOs are 0, 2, 4, ..., (ngpios - 1) * 2). Signed-off-by: Steven Lee <steven_lee@aspeedtech.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-09-22gpio: uniphier: Fix void functions to remove return valueKunihiko Hayashi
The return type of irq_chip.irq_mask() and irq_chip.irq_unmask() should be void. Fixes: dbe776c2ca54 ("gpio: uniphier: add UniPhier GPIO controller driver") Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-09-22gpiolib: acpi: Make set-debounce-timeout failures non fatalHans de Goede
Commit 8dcb7a15a585 ("gpiolib: acpi: Take into account debounce settings") made the gpiolib-acpi code call gpio_set_debounce_timeout() when requesting GPIOs. This in itself is fine, but it also made gpio_set_debounce_timeout() errors fatal, causing the requesting of the GPIO to fail. This is causing regressions. E.g. on a HP ElitePad 1000 G2 various _AEI specified GPIO ACPI event sources specify a debouncy timeout of 20 ms, but the pinctrl-baytrail.c only supports certain fixed values, the closest ones being 12 or 24 ms and pinctrl-baytrail.c responds with -EINVAL when specified a value which is not one of the fixed values. This is causing the acpi_request_own_gpiod() call to fail for 3 ACPI event sources on the HP ElitePad 1000 G2, which in turn is causing e.g. the battery charging vs discharging status to never get updated, even though a charger has been plugged-in or unplugged. Make gpio_set_debounce_timeout() errors non fatal, warning about the failure instead, to fix this regression. Note we should probably also fix various pinctrl drivers to just pick the first bigger discrete value rather then returning -EINVAL but this will need to be done on a per driver basis, where as this fix at least gets us back to where things were before and thus restores functionality on devices where this was lost due to gpio_set_debounce_timeout() errors. Fixes: 8dcb7a15a585 ("gpiolib: acpi: Take into account debounce settings") Depends-on: 2e2b496cebef ("gpiolib: acpi: Extract acpi_request_own_gpiod() helper") Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-09-22HID: u2fzero: ignore incomplete packets without dataAndrej Shadura
Since the actual_length calculation is performed unsigned, packets shorter than 7 bytes (e.g. packets without data or otherwise truncated) or non-received packets ("zero" bytes) can cause buffer overflow. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214437 Fixes: 42337b9d4d958("HID: add driver for U2F Zero built-in LED and RNG") Signed-off-by: Andrej Shadura <andrew.shadura@collabora.co.uk> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2021-09-22MAINTAINERS: usb, update Peter Korsgaard's entriesJiri Slaby
Peter's e-mail in MAINTAINERS is defunct: This is the qmail-send program at a.mx.sunsite.dk. <jacmet@sunsite.dk>: Sorry, no mailbox here by that name. (#5.1.1) Peter says: ** Ahh yes, it should be changed to peter@korsgaard.com. However he also says: ** I haven't had access to c67x00 hw for quite some years though, so maybe ** it should be marked Orphan instead? So change as he wishes. Cc: Peter Korsgaard <peter@korsgaard.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-usb@vger.kernel.org Acked-by: Peter Korsgaard <peter@korsgaard.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210922063008.25758-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22scsi: ses: Retry failed Send/Receive Diagnostic commandsWen Xiong
Setting SCSI logging level with error=3, we saw some errors from enclosues: [108017.360833] ses 0:0:9:0: tag#641 Done: NEEDS_RETRY Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s [108017.360838] ses 0:0:9:0: tag#641 CDB: Receive Diagnostic 1c 01 01 00 20 00 [108017.427778] ses 0:0:9:0: Power-on or device reset occurred [108017.427784] ses 0:0:9:0: tag#641 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s [108017.427788] ses 0:0:9:0: tag#641 CDB: Receive Diagnostic 1c 01 01 00 20 00 [108017.427791] ses 0:0:9:0: tag#641 Sense Key : Unit Attention [current] [108017.427793] ses 0:0:9:0: tag#641 Add. Sense: Bus device reset function occurred [108017.427801] ses 0:0:9:0: Failed to get diagnostic page 0x1 [108017.427804] ses 0:0:9:0: Failed to bind enclosure -19 [108017.427895] ses 0:0:10:0: Attached Enclosure device [108017.427942] ses 0:0:10:0: Attached scsi generic sg18 type 13 Retry if the Send/Receive Diagnostic commands complete with a transient error status (NOT_READY or UNIT_ATTENTION with ASC 0x29). Link: https://lore.kernel.org/r/1631849061-10210-2-git-send-email-wenxiong@linux.ibm.com Reviewed-by: Brian King <brking@linux.ibm.com> Reviewed-by: James Bottomley <jejb@linux.ibm.com> Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-22scsi: target: Fix spelling mistake "CONFLIFT" -> "CONFLICT"Colin Ian King
There is a spelling mistake in a dev_err message. Fix it. Link: https://lore.kernel.org/r/20210920183206.17477-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-22scsi: lpfc: Fix gcc -Wstringop-overread warning, againArnd Bergmann
I fixed a stringop-overread warning earlier this year, now a second copy of the original code was added and the warning came back: drivers/scsi/lpfc/lpfc_attr.c: In function 'lpfc_cmf_info_show': drivers/scsi/lpfc/lpfc_attr.c:289:25: error: 'strnlen' specified bound 4095 exceeds source size 24 [-Werror=stringop-overread] 289 | strnlen(LPFC_INFO_MORE_STR, PAGE_SIZE - 1), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix it the same way as the other copy. Link: https://lore.kernel.org/r/20210920095628.1191676-1-arnd@kernel.org Fixes: ada48ba70f6b ("scsi: lpfc: Fix gcc -Wstringop-overread warning") Fixes: 74a7baa2a3ee ("scsi: lpfc: Add cmf_info sysfs entry") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-22scsi: lpfc: Use correct scnprintf() limitDan Carpenter
The limit should be "PAGE_SIZE - len" instead of "PAGE_SIZE". We're not going to hit the limit so this fix will not affect runtime. Link: https://lore.kernel.org/r/20210916132331.GE25094@kili Fixes: 5b9e70b22cc5 ("scsi: lpfc: raise sg count for nvme to use available sg resources") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-22scsi: lpfc: Fix sprintf() overflow in lpfc_display_fpin_wwpn()Dan Carpenter
This scnprintf() uses the wrong limit. It should be "LPFC_FPIN_WWPN_LINE_SZ - len" instead of LPFC_FPIN_WWPN_LINE_SZ. Link: https://lore.kernel.org/r/20210916132251.GD25094@kili Fixes: 428569e66fa7 ("scsi: lpfc: Expand FPIN and RDF receive logging") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-22scsi: core: Remove 'current_tag'Hannes Reinecke
The 'current_tag' field in struct scsi_device is unused now; remove it. Link: https://lore.kernel.org/r/1631696835-136198-4-git-send-email-john.garry@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-22scsi: acornscsi: Remove tagged queuing vestigesHannes Reinecke
The acornscsi driver has a config option to enable tagged queuing, but this option gets disabled in the driver itself with the comment 'needs to be debugged'. As this is a _really_ old driver I doubt anyone will be wanting to invest time here, so remove the tagged queue vestiges and make our lives easier. [jpg: Use scsi_cmd_to_rq()] Link: https://lore.kernel.org/r/1631696835-136198-3-git-send-email-john.garry@huawei.com Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-22scsi: fas216: Kill scmd->tagHannes Reinecke
The driver is attempting to allocate a tag internally which is a no-go with blk-mq. Switch the driver to use the request tag and kill usage of scmd->tag and scmd->device->current_tag. [jpg: Change to use scsi_cmd_to_rq()] Link: https://lore.kernel.org/r/1631696835-136198-2-git-send-email-john.garry@huawei.com Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-21scsi: qla2xxx: Restore initiator in dual modeDmitry Bogdanov
In dual mode in case of disabling the target, the whole port goes offline and initiator is turned off too. Fix restoring initiator mode after disabling target in dual mode. Link: https://lore.kernel.org/r/20210915153239.8035-1-d.bogdanov@yadro.com Fixes: 0645cb8350cd ("scsi: qla2xxx: Add mode control for each physical port") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-21scsi: ufs: core: Unbreak the reset handlerBart Van Assche
A command tag is passed as the second argument of the __ufshcd_transfer_req_compl() call in ufshcd_eh_device_reset_handler() instead of a bitmask. Fix this by passing a bitmask as argument instead of a command tag. Link: https://lore.kernel.org/r/20210916175408.2260084-1-bvanassche@acm.org Fixes: a45f937110fa ("scsi: ufs: Optimize host lock on transfer requests send/compl paths") Cc: Can Guo <cang@codeaurora.org> Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-21scsi: sd_zbc: Support disks with more than 2**32 logical blocksBart Van Assche
This patch addresses the following Coverity report about the zno * sdkp->zone_blocks expression: CID 1475514 (#1 of 1): Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN) overflow_before_widen: Potentially overflowing expression zno * sdkp->zone_blocks with type unsigned int (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type sector_t (64 bits, unsigned). Link: https://lore.kernel.org/r/20210917212314.2362324-1-bvanassche@acm.org Fixes: 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands") Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Damien Le Moal <Damien.LeMoal@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-21scsi: ufs: core: Revert "scsi: ufs: Synchronize SCSI and UFS error handling"Adrian Hunter
This reverts commit a113eaaf86373362b053279049907ff82b5df6c8. There are a couple of issues with the commit: 1. It causes deadlocks. 2. It causes the shost->eh_cmd_q list of failed requests not to be processed, ever. So revert it. 1. Deadlocks The SCSI error handler runs with requests blocked beginning when scsi_schedule_eh() sets SHOST_RECOVERY state, continuing through scsi_error_handler() callback ->eh_strategy_handler() until scsi_restart_operations() is called. By setting eh_strategy_handler to ufshcd_err_handler, the patch changed the UFS error handler to run with requests blocked, including PM requests, for the entire run of the error handler. That conflicts with UFS error handler existing synchronization with UFS device PM operations. The UFS error handler synchronizes with runtime PM by doing pm_runtime_get_sync() prior to blocking requests itself. It synchronizes with system PM by use of hba->host_sem, again before blocking requests itself. However, if requests are already blocked, then PM operations will block. So: the UFS error handler blocks waiting on PM + PM blocks waiting on SCSI PM requests to process or fail + PM requests are blocked waiting on error handling to finish = deadlock This happens both for runtime PM and system PM. Prior to the patch, these deadlocks could not happen even if SCSI error handling was running, because the presence of requests in shost->eh_cmd_q would mean the queues could not be suspended, which would mean that, should the UFS error handler run at the same time, it would not need to wait for PM or vice versa. Please note these scenarios are not just theoretical, they were found during testing on a Samsung Galaxy Book S. 2. ->eh_strategy_handler() must process shost->eh_cmd_q list of failed requests, as all other eh_strategy_handler's do except UFS error handler. Refer for example: scsi_unjam_host(), ata_scsi_error() and sas_scsi_recover_host(). Link: https://lore.kernel.org/r/20210917144349.14058-1-adrian.hunter@intel.com Fixes: a113eaaf8637 ("scsi: ufs: Synchronize SCSI and UFS error handling") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-21Merge branch 's390-qeth-fixes-2021-09-21'Jakub Kicinski
Julian Wiedmann says: ==================== s390/qeth: fixes 2021-09-21 This brings two fixes for deadlocks when a device is removed while it has certain types of async work pending. And one additional fix for a missing NULL check in an error case. ==================== Link: https://lore.kernel.org/r/20210921145217.1584654-1-jwi@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>