summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-02KVM: selftests: Convert steal_time test to printf style GUEST_ASSERTSean Christopherson
Convert the steal_time test to use printf-based GUEST_ASERT. Opportunistically use GUEST_ASSERT_EQ() and GUEST_ASSERT_NE() so that the test spits out debug information on failure. Link: https://lore.kernel.org/r/20230729003643.1053367-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Convert set_memory_region_test to printf-based GUEST_ASSERTSean Christopherson
Convert set_memory_region_test to print-based GUEST_ASSERT, using a combo of newfangled macros to report (hopefully) useful information. Link: https://lore.kernel.org/r/20230729003643.1053367-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Convert s390's tprot test to printf style GUEST_ASSERTSean Christopherson
Convert s390's tprot test to printf-based GUEST_ASSERT. Link: https://lore.kernel.org/r/20230729003643.1053367-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Convert s390's memop test to printf style GUEST_ASSERTSean Christopherson
Convert s390's memop test to printf-based GUEST_ASSERT, and opportunistically use GUEST_FAIL() to report invalid sizes. Link: https://lore.kernel.org/r/20230729003643.1053367-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Convert the memslot performance test to printf guest assertsSean Christopherson
Use the printf-based GUEST_ASSERT_EQ() in the memslot perf test instead of an half-baked open code version. Link: https://lore.kernel.org/r/20230729003643.1053367-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Convert ARM's vGIC IRQ test to printf style GUEST_ASSERTSean Christopherson
Use printf-based guest assert reporting in ARM's vGIC IRQ test. Note, this is not as innocuous as it looks! The printf-based version of GUEST_ASSERT_EQ() ensures the expressions are evaluated only once, whereas the old version did not! Link: https://lore.kernel.org/r/20230729003643.1053367-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Convert ARM's page fault test to printf style GUEST_ASSERTSean Christopherson
Use GUEST_FAIL() in ARM's page fault test to report unexpected faults. Link: https://lore.kernel.org/r/20230729003643.1053367-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Convert ARM's hypercalls test to printf style GUEST_ASSERTSean Christopherson
Convert ARM's hypercalls test to use printf-based GUEST_ASSERT(). Opportunistically use GUEST_FAIL() to complain about an unexpected stage. Link: https://lore.kernel.org/r/20230729003643.1053367-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Convert debug-exceptions to printf style GUEST_ASSERTSean Christopherson
Convert ARM's debug exceptions test to use printf-based GUEST_ASSERT(). Opportunistically Use GUEST_ASSERT_EQ() in guest_code_ss() so that the expected vs. actual values get printed out. Link: https://lore.kernel.org/r/20230729003643.1053367-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Convert aarch_timer to printf style GUEST_ASSERTSean Christopherson
Convert ARM's aarch_timer test to use printf-based GUEST_ASSERT(). To maintain existing functionality, manually print the host information, e.g. stage and iteration, to stderr prior to reporting the guest assert. Link: https://lore.kernel.org/r/20230729003643.1053367-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Add a selftest for guest prints and formatted assertsAaron Lewis
Add a test to exercise the various features in KVM selftest's local snprintf() and compare them to LIBC's snprintf() to ensure they behave the same. This is not an exhaustive test. KVM's local snprintf() does not implement all the features LIBC does, e.g. KVM's local snprintf() does not support floats or doubles, so testing for those features were excluded. Testing was added for the features that are expected to work to support a minimal version of printf() in the guest. Signed-off-by: Aaron Lewis <aaronlewis@google.com> [sean: use UCALL_EXIT_REASON, enable for all architectures] Link: https://lore.kernel.org/r/20230731203026.1192091-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Add #define of expected KVM exit reason for ucallSean Christopherson
Define the expected architecture specific exit reason for a successful ucall so that common tests can assert that a ucall occurred without the test needing to implement arch specific code. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20230731203026.1192091-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Add arch ucall.h and inline simple arch hooksSean Christopherson
Add an architecture specific ucall.h and inline the simple arch hooks, e.g. the init hook for everything except ARM, and the actual "do ucall" hook for everything except x86 (which should be simple, but temporarily isn't due to carrying a workaround). Having a per-arch ucall header will allow adding a #define for the expected KVM exit reason for a ucall that is colocated (for everything except x86) with the ucall itself. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20230731203026.1192091-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Add formatted guest assert support in ucall frameworkSean Christopherson
Add printf-based GUEST_ASSERT macros and accompanying host-side support to provide an assert-specific versions of GUEST_PRINTF(). To make it easier to parse assert messages, for humans and bots alike, preserve/use the same layout as host asserts, e.g. in the example below, the reported expression, file, line number, and message are from the guest assertion, not the host reporting of the assertion. The call stack still captures the host reporting, but capturing the guest stack is a less pressing concern, i.e. can be done in the future, and an optimal solution would capture *both* the host and guest stacks, i.e. capturing the host stack isn't an outright bug. Running soft int test ==== Test Assertion Failure ==== x86_64/svm_nested_soft_inject_test.c:39: regs->rip != (unsigned long)l2_guest_code_int pid=214104 tid=214104 errno=4 - Interrupted system call 1 0x0000000000401b35: run_test at svm_nested_soft_inject_test.c:191 2 0x00000000004017d2: main at svm_nested_soft_inject_test.c:212 3 0x0000000000415b03: __libc_start_call_main at libc-start.o:? 4 0x000000000041714f: __libc_start_main_impl at ??:? 5 0x0000000000401660: _start at ??:? Expected IRQ at RIP 0x401e50, received IRQ at 0x401e50 Don't bother sharing code between ucall_assert() and ucall_fmt(), as forwarding the variable arguments would either require using macros or building a va_list, i.e. would make the code less readable and/or require just as much copy+paste code anyways. Gate the new macros with a flag so that tests can more or less be switched over one-by-one. The slow conversion won't be perfect, e.g. library code won't pick up the flag, but the only asserts in library code are of the vanilla GUEST_ASSERT() variety, i.e. don't print out variables. Add a temporary alias to GUEST_ASSERT_1() to fudge around ARM's arch_timer.h header using GUEST_ASSERT_1(), thus thwarting any attempt to convert tests one-by-one. Link: https://lore.kernel.org/r/20230729003643.1053367-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Add string formatting options to ucallAaron Lewis
Add more flexibility to guest debugging and testing by adding GUEST_PRINTF() and GUEST_ASSERT_FMT() to the ucall framework. Add a sized buffer to the ucall structure to hold the formatted string, i.e. to allow the guest to easily resolve the string, and thus avoid the ugly pattern of the host side having to make assumptions about the desired format, as well as having to pass around a large number of parameters. The buffer size was chosen to accommodate most use cases, and based on similar usage. E.g. printf() uses the same size buffer in arch/x86/boot/printf.c. And 1KiB ought to be enough for anybody. Signed-off-by: Aaron Lewis <aaronlewis@google.com> [sean: massage changelog, wrap macro param in ()] Link: https://lore.kernel.org/r/20230729003643.1053367-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Add additional pages to the guest to accommodate ucallAaron Lewis
Add additional pages to the guest to account for the number of pages the ucall headers need. The only reason things worked before is the ucall headers are fairly small. If they were ever to increase in size the guest could run out of memory. This is done in preparation for adding string formatting options to the guest through the ucall framework which increases the size of the ucall headers. Fixes: 426729b2cf2e ("KVM: selftests: Add ucall pool based implementation") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20230729003643.1053367-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Add guest_snprintf() to KVM selftestsAaron Lewis
Add a local version of guest_snprintf() for use in the guest. Having a local copy allows the guest access to string formatting options without dependencies on LIBC. LIBC is problematic because it heavily relies on both AVX-512 instructions and a TLS, neither of which are guaranteed to be set up in the guest. The file guest_sprintf.c was lifted from arch/x86/boot/printf.c and adapted to work in the guest, including the addition of buffer length. I.e. s/sprintf/snprintf/ The functions where prefixed with "guest_" to allow guests to explicitly call them. A string formatted by this function is expected to succeed or die. If something goes wrong during the formatting process a GUEST_ASSERT() will be thrown. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/all/mtdi6smhur5rqffvpu7qux7mptonw223y2653x2nwzvgm72nlo@zyc4w3kwl3rg [sean: add a link to the discussion of other options] Link: https://lore.kernel.org/r/20230729003643.1053367-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Add strnlen() to the string overridesAaron Lewis
Add strnlen() to the string overrides to allow it to be called in the guest. The implementation for strnlen() was taken from the kernel's generic version, lib/string.c. This will be needed when printf() is introduced. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20230729003643.1053367-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Add a shameful hack to preserve/clobber GPRs across ucallSean Christopherson
Preserve or clobber all GPRs (except RIP and RSP, as they're saved and restored via the VMCS) when performing a ucall on x86 to fudge around a horrific long-standing bug in selftests' nested VMX support where L2's GPRs are not preserved across a nested VM-Exit. I.e. if a test triggers a nested VM-Exit to L1 in response to a ucall, e.g. GUEST_SYNC(), then L2's GPR state can be corrupted. The issues manifests as an unexpected #GP in clear_bit() when running the hyperv_evmcs test due to RBX being used to track the ucall object, and RBX being clobbered by the nested VM-Exit. The problematic hyperv_evmcs testcase is where L0 (test's host userspace) injects an NMI in response to GUEST_SYNC(8) from L2, but the bug could "randomly" manifest in any test that induces a nested VM-Exit from L0. The bug hasn't caused failures in the past due to sheer dumb luck. The obvious fix is to rework the nVMX helpers to save/restore L2 GPRs across VM-Exit and VM-Enter, but that is a much bigger task and carries its own risks, e.g. nSVM does save/restore GPRs, but not in a thread-safe manner, and there is a _lot_ of cleanup that can be done to unify code for doing VM-Enter on nVMX, nSVM, and eVMCS. Link: https://lore.kernel.org/r/20230729003643.1053367-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Make TEST_ASSERT_EQ() output look like normal TEST_ASSERT()Sean Christopherson
Clean up TEST_ASSERT_EQ() so that the (mostly) raw code is captured in the main assert message, not the helper macro's code. E.g. make this: x86_64/tsc_msrs_test.c:106: __a == __b pid=40470 tid=40470 errno=0 - Success 1 0x000000000040170e: main at tsc_msrs_test.c:106 2 0x0000000000416f23: __libc_start_call_main at libc-start.o:? 3 0x000000000041856f: __libc_start_main_impl at ??:? 4 0x0000000000401ef0: _start at ??:? TEST_ASSERT_EQ(rounded_host_rdmsr(MSR_IA32_TSC), val + 1) failed. rounded_host_rdmsr(MSR_IA32_TSC) is 0 val + 1 is 0x1 look like this: x86_64/tsc_msrs_test.c:106: rounded_host_rdmsr(MSR_IA32_TSC) == val + 1 pid=5737 tid=5737 errno=0 - Success 1 0x0000000000401714: main at tsc_msrs_test.c:106 2 0x0000000000415c23: __libc_start_call_main at libc-start.o:? 3 0x000000000041726f: __libc_start_main_impl at ??:? 4 0x0000000000401e60: _start at ??:? 0 != 0x1 (rounded_host_rdmsr(MSR_IA32_TSC) != val + 1) Opportunstically clean up the formatting of the entire macro. Link: https://lore.kernel.org/r/20230729003643.1053367-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Rename the ASSERT_EQ macroThomas Huth
There is already an ASSERT_EQ macro in the file tools/testing/selftests/kselftest_harness.h, so currently KVM selftests can't include test_util.h from the KVM selftests together with that file. Rename the macro in the KVM selftests to TEST_ASSERT_EQ to avoid the problem - it is also more similar to the other macros in test_util.h that way. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20230712075910.22480-2-thuth@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Remove superfluous variable assignmentMinjie Du
Don't nullify "nodep" to NULL one line before it's set to "tmp". Signed-off-by: Minjie Du <duminjie@vivo.com> Link: https://lore.kernel.org/r/20230704122148.11573-1-duminjie@vivo.com [sean: massage shortlog+changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02Documentation: kdump: Add va_kernel_pa_offset for RISCV64Song Shuai
RISC-V Linux exports "va_kernel_pa_offset" in vmcoreinfo to help Crash-utility translate the kernel virtual address correctly. Here adds the definition of "va_kernel_pa_offset". Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") Link: https://lore.kernel.org/linux-riscv/20230724040649.220279-1-suagrfillet@gmail.com/ Signed-off-by: Song Shuai <suagrfillet@gmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230724100917.309061-2-suagrfillet@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-02riscv: Export va_kernel_pa_offset in vmcoreinfoSong Shuai
Since RISC-V Linux v6.4, the commit 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") changes phys_ram_base from the physical start of the kernel to the actual start of the DRAM. The Crash-utility's VTOP() still uses phys_ram_base and kernel_map.virt_addr to translate kernel virtual address, that failed the Crash with Linux v6.4 [1]. Export kernel_map.va_kernel_pa_offset in vmcoreinfo to help Crash translate the kernel virtual address correctly. Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") Link: https://lore.kernel.org/linux-riscv/20230724040649.220279-1-suagrfillet@gmail.com/ [1] Signed-off-by: Song Shuai <suagrfillet@gmail.com> Reviewed-by: Xianting Tian  <xianting.tian@linux.alibaba.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230724100917.309061-1-suagrfillet@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-02RISC-V: ACPI: Fix acpi_os_ioremap to return iomem addressSunil V L
acpi_os_ioremap() currently is a wrapper to memremap() on RISC-V. But the callers of acpi_os_ioremap() expect it to return __iomem address and hence sparse tool reports a new warning. Fix this issue by type casting to __iomem type. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307230357.egcTAefj-lkp@intel.com/ Fixes: a91a9ffbd3a5 ("RISC-V: Add support to build the ACPI core") Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230724100346.1302937-1-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-02selftests: riscv: Fix compilation error with vstate_exec_nolibc.cAlexandre Ghiti
The following error happens: In file included from vstate_exec_nolibc.c:2: /usr/include/riscv64-linux-gnu/sys/prctl.h:42:12: error: conflicting types for ‘prctl’; h ave ‘int(int, ...)’ 42 | extern int prctl (int __option, ...) __THROW; | ^~~~~ In file included from ./../../../../include/nolibc/nolibc.h:99, from <command-line>: ./../../../../include/nolibc/sys.h:892:5: note: previous definition of ‘prctl’ with type ‘int(int, long unsigned int, long unsigned int, long unsigned int, long unsigned int) ’ 892 | int prctl(int option, unsigned long arg2, unsigned long arg3, | ^~~~~ Fix this by not including <sys/prctl.h>, which is not needed here since prctl syscall is directly called using its number. Fixes: 7cf6198ce22d ("selftests: Test RISC-V Vector prctl interface") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230713115829.110421-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-02selftests/riscv: fix potential build failure during the "emit_tests" stepJohn Hubbard
The riscv selftests (which were modeled after the arm64 selftests) are improperly declaring the "emit_tests" target to depend upon the "all" target. This approach, when combined with commit 9fc96c7c19df ("selftests: error out if kernel header files are not yet built"), has caused build failures [1] on arm64, and is likely to cause similar failures for riscv. To fix this, simply remove the unnecessary "all" dependency from the emit_tests target. The dependency is still effectively honored, because again, invocation is via "install", which also depends upon "all". An alternative approach would be to harden the emit_tests target so that it can depend upon "all", but that's a lot more complicated and hard to get right, and doesn't seem worth it, especially given that emit_tests should probably not be overridden at all. [1] https://lore.kernel.org/20230710-kselftest-fix-arm64-v1-1-48e872844f25@kernel.org Fixes: 9fc96c7c19df ("selftests: error out if kernel header files are not yet built") Signed-off-by: John Hubbard <jhubbard@nvidia.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230712193514.740033-1-jhubbard@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-02KVM: selftests: use unified time type for comparisonBibo Mao
With test case kvm_page_table_test, start time is acquired with time type CLOCK_MONOTONIC_RAW, however end time in timespec_elapsed() is acquired with time type CLOCK_MONOTONIC. This can cause inaccurate elapsed time calculation due to mixing timebases, e.g. LoongArch in particular will see weirdness. Modify kvm_page_table_test to use unified time type CLOCK_MONOTONIC for start time. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20230731022405.854884-1-maobibo@loongson.cn [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Extend x86's sync_regs_test to check for exception racesMichal Luczaj
Attempt to set the to-be-queued exception to be both pending and injected _after_ KVM_CAP_SYNC_REGS's kvm_vcpu_ioctl_x86_set_vcpu_events() squashes the pending exception (if there's also an injected exception). Buggy KVM versions will eventually yell loudly about having impossible state when processing queued excpetions, e.g. WARNING: CPU: 0 PID: 1115 at arch/x86/kvm/x86.c:10095 kvm_check_and_inject_events+0x220/0x500 [kvm] arch/x86/kvm/x86.c:kvm_check_and_inject_events(): WARN_ON_ONCE(vcpu->arch.exception.injected && vcpu->arch.exception.pending); Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230728001606.2275586-3-mhal@rbox.co [sean: split to separate patch, massage changelog and comment] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Extend x86's sync_regs_test to check for event vector racesMichal Luczaj
Attempt to modify the to-be-injected exception vector to an illegal value _after_ the sanity checks performed by KVM_CAP_SYNC_REGS's arch/x86/kvm/x86.c:kvm_vcpu_ioctl_x86_set_vcpu_events(). Buggy KVM versions will eventually yells loudly about attempting to inject a bogus vector, e.g. WARNING: CPU: 0 PID: 1107 at arch/x86/kvm/x86.c:547 kvm_check_and_inject_events+0x4a0/0x500 [kvm] arch/x86/kvm/x86.c:exception_type(): WARN_ON(vector > 31 || vector == NMI_VECTOR) Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230728001606.2275586-3-mhal@rbox.co [sean: split to separate patch] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: selftests: Extend x86's sync_regs_test to check for CR4 racesMichal Luczaj
Attempt to modify vcpu->run->s.regs _after_ the sanity checks performed by KVM_CAP_SYNC_REGS's arch/x86/kvm/x86.c:sync_regs(). This can lead to some nonsensical vCPU states accompanied by kernel splats, e.g. disabling PAE while long mode is enabled makes KVM all kinds of confused: WARNING: CPU: 0 PID: 1142 at arch/x86/kvm/mmu/paging_tmpl.h:358 paging32_walk_addr_generic+0x431/0x8f0 [kvm] arch/x86/kvm/mmu/paging_tmpl.h: KVM_BUG_ON(is_long_mode(vcpu) && !is_pae(vcpu), vcpu->kvm) Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230728001606.2275586-3-mhal@rbox.co [sean: see link] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issuesMichal Luczaj
In a spirit of using a sledgehammer to crack a nut, make sync_regs() feed __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() with kernel's own copy of data. Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they have exclusive rights to structs they operate on. While this is true when coming from an ioctl handler (caller makes a local copy of user's data), sync_regs() breaks this contract; a pointer to a user-modifiable memory (vcpu->run->s.regs) is provided. This can lead to a situation when incoming data is checked and/or sanitized only to be re-set by a user thread running in parallel. Signed-off-by: Michal Luczaj <mhal@rbox.co> Fixes: 01643c51bfcf ("KVM: x86: KVM_CAP_SYNC_REGS") Link: https://lore.kernel.org/r/20230728001606.2275586-2-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02Merge tag 'exfat-for-6.5-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fixes from Namjae Jeon: - Fix page allocation failure from allocation bitmap by using kvmalloc_array/kvfree - Add the check to validate if filename entries exceeds max filename length - Fix potential deadlock condition from dir_emit*() * tag 'exfat-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: release s_lock before calling dir_emit() exfat: check if filename entries exceeds max filename length exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree
2023-08-02smb: client: fix dfs link mount against w2k8Paulo Alcantara
Customer reported that they couldn't mount their DFS link that was seen by the client as a DFS interlink -- special form of DFS link where its single target may point to a different DFS namespace -- and it turned out that it was just a regular DFS link where its referral header flags missed the StorageServers bit thus making the client think it couldn't tree connect to target directly without requiring further referrals. When the DFS link referral header flags misses the StoraServers bit and its target doesn't respond to any referrals, then tree connect to it. Fixes: a1c0d00572fc ("cifs: share dfs connections and supers") Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three small fixes, all in drivers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: pm80xx: Fix error return code in pm8001_pci_probe() scsi: zfcp: Defer fc_rport blocking until after ADISC response scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices
2023-08-02word-at-a-time: use the same return type for has_zero regardless of endiannessndesaulniers@google.com
Compiling big-endian targets with Clang produces the diagnostic: fs/namei.c:2173:13: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical] } while (!(has_zero(a, &adata, &constants) | has_zero(b, &bdata, &constants))); ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ || fs/namei.c:2173:13: note: cast one or both operands to int to silence this warning It appears that when has_zero was introduced, two definitions were produced with different signatures (in particular different return types). Looking at the usage in hash_name() in fs/namei.c, I suspect that has_zero() is meant to be invoked twice per while loop iteration; using logical-or would not update `bdata` when `a` did not have zeros. So I think it's preferred to always return an unsigned long rather than a bool than update the while loop in hash_name() to use a logical-or rather than bitwise-or. [ Also changed powerpc version to do the same - Linus ] Link: https://github.com/ClangBuiltLinux/linux/issues/1832 Link: https://lore.kernel.org/lkml/20230801-bitwise-v1-1-799bec468dc4@google.com/ Fixes: 36126f8f2ed8 ("word-at-a-time: make the interfaces truly generic") Debugged-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-02powerpc/powermac: Use early_* IO variants in via_calibrate_decr()Benjamin Gray
On a powermac platform, via the call path: start_kernel() time_init() ppc_md.calibrate_decr() (pmac_calibrate_decr) via_calibrate_decr() ioremap() and iounmap() are called. The unmap can enable interrupts unexpectedly (cond_resched() in vunmap_pmd_range()), which causes a warning later in the boot sequence in start_kernel(). Use the early_* variants of these IO functions to prevent this. The issue is pre-existing, but is surfaced by commit 721255b9826b ("genirq: Use a maple tree for interrupt descriptor management"). Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230706010816.72682-1-bgray@linux.ibm.com
2023-08-02wifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1()Hans de Goede
Using brcmfmac with 6.5-rc3 on a brcmfmac43241b4-sdio triggers a backtrace caused by the following field-spanning warning: memcpy: detected field-spanning write (size 120) of single field "&params_le->channel_list[0]" at drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:1072 (size 2) The driver still works after this warning. The warning was introduced by the new field-spanning write checks which were enabled recently. Fix this by replacing the channel_list[1] declaration at the end of the struct with a flexible array declaration. Most users of struct brcmf_scan_params_le calculate the size to alloc using the size of the non flex-array part of the struct + needed extra space, so they do not care about sizeof(struct brcmf_scan_params_le). brcmf_notify_escan_complete() however uses the struct on the stack, expecting there to be room for at least 1 entry in the channel-list to store the special -1 abort channel-id. To make this work use an anonymous union with a padding member added + the actual channel_list flexible array. Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230729140500.27892-1-hdegoede@redhat.com
2023-08-02vxlan: Fix nexthop hash sizeBenjamin Poirier
The nexthop code expects a 31 bit hash, such as what is returned by fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash returned by skb_get_hash() can lead to problems related to the fact that 'int hash' is a negative number when the MSB is set. In the case of hash threshold nexthop groups, nexthop_select_path_hthr() will disproportionately select the first nexthop group entry. In the case of resilient nexthop groups, nexthop_select_path_res() may do an out of bounds access in nh_buckets[], for example: hash = -912054133 num_nh_buckets = 2 bucket_index = 65535 which leads to the following panic: BUG: unable to handle page fault for address: ffffc900025910c8 PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0 Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x1ee/0x5c0 ? __pfx_is_prefetch.constprop.0+0x10/0x10 ? __pfx_page_fault_oops+0x10/0x10 ? search_bpf_extables+0xfe/0x1c0 ? fixup_exception+0x3b/0x470 ? exc_page_fault+0xf6/0x110 ? asm_exc_page_fault+0x26/0x30 ? nexthop_select_path+0x197/0xbf0 ? nexthop_select_path+0x197/0xbf0 ? lock_is_held_type+0xe7/0x140 vxlan_xmit+0x5b2/0x2340 ? __lock_acquire+0x92b/0x3370 ? __pfx_vxlan_xmit+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_register_lock_class+0x10/0x10 ? skb_network_protocol+0xce/0x2d0 ? dev_hard_start_xmit+0xca/0x350 ? __pfx_vxlan_xmit+0x10/0x10 dev_hard_start_xmit+0xca/0x350 __dev_queue_xmit+0x513/0x1e20 ? __pfx___dev_queue_xmit+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? mark_held_locks+0x44/0x90 ? skb_push+0x4c/0x80 ? eth_header+0x81/0xe0 ? __pfx_eth_header+0x10/0x10 ? neigh_resolve_output+0x215/0x310 ? ip6_finish_output2+0x2ba/0xc90 ip6_finish_output2+0x2ba/0xc90 ? lock_release+0x236/0x3e0 ? ip6_mtu+0xbb/0x240 ? __pfx_ip6_finish_output2+0x10/0x10 ? find_held_lock+0x83/0xa0 ? lock_is_held_type+0xe7/0x140 ip6_finish_output+0x1ee/0x780 ip6_output+0x138/0x460 ? __pfx_ip6_output+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_ip6_finish_output+0x10/0x10 NF_HOOK.constprop.0+0xc0/0x420 ? __pfx_NF_HOOK.constprop.0+0x10/0x10 ? ndisc_send_skb+0x2c0/0x960 ? __pfx_lock_release+0x10/0x10 ? __local_bh_enable_ip+0x93/0x110 ? lock_is_held_type+0xe7/0x140 ndisc_send_skb+0x4be/0x960 ? __pfx_ndisc_send_skb+0x10/0x10 ? mark_held_locks+0x65/0x90 ? find_held_lock+0x83/0xa0 ndisc_send_ns+0xb0/0x110 ? __pfx_ndisc_send_ns+0x10/0x10 addrconf_dad_work+0x631/0x8e0 ? lock_acquire+0x180/0x3f0 ? __pfx_addrconf_dad_work+0x10/0x10 ? mark_held_locks+0x24/0x90 process_one_work+0x582/0x9c0 ? __pfx_process_one_work+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? mark_held_locks+0x24/0x90 worker_thread+0x93/0x630 ? __kthread_parkme+0xdc/0x100 ? __pfx_worker_thread+0x10/0x10 kthread+0x1a5/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 RIP: 0000:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: CR2: ffffc900025910c8 ---[ end trace 0000000000000000 ]--- RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fix this problem by ensuring the MSB of hash is 0 using a right shift - the same approach used in fib_multipath_hash() and rt6_multipath_hash(). Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-02ip6mr: Fix skb_under_panic in ip6mr_cache_report()Yue Haibing
skbuff: skb_under_panic: text:ffffffff88771f69 len:56 put:-4 head:ffff88805f86a800 data:ffff887f5f86a850 tail:0x88 end:0x2c0 dev:pim6reg ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:192! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 2 PID: 22968 Comm: kworker/2:11 Not tainted 6.5.0-rc3-00044-g0a8db05b571a #236 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:skb_panic+0x152/0x1d0 Call Trace: <TASK> skb_push+0xc4/0xe0 ip6mr_cache_report+0xd69/0x19b0 reg_vif_xmit+0x406/0x690 dev_hard_start_xmit+0x17e/0x6e0 __dev_queue_xmit+0x2d6a/0x3d20 vlan_dev_hard_start_xmit+0x3ab/0x5c0 dev_hard_start_xmit+0x17e/0x6e0 __dev_queue_xmit+0x2d6a/0x3d20 neigh_connected_output+0x3ed/0x570 ip6_finish_output2+0x5b5/0x1950 ip6_finish_output+0x693/0x11c0 ip6_output+0x24b/0x880 NF_HOOK.constprop.0+0xfd/0x530 ndisc_send_skb+0x9db/0x1400 ndisc_send_rs+0x12a/0x6c0 addrconf_dad_completed+0x3c9/0xea0 addrconf_dad_work+0x849/0x1420 process_one_work+0xa22/0x16e0 worker_thread+0x679/0x10c0 ret_from_fork+0x28/0x60 ret_from_fork_asm+0x11/0x20 When setup a vlan device on dev pim6reg, DAD ns packet may sent on reg_vif_xmit(). reg_vif_xmit() ip6mr_cache_report() skb_push(skb, -skb_network_offset(pkt));//skb_network_offset(pkt) is 4 And skb_push declared as: void *skb_push(struct sk_buff *skb, unsigned int len); skb->data -= len; //0xffff88805f86a84c - 0xfffffffc = 0xffff887f5f86a850 skb->data is set to 0xffff887f5f86a850, which is invalid mem addr, lead to skb_push() fails. Fixes: 14fb64e1f449 ("[IPV6] MROUTE: Support PIM-SM (SSM).") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-02s390/qeth: Don't call dev_close/dev_open (DOWN/UP)Alexandra Winter
dev_close() and dev_open() are issued to change the interface state to DOWN or UP (dev->flags IFF_UP). When the netdev is set DOWN it loses e.g its Ipv6 addresses and routes. We don't want this in cases of device recovery (triggered by hardware or software) or when the qeth device is set offline. Setting a qeth device offline or online and device recovery actions call netif_device_detach() and/or netif_device_attach(). That will reset or set the LOWER_UP indication i.e. change the dev->state Bit __LINK_STATE_PRESENT. That is enough to e.g. cause bond failovers, and still preserves the interface settings that are handled by the network stack. Don't call dev_open() nor dev_close() from the qeth device driver. Let the network stack handle this. Fixes: d4560150cb47 ("s390/qeth: call dev_close() during recovery") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-02Merge branch 'tun-tap-uid'David S. Miller
Laszlo Ersek says: ==================== tun/tap: set sk_uid from current_fsuid() The original patches fixing CVE-2023-1076 are incorrect in my opinion. This small series fixes them up; see the individual commit messages for explanation. I have a very elaborate test procedure demonstrating the problem for both tun and tap; it involves libvirt, qemu, and "crash". I can share that procedure if necessary, but it's indeed quite long (I wrote it originally for our QE team). The patches in this series are supposed to "re-fix" CVE-2023-1076; given that said CVE is classified as Low Impact (CVSSv3=5.5), I'm posting this publicly, and not suggesting any embargo. Red Hat Product Security may assign a new CVE number later. I've tested the patches on top of v6.5-rc4, with "crash" built at commit c74f375e0ef7. Cc: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pietro Borrello <borrello@diag.uniroma1.it> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-02net: tap_open(): set sk_uid from current_fsuid()Laszlo Ersek
Commit 66b2c338adce initializes the "sk_uid" field in the protocol socket (struct sock) from the "/dev/tapX" device node's owner UID. Per original commit 86741ec25462 ("net: core: Add a UID field to struct sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the userspace process that creates the socket. Commit 86741ec25462 mentions socket() and accept(); with "tap", the action that creates the socket is open("/dev/tapX"). Therefore the device node's owner UID is irrelevant. In most cases, "/dev/tapX" will be owned by root, so in practice, commit 66b2c338adce has no observable effect: - before, "sk_uid" would be zero, due to undefined behavior (CVE-2023-1076), - after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root. What matters is the (fs)UID of the process performing the open(), so cache that in "sk_uid". Cc: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pietro Borrello <borrello@diag.uniroma1.it> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 66b2c338adce ("tap: tap_open(): correctly initialize socket uid") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435 Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-02net: tun_chr_open(): set sk_uid from current_fsuid()Laszlo Ersek
Commit a096ccca6e50 initializes the "sk_uid" field in the protocol socket (struct sock) from the "/dev/net/tun" device node's owner UID. Per original commit 86741ec25462 ("net: core: Add a UID field to struct sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the userspace process that creates the socket. Commit 86741ec25462 mentions socket() and accept(); with "tun", the action that creates the socket is open("/dev/net/tun"). Therefore the device node's owner UID is irrelevant. In most cases, "/dev/net/tun" will be owned by root, so in practice, commit a096ccca6e50 has no observable effect: - before, "sk_uid" would be zero, due to undefined behavior (CVE-2023-1076), - after, "sk_uid" would be zero, due to "/dev/net/tun" being owned by root. What matters is the (fs)UID of the process performing the open(), so cache that in "sk_uid". Cc: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pietro Borrello <borrello@diag.uniroma1.it> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Fixes: a096ccca6e50 ("tun: tun_chr_open(): correctly initialize socket uid") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435 Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-02drm/panel: samsung-s6d7aa0: Add MODULE_DEVICE_TABLENikita Travkin
The driver can be built as a module, however the lack of the MODULE_DEVICE_TABLE macro prevents it from being automatically probed from the DT in such case. Add the missed macro to make sure the module can load automatically. Fixes: 6810bb390282 ("drm/panel: Add Samsung S6D7AA0 panel controller driver") Signed-off-by: Nikita Travkin <nikita@trvn.ru> Acked-by: Artur Weber <aweber.kernel@gmail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230802-gt5-panel-dtable-v1-1-c0a765c175e2@trvn.ru
2023-08-02ata,scsi: do not issue START STOP UNIT on resumeDamien Le Moal
During system resume, ata_port_pm_resume() triggers ata EH to 1) Resume the controller 2) Reset and rescan the ports 3) Revalidate devices This EH execution is started asynchronously from ata_port_pm_resume(), which means that when sd_resume() is executed, none or only part of the above processing may have been executed. However, sd_resume() issues a START STOP UNIT to wake up the drive from sleep mode. This command is translated to ATA with ata_scsi_start_stop_xlat() and issued to the device. However, depending on the state of execution of the EH process and revalidation triggerred by ata_port_pm_resume(), two things may happen: 1) The START STOP UNIT fails if it is received before the controller has been reenabled at the beginning of the EH execution. This is visible with error messages like: ata10.00: device reported invalid CHS sector 0 sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 sd 9:0:0:0: PM: failed to resume async: error -5 2) The START STOP UNIT command is received while the EH process is on-going, which mean that it is stopped and must wait for its completion, at which point the command is rather useless as the drive is already fully spun up already. This case results also in a significant delay in sd_resume() which is observable by users as the entire system resume completion is delayed. Given that ATA devices will be woken up by libata activity on resume, sd_resume() has no need to issue a START STOP UNIT command, which solves the above mentioned problems. Do not issue this command by introducing the new scsi_device flag no_start_on_resume and setting this flag to 1 in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP UNIT command only if this flag is not set. Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Tanner Watkins <dalzot@gmail.com> Tested-by: Paul Ausbeck <paula@soe.ucsc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
2023-08-02Merge branch 'omap-for-v6.5/ti-sysc' into omap-for-v6.5/fixesTony Lindgren
Merge in a missed change into fixes. Signed-off-by: Tony Lindgren <tony@atomide.com>
2023-08-02Merge tag 'gvt-fixes-2023-08-02' of https://github.com/intel/gvt-linux into ↵Tvrtko Ursulin
drm-intel-fixes gvt-fixes-2023-08-02 - Fix bug to get AUX CH register message length (Yan) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> From: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZMnvf46JqgeIuTir@debian-scheme
2023-08-02libceph: fix potential hang in ceph_osdc_notify()Ilya Dryomov
If the cluster becomes unavailable, ceph_osdc_notify() may hang even with osd_request_timeout option set because linger_notify_finish_wait() waits for MWatchNotify NOTIFY_COMPLETE message with no associated OSD request in flight -- it's completely asynchronous. Introduce an additional timeout, derived from the specified notify timeout. While at it, switch both waits to killable which is more correct. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Xiubo Li <xiubli@redhat.com>
2023-08-02rbd: prevent busy loop when requesting exclusive lockIlya Dryomov
Due to rbd_try_acquire_lock() effectively swallowing all but EBLOCKLISTED error from rbd_try_lock() ("request lock anyway") and rbd_request_lock() returning ETIMEDOUT error not only for an actual notify timeout but also when the lock owner doesn't respond, a busy loop inside of rbd_acquire_lock() between rbd_try_acquire_lock() and rbd_request_lock() is possible. Requesting the lock on EBUSY error (returned by get_lock_owner_info() if an incompatible lock or invalid lock owner is detected) makes very little sense. The same goes for ETIMEDOUT error (might pop up pretty much anywhere if osd_request_timeout option is set) and many others. Just fail I/O requests on rbd_dev->acquiring_list immediately on any error from rbd_try_lock(). Cc: stable@vger.kernel.org # 588159009d5b: rbd: retrieve and check lock owner twice before blocklisting Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>