summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-01-13firmware: coreboot: Check size of table entry and use flex-arrayKees Cook
The memcpy() of the data following a coreboot_table_entry couldn't be evaluated by the compiler under CONFIG_FORTIFY_SOURCE. To make it easier to reason about, add an explicit flexible array member to struct coreboot_device so the entire entry can be copied at once. Additionally, validate the sizes before copying. Avoids this run-time false positive warning: memcpy: detected field-spanning write (size 168) of single field "&device->entry" at drivers/firmware/google/coreboot_table.c:103 (size 8) Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Link: https://lore.kernel.org/all/03ae2704-8c30-f9f0-215b-7cdf4ad35a9a@molgen.mpg.de/ Cc: Jack Rosenthal <jrosenth@chromium.org> Cc: Guenter Roeck <groeck@chromium.org> Cc: Julius Werner <jwerner@chromium.org> Cc: Brian Norris <briannorris@chromium.org> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Julius Werner <jwerner@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Link: https://lore.kernel.org/r/20230107031406.gonna.761-kees@kernel.org Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Jack Rosenthal <jrosenth@chromium.org> Link: https://lore.kernel.org/r/20230112230312.give.446-kees@kernel.org
2023-01-13kallsyms: Fix scheduling with interrupts disabled in self-testNicholas Piggin
kallsyms_on_each* may schedule so must not be called with interrupts disabled. The iteration function could disable interrupts, but this also changes lookup_symbol() to match the change to the other timing code. Reported-by: Erhard F. <erhard_f@mailbox.org> Link: https://lore.kernel.org/all/bug-216902-206035@https.bugzilla.kernel.org%2F/ Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.sang@intel.com Fixes: 30f3bb09778d ("kallsyms: Add self-test facility") Tested-by: "Erhard F." <erhard_f@mailbox.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-01-14ata: pata_cs5535: Don't build on UMLPeter Foley
This driver uses MSR functions that aren't implemented under UML. Avoid building it to prevent tripping up allyesconfig. e.g. /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3a3): undefined reference to `__tracepoint_read_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3d2): undefined reference to `__tracepoint_write_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x457): undefined reference to `__tracepoint_write_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x481): undefined reference to `do_trace_write_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4d5): undefined reference to `do_trace_write_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4f5): undefined reference to `do_trace_read_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x51c): undefined reference to `do_trace_write_msr' Signed-off-by: Peter Foley <pefoley2@pefoley.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2023-01-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the PMCR_EL0 reset value after the PMU rework - Correctly handle S2 fault triggered by a S1 page table walk by not always classifying it as a write, as this breaks on R/O memslots - Document why we cannot exit with KVM_EXIT_MMIO when taking a write fault from a S1 PTW on a R/O memslot - Put the Apple M2 on the naughty list for not being able to correctly implement the vgic SEIS feature, just like the M1 before it - Reviewer updates: Alex is stepping down, replaced by Zenghui x86: - Fix various rare locking issues in Xen emulation and teach lockdep to detect them - Documentation improvements - Do not return host topology information from KVM_GET_SUPPORTED_CPUID" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest() KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking Documentation: kvm: fix SRCU locking order docs KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID KVM: nSVM: clarify recalc_intercepts() wrt CR8 MAINTAINERS: Remove myself as a KVM/arm64 reviewer MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* KVM: arm64: Document the behaviour of S1PTW faults on RO memslots KVM: arm64: Fix S1PTW handling on RO memslots KVM: arm64: PMU: Fix PMCR_EL0 reset value
2023-01-13lockref: stop doing cpu_relax in the cmpxchg loopMateusz Guzik
On the x86-64 architecture even a failing cmpxchg grants exclusive access to the cacheline, making it preferable to retry the failed op immediately instead of stalling with the pause instruction. To illustrate the impact, below are benchmark results obtained by running various will-it-scale tests on top of the 6.2-rc3 kernel and Cascade Lake (2 sockets * 24 cores * 2 threads) CPU. All results in ops/s. Note there is some variance in re-runs, but the code is consistently faster when contention is present. open3 ("Same file open/close"): proc stock no-pause 1 805603 814942 (+%1) 2 1054980 1054781 (-0%) 8 1544802 1822858 (+18%) 24 1191064 2199665 (+84%) 48 851582 1469860 (+72%) 96 609481 1427170 (+134%) fstat2 ("Same file fstat"): proc stock no-pause 1 3013872 3047636 (+1%) 2 4284687 4400421 (+2%) 8 3257721 5530156 (+69%) 24 2239819 5466127 (+144%) 48 1701072 5256609 (+209%) 96 1269157 6649326 (+423%) Additionally, a kernel with a private patch to help access() scalability: access2 ("Same file access"): proc stock patched patched +nopause 24 2378041 2005501 5370335 (-15% / +125%) That is, fixing the problems in access itself *reduces* scalability after the cacheline ping-pong only happens in lockref with the pause instruction. Note that fstat and access benchmarks are not currently integrated into will-it-scale, but interested parties can find them in pull requests to said project. Code at hand has a rather tortured history. First modification showed up in commit d472d9d98b46 ("lockref: Relax in cmpxchg loop"), written with Itanium in mind. Later it got patched up to use an arch-dependent macro to stop doing it on s390 where it caused a significant regression. Said macro had undergone revisions and was ultimately eliminated later, going back to cpu_relax. While I intended to only remove cpu_relax for x86-64, I got the following comment from Linus: I would actually prefer just removing it entirely and see if somebody else hollers. You have the numbers to prove it hurts on real hardware, and I don't think we have any numbers to the contrary. So I think it's better to trust the numbers and remove it as a failure, than say "let's just remove it on x86-64 and leave everybody else with the potentially broken code" Additionally, Will Deacon (maintainer of the arm64 port, one of the architectures previously benchmarked): So, from the arm64 side of the fence, I'm perfectly happy just removing the cpu_relax() calls from lockref. As such, come back full circle in history and whack it altogether. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/all/CAGudoHHx0Nqg6DE70zAVA75eV-HXfWyhVMWZ-aSeOofkA_=WdA@mail.gmail.com/ Acked-by: Tony Luck <tony.luck@intel.com> # ia64 Acked-by: Nicholas Piggin <npiggin@gmail.com> # powerpc Acked-by: Will Deacon <will@kernel.org> # arm64 Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-13phy: freescale: imx8m-pcie: Add one missing error returnRichard Zhu
There should be one error return when fail to fetch the perst reset. Add the missing error return. Fixes: dce9edff16ee ("phy: freescale: imx8m-pcie: Add i.MX8MP PCIe PHY support") Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> Reviewed-by: Marek Vasut <marex@denx.de> Link: https://lore.kernel.org/r/1671433941-2037-1-git-send-email-hongxing.zhu@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-01-13x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM spaceBjorn Helgaas
Normally we reject ECAM space unless it is reported as reserved in the E820 table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes E820 entries that correspond to EfiMemoryMappedIO regions because some other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the E820 entries prevent Linux from allocating BAR space for hot-added devices. Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is normally converted to an E820 entry by a bootloader or EFI stub. After 07eab0901ede, that E820 entry is removed, so we reject this ECAM space, which makes PCI extended config space (offsets 0x100-0xfff) inaccessible. The lack of extended config space breaks anything that relies on it, including perf, VSEC telemetry, EDAC, QAT, SR-IOV, etc. Allow use of ECAM for extended config space when the region is covered by an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02 _CRS. Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@linux.intel.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216891 Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map") Link: https://lore.kernel.org/r/20230110180243.1590045-3-helgaas@kernel.org Reported-by: Kan Liang <kan.liang@linux.intel.com> Reported-by: Tony Luck <tony.luck@intel.com> Reported-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reported-by: Yunying Sun <yunying.sun@intel.com> Reported-by: Baowen Zheng <baowen.zheng@corigine.com> Reported-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reported-by: Yang Lixiao <lixiao.yang@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
2023-01-13phy: Revert "phy: qualcomm: usb28nm: Add MDM9607 init sequence"Marijn Suijten
This reverts commit 557a28811c7e0286d3816842032db5eb7bb5f156. This commit introduced an init sequence from downstream DT [1] in the driver. As mentioned by the comment above the HSPHY_INIT_CFG macro for this sequence: /* * The macro is used to define an initialization sequence. Each tuple * is meant to program 'value' into phy register at 'offset' with 'delay' * in us followed. */ Instead of corresponding to offsets into the phy register, the sequence read by the downstream driver [2] is passed into ulpi_write [3] which crafts the address-value pair into a new value and writes it into the same register at USB_ULPI_VIEWPORT [4]. In other words, this init sequence is programmed into the hardware in a totally different way than downstream and is unlikely to achieve the desired result, if the hsphy is working at all. An alternative method needs to be found to write these init values at the desired location. Fortunately mdm9607 did not land upstream yet [5] and should have its compatible revised to use the generic one, instead of a compatible that writes wrong data to the wrong registers. [1]: https://android.googlesource.com/kernel/msm/+/android-7.1.0_r0.2/arch/arm/boot/dts/qcom/mdm9607.dtsi#585 [2]: https://android.googlesource.com/kernel/msm/+/android-7.1.0_r0.2/drivers/usb/phy/phy-msm-usb.c#4183 [3]: https://android.googlesource.com/kernel/msm/+/android-7.1.0_r0.2/drivers/usb/phy/phy-msm-usb.c#468 [4]: https://android.googlesource.com/kernel/msm/+/android-7.1.0_r0.2/drivers/usb/phy/phy-msm-usb.c#418 [5]: https://lore.kernel.org/linux-arm-msm/20210805222812.40731-1-konrad.dybcio@somainline.org/ Reported-by: Michael Srba <Michael.Srba@seznam.cz> Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org> Reviewed-by: Stephan Gerhold <stephan@gerhold.net> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Link: https://lore.kernel.org/r/20221214223733.648167-1-marijn.suijten@somainline.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-01-13phy: rockchip-inno-usb2: Fix missing clk_disable_unprepare() in ↵Shang XiaoJing
rockchip_usb2phy_power_on() The clk_disable_unprepare() should be called in the error handling of rockchip_usb2phy_power_on(). Fixes: 0e08d2a727e6 ("phy: rockchip-inno-usb2: add a new driver for Rockchip usb2phy") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Link: https://lore.kernel.org/r/20221205115823.16957-1-shangxiaojing@huawei.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-01-13drm/vc4: bo: Fix unused variable warningMaxime Ripard
Commit 07a2975c65f2 ("drm/vc4: bo: Fix drmm_mutex_init memory hog") removed the only use of the ret variable, but didn't remove the variable itself leading to a unused variable warning. Remove that variable. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 07a2975c65f2 ("drm/vc4: bo: Fix drmm_mutex_init memory hog") Reviewed-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20230113154637.1704116-1-maxime@cerno.tech
2023-01-13Merge tag 'efi-fixes-for-v6.2-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - avoid a potential crash on the efi_subsys_init() error path - use more appropriate error code for runtime services calls issued after a crash in the firmware occurred - avoid READ_ONCE() for accessing firmware tables that may appear misaligned in memory * tag 'efi-fixes-for-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: tpm: Avoid READ_ONCE() for accessing the event log efi: rt-wrapper: Add missing include efi: fix userspace infinite retry read efivars after EFI runtime services page fault efi: fix NULL-deref in init error path
2023-01-13Merge tag 'docs-6.2-fixes' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation fixes from Jonathan Corbet: "Three documentation fixes (or rather two and one warning): - Sphinx 6.0 broke our configuration mechanism, so fix it - I broke our configuration for non-Alabaster themes; Akira fixed it - Deprecate Sphinx < 2.4 with an eye toward future removal" * tag 'docs-6.2-fixes' of git://git.lwn.net/linux: docs/conf.py: Use about.html only in sidebar of alabaster theme docs: Deprecate use of Sphinx < 2.4.x docs: Fix the docs build with Sphinx 6.0
2023-01-13bpf: Fix pointer-leak due to insufficient speculative store bypass mitigationLuis Gerhorst
To mitigate Spectre v4, 2039f26f3aca ("bpf: Fix leakage due to insufficient speculative store bypass mitigation") inserts lfence instructions after 1) initializing a stack slot and 2) spilling a pointer to the stack. However, this does not cover cases where a stack slot is first initialized with a pointer (subject to sanitization) but then overwritten with a scalar (not subject to sanitization because the slot was already initialized). In this case, the second write may be subject to speculative store bypass (SSB) creating a speculative pointer-as-scalar type confusion. This allows the program to subsequently leak the numerical pointer value using, for example, a branch-based cache side channel. To fix this, also sanitize scalars if they write a stack slot that previously contained a pointer. Assuming that pointer-spills are only generated by LLVM on register-pressure, the performance impact on most real-world BPF programs should be small. The following unprivileged BPF bytecode drafts a minimal exploit and the mitigation: [...] // r6 = 0 or 1 (skalar, unknown user input) // r7 = accessible ptr for side channel // r10 = frame pointer (fp), to be leaked // r9 = r10 # fp alias to encourage ssb *(u64 *)(r9 - 8) = r10 // fp[-8] = ptr, to be leaked // lfence added here because of pointer spill to stack. // // Ommitted: Dummy bpf_ringbuf_output() here to train alias predictor // for no r9-r10 dependency. // *(u64 *)(r10 - 8) = r6 // fp[-8] = scalar, overwrites ptr // 2039f26f3aca: no lfence added because stack slot was not STACK_INVALID, // store may be subject to SSB // // fix: also add an lfence when the slot contained a ptr // r8 = *(u64 *)(r9 - 8) // r8 = architecturally a scalar, speculatively a ptr // // leak ptr using branch-based cache side channel: r8 &= 1 // choose bit to leak if r8 == 0 goto SLOW // no mispredict // architecturally dead code if input r6 is 0, // only executes speculatively iff ptr bit is 1 r8 = *(u64 *)(r7 + 0) # encode bit in cache (0: slow, 1: fast) SLOW: [...] After running this, the program can time the access to *(r7 + 0) to determine whether the chosen pointer bit was 0 or 1. Repeat this 64 times to recover the whole address on amd64. In summary, sanitization can only be skipped if one scalar is overwritten with another scalar. Scalar-confusion due to speculative store bypass can not lead to invalid accesses because the pointer bounds deducted during verification are enforced using branchless logic. See 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") for details. Do not make the mitigation depend on !env->allow_{uninit_stack,ptr_leaks} because speculative leaks are likely unexpected if these were enabled. For example, leaking the address to a protected log file may be acceptable while disabling the mitigation might unintentionally leak the address into the cached-state of a map that is accessible to unprivileged processes. Fixes: 2039f26f3aca ("bpf: Fix leakage due to insufficient speculative store bypass mitigation") Signed-off-by: Luis Gerhorst <gerhorst@cs.fau.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Henriette Hofmeier <henriette.hofmeier@rub.de> Link: https://lore.kernel.org/bpf/edc95bad-aada-9cfc-ffe2-fa9bb206583c@cs.fau.de Link: https://lore.kernel.org/bpf/20230109150544.41465-1-gerhorst@cs.fau.de
2023-01-13efi: tpm: Avoid READ_ONCE() for accessing the event logArd Biesheuvel
Nathan reports that recent kernels built with LTO will crash when doing EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a misaligned load from the TPM event log, which is annotated with READ_ONCE(), and under LTO, this gets translated into a LDAR instruction which does not tolerate misaligned accesses. Interestingly, this does not happen when booting the same kernel straight from the UEFI shell, and so the fact that the event log may appear misaligned in memory may be caused by a bug in GRUB or SHIM. However, using READ_ONCE() to access firmware tables is slightly unusual in any case, and here, we only need to ensure that 'event' is not dereferenced again after it gets unmapped, but this is already taken care of by the implicit barrier() semantics of the early_memunmap() call. Cc: <stable@vger.kernel.org> Cc: Peter Jones <pjones@redhat.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1782 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-13KVM: x86: Add helpers to recalc physical vs. logical optimized APIC mapsSean Christopherson
Move the guts of kvm_recalculate_apic_map()'s main loop to two separate helpers to handle recalculating the physical and logical pieces of the optimized map. Having 100+ lines of code in the for-loop makes it hard to understand what is being calculated where. No functional change intended. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-34-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Allow APICv APIC ID inhibit to be clearedGreg Edwards
Legacy kernels prior to commit 4399c03c6780 ("x86/apic: Remove verify_local_APIC()") write the APIC ID of the boot CPU twice to verify a functioning local APIC. This results in APIC acceleration inhibited on these kernels for reason APICV_INHIBIT_REASON_APIC_ID_MODIFIED. Allow the APICV_INHIBIT_REASON_APIC_ID_MODIFIED inhibit reason to be cleared if/when all APICs in xAPIC mode set their APIC ID back to the expected vcpu_id value. Fold the functionality previously in kvm_lapic_xapic_id_updated() into kvm_recalculate_apic_map(), as this allows examining all APICs in one pass. Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") Signed-off-by: Greg Edwards <gedwards@ddn.com> Link: https://lore.kernel.org/r/20221117183247.94314-1-gedwards@ddn.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-33-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Track required APICv inhibits with variable, not callbackSean Christopherson
Track the per-vendor required APICv inhibits with a variable instead of calling into vendor code every time KVM wants to query the set of required inhibits. The required inhibits are a property of the vendor's virtualization architecture, i.e. are 100% static. Using a variable allows the compiler to inline the check, e.g. generate a single-uop TEST+Jcc, and thus eliminates any desire to avoid checking inhibits for performance reasons. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-32-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a ↵Sean Christopherson
running vcpu" Turns out that some warnings exist for good reasons. Restore the warning in avic_vcpu_load() that guards against calling avic_vcpu_load() on a running vCPU now that KVM avoids doing so when switching between x2APIC and xAPIC. The entire point of the WARN is to highlight that KVM should not be reloading an AVIC. Opportunistically convert the WARN_ON() to WARN_ON_ONCE() to avoid spamming the kernel if it does fire. This reverts commit c0caeee65af3944b7b8abbf566e7cc1fae15c775. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-31-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Ignore writes to Remote Read Data on AVIC write trapsSean Christopherson
Drop writes to APIC_RRR, a.k.a. Remote Read Data Register, on AVIC unaccelerated write traps. The register is read-only and isn't emulated by KVM. Sending the register through kvm_apic_write_nodecode() will result in screaming when x2APIC is enabled due to the unexpected failure to retrieve the MSR (KVM expects that only "legal" accesses will trap). Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-30-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Handle multiple logical targets in AVIC kick fastpathSean Christopherson
Iterate over all target logical IDs in the AVIC kick fastpath instead of bailing if there is more than one target. Now that KVM inhibits AVIC if vCPUs aren't mapped 1:1 with logical IDs, each bit in the destination is guaranteed to match to at most one vCPU, i.e. iterating over the bitmap is guaranteed to kick each valid target exactly once. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-29-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Require logical ID to be power-of-2 for AVIC entrySean Christopherson
Do not modify AVIC's logical ID table if the logical ID portion of the LDR is not a power-of-2, i.e. if the LDR has multiple bits set. Taking only the first bit means that KVM will fail to match MDAs that intersect with "higher" bits in the "ID" The "ID" acts as a bitmap, but is referred to as an ID because there's an implicit, unenforced "requirement" that software only set one bit. This edge case is arguably out-of-spec behavior, but KVM cleanly handles it in all other cases, e.g. the optimized logical map (and AVIC!) is also disabled in this scenario. Refactor the code to consolidate the checks, and so that the code looks more like avic_kick_target_vcpus_fast(). Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC") Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-28-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"Sean Christopherson
Update SVM's cache of the LDR even if the new value is "bad". Leaving stale information in the cache can result in KVM missing updates and/or invalidating the wrong entry, e.g. if avic_invalidate_logical_id_entry() is triggered after a different vCPU has "claimed" the old LDR. Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-27-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Always update local APIC on writes to logical dest registerSean Christopherson
Update the vCPU's local (virtual) APIC on LDR writes even if the write "fails". The APIC needs to recalc the optimized logical map even if the LDR is invalid or zero, e.g. if the guest clears its LDR, the optimized map will be left as is and the vCPU will receive interrupts using its old LDR. Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-26-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical modeSean Christopherson
Inhibit SVM's AVIC if multiple vCPUs are aliased to the same logical ID. Architecturally, all CPUs whose logical ID matches the MDA are supposed to receive the interrupt; overwriting existing entries in AVIC's logical=>physical map can result in missed IPIs. Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-25-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabledSean Christopherson
Inhibit APICv/AVIC if the optimized physical map is disabled so that KVM KVM provides consistent APIC behavior if xAPIC IDs are aliased due to vcpu_id being truncated and the x2APIC hotplug hack isn't enabled. If the hotplug hack is disabled, events that are emulated by KVM will follow architectural behavior (all matching vCPUs receive events, even if the "match" is due to truncation), whereas APICv and AVIC will deliver events only to the first matching vCPU, i.e. the vCPU that matches without truncation. Note, the "extra" inhibit is needed because KVM deliberately ignores mismatches due to truncation when applying the APIC_ID_MODIFIED inhibit so that large VMs (>255 vCPUs) can run with APICv/AVIC. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-24-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDsSean Christopherson
Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs for x2APIC. If 32-bit IDs are not enabled, disable the optimized map to honor x86 architectural behavior if multiple vCPUs shared a physical APIC ID. As called out in the changelog that added the hack, all CPUs whose (possibly truncated) APIC ID matches the target are supposed to receive the IPI. KVM intentionally differs from real hardware, because real hardware (Knights Landing) does just "x2apic_id & 0xff" to decide whether to accept the interrupt in xAPIC mode and it can deliver one interrupt to more than one physical destination, e.g. 0x123 to 0x123 and 0x23. Applying the hack even when x2APIC is not fully enabled means KVM doesn't correctly handle scenarios where the guest has aliased xAPIC IDs across multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any interrupts. It's extremely unlikely any real world guest aliases APIC IDs, or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which vCPU is "normal". Furthermore, the hack is _not_ guaranteed to work! The hack works if and only if the optimized APIC map is successfully allocated. If the map allocation fails (unlikely), KVM will fall back to its unoptimized behavior, which _does_ honor the architectural behavior. Pivot on 32-bit x2APIC IDs being enabled as that is required to take advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't break existing setups unless they are way, way off in the weeds. And an entry in KVM's errata to document the hack. Alternatively, KVM could provide an actual x2APIC quirk and document the hack that way, but there's unlikely to ever be a use case for disabling the quirk. Go the errata route to avoid having to validate a quirk no one cares about. Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-23-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Disable APIC logical map if vCPUs are aliased in logical modeSean Christopherson
Disable the optimized APIC logical map if multiple vCPUs are aliased to the same logical ID. Architecturally, all CPUs whose logical ID matches the MDA are supposed to receive the interrupt; overwriting existing map entries can result in missed IPIs. Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-22-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Disable APIC logical map if logical ID covers multiple MDAsSean Christopherson
Disable the optimized APIC logical map if a logical ID covers multiple MDAs, i.e. if a vCPU has multiple bits set in its ID. In logical mode, events match if "ID & MDA != 0", i.e. creating an entry for only the first bit can cause interrupts to be missed. Note, creating an entry for every bit is also wrong as KVM would generate IPIs for every matching bit. It would be possible to teach KVM to play nice with this edge case, but it is very much an edge case and probably not used in any real world OS, i.e. it's not worth optimizing. Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-21-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Skip redundant x2APIC logical mode optimized cluster setupSean Christopherson
Skip the optimized cluster[] setup for x2APIC logical mode, as KVM reuses the optimized map's phys_map[] and doesn't actually need to insert the target apic into the cluster[]. The LDR is derived from the x2APIC ID, and both are read-only in KVM, thus the vCPU's cluster[ldr] is guaranteed to be the same entry as the vCPU's phys_map[x2apic_id] entry. Skipping the unnecessary setup will allow a future fix for aliased xAPIC logical IDs to simply require that cluster[ldr] is non-NULL, i.e. won't have to special case x2APIC. Alternatively, the future check could allow "cluster[ldr] == apic", but that ends up being terribly confusing because cluster[ldr] is only set at the very end, i.e. it's only possible due to x2APIC's shenanigans. Another alternative would be to send x2APIC down a separate path _after_ the calculation and then assert that all of the above, but the resulting code is rather messy, and it's arguably unnecessary since asserting that the actual LDR matches the expected LDR means that simply testing that interrupts are delivered correctly provides the same guarantees. Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Explicitly track all possibilities for APIC map's logical modesSean Christopherson
Track all possibilities for the optimized APIC map's logical modes instead of overloading the pseudo-bitmap and treating any "unknown" value as "invalid". As documented by the now-stale comment above the mode values, the values did have meaning when the optimized map was originally added. That dependent logical was removed by commit e45115b62f9a ("KVM: x86: use physical LAPIC array for logical x2APIC"), but the obfuscated behavior and its comment were left behind. Opportunistically rename "mode" to "logical_mode", partly to make it clear that the "disabled" case applies only to the logical map, but also to prove that there is no lurking code that expects "mode" to be a bitmap. Functionally, this is a glorified nop. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0Sean Christopherson
Explicitly skip the optimized map setup if the vCPU's LDR is '0', i.e. if the vCPU will never respond to logical mode interrupts. KVM already skips setup in this case, but relies on kvm_apic_map_get_logical_dest() to generate mask==0. KVM still needs the mask=0 check as a non-zero LDR can yield mask==0 depending on the mode, but explicitly handling the LDR will make it simpler to clean up the logical mode tracking in the future. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Add helper to perform final AVIC "kick" of single vCPUSean Christopherson
Add a helper to perform the final kick, two instances of the ICR decoding is one too many. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatchSean Christopherson
Document that AVIC is inhibited if any vCPU's APIC ID diverges from its vCPU ID, i.e. that there's no need to check for a destination match in the AVIC kick fast path. Opportunistically tweak comments to remove "guest bug", as that suggests KVM is punting on error handling, which is not the case. Targeting a non-existent vCPU or no vCPUs _may_ be a guest software bug, but whether or not it's a guest bug is irrelevant. Such behavior is architecturally legal and thus needs to faithfully emulated by KVM (and it is). Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible"Sean Christopherson
Due to a likely mismerge of patches, KVM ended up with a superfluous commit to "enable" AVIC's fast path for x2AVIC mode. Even worse, the superfluous commit has several bugs and creates a nasty local shadow variable. Rather than fix the bugs piece-by-piece[*] to achieve the same end result, revert the patch wholesale. Opportunistically add a comment documenting the x2AVIC dependencies. This reverts commit 8c9e639da435874fb845c4d296ce55664071ea7a. [*] https://lore.kernel.org/all/YxEP7ZBRIuFWhnYJ@google.com Fixes: 8c9e639da435 ("KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible") Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Fix x2APIC Logical ID calculation for avic_kick_target_vcpus_fastSuravee Suthikulpanit
For X2APIC ID in cluster mode, the logical ID is bit [15:0]. Fixes: 603ccef42ce9 ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast") Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kickSean Christopherson
Compute the destination from ICRH using the sender's x2APIC status, not each (potential) target's x2APIC status. Fixes: c514d3a348ac ("KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC ID") Cc: Li RongQing <lirongqing@baidu.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" booleanSean Christopherson
Replace the "avic_mode" enum with a single bool to track whether or not x2AVIC is enabled. KVM already has "apicv_enabled" that tracks if any flavor of AVIC is enabled, i.e. AVIC_MODE_NONE and AVIC_MODE_X1 are redundant and unnecessary noise. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabledSean Christopherson
Free the APIC access page memslot if any vCPU enables x2APIC and SVM's AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with x2APIC enabled. On AMD, if its "hybrid" mode is enabled (AVIC is enabled when x2APIC is enabled even without x2AVIC support), keeping the APIC access page memslot results in the guest being able to access the virtual APIC page as x2APIC is fully emulated by KVM. I.e. hardware isn't aware that the guest is operating in x2APIC mode. Exempt nested SVM's update of APICv state from the new logic as x2APIC can't be toggled on VM-Exit. In practice, invoking the x2APIC logic should be harmless precisely because it should be a glorified nop, but play it safe to avoid latent bugs, e.g. with dropping the vCPU's SRCU lock. Intel doesn't suffer from the same issue as APICv has fully independent VMCS controls for xAPIC vs. x2APIC virtualization. Technically, KVM should provide bus error semantics and not memory semantics for the APIC page when x2APIC is enabled, but KVM already provides memory semantics in other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware disabled (via APIC_BASE MSR). Note, checking apic_access_memslot_enabled without taking locks relies it being set during vCPU creation (before kvm_vcpu_reset()). vCPUs can race to set the inhibit and delete the memslot, i.e. can get false positives, but can't get false negatives as apic_access_memslot_enabled can't be toggled "on" once any vCPU reaches KVM_RUN. Opportunistically drop the "can" while updating avic_activate_vmcb()'s comment, i.e. to state that KVM _does_ support the hybrid mode. Move the "Note:" down a line to conform to preferred kernel/KVM multi-line comment style. Opportunistically update the apicv_update_lock comment, as it isn't actually used to protect apic_access_memslot_enabled (which is protected by slots_lock). Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Move APIC access page helper to common x86 codeSean Christopherson
Move the APIC access page allocation helper function to common x86 code, the allocation routine is virtually identical between APICv (VMX) and AVIC (SVM). Keep APICv's gfn_to_page() + put_page() sequence, which verifies that a backing page can be allocated, i.e. that the system isn't under heavy memory pressure. Forcing the backing page to be populated isn't strictly necessary, but skipping the effective prefetch only delays the inevitable. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Handle APICv updates for APIC "mode" changes via requestSean Christopherson
Use KVM_REQ_UPDATE_APICV to react to APIC "mode" changes, i.e. to handle the APIC being hardware enabled/disabled and/or x2APIC being toggled. There is no need to immediately update APICv state, the only requirement is that APICv be updating prior to the next VM-Enter. Making a request will allow piggybacking KVM_REQ_UPDATE_APICV to "inhibit" the APICv memslot when x2APIC is enabled. Doing that directly from kvm_lapic_set_base() isn't feasible as KVM's SRCU must not be held when modifying memslots (to avoid deadlock), and may or may not be held when kvm_lapic_set_base() is called, i.e. KVM can't do the right thing without tracking that is rightly buried behind CONFIG_PROVE_RCU=y. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Don't put/load AVIC when setting virtual APIC modeSean Christopherson
Move the VMCB updates from avic_refresh_apicv_exec_ctrl() into avic_set_virtual_apic_mode() and invert the dependency being said functions to avoid calling avic_vcpu_{load,put}() and avic_set_pi_irte_mode() when "only" setting the virtual APIC mode. avic_set_virtual_apic_mode() is invoked from common x86 with preemption enabled, which makes avic_vcpu_{load,put}() unhappy. Luckily, calling those and updating IRTE stuff is unnecessary as the only reason avic_set_virtual_apic_mode() is called is to handle transitions between xAPIC and x2APIC that don't also toggle APICv activation. And if activation doesn't change, there's no need to fiddle with the physical APIC ID table or update IRTE. The "full" refresh is guaranteed to be called if activation changes in this case as the only call to the "set" path is: kvm_vcpu_update_apicv(vcpu); static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu); and kvm_vcpu_update_apicv() invokes the refresh if activation changes: if (apic->apicv_active == activate) goto out; apic->apicv_active = activate; kvm_apic_update_apicv(vcpu); static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu); Rename the helper to reflect that it is also called during "refresh". WARNING: CPU: 183 PID: 49186 at arch/x86/kvm/svm/avic.c:1081 avic_vcpu_put+0xde/0xf0 [kvm_amd] CPU: 183 PID: 49186 Comm: stable Tainted: G O 6.0.0-smp--fcddbca45f0a-sink #34 Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022 RIP: 0010:avic_vcpu_put+0xde/0xf0 [kvm_amd] avic_refresh_apicv_exec_ctrl+0x142/0x1c0 [kvm_amd] avic_set_virtual_apic_mode+0x5a/0x70 [kvm_amd] kvm_lapic_set_base+0x149/0x1a0 [kvm] kvm_set_apic_base+0x8f/0xd0 [kvm] kvm_set_msr_common+0xa3a/0xdc0 [kvm] svm_set_msr+0x364/0x6b0 [kvm_amd] __kvm_set_msr+0xb8/0x1c0 [kvm] kvm_emulate_wrmsr+0x58/0x1d0 [kvm] msr_interception+0x1c/0x30 [kvm_amd] svm_invoke_exit_handler+0x31/0x100 [kvm_amd] svm_handle_exit+0xfc/0x160 [kvm_amd] vcpu_enter_guest+0x21bb/0x23e0 [kvm] vcpu_run+0x92/0x450 [kvm] kvm_arch_vcpu_ioctl_run+0x43e/0x6e0 [kvm] kvm_vcpu_ioctl+0x559/0x620 [kvm] Fixes: 05c4fe8c1bd9 ("KVM: SVM: Refresh AVIC configuration when changing APIC mode") Cc: stable@vger.kernel.org Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit IDSean Christopherson
Truncate the vcpu_id, a.k.a. x2APIC ID, to an 8-bit value when comparing it against the xAPIC ID to avoid false positives (sort of) on systems with >255 CPUs, i.e. with IDs that don't fit into a u8. The intent of APIC_ID_MODIFIED is to inhibit APICv/AVIC when the xAPIC is changed from it's original value, The mismatch isn't technically a false positive, as architecturally the xAPIC IDs do end up being aliased in this scenario, and neither APICv nor AVIC correctly handles IPI virtualization when there is aliasing. However, KVM already deliberately does not honor the aliasing behavior that results when an x2APIC ID gets truncated to an xAPIC ID. I.e. the resulting APICv/AVIC behavior is aligned with KVM's existing behavior when KVM's x2APIC hotplug hack is effectively enabled. If/when KVM provides a way to disable the hotplug hack, APICv/AVIC can piggyback whatever logic disables the optimized APIC map (which is what provides the hotplug hack), i.e. so that KVM's optimized map and APIC virtualization yield the same behavior. For now, fix the immediate problem of APIC virtualization being disabled for large VMs, which is a much more pressing issue than ensuring KVM honors architectural behavior for APIC ID aliasing. Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is disabledSean Christopherson
Don't inhibit APICv/AVIC due to an xAPIC ID mismatch if the APIC is hardware disabled. The ID cannot be consumed while the APIC is disabled, and the ID is guaranteed to be set back to the vcpu_id when the APIC is hardware enabled (architectural behavior correctly emulated by KVM). Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid targetSean Christopherson
Emulate ICR writes on AVIC IPI failures due to invalid targets using the same logic as failures due to invalid types. AVIC acceleration fails if _any_ of the targets are invalid, and crucially VM-Exits before sending IPIs to targets that _are_ valid. In logical mode, the destination is a bitmap, i.e. a single IPI can target multiple logical IDs. Doing nothing causes KVM to drop IPIs if at least one target is valid and at least one target is invalid. Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC") Cc: stable@vger.kernel.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: SVM: Flush the "current" TLB when activating AVICSean Christopherson
Flush the TLB when activating AVIC as the CPU can insert into the TLB while AVIC is "locally" disabled. KVM doesn't treat "APIC hardware disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC hardware disabled, AVIC is not guaranteed to be inhibited. As a result, KVM may create a valid NPT mapping for the APIC base, which the CPU can cache as a non-AVIC translation. Note, Intel handles this in vmx_set_virtual_apic_mode(). Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Purge "highest ISR" cache when updating APICv stateSean Christopherson
Purge the "highest ISR" cache when updating APICv state on a vCPU. The cache must not be used when APICv is active as hardware may emulate EOIs (and other operations) without exiting to KVM. This fixes a bug where KVM will effectively block IRQs in perpetuity due to the "highest ISR" never getting reset if APICv is activated on a vCPU while an IRQ is in-service. Hardware emulates the EOI and KVM never gets a chance to update its cache. Fixes: b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function") Cc: stable@vger.kernel.org Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Blindly get current x2APIC reg value on "nodecode write" trapsSean Christopherson
When emulating a x2APIC write in response to an APICv/AVIC trap, get the the written value from the vAPIC page without checking that reads are allowed for the target register. AVIC can generate trap-like VM-Exits on writes to EOI, and so KVM needs to get the written value from the backing page without running afoul of EOI's write-only behavior. Alternatively, EOI could be special cased to always write '0', e.g. so that the sanity check could be preserved, but x2APIC on AMD is actually supposed to disallow non-zero writes (not emulated by KVM), and the sanity check was a byproduct of how the KVM code was written, i.e. wasn't added to guard against anything in particular. Fixes: 70c8327c11c6 ("KVM: x86: Bug the VM if an accelerated x2APIC trap occurs on a "bad" reg") Fixes: 1bd9dfec9fd4 ("KVM: x86: Do not block APIC write for non ICR registers") Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13io_uring: lock overflowing for IOPOLLPavel Begunkov
syzbot reports an issue with overflow filling for IOPOLL: WARNING: CPU: 0 PID: 28 at io_uring/io_uring.c:734 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734 CPU: 0 PID: 28 Comm: kworker/u4:1 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0 Workqueue: events_unbound io_ring_exit_work Call trace:  io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734  io_req_cqe_overflow+0x5c/0x70 io_uring/io_uring.c:773  io_fill_cqe_req io_uring/io_uring.h:168 [inline]  io_do_iopoll+0x474/0x62c io_uring/rw.c:1065  io_iopoll_try_reap_events+0x6c/0x108 io_uring/io_uring.c:1513  io_uring_try_cancel_requests+0x13c/0x258 io_uring/io_uring.c:3056  io_ring_exit_work+0xec/0x390 io_uring/io_uring.c:2869  process_one_work+0x2d8/0x504 kernel/workqueue.c:2289  worker_thread+0x340/0x610 kernel/workqueue.c:2436  kthread+0x12c/0x158 kernel/kthread.c:376  ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:863 There is no real problem for normal IOPOLL as flush is also called with uring_lock taken, but it's getting more complicated for IOPOLL|SQPOLL, for which __io_cqring_overflow_flush() happens from the CQ waiting path. Reported-and-tested-by: syzbot+6805087452d72929404e@syzkaller.appspotmail.com Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-13Merge tag 'sound-6.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This became a slightly big update, but it's more or less expected, as the first batch after holidays. All changes (but for the last two last-minute fixes) have been stewed in linux-next long enough, so it's fairly safe to take: - PCM UAF fix in 32bit compat layer - ASoC board-specific fixes for Intel, AMD, Medathek, Qualcomm - SOF power management fixes - ASoC Intel link failure fixes - A series of fixes for USB-audio regressions - CS35L41 HD-audio codec regression fixes - HD-audio device-specific fixes / quirks Note that one SPI patch has been taken in ASoC subtree mistakenly, and the same fix is found in spi tree, but it should be OK to apply" * tag 'sound-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits) ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF ALSA: usb-audio: Fix possible NULL pointer dereference in snd_usb_pcm_has_fixed_rate() ALSA: hda/realtek: Enable mute/micmute LEDs on HP Spectre x360 13-aw0xxx ASoC: fsl-asoc-card: Fix naming of AC'97 CODEC widgets ASoC: fsl_ssi: Rename AC'97 streams to avoid collisions with AC'97 CODEC ALSA: hda/hdmi: Add a HP device 0x8715 to force connect list ALSA: control-led: use strscpy in set_led_id() ALSA: usb-audio: Always initialize fixed_rate in snd_usb_find_implicit_fb_sync_format() ASoC: dt-bindings: qcom,lpass-tx-macro: correct clocks on SC7280 ASoC: dt-bindings: qcom,lpass-wsa-macro: correct clocks on SM8250 ASoC: qcom: Fix building APQ8016 machine driver without SOUNDWIRE ALSA: hda: cs35l41: Check runtime suspend capability at runtime_idle ALSA: hda: cs35l41: Don't return -EINVAL from system suspend/resume ASoC: fsl_micfil: Correct the number of steps on SX controls ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform Revert "ALSA: usb-audio: Drop superfluous interface setup at parsing" ALSA: usb-audio: More refactoring of hw constraint rules ALSA: usb-audio: Relax hw constraints for implicit fb sync ALSA: usb-audio: Make sure to stop endpoints before closing EPs ALSA: hda - Enable headset mic on another Dell laptop with ALC3254 ...
2023-01-13tomoyo: Update website linkTetsuo Handa
SourceForge.JP was renamed to OSDN in May 2015. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>