summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2023-04-16Merge tag 'kbuild-fixes-v6.3-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Drop debug info from purgatory objects again - Document that kernel.org provides prebuilt LLVM toolchains - Give up handling untracked files for source package builds - Avoid creating corrupted cpio when KBUILD_BUILD_TIMESTAMP is given with a pre-epoch data. - Change panic_show_mem() to a macro to handle variable-length argument - Compress tarballs on-the-fly again * tag 'kbuild-fixes-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: do not create intermediate *.tar for tar packages kbuild: do not create intermediate *.tar for source tarballs kbuild: merge cmd_archive_linux and cmd_archive_perf init/initramfs: Fix argument forwarding to panic() in panic_show_mem() initramfs: Check negative timestamp to prevent broken cpio archive kbuild: give up untracked files for source package builds Documentation/llvm: Add a note about prebuilt kernel.org toolchains purgatory: fix disabling debug info
2023-04-14x86/hyperv: Mark hv_ghcb_terminate() as noreturnGuilherme G. Piccoli
Annotate the function prototype and definition as noreturn to prevent objtool warnings like: vmlinux.o: warning: objtool: hyperv_init+0x55c: unreachable instruction Also, as per Josh's suggestion, add it to the global_noreturns list. As a comparison, an objdump output without the annotation: [...] 1b63: mov $0x1,%esi 1b68: xor %edi,%edi 1b6a: callq ffffffff8102f680 <hv_ghcb_terminate> 1b6f: jmpq ffffffff82f217ec <hyperv_init+0x9c> # unreachable 1b74: cmpq $0xffffffffffffffff,-0x702a24(%rip) [...] Now, after adding the __noreturn to the function prototype: [...] 17df: callq ffffffff8102f6d0 <hv_ghcb_negotiate_protocol> 17e4: test %al,%al 17e6: je ffffffff82f21bb9 <hyperv_init+0x469> [...] <many insns> 1bb9: mov $0x1,%esi 1bbe: xor %edi,%edi 1bc0: callq ffffffff8102f680 <hv_ghcb_terminate> 1bc5: nopw %cs:0x0(%rax,%rax,1) # end of function Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/32453a703dfcf0d007b473c9acbf70718222b74b.1681342859.git.jpoimboe@kernel.org
2023-04-14x86/cpu: Mark {hlt,resume}_play_dead() __noreturnJosh Poimboeuf
Fixes the following warning: vmlinux.o: warning: objtool: resume_play_dead+0x21: unreachable instruction Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/ce1407c4bf88b1334fe40413126343792a77ca50.1681342859.git.jpoimboe@kernel.org
2023-04-14cpu: Mark nmi_panic_self_stop() __noreturnJosh Poimboeuf
In preparation for improving objtool's handling of weak noreturn functions, mark nmi_panic_self_stop() __noreturn. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/316fc6dfab5a8c4e024c7185484a1ee5fb0afb79.1681342859.git.jpoimboe@kernel.org
2023-04-14x86/head: Mark *_start_kernel() __noreturnJosh Poimboeuf
Now that start_kernel() is __noreturn, mark its chain of callers __noreturn. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/c2525f96b88be98ee027ee0291d58003036d4120.1681342859.git.jpoimboe@kernel.org
2023-04-14x86/linkage: Fix padding for typed functionsJosh Poimboeuf
CFI typed functions are failing to get padded properly for CONFIG_CALL_PADDING. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/721f0da48d2a49fe907225711b8b76a2b787f9a8.1681331135.git.jpoimboe@kernel.org
2023-04-14Merge branches 'iommu/fixes', 'arm/allwinner', 'arm/exynos', 'arm/mediatek', ↵Joerg Roedel
'arm/omap', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 'unisoc', 'x86/vt-d', 'x86/amd', 'core' and 'platform-remove_new' into next
2023-04-13x86/mm/dump_pagetables: remove MODULE_LICENSE in non-modulesNick Alcock
Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: linux-modules@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-04-13crypto: blake2s: remove module_init and module.h inclusionNick Alcock
Now this can no longer be built as a module, drop all remaining module-related code as well. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: linux-modules@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: linux-crypto@vger.kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-04-13crypto: remove MODULE_LICENSE in non-modulesNick Alcock
Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: linux-modules@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: linux-crypto@vger.kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-04-13x86/rtc: Remove __init for runtime functionsMatija Glavinic Pecotic
set_rtc_noop(), get_rtc_noop() are after booting, therefore their __init annotation is wrong. A crash was observed on an x86 platform where CMOS RTC is unused and disabled via device tree. set_rtc_noop() was invoked from ntp: sync_hw_clock(), although CONFIG_RTC_SYSTOHC=n, however sync_cmos_clock() doesn't honour that. Workqueue: events_power_efficient sync_hw_clock RIP: 0010:set_rtc_noop Call Trace: update_persistent_clock64 sync_hw_clock Fix this by dropping the __init annotation from set/get_rtc_noop(). Fixes: c311ed6183f4 ("x86/init: Allow DT configured systems to disable RTC at boot time") Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/59f7ceb1-446b-1d3d-0bc8-1f0ee94b1e18@nokia.com
2023-04-12x86/ioapic: Don't return 0 from arch_dynirq_lower_bound()Saurabh Sengar
arch_dynirq_lower_bound() is invoked by the core interrupt code to retrieve the lowest possible Linux interrupt number for dynamically allocated interrupts like MSI. The x86 implementation uses this to exclude the IO/APIC GSI space. This works correctly as long as there is an IO/APIC registered, but returns 0 if not. This has been observed in VMs where the BIOS does not advertise an IO/APIC. 0 is an invalid interrupt number except for the legacy timer interrupt on x86. The return value is unchecked in the core code, so it ends up to allocate interrupt number 0 which is subsequently considered to be invalid by the caller, e.g. the MSI allocation code. The function has already a check for 0 in the case that an IO/APIC is registered, as ioapic_dynirq_base is 0 in case of device tree setups. Consolidate this and zero check for both ioapic_dynirq_base and gsi_top, which is used in the case that no IO/APIC is registered. Fixes: 3e5bedc2c258 ("x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines") Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1679988604-20308-1-git-send-email-ssengar@linux.microsoft.com
2023-04-11Merge tag 'pci-v6.3-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Provide pci_msix_can_alloc_dyn() stub when CONFIG_PCI_MSI unset to avoid build errors (Reinette Chatre) - Quirk AMD XHCI controller that loses MSI-X state in D3hot to avoid broken USB after hotplug or suspend/resume (Basavaraj Natikar) - Fix use-after-free in pci_bus_release_domain_nr() (Rob Herring) * tag 'pci-v6.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: Fix use-after-free in pci_bus_release_domain_nr() x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()
2023-04-11PCI: Fix up L1SS capability for Intel Apollo Lake Root PortRon Lee
On Google Coral and Reef family Chromebooks with Intel Apollo Lake SoC, firmware clobbers the header of the L1 PM Substates capability and the previous capability when returning from D3cold to D0. Save those headers at enumeration-time and restore them at resume. [bhelgaas: The main benefit is to make the lspci output after resume correct. Apparently there's little or no effect on power consumption.] Link: https://lore.kernel.org/linux-pci/CAFJ_xbq0cxcH-cgpXLU4Mjk30+muWyWm1aUZGK7iG53yaLBaQg@mail.gmail.com/T/#u Link: https://lore.kernel.org/r/20230411160213.4453-1-ron.lee@intel.com Signed-off-by: Ron Lee <ron.lee@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-04-11KVM: x86: Filter out XTILE_CFG if XTILE_DATA isn't permittedSean Christopherson
Filter out XTILE_CFG from the supported XCR0 reported to userspace if the current process doesn't have access to XTILE_DATA. Attempting to set XTILE_CFG in XCR0 will #GP if XTILE_DATA is also not set, and so keeping XTILE_CFG as supported results in explosions if userspace feeds KVM_GET_SUPPORTED_CPUID back into KVM and the guest doesn't sanity check CPUID. Fixes: 445ecdf79be0 ("kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID") Reported-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Aaron Lewis <aaronlewis@google.com> Tested-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20230405004520.421768-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-11KVM: x86: Add a helper to handle filtering of unpermitted XCR0 featuresAaron Lewis
Add a helper, kvm_get_filtered_xcr0(), to dedup code that needs to account for XCR0 features that require explicit opt-in on a per-process basis. In addition to documenting when KVM should/shouldn't consult xstate_get_guest_group_perm(), the helper will also allow sanitizing the filtered XCR0 to avoid enumerating architecturally illegal XCR0 values, e.g. XTILE_CFG without XTILE_DATA. No functional changes intended. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Mingwei Zhang <mizhang@google.com> [sean: rename helper, move to x86.h, massage changelog] Reviewed-by: Aaron Lewis <aaronlewis@google.com> Tested-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20230405004520.421768-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-11KVM: nVMX: Emulate NOPs in L2, and PAUSE if it's not interceptedSean Christopherson
Extend VMX's nested intercept logic for emulated instructions to handle "pause" interception, in quotes because KVM's emulator doesn't filter out NOPs when checking for nested intercepts. Failure to allow emulation of NOPs results in KVM injecting a #UD into L2 on any NOP that collides with the emulator's definition of PAUSE, i.e. on all single-byte NOPs. For PAUSE itself, honor L1's PAUSE-exiting control, but ignore PLE to avoid unnecessarily injecting a #UD into L2. Per the SDM, the first execution of PAUSE after VM-Entry is treated as the beginning of a new loop, i.e. will never trigger a PLE VM-Exit, and so L1 can't expect any given execution of PAUSE to deterministically exit. ... the processor considers this execution to be the first execution of PAUSE in a loop. (It also does so for the first execution of PAUSE at CPL 0 after VM entry.) All that said, the PLE side of things is currently a moot point, as KVM doesn't expose PLE to L1. Note, vmx_check_intercept() is still wildly broken when L1 wants to intercept an instruction, as KVM injects a #UD instead of synthesizing a nested VM-Exit. That issue extends far beyond NOP/PAUSE and needs far more effort to fix, i.e. is a problem for the future. Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Cc: Mathias Krause <minipli@grsecurity.net> Cc: stable@vger.kernel.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230405002359.418138-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-10KVM: x86/mmu: Refresh CR0.WP prior to checking for emulated permission faultsSean Christopherson
Refresh the MMU's snapshot of the vCPU's CR0.WP prior to checking for permission faults when emulating a guest memory access and CR0.WP may be guest owned. If the guest toggles only CR0.WP and triggers emulation of a supervisor write, e.g. when KVM is emulating UMIP, KVM may consume a stale CR0.WP, i.e. use stale protection bits metadata. Note, KVM passes through CR0.WP if and only if EPT is enabled as CR0.WP is part of the MMU role for legacy shadow paging, and SVM (NPT) doesn't support per-bit interception controls for CR0. Don't bother checking for EPT vs. NPT as the "old == new" check will always be true under NPT, i.e. the only cost is the read of vcpu->arch.cr4 (SVM unconditionally grabs CR0 from the VMCB on VM-Exit). Reported-by: Mathias Krause <minipli@grsecurity.net> Link: https://lkml.kernel.org/r/677169b4-051f-fcae-756b-9a3e1bb9f8fe%40grsecurity.net Fixes: fb509f76acc8 ("KVM: VMX: Make CR0.WP a guest owned bit") Tested-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20230405002608.418442-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-10KVM: x86/mmu: Move filling of Hyper-V's TLB range struct into Hyper-V codeSean Christopherson
Refactor Hyper-V's range-based TLB flushing API to take a gfn+nr_pages pair instead of a struct, and bury said struct in Hyper-V specific code. Passing along two params generates much better code for the common case where KVM is _not_ running on Hyper-V, as forwarding the flush on to Hyper-V's hv_flush_remote_tlbs_range() from kvm_flush_remote_tlbs_range() becomes a tail call. Cc: David Matlack <dmatlack@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230405003133.419177-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-10KVM: x86: Rename Hyper-V remote TLB hooks to match established schemeSean Christopherson
Rename the Hyper-V hooks for TLB flushing to match the naming scheme used by all the other TLB flushing hooks, e.g. in kvm_x86_ops, vendor code, arch hooks from common code, etc. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230405003133.419177-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-10Merge tag 'uml-for-linus-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML fix from Richard Weinberger: - Build regression fix for older gcc versions * tag 'uml-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: Only disable SSE on clang to work around old GCC bugs
2023-04-09Merge tag 'x86_urgent_for_v6.3_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add a new Intel Arrow Lake CPU model number - Fix a confusion about how to check the version of the ACPI spec which supports a "online capable" bit in the MADT table which lead to a bunch of boot breakages with Zen1 systems and VMs * tag 'x86_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add model number for Intel Arrow Lake processor x86/acpi/boot: Correct acpi_is_processor_usable() check x86/ACPI/boot: Use FADT version to check support for online capable
2023-04-08x86/kexec: remove unnecessary arch_kexec_kernel_image_load()Bjorn Helgaas
Patch series "kexec: Remove unnecessary arch hook", v2. There are no arch-specific things in arch_kexec_kernel_image_load(), so remove it and just use the generic version. This patch (of 2): The x86 implementation of arch_kexec_kernel_image_load() is functionally identical to the generic arch_kexec_kernel_image_load(): arch_kexec_kernel_image_load # x86 if (!image->fops || !image->fops->load) return ERR_PTR(-ENOEXEC); return image->fops->load(image, image->kernel_buf, ...) arch_kexec_kernel_image_load # generic kexec_image_load_default if (!image->fops || !image->fops->load) return ERR_PTR(-ENOEXEC); return image->fops->load(image, image->kernel_buf, ...) Remove the x86-specific version and use the generic arch_kexec_kernel_image_load(). No functional change intended. Link: https://lkml.kernel.org/r/20230307224416.907040-1-helgaas@kernel.org Link: https://lkml.kernel.org/r/20230307224416.907040-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-08ELF: fix all "Elf" typosAlexey Dobriyan
ELF is acronym and therefore should be spelled in all caps. I left one exception at Documentation/arm/nwfpe/nwfpe.rst which looks like being written in the first person. Link: https://lkml.kernel.org/r/Y/3wGWQviIOkyLJW@p183 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-08purgatory: fix disabling debug infoAlyssa Ross
Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS. Instead, it includes -g, the appropriate -gdwarf-* flag, and also the -Wa versions of both of those if building with Clang and GNU as. As a result, debug info was being generated for the purgatory objects, even though the intention was that it not be. Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files") Signed-off-by: Alyssa Ross <hi@alyssa.is> Cc: stable@vger.kernel.org Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-04-07KVM: x86/pmu: Prevent the PMU from counting disallowed eventsAaron Lewis
When counting "Instructions Retired" (0xc0) in a guest, KVM will occasionally increment the PMU counter regardless of if that event is being filtered. This is because some PMU events are incremented via kvm_pmu_trigger_event(), which doesn't know about the event filter. Add the event filter to kvm_pmu_trigger_event(), so events that are disallowed do not increment their counters. Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230307141400.1486314-2-aaronlewis@google.com [sean: prepend "pmc" to the new function] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07KVM: x86/pmu: Fix a typo in kvm_pmu_request_counter_reprogam()Like Xu
Fix a "reprogam" => "reprogram" typo in kvm_pmu_request_counter_reprogam(). Fixes: 68fb4757e867 ("KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()") Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230310113349.31799-1-likexu@tencent.com [sean: trim the changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07x86/apic: Fix atomic update of offset in reserve_eilvt_offset()Uros Bizjak
The detection of atomic update failure in reserve_eilvt_offset() is not correct. The value returned by atomic_cmpxchg() should be compared to the old value from the location to be updated. If these two are the same, then atomic update succeeded and "eilvt_offsets[offset]" location is updated to "new" in an atomic way. Otherwise, the atomic update failed and it should be retried with the value from "eilvt_offsets[offset]" - exactly what atomic_try_cmpxchg() does in a correct and more optimal way. Fixes: a68c439b1966c ("apic, x86: Check if EILVT APIC registers are available (AMD only)") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230227160917.107820-1-ubizjak@gmail.com
2023-04-06KVM: x86/pmu: Rewrite reprogram_counters() to improve performanceLike Xu
A valid pmc is always tested before using pmu->reprogram_pmi. Eliminate this part of the redundancy by setting the counter's bitmask directly, and in addition, trigger KVM_REQ_PMU only once to save more cpu cycles. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230214050757.9623-4-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: VMX: Refactor intel_pmu_{g,}set_msr() to align with other helpersSean Christopherson
Invert the flows in intel_pmu_{g,s}et_msr()'s case statements so that they follow the kernel's preferred style of: if (<not valid>) return <error> <commit change> return <success> which is also the style used by every other {g,s}et_msr() helper (except AMD's PMU variant, which doesn't use a switch statement). Modify the "set" paths with costly side effects, i.e. that reprogram counters, to skip only the side effects, i.e. to perform reserved bits checks even if the value is unchanged. None of the reserved bits checks are expensive, so there's no strong justification for skipping them, and guarding only the side effect makes it slightly more obvious what is being skipped and why. No functional change intended (assuming no reserved bit bugs). Link: https://lkml.kernel.org/r/Y%2B6cfen%2FCpO3%2FdLO%40google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86/pmu: Rename pmc_is_enabled() to pmc_is_globally_enabled()Like Xu
The name of function pmc_is_enabled() is a bit misleading. A PMC can be disabled either by PERF_CLOBAL_CTRL or by its corresponding EVTSEL. Append global semantics to its name. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230214050757.9623-2-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86/pmu: Zero out LBR capabilities during PMU refreshSean Christopherson
Zero out the LBR capabilities during PMU refresh to avoid exposing LBRs to the guest against userspace's wishes. If userspace modifies the guest's CPUID model or invokes KVM_CAP_PMU_CAPABILITY to disable vPMU after an initial KVM_SET_CPUID2, but before the first KVM_RUN, KVM will retain the previous LBR info due to bailing before refreshing the LBR descriptor. Note, this is a very theoretical bug, there is no known use case where a VMM would deliberately enable the vPMU via KVM_SET_CPUID2, and then later disable the vPMU. Link: https://lore.kernel.org/r/20230311004618.920745-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86/pmu: WARN and bug the VM if PMU is refreshed after vCPU has runSean Christopherson
Now that KVM disallows changing feature MSRs, i.e. PERF_CAPABILITIES, after running a vCPU, WARN and bug the VM if the PMU is refreshed after the vCPU has run. Note, KVM has disallowed CPUID updates after running a vCPU since commit feb627e8d6f6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN"), i.e. PERF_CAPABILITIES was the only remaining way to trigger a PMU refresh after KVM_RUN. Cc: Like Xu <like.xu.linux@gmail.com> Link: https://lore.kernel.org/r/20230311004618.920745-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Disallow writes to immutable feature MSRs after KVM_RUNSean Christopherson
Disallow writes to feature MSRs after KVM_RUN to prevent userspace from changing the vCPU model after running the vCPU. Similar to guest CPUID, KVM uses feature MSRs to configure intercepts, determine what operations are/aren't allowed, etc. Changing the capabilities while the vCPU is active will at best yield unpredictable guest behavior, and at worst could be dangerous to KVM. Allow writing the current value, e.g. so that userspace can blindly set all MSRs when emulating RESET, and unconditionally allow writes to MSR_IA32_UCODE_REV so that userspace can emulate patch loads. Special case the VMX MSRs to keep the generic list small, i.e. so that KVM can do a linear walk of the generic list without incurring meaningful overhead. Cc: Like Xu <like.xu.linux@gmail.com> Cc: Yu Zhang <yu.c.zhang@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Generate set of VMX feature MSRs using first/last definitionsSean Christopherson
Add VMX MSRs to the runtime list of feature MSRs by iterating over the range of emulated MSRs instead of manually defining each MSR in the "all" list. Using the range definition reduces the cost of emulating a new VMX MSR, e.g. prevents forgetting to add an MSR to the list. Extracting the VMX MSRs from the "all" list, which is a compile-time constant, also shrinks the list to the point where the compiler can heavily optimize code that iterates over the list. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Add macros to track first...last VMX feature MSRsSean Christopherson
Add macros to track the range of VMX feature MSRs that are emulated by KVM to reduce the maintenance cost of extending the set of emulated MSRs. Note, KVM doesn't necessarily emulate all known/consumed VMX MSRs, e.g. PROCBASED_CTLS3 is consumed by KVM to enable IPI virtualization, but is not emulated as KVM doesn't emulate/virtualize IPI virtualization for nested guests. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Add a helper to query whether or not a vCPU has ever runSean Christopherson
Add a helper to query if a vCPU has run so that KVM doesn't have to open code the check on last_vmentry_cpu being set to a magic value. No functional change intended. Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Like Xu <like.xu.linux@gmail.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Rename kvm_init_msr_list() to clarify it inits multiple listsSean Christopherson
Rename kvm_init_msr_list() to kvm_init_msr_lists() to clarify that it initializes multiple lists: MSRs to save, emulated MSRs, and feature MSRs. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hotBasavaraj Natikar
The AMD [1022:15b8] USB controller loses some internal functional MSI-X context when transitioning from D0 to D3hot. BIOS normally traps D0->D3hot and D3hot->D0 transitions so it can save and restore that internal context, but some firmware in the field can't do this because it fails to clear the AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit. Clear AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit before USB controller initialization during boot. Link: https://lore.kernel.org/linux-usb/Y%2Fz9GdHjPyF2rNG3@glanzmann.de/T/#u Link: https://lore.kernel.org/r/20230329172859.699743-1-Basavaraj.Natikar@amd.com Reported-by: Thomas Glanzmann <thomas@glanzmann.de> Tested-by: Thomas Glanzmann <thomas@glanzmann.de> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Cc: stable@vger.kernel.org
2023-04-06x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outsideKirill A. Shutemov
arch_prctl(ARCH_FORCE_TAGGED_SVA) overrides the default and allows LAM and SVA to co-exist in the process. It is expected by called by the process when it knows what it is doing. arch_prctl() operates on the current process, but the same code is reachable from ptrace where it can be called on arbitrary task. Make it strict and only allow to set MM_CONTEXT_FORCE_TAGGED_SVA for the current process. Fixes: 23e5d9ec2bab ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive") Suggested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/all/20230403111020.3136-3-kirill.shutemov%40linux.intel.com
2023-04-06x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVAKirill A. Shutemov
Normally, LAM and SVA are mutually exclusive. LAM enabling will fail if SVA is already in use. Correct error code for the failure. EINTR is nonsensical there. Fixes: 23e5d9ec2bab ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/all/CACT4Y+YfqSMsZArhh25TESmG-U4jO5Hjphz87wKSnTiaw2Wrfw@mail.gmail.com Link: https://lore.kernel.org/all/20230403111020.3136-2-kirill.shutemov%40linux.intel.com
2023-04-06KVM: SVM: Return the local "r" variable from svm_set_msr()Sean Christopherson
Rename "r" to "ret" and actually return it from svm_set_msr() to reduce the probability of repeating the mistake of commit 723d5fb0ffe4 ("kvm: svm: Add IA32_FLUSH_CMD guest support"), which set "r" thinking that it would be propagated to the caller. Alternatively, the declaration of "r" could be moved into the handling of MSR_TSC_AUX, but that risks variable shadowing in the future. A wrapper for kvm_set_user_return_msr() would allow eliding a local variable, but that feels like delaying the inevitable. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMDSean Christopherson
Virtualize FLUSH_L1D so that the guest can use the performant L1D flush if one of the many mitigations might require a flush in the guest, e.g. Linux provides an option to flush the L1D when switching mms. Passthrough MSR_IA32_FLUSH_CMD for write when it's supported in hardware and exposed to the guest, i.e. always let the guest write it directly if FLUSH_L1D is fully supported. Forward writes to hardware in host context on the off chance that KVM ends up emulating a WRMSR, or in the really unlikely scenario where userspace wants to force a flush. Restrict these forwarded WRMSRs to the known command out of an abundance of caution. Passing through the MSR means the guest can throw any and all values at hardware, but doing so in host context is arguably a bit more dangerous. Link: https://lkml.kernel.org/r/CALMp9eTt3xzAEoQ038bJQ9LN0ZOXrSWsN7xnNUD%2B0SS%3DWwF7Pg%40mail.gmail.com Link: https://lore.kernel.org/all/20230201132905.549148-2-eesposit@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Move MSR_IA32_PRED_CMD WRMSR emulation to common codeSean Christopherson
Dedup the handling of MSR_IA32_PRED_CMD across VMX and SVM by moving the logic to kvm_set_msr_common(). Now that the MSR interception toggling is handled as part of setting guest CPUID, the VMX and SVM paths are identical. Opportunistically massage the code to make it a wee bit denser. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20230322011440.2195485-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: SVM: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUIDSean Christopherson
Passthrough MSR_IA32_PRED_CMD based purely on whether or not the MSR is supported and enabled, i.e. don't wait until the first write. There's no benefit to deferred passthrough, and the extra logic only adds complexity. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: VMX: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUIDSean Christopherson
Passthrough MSR_IA32_PRED_CMD based purely on whether or not the MSR is supported and enabled, i.e. don't wait until the first write. There's no benefit to deferred passthrough, and the extra logic only adds complexity. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20230322011440.2195485-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Revert MSR_IA32_FLUSH_CMD.FLUSH_L1D enablingSean Christopherson
Revert the recently added virtualizing of MSR_IA32_FLUSH_CMD, as both the VMX and SVM are fatally buggy to guests that use MSR_IA32_FLUSH_CMD or MSR_IA32_PRED_CMD, and because the entire foundation of the logic is flawed. The most immediate problem is an inverted check on @cmd that results in rejecting legal values. SVM doubles down on bugs and drops the error, i.e. silently breaks all guest mitigations based on the command MSRs. The next issue is that neither VMX nor SVM was updated to mark MSR_IA32_FLUSH_CMD as being a possible passthrough MSR, which isn't hugely problematic, but does break MSR filtering and triggers a WARN on VMX designed to catch this exact bug. The foundational issues stem from the MSR_IA32_FLUSH_CMD code reusing logic from MSR_IA32_PRED_CMD, which in turn was likely copied from KVM's support for MSR_IA32_SPEC_CTRL. The copy+paste from MSR_IA32_SPEC_CTRL was misguided as MSR_IA32_PRED_CMD (and MSR_IA32_FLUSH_CMD) is a write-only MSR, i.e. doesn't need the same "deferred passthrough" shenanigans as MSR_IA32_SPEC_CTRL. Revert all MSR_IA32_FLUSH_CMD enabling in one fell swoop so that there is no point where KVM advertises, but does not support, L1D_FLUSH. This reverts commits 45cf86f26148e549c5ba4a8ab32a390e4bde216e, 723d5fb0ffe4c02bd4edf47ea02c02e454719f28, and a807b78ad04b2eaa348f52f5cc7702385b6de1ee. Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lkml.kernel.org/r/20230317190432.GA863767%40dev-arch.thelio-3990X Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Message-Id: <20230322011440.2195485-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-05x86/mm: try VMA lock-based page fault handling firstSuren Baghdasaryan
Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. Link: https://lkml.kernel.org/r/20230227173632.3292573-30-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05KVM: x86/pmu: Disallow legacy LBRs if architectural LBRs are availableSean Christopherson
Disallow enabling LBR support if the CPU supports architectural LBRs. Traditional LBR support is absent on CPU models that have architectural LBRs, and KVM doesn't yet support arch LBRs, i.e. KVM will pass through non-existent MSRs if userspace enables LBRs for the guest. Cc: stable@vger.kernel.org Cc: Yang Weijiang <weijiang.yang@intel.com> Cc: Like Xu <like.xu.linux@gmail.com> Reported-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: be635e34c284 ("KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES") Tested-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230128001427.2548858-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-05KVM: x86: set "mitigate_smt_rsb" storage-class-specifier to staticTom Rix
smatch reports arch/x86/kvm/x86.c:199:20: warning: symbol 'mitigate_smt_rsb' was not declared. Should it be static? This variable is only used in one file so it should be static. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230404010141.1913667-1-trix@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>