summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm
AgeCommit message (Collapse)Author
2022-08-05Merge tag 'asm-generic-6.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "There are three independent sets of changes: - Sai Prakash Ranjan adds tracing support to the asm-generic version of the MMIO accessors, which is intended to help understand problems with device drivers and has been part of Qualcomm's vendor kernels for many years - A patch from Sebastian Siewior to rework the handling of IRQ stacks in softirqs across architectures, which is needed for enabling PREEMPT_RT - The last patch to remove the CONFIG_VIRT_TO_BUS option and some of the code behind that, after the last users of this old interface made it in through the netdev, scsi, media and staging trees" * tag 'asm-generic-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: uapi: asm-generic: fcntl: Fix typo 'the the' in comment arch/*/: remove CONFIG_VIRT_TO_BUS soc: qcom: geni: Disable MMIO tracing for GENI SE serial: qcom_geni_serial: Disable MMIO tracing for geni serial asm-generic/io: Add logging support for MMIO accessors KVM: arm64: Add a flag to disable MMIO trace for nVHE KVM lib: Add register read/write tracing support drm/meson: Fix overflow implicit truncation warnings irqchip/tegra: Fix overflow implicit truncation warnings coresight: etm4x: Use asm-generic IO memory barriers arm64: io: Use asm-generic high level MMIO accessors arch/*: Disable softirq stacks on PREEMPT_RT.
2022-08-04Merge tag 'pci-v5.20-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Consolidate duplicated 'next function' scanning and extend to allow 'isolated functions' on s390, similar to existing hypervisors (Niklas Schnelle) Resource management: - Implement pci_iobar_pfn() for sparc, which allows us to remove the sparc-specific pci_mmap_page_range() and pci_mmap_resource_range(). This removes the ability to map the entire PCI I/O space using /proc/bus/pci, but we believe that's already been broken since v2.6.28 (Arnd Bergmann) - Move common PCI definitions to asm-generic/pci.h and rework others to be be more specific and more encapsulated in arches that need them (Stafford Horne) Power management: - Convert drivers to new *_PM_OPS macros to avoid need for '#ifdef CONFIG_PM_SLEEP' or '__maybe_unused' (Bjorn Helgaas) Virtualization: - Add ACS quirk for Broadcom BCM5750x multifunction NICs that isolate the functions but don't advertise an ACS capability (Pavan Chebbi) Error handling: - Clear PCI Status register during enumeration in case firmware left errors logged (Kai-Heng Feng) - When we have native control of AER, enable error reporting for all devices that support AER. Previously only a few drivers enabled this (Stefan Roese) - Keep AER error reporting enabled for switches. Previously we enabled this during enumeration but immediately disabled it (Stefan Roese) - Iterate over error counters instead of error strings to avoid printing junk in AER sysfs counters (Mohamed Khalfella) ASPM: - Remove pcie_aspm_pm_state_change() so ASPM config changes, e.g., via sysfs, are not lost across power state changes (Kai-Heng Feng) Endpoint framework: - Don't stop an EPC when unbinding an EPF from it (Shunsuke Mie) Endpoint embedded DMA controller driver: - Simplify and clean up support for the DesignWare embedded DMA (eDMA) controller (Frank Li, Serge Semin) Broadcom STB PCIe controller driver: - Avoid config space accesses when link is down because we can't recover from the CPU aborts these cause (Jim Quinlan) - Look for power regulators described under Root Ports in DT and enable them before scanning the secondary bus (Jim Quinlan) - Disable/enable regulators in suspend/resume (Jim Quinlan) Freescale i.MX6 PCIe controller driver: - Simplify and clean up clock and PHY management (Richard Zhu) - Disable/enable regulators in suspend/resume (Richard Zhu) - Set PCIE_DBI_RO_WR_EN before writing DBI registers (Richard Zhu) - Allow speeds faster than Gen2 (Richard Zhu) - Make link being down a non-fatal error so controller probe doesn't fail if there are no Endpoints connected (Richard Zhu) Loongson PCIe controller driver: - Add ACPI and MCFG support for Loongson LS7A (Huacai Chen) - Avoid config reads to non-existent LS2K/LS7A devices because a hardware defect causes machine hangs (Huacai Chen) - Work around LS7A integrated devices that report incorrect Interrupt Pin values (Jianmin Lv) Marvell Aardvark PCIe controller driver: - Add support for AER and Slot capability on emulated bridge (Pali Rohár) MediaTek PCIe controller driver: - Add Airoha EN7532 to DT binding (John Crispin) - Allow building of driver for ARCH_AIROHA (Felix Fietkau) MediaTek PCIe Gen3 controller driver: - Print decoded LTSSM state when the link doesn't come up (Jianjun Wang) NVIDIA Tegra194 PCIe controller driver: - Convert DT binding to json-schema (Vidya Sagar) - Add DT bindings and driver support for Tegra234 Root Port and Endpoint mode (Vidya Sagar) - Fix some Root Port interrupt handling issues (Vidya Sagar) - Set default Max Payload Size to 256 bytes (Vidya Sagar) - Fix Data Link Feature capability programming (Vidya Sagar) - Extend Endpoint mode support to devices beyond Controller-5 (Vidya Sagar) Qualcomm PCIe controller driver: - Rework clock, reset, PHY power-on ordering to avoid hangs and improve consistency (Robert Marko, Christian Marangi) - Move pipe_clk handling to PHY drivers (Dmitry Baryshkov) - Add IPQ60xx support (Selvam Sathappan Periakaruppan) - Allow ASPM L1 and substates for 2.7.0 (Krishna chaitanya chundru) - Add support for more than 32 MSI interrupts (Dmitry Baryshkov) Renesas R-Car PCIe controller driver: - Convert DT binding to json-schema (Herve Codina) - Add Renesas RZ/N1D (R9A06G032) to rcar-gen2 DT binding and driver (Herve Codina) Samsung Exynos PCIe controller driver: - Fix phy-exynos-pcie driver so it follows the 'phy_init() before phy_power_on()' PHY programming model (Marek Szyprowski) Synopsys DesignWare PCIe controller driver: - Simplify and clean up the DWC core extensively (Serge Semin) - Fix an issue with programming the ATU for regions that cross a 4GB boundary (Serge Semin) - Enable the CDM check if 'snps,enable-cdm-check' exists; previously we skipped it if 'num-lanes' was absent (Serge Semin) - Allocate a 32-bit DMA-able page to be MSI target instead of using a driver data structure that may not be addressable with 32-bit address (Will McVicker) - Add DWC core support for more than 32 MSI interrupts (Dmitry Baryshkov) Xilinx Versal CPM PCIe controller driver: - Add DT binding and driver support for Versal CPM5 Gen5 Root Port (Bharat Kumar Gogada)" * tag 'pci-v5.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (150 commits) PCI: imx6: Support more than Gen2 speed link mode PCI: imx6: Set PCIE_DBI_RO_WR_EN before writing DBI registers PCI: imx6: Reformat suspend callback to keep symmetric with resume PCI: imx6: Move the imx6_pcie_ltssm_disable() earlier PCI: imx6: Disable clocks in reverse order of enable PCI: imx6: Do not hide PHY driver callbacks and refine the error handling PCI: imx6: Reduce resume time by only starting link if it was up before suspend PCI: imx6: Mark the link down as non-fatal error PCI: imx6: Move regulator enable out of imx6_pcie_deassert_core_reset() PCI: imx6: Turn off regulator when system is in suspend mode PCI: imx6: Call host init function directly in resume PCI: imx6: Disable i.MX6QDL clock when disabling ref clocks PCI: imx6: Propagate .host_init() errors to caller PCI: imx6: Collect clock enables in imx6_pcie_clk_enable() PCI: imx6: Factor out ref clock disable to match enable PCI: imx6: Move imx6_pcie_clk_disable() earlier PCI: imx6: Move imx6_pcie_enable_ref_clk() earlier PCI: imx6: Move PHY management functions together PCI: imx6: Move imx6_pcie_grp_offset(), imx6_pcie_configure_type() earlier PCI: imx6: Convert to NOIRQ_SYSTEM_SLEEP_PM_OPS() ...
2022-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "Quite a large pull request due to a selftest API overhaul and some patches that had come in too late for 5.19. ARM: - Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack - Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure - Disagregation of the vcpu flags in separate sets to better track their use model. - A fix for the GICv2-on-v3 selftest - A small set of cosmetic fixes RISC-V: - Track ISA extensions used by Guest using bitmap - Added system instruction emulation framework - Added CSR emulation framework - Added gfp_custom flag in struct kvm_mmu_memory_cache - Added G-stage ioremap() and iounmap() functions - Added support for Svpbmt inside Guest s390: - add an interface to provide a hypervisor dump for secure guests - improve selftests to use TAP interface - enable interpretive execution of zPCI instructions (for PCI passthrough) - First part of deferred teardown - CPU Topology - PV attestation - Minor fixes x86: - Permit guests to ignore single-bit ECC errors - Intel IPI virtualization - Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS - PEBS virtualization - Simplify PMU emulation by just using PERF_TYPE_RAW events - More accurate event reinjection on SVM (avoid retrying instructions) - Allow getting/setting the state of the speaker port data bit - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent - "Notify" VM exit (detect microarchitectural hangs) for Intel - Use try_cmpxchg64 instead of cmpxchg64 - Ignore benign host accesses to PMU MSRs when PMU is disabled - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior - Allow NX huge page mitigation to be disabled on a per-vm basis - Port eager page splitting to shadow MMU as well - Enable CMCI capability by default and handle injected UCNA errors - Expose pid of vcpu threads in debugfs - x2AVIC support for AMD - cleanup PIO emulation - Fixes for LLDT/LTR emulation - Don't require refcounted "struct page" to create huge SPTEs - Miscellaneous cleanups: - MCE MSR emulation - Use separate namespaces for guest PTEs and shadow PTEs bitmasks - PIO emulation - Reorganize rmap API, mostly around rmap destruction - Do not workaround very old KVM bugs for L0 that runs with nesting enabled - new selftests API for CPUID Generic: - Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache - new selftests API using struct kvm_vcpu instead of a (vm, id) tuple" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits) selftests: kvm: set rax before vmcall selftests: KVM: Add exponent check for boolean stats selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test selftests: KVM: Check stat name before other fields KVM: x86/mmu: remove unused variable RISC-V: KVM: Add support for Svpbmt inside Guest/VM RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap() RISC-V: KVM: Add G-stage ioremap() and iounmap() functions KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache RISC-V: KVM: Add extensible CSR emulation framework RISC-V: KVM: Add extensible system instruction emulation framework RISC-V: KVM: Factor-out instruction emulation into separate sources RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function RISC-V: KVM: Fix variable spelling mistake RISC-V: KVM: Improve ISA extension by using a bitmap KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs() KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT KVM: x86: Do not block APIC write for non ICR registers ...
2022-08-03Merge tag 'efi-next-for-v5.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: - Enable mirrored memory for arm64 - Fix up several abuses of the efivar API - Refactor the efivar API in preparation for moving the 'business logic' part of it into efivarfs - Enable ACPI PRM on arm64 * tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits) ACPI: Move PRM config option under the main ACPI config ACPI: Enable Platform Runtime Mechanism(PRM) support on ARM64 ACPI: PRM: Change handler_addr type to void pointer efi: Simplify arch_efi_call_virt() macro drivers: fix typo in firmware/efi/memmap.c efi: vars: Drop __efivar_entry_iter() helper which is no longer used efi: vars: Use locking version to iterate over efivars linked lists efi: pstore: Omit efivars caching EFI varstore access layer efi: vars: Add thin wrapper around EFI get/set variable interface efi: vars: Don't drop lock in the middle of efivar_init() pstore: Add priv field to pstore_record for backend specific use Input: applespi - avoid efivars API and invoke EFI services directly selftests/kexec: remove broken EFI_VARS secure boot fallback check brcmfmac: Switch to appropriate helper to load EFI variable contents iwlwifi: Switch to proper EFI variable store interface media: atomisp_gmin_platform: stop abusing efivar API efi: efibc: avoid efivar API for setting variables efi: avoid efivars layer when loading SSDTs from variables efi: Correct comment on efi_memmap_alloc memblock: Disable mirror feature if kernelcore is not specified ...
2022-08-02Merge tag 'random-6.0-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "Though there's been a decent amount of RNG-related development during this last cycle, not all of it is coming through this tree, as this cycle saw a shift toward tackling early boot time seeding issues, which took place in other trees as well. Here's a summary of the various patches: - The CONFIG_ARCH_RANDOM .config option and the "nordrand" boot option have been removed, as they overlapped with the more widely supported and more sensible options, CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu". This change allowed simplifying a bit of arch code. - x86's RDRAND boot time test has been made a bit more robust, with RDRAND disabled if it's clearly producing bogus results. This would be a tip.git commit, technically, but I took it through random.git to avoid a large merge conflict. - The RNG has long since mixed in a timestamp very early in boot, on the premise that a computer that does the same things, but does so starting at different points in wall time, could be made to still produce a different RNG state. Unfortunately, the clock isn't set early in boot on all systems, so now we mix in that timestamp when the time is actually set. - User Mode Linux now uses the host OS's getrandom() syscall to generate a bootloader RNG seed and later on treats getrandom() as the platform's RDRAND-like faculty. - The arch_get_random_{seed_,}_long() family of functions is now arch_get_random_{seed_,}_longs(), which enables certain platforms, such as s390, to exploit considerable performance advantages from requesting multiple CPU random numbers at once, while at the same time compiling down to the same code as before on platforms like x86. - A small cleanup changing a cmpxchg() into a try_cmpxchg(), from Uros. - A comment spelling fix" More info about other random number changes that come in through various architecture trees in the full commentary in the pull request: https://lore.kernel.org/all/20220731232428.2219258-1-Jason@zx2c4.com/ * tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: random: correct spelling of "overwrites" random: handle archrandom with multiple longs um: seed rng using host OS rng random: use try_cmpxchg in _credit_init_bits timekeeping: contribute wall clock to rng on time change x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu" random: remove CONFIG_ARCH_RANDOM
2022-08-02Merge tag 'integrity-v6.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "Aside from the one EVM cleanup patch, all the other changes are kexec related. On different architectures different keyrings are used to verify the kexec'ed kernel image signature. Here are a number of preparatory cleanup patches and the patches themselves for making the keyrings - builtin_trusted_keyring, .machine, .secondary_trusted_keyring, and .platform - consistent across the different architectures" * tag 'integrity-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: kexec, KEYS, s390: Make use of built-in and secondary keyring for signature verification arm64: kexec_file: use more system keyrings to verify kernel image signature kexec, KEYS: make the code in bzImage64_verify_sig generic kexec: clean up arch_kexec_kernel_verify_sig kexec: drop weak attribute from functions kexec_file: drop weak attribute from functions evm: Use IS_ENABLED to initialize .enabled
2022-08-02Merge branch 'turbostat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: "Only updating the turbostat tool here, no kernel changes" * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: version 2022.07.28 tools/power turbostat: do not decode ACC for ICX and SPR tools/power turbostat: fix SPR PC6 limits tools/power turbostat: cleanup 'automatic_cstate_conversion_probe()' tools/power turbostat: separate SPR from ICX tools/power turbosstat: fix comment tools/power turbostat: Support RAPTORLAKE P tools/power turbostat: add support for ALDERLAKE_N tools/power turbostat: dump secondary Turbo-Ratio-Limit tools/power turbostat: simplify dump_turbo_ratio_limits() tools/power turbostat: dump CPUID.7.EDX.Hybrid tools/power turbostat: update turbostat.8 tools/power turbostat: Show uncore frequency tools/power turbostat: Fix file pointer leak tools/power turbostat: replace strncmp with single character compare tools/power turbostat: print the kernel boot commandline tools/power turbostat: Introduce support for RaptorLake
2022-08-01Merge tag 'perf-core-2022-08-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: - Fix Intel Alder Lake PEBS memory access latency & data source profiling info bugs. - Use Intel large-PEBS hardware feature in more circumstances, to reduce PMI overhead & reduce sampling data. - Extend the lost-sample profiling output with the PERF_FORMAT_LOST ABI variant, which tells tooling the exact number of samples lost. - Add new IBS register bits definitions. - AMD uncore events: Add PerfMonV2 DF (Data Fabric) enhancements. * tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/ibs: Add new IBS register bits into header perf/x86/intel: Fix PEBS data source encoding for ADL perf/x86/intel: Fix PEBS memory access info encoding for ADL perf/core: Add a new read format to get a number of lost samples perf/x86/amd/uncore: Add PerfMonV2 RDPMC assignments perf/x86/amd/uncore: Add PerfMonV2 DF event format perf/x86/amd/uncore: Detect available DF counters perf/x86/amd/uncore: Use attr_update for format attributes perf/x86/amd/uncore: Use dynamic events array x86/events/intel/ds: Enable large PEBS for PERF_SAMPLE_WEIGHT_TYPE
2022-08-01Merge tag 'x86_core_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Have invalid MSR accesses warnings appear only once after a pr_warn_once() change broke that - Simplify {JMP,CALL}_NOSPEC and let the objtool retpoline patching infra take care of them instead of having unreadable alternative macros there * tag 'x86_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/extable: Fix ex_handler_msr() print condition x86,nospec: Simplify {JMP,CALL}_NOSPEC
2022-08-01Merge tag 'x86_cpu_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Remove the vendor check when selecting MWAIT as the default idle state - Respect idle=nomwait when supplied on the kernel cmdline - Two small cleanups * tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Use MSR_IA32_MISC_ENABLE constants x86: Fix comment for X86_FEATURE_ZEN x86: Remove vendor checks from prefer_mwait_c1_over_halt x86: Handle idle=nomwait cmdline properly for x86_idle
2022-08-01Merge tag 'x86_fpu_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu update from Borislav Petkov: - Add machinery to initialize AMX register state in order for AMX-capable CPUs to be able to enter deeper low-power state * tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_idle: Add a new flag to initialize the AMX state x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
2022-08-01Merge tag 'x86_mm_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Borislav Petkov: - Rename a PKRU macro to make more sense when reading the code - Update pkeys documentation - Avoid reading contended mm's TLB generation var if not absolutely necessary along with fixing a case where arch_tlbbatch_flush() doesn't adhere to the generation scheme and thus violates the conditions for the above avoidance. * tag 'x86_mm_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/tlb: Ignore f->new_tlb_gen when zero x86/pkeys: Clarify PKRU_AD_KEY macro Documentation/protection-keys: Clean up documentation for User Space pkeys x86/mm/tlb: Avoid reading mm_tlb_gen when possible
2022-08-01Merge remote-tracking branch 'kvm/next' into kvm-next-5.20Paolo Bonzini
KVM/s390, KVM/x86 and common infrastructure changes for 5.20 x86: * Permit guests to ignore single-bit ECC errors * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Cleanups for MCE MSR emulation s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes Generic: * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple x86: * Use try_cmpxchg64 instead of cmpxchg64 * Bugfixes * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs x86 cleanups: * Use separate namespaces for guest PTEs and shadow PTEs bitmasks * PIO emulation * Reorganize rmap API, mostly around rmap destruction * Do not workaround very old KVM bugs for L0 that runs with nesting enabled * new selftests API for CPUID
2022-07-28tools/power turbostat: dump secondary Turbo-Ratio-LimitLen Brown
Intel Performance Hybrid processors have a 2nd MSR describing the turbo limits enforced on the Ecores. Note, TRL and Secondary-TRL are usually R/O information, but on overclock-capable parts, they can be written. Signed-off-by: Len Brown <len.brown@intel.com>
2022-07-27Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"Borislav Petkov
This reverts commit 007faec014cb5d26983c1f86fd08c6539b41392e. Now that hyperv does its own protocol negotiation: 49d6a3c062a1 ("x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM") revert this exposure of the sev_es_ghcb_hv_call() helper. Cc: Wei Liu <wei.liu@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by:Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com
2022-07-27perf/x86/ibs: Add new IBS register bits into headerRavi Bangoria
IBS support has been enhanced with two new features in upcoming uarch: 1. DataSrc extension and 2. L3 miss filtering. Additional set of bits has been introduced in IBS registers to use these features. Define these new bits into arch/x86/ header. [ bp: Massage commit message. ] Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220604044519.594-7-ravi.bangoria@amd.com
2022-07-25random: handle archrandom with multiple longsJason A. Donenfeld
The archrandom interface was originally designed for x86, which supplies RDRAND/RDSEED for receiving random words into registers, resulting in one function to generate an int and another to generate a long. However, other architectures don't follow this. On arm64, the SMCCC TRNG interface can return between one and three longs. On s390, the CPACF TRNG interface can return arbitrary amounts, with four longs having the same cost as one. On UML, the os_getrandom() interface can return arbitrary amounts. So change the api signature to take a "max_longs" parameter designating the maximum number of longs requested, and then return the number of longs generated. Since callers need to check this return value and loop anyway, each arch implementation does not bother implementing its own loop to try again to fill the maximum number of longs. Additionally, all existing callers pass in a constant max_longs parameter. Taken together, these two things mean that the codegen doesn't really change much for one-word-at-a-time platforms, while performance is greatly improved on platforms such as s390. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-24Merge tag 'x86_urgent_for_v5.19_rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "A couple more retbleed fallout fixes. It looks like their urgency is decreasing so it seems like we've managed to catch whatever snafus the limited -rc testing has exposed. Maybe we're getting ready... :) - Make retbleed mitigations 64-bit only (32-bit will need a bit more work if even needed, at all). - Prevent return thunks patching of the LKDTM modules as it is not needed there - Avoid writing the SPEC_CTRL MSR on every kernel entry on eIBRS parts - Enhance error output of apply_returns() when it fails to patch a return thunk - A sparse fix to the sev-guest module - Protect EFI fw calls by issuing an IBPB on AMD" * tag 'x86_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Make all RETbleed mitigations 64-bit only lkdtm: Disable return thunks in rodata.c x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts x86/alternative: Report missing return thunk details virt: sev-guest: Pass the appropriate argument type to iounmap() x86/amd: Use IBPB for firmware calls
2022-07-22PCI: Move isa_dma_bridge_buggy out of asm/dma.hStafford Horne
The isa_dma_bridge_buggy symbol is only used for x86_32, and only x86_32 platforms or quirks ever set it. Add a new linux/isa-dma.h header that #defines isa_dma_bridge_buggy to 0 except on x86_32, where we keep it as a variable, and remove all the arch- specific definitions. [bhelgaas: commit log] Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/r/20220722214944.831438-3-shorne@gmail.com Signed-off-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2022-07-22PCI: Remove pci_get_legacy_ide_irq() and asm-generic/pci.hStafford Horne
pci_get_legacy_ide_irq() is only used on platforms that support PNP, so many architectures define it but never use it. Replace uses of it with ATA_PRIMARY_IRQ() and ATA_SECONDARY_IRQ(), which provide the same functionality. Since pci_get_legacy_ide_irq() is no longer used, remove all the architecture-specific definitions of it as well as asm-generic/pci.h, which only provides pci_get_legacy_ide_irq() [bhelgaas: commit log] Co-developed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20220722214944.831438-2-shorne@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Pierre Morel <pmorel@linux.ibm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-21mmu_gather: Remove per arch tlb_{start,end}_vma()Peter Zijlstra
Scattered across the archs are 3 basic forms of tlb_{start,end}_vma(). Provide two new MMU_GATHER_knobs to enumerate them and remove the per arch tlb_{start,end}_vma() implementations. - MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range() but does *NOT* want to call it for each VMA. - MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the invalidate across multiple VMAs if possible. With these it is possible to capture the three forms: 1) empty stubs; select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS 2) start: flush_cache_range(), end: empty; select MMU_GATHER_MERGE_VMAS 3) start: flush_cache_range(), end: flush_tlb_range(); default Obviously, if the architecture does not have flush_cache_range() then it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-21x86,nospec: Simplify {JMP,CALL}_NOSPECPeter Zijlstra
Have {JMP,CALL}_NOSPEC generate the same code GCC does for indirect calls and rely on the objtool retpoline patching infrastructure. There's no reason these should be alternatives while the vast bulk of compiler generated retpolines are not. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-07-19x86/fpu: Add a helper to prepare AMX state for low-power CPU idleChang S. Bae
When a CPU enters an idle state, a non-initialized AMX register state may be the cause of preventing a deeper low-power state. Other extended register states whether initialized or not do not impact the CPU idle state. The new helper can ensure the AMX state is initialized before the CPU is idle, and it will be used by the intel idle driver. Check the AMX_TILE feature bit before using XGETBV1 as a chain of dependencies was established via cpuid_deps[]: AMX->XFD->XGETBV1. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20220608164748.11864-2-chang.seok.bae@intel.com
2022-07-19x86/mm/tlb: Ignore f->new_tlb_gen when zeroNadav Amit
Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") introduced an optimization to skip superfluous TLB flushes based on the generation provided in flush_tlb_info. However, arch_tlbbatch_flush() does not provide any generation in flush_tlb_info and populates the flush_tlb_info generation with 0. This 0 is causes the flush_tlb_info to be interpreted as a superfluous, old flush. As a result, try_to_unmap_one() would not perform any TLB flushes. Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is anyhow is an invalid generation value. To avoid future confusion, introduce TLB_GENERATION_INVALID constant and use it properly. Add warnings to ensure no partial flushes are done with TLB_GENERATION_INVALID or when f->mm is NULL, since this does not make any sense. In addition, add the missing unlikely(). [ dhansen: change VM_BUG_ON() -> VM_WARN_ON(), clarify changelog ] Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Hugh Dickins <hughd@google.com> Link: https://lkml.kernel.org/r/20220710232837.3618-1-namit@vmware.com
2022-07-18x86/amd: Use IBPB for firmware callsPeter Zijlstra
On AMD IBRS does not prevent Retbleed; as such use IBPB before a firmware call to flush the branch history state. And because in order to do an EFI call, the kernel maps a whole lot of the kernel page table into the EFI page table, do an IBPB just in case in order to prevent the scenario of poisoning the BTB and causing an EFI call using the unprotected RET there. [ bp: Massage. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220715194550.793957-1-cascardo@canonical.com
2022-07-18random: remove CONFIG_ARCH_RANDOMJason A. Donenfeld
When RDRAND was introduced, there was much discussion on whether it should be trusted and how the kernel should handle that. Initially, two mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and "nordrand", a boot-time switch. Later the thinking evolved. With a properly designed RNG, using RDRAND values alone won't harm anything, even if the outputs are malicious. Rather, the issue is whether those values are being *trusted* to be good or not. And so a new set of options were introduced as the real ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu". With these options, RDRAND is used, but it's not always credited. So in the worst case, it does nothing, and in the best case, maybe it helps. Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the center and became something certain platforms force-select. The old options don't really help with much, and it's a bit odd to have special handling for these instructions when the kernel can deal fine with the existence or untrusted existence or broken existence or non-existence of that CPU capability. Simplify the situation by removing CONFIG_ARCH_RANDOM and using the ordinary asm-generic fallback pattern instead, keeping the two options that are actually used. For now it leaves "nordrand" for now, as the removal of that will take a different route. Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-15kexec: drop weak attribute from functionsNaveen N. Rao
Drop __weak attribute from functions in kexec_core.c: - machine_kexec_post_load() - arch_kexec_protect_crashkres() - arch_kexec_unprotect_crashkres() - crash_free_reserved_phys_range() Link: https://lkml.kernel.org/r/c0f6219e03cb399d166d518ab505095218a902dd.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15kexec_file: drop weak attribute from functionsNaveen N. Rao
As requested (http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org), this series converts weak functions in kexec to use the #ifdef approach. Quoting the 3e35142ef99fe ("kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]") changelog: : Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") : [1], binutils (v2.36+) started dropping section symbols that it thought : were unused. This isn't an issue in general, but with kexec_file.c, gcc : is placing kexec_arch_apply_relocations[_add] into a separate : .text.unlikely section and the section symbol ".text.unlikely" is being : dropped. Due to this, recordmcount is unable to find a non-weak symbol in : .text.unlikely to generate a relocation record against. This patch (of 2); Drop __weak attribute from functions in kexec_file.c: - arch_kexec_kernel_image_probe() - arch_kimage_file_post_load_cleanup() - arch_kexec_kernel_image_load() - arch_kexec_locate_mem_hole() - arch_kexec_kernel_verify_sig() arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so drop the static attribute for the latter. arch_kexec_kernel_verify_sig() is not overridden by any architecture, so drop the __weak attribute. Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-14x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_currentNathan Chancellor
Clang warns: arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection] DEFINE_PER_CPU(u64, x86_spec_ctrl_current); ^ arch/x86/include/asm/nospec-branch.h:283:12: note: previous declaration is here extern u64 x86_spec_ctrl_current; ^ 1 error generated. The declaration should be using DECLARE_PER_CPU instead so all attributes stay in sync. Cc: stable@vger.kernel.org Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-14KVM: x86: Restrict get_mt_mask() to a u8, use KVM_X86_OP_OPTIONAL_RET0Sean Christopherson
Restrict get_mt_mask() to a u8 and reintroduce using a RET0 static_call for the SVM implementation. EPT stores the memtype information in the lower 8 bits (bits 6:3 to be precise), and even returns a shifted u8 without an explicit cast to a larger type; there's no need to return a full u64. Note, RET0 doesn't play nice with a u64 return on 32-bit kernels, see commit bf07be36cd88 ("KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask"). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220714153707.3239119-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-13KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specificSean Christopherson
Add a "UD" clause to KVM_X86_QUIRK_MWAIT_NEVER_FAULTS to make it clear that the quirk only controls the #UD behavior of MONITOR/MWAIT. KVM doesn't currently enforce fault checks when MONITOR/MWAIT are supported, but that could change in the future. SVM also has a virtualization hole in that it checks all faults before intercepts, and so "never faults" is already a lie when running on SVM. Fixes: bfbcc81bb82c ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior") Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220711225753.1073989-4-seanjc@google.com
2022-07-12KVM: x86/mmu: Replace UNMAPPED_GVA with INVALID_GPA for gva_to_gpa()Hou Wenlong
The result of gva_to_gpa() is physical address not virtual address, it is odd that UNMAPPED_GVA macro is used as the result for physical address. Replace UNMAPPED_GVA with INVALID_GPA and drop UNMAPPED_GVA macro. No functional change intended. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/6104978956449467d3c68f1ad7f2c2f6d771d0ee.1656667239.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-11Merge tag 'x86_bugs_retbleed' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 retbleed fixes from Borislav Petkov: "Just when you thought that all the speculation bugs were addressed and solved and the nightmare is complete, here's the next one: speculating after RET instructions and leaking privileged information using the now pretty much classical covert channels. It is called RETBleed and the mitigation effort and controlling functionality has been modelled similar to what already existing mitigations provide" * tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) x86/speculation: Disable RRSBA behavior x86/kexec: Disable RET on kexec x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry x86/bugs: Add Cannon lake to RETBleed affected CPU list x86/retbleed: Add fine grained Kconfig knobs x86/cpu/amd: Enumerate BTC_NO x86/common: Stamp out the stepping madness KVM: VMX: Prevent RSB underflow before vmenter x86/speculation: Fill RSB on vmexit for IBRS KVM: VMX: Fix IBRS handling after vmexit KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS KVM: VMX: Convert launched argument to flags KVM: VMX: Flatten __vmx_vcpu_run() objtool: Re-add UNWIND_HINT_{SAVE_RESTORE} x86/speculation: Remove x86_spec_ctrl_mask x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit x86/speculation: Fix SPEC_CTRL write on SMT state change x86/speculation: Fix firmware entry SPEC_CTRL handling x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n ...
2022-07-09x86/speculation: Disable RRSBA behaviorPawan Gupta
Some Intel processors may use alternate predictors for RETs on RSB-underflow. This condition may be vulnerable to Branch History Injection (BHI) and intramode-BTI. Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines, eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against such attacks. However, on RSB-underflow, RET target prediction may fallback to alternate predictors. As a result, RET's predicted target may get influenced by branch history. A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback behavior when in kernel mode. When set, RETs will not take predictions from alternate predictors, hence mitigating RETs as well. Support for this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2). For spectre v2 mitigation, when a user selects a mitigation that protects indirect CALLs and JMPs against BHI and intramode-BTI, set RRSBA_DIS_S also to protect RETs for RSB-underflow case. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-01x86/xen: Use clear_bss() for Xen PV guestsJuergen Gross
Instead of clearing the bss area in assembly code, use the clear_bss() function. This requires to pass the start_info address as parameter to xen_start_kernel() in order to avoid the xen_start_info being zeroed again. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220630071441.28576-2-jgross@suse.com
2022-06-29x86/retbleed: Add fine grained Kconfig knobsPeter Zijlstra
Do fine-grained Kconfig for all the various retbleed parts. NOTE: if your compiler doesn't support return thunks this will silently 'upgrade' your mitigation to IBPB, you might not like this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-28efi: Simplify arch_efi_call_virt() macroSudeep Holla
Currently, the arch_efi_call_virt() assumes all users of it will have defined a type 'efi_##f##_t' to make use of it. Simplify the arch_efi_call_virt() macro by eliminating the explicit need for efi_##f##_t type for every user of this macro. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [ardb: apply Sudeep's ARM fix to i686, Loongarch and RISC-V too] Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-28arch/*/: remove CONFIG_VIRT_TO_BUSArnd Bergmann
All architecture-independent users of virt_to_bus() and bus_to_virt() have been fixed to use the dma mapping interfaces or have been removed now. This means the definitions on most architectures, and the CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. The only exceptions to this are a few network and scsi drivers for m68k Amiga and VME machines and ppc32 Macintosh. These drivers work correctly with the old interfaces and are probably not worth changing. On alpha and parisc, virt_to_bus() were still used in asm/floppy.h. alpha can use isa_virt_to_bus() like x86 does, and parisc can just open-code the virt_to_phys() here, as this is architecture specific code. I tried updating the bus-virt-phys-mapping.rst documentation, which started as an email from Linus to explain some details of the Linux-2.0 driver interfaces. The bits about virt_to_bus() were declared obsolete backin 2000, and the rest is not all that relevant any more, so in the end I just decided to remove the file completely. Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-06-27x86/cpu/amd: Enumerate BTC_NOAndrew Cooper
BTC_NO indicates that hardware is not susceptible to Branch Type Confusion. Zen3 CPUs don't suffer BTC. Hypervisors are expected to synthesise BTC_NO when it is appropriate given the migration pool, to prevent kernels using heuristics. [ bp: Massage. ] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Fill RSB on vmexit for IBRSJosh Poimboeuf
Prevent RSB underflow/poisoning attacks with RSB. While at it, add a bunch of comments to attempt to document the current state of tribal knowledge about RSB attacks and what exactly is being mitigated. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27KVM: VMX: Prevent guest RSB poisoning attacks with eIBRSJosh Poimboeuf
On eIBRS systems, the returns in the vmexit return path from __vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks. Fix that by moving the post-vmexit spec_ctrl handling to immediately after the vmexit. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}Josh Poimboeuf
Commit c536ed2fffd5 ("objtool: Remove SAVE/RESTORE hints") removed the save/restore unwind hints because they were no longer needed. Now they're going to be needed again so re-add them. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Fix firmware entry SPEC_CTRL handlingJosh Poimboeuf
The firmware entry code may accidentally clear STIBP or SSBD. Fix that. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=nJosh Poimboeuf
If a kernel is built with CONFIG_RETPOLINE=n, but the user still wants to mitigate Spectre v2 using IBRS or eIBRS, the RSB filling will be silently disabled. There's nothing retpoline-specific about RSB buffer filling. Remove the CONFIG_RETPOLINE guards around it. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/cpu/amd: Add Spectral ChickenPeter Zijlstra
Zen2 uarchs have an undocumented, unnamed, MSR that contains a chicken bit for some speculation behaviour. It needs setting. Note: very belatedly AMD released naming; it's now officially called MSR_AMD64_DE_CFG2 and MSR_AMD64_DE_CFG2_SUPPRESS_NOBR_PRED_BIT but shall remain the SPECTRAL CHICKEN. Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27objtool: Add entry UNRET validationPeter Zijlstra
Since entry asm is tricky, add a validation pass that ensures the retbleed mitigation has been done before the first actual RET instruction. Entry points are those that either have UNWIND_HINT_ENTRY, which acts as UNWIND_HINT_EMPTY but marks the instruction as an entry point, or those that have UWIND_HINT_IRET_REGS at +0. This is basically a variant of validate_branch() that is intra-function and it will simply follow all branches from marked entry points and ensures that all paths lead to ANNOTATE_UNRET_END. If a path hits RET or an indirection the path is a fail and will be reported. There are 3 ANNOTATE_UNRET_END instances: - UNTRAIN_RET itself - exception from-kernel; this path doesn't need UNTRAIN_RET - all early exceptions; these also don't need UNTRAIN_RET Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Add retbleed=ibpbPeter Zijlstra
jmp2ret mitigates the easy-to-attack case at relatively low overhead. It mitigates the long speculation windows after a mispredicted RET, but it does not mitigate the short speculation window from arbitrary instruction boundaries. On Zen2, there is a chicken bit which needs setting, which mitigates "arbitrary instruction boundaries" down to just "basic block boundaries". But there is no fix for the short speculation window on basic block boundaries, other than to flush the entire BTB to evict all attacker predictions. On the spectrum of "fast & blurry" -> "safe", there is (on top of STIBP or no-SMT): 1) Nothing System wide open 2) jmp2ret May stop a script kiddy 3) jmp2ret+chickenbit Raises the bar rather further 4) IBPB Only thing which can count as "safe". Tentative numbers put IBPB-on-entry at a 2.5x hit on Zen2, and a 10x hit on Zen1 according to lmbench. [ bp: Fixup feature bit comments, document option, 32-bit build fix. ] Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27objtool: Update Retpoline validationPeter Zijlstra
Update retpoline validation with the new CONFIG_RETPOLINE requirement of not having bare naked RET instructions. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27intel_idle: Disable IBRS during long idlePeter Zijlstra
Having IBRS enabled while the SMT sibling is idle unnecessarily slows down the running sibling. OTOH, disabling IBRS around idle takes two MSR writes, which will increase the idle latency. Therefore, only disable IBRS around deeper idle states. Shallow idle states are bounded by the tick in duration, since NOHZ is not allowed for them by virtue of their short target residency. Only do this for mwait-driven idle, since that keeps interrupts disabled across idle, which makes disabling IBRS vs IRQ-entry a non-issue. Note: C6 is a random threshold, most importantly C1 probably shouldn't disable IBRS, benchmarking needed. Suggested-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Report Intel retbleed vulnerabilityPeter Zijlstra
Skylake suffers from RSB underflow speculation issues; report this vulnerability and it's mitigation (spectre_v2=ibrs). [jpoimboe: cleanups, eibrs] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>