summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm/msr-index.h
AgeCommit message (Collapse)Author
2024-09-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull x86 kvm updates from Paolo Bonzini: "x86: - KVM currently invalidates the entirety of the page tables, not just those for the memslot being touched, when a memslot is moved or deleted. This does not traditionally have particularly noticeable overhead, but Intel's TDX will require the guest to re-accept private pages if they are dropped from the secure EPT, which is a non starter. Actually, the only reason why this is not already being done is a bug which was never fully investigated and caused VM instability with assigned GeForce GPUs, so allow userspace to opt into the new behavior. - Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10 functionality that is on the horizon) - Rework common MSR handling code to suppress errors on userspace accesses to unsupported-but-advertised MSRs This will allow removing (almost?) all of KVM's exemptions for userspace access to MSRs that shouldn't exist based on the vCPU model (the actual cleanup is non-trivial future work) - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the 64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv) stores the entire 64-bit value at the ICR offset - Fix a bug where KVM would fail to exit to userspace if one was triggered by a fastpath exit handler - Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when there's already a pending wake event at the time of the exit - Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the nested guest) - Overhaul the "unprotect and retry" logic to more precisely identify cases where retrying is actually helpful, and to harden all retry paths against putting the guest into an infinite retry loop - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in the shadow MMU - Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for adding multi generation LRU support in KVM - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled, i.e. when the CPU has already flushed the RSB - Trace the per-CPU host save area as a VMCB pointer to improve readability and cleanup the retrieval of the SEV-ES host save area - Remove unnecessary accounting of temporary nested VMCB related allocations - Set FINAL/PAGE in the page fault error code for EPT violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2 - Add a lockdep assertion to detect unsafe accesses of vmcs12 structures - Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible) - Minor SGX fix and a cleanup - Misc cleanups Generic: - Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed - Enable virtualization when KVM is loaded, not right before the first VM is created Together with the previous change, this simplifies a lot the logic of the callbacks, because their very existence implies virtualization is enabled - Fix a bug that results in KVM prematurely exiting to userspace for coalesced MMIO/PIO in many cases, clean up the related code, and add a testcase - Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_ the gpa+len crosses a page boundary, which thankfully is guaranteed to not happen in the current code base. Add WARNs in more helpers that read/write guest memory to detect similar bugs Selftests: - Fix a goof that caused some Hyper-V tests to be skipped when run on bare metal, i.e. NOT in a VM - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest - Explicitly include one-off assets in .gitignore. Past Sean was completely wrong about not being able to detect missing .gitignore entries - Verify userspace single-stepping works when KVM happens to handle a VM-Exit in its fastpath - Misc cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) Documentation: KVM: fix warning in "make htmldocs" s390: Enable KVM_S390_UCONTROL config in debug_defconfig selftests: kvm: s390: Add VM run test case KVM: SVM: let alternatives handle the cases when RSB filling is required KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range() KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps() KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range() KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure() KVM: x86: Update retry protection fields when forcing retry on emulation failure KVM: x86: Apply retry protection to "unprotect on failure" path KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn ...
2024-08-22KVM VMX: Move MSR_IA32_VMX_MISC bit defines to asm/vmx.hSean Christopherson
Move the handful of MSR_IA32_VMX_MISC bit defines that are currently in msr-indx.h to vmx.h so that all of the VMX_MISC defines and wrappers can be found in a single location. Opportunistically use BIT_ULL() instead of open coding hex values, add defines for feature bits that are architecturally defined, and move the defines down in the file so that they are colocated with the helpers for getting fields from VMX_MISC. No functional change intended. Cc: Shan Kang <shan.kang@intel.com> Cc: Kai Huang <kai.huang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> [sean: split to separate patch, write changelog] Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22KVM: VMX: Move MSR_IA32_VMX_BASIC bit defines to asm/vmx.hXin Li
Move the bit defines for MSR_IA32_VMX_BASIC from msr-index.h to vmx.h so that they are colocated with other VMX MSR bit defines, and with the helpers that extract specific information from an MSR_IA32_VMX_BASIC value. Opportunistically use BIT_ULL() instead of open coding hex values. Opportunistically rename VMX_BASIC_64 to VMX_BASIC_32BIT_PHYS_ADDR_ONLY, as "VMX_BASIC_64" is widly misleading. The flag enumerates that addresses are limited to 32 bits, not that 64-bit addresses are allowed. Last but not least, opportunistically #define DUAL_MONITOR_TREATMENT so that all known single-bit feature flags are defined (this will allow replacing open-coded literals in the future). Cc: Shan Kang <shan.kang@intel.com> Cc: Kai Huang <kai.huang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> [sean: split to separate patch, write changelog] Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22x86/cpu: KVM: Move macro to encode PAT value to common headerSean Christopherson
Move pat/memtype.c's PAT() macro to msr-index.h as PAT_VALUE(), and use it in KVM to define the default (Power-On / RESET) PAT value instead of open coding an inscrutable magic number. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240605231918.2915961-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22x86/cpu: KVM: Add common defines for architectural memory types (PAT, MTRRs, ↵Sean Christopherson
etc.) Add defines for the architectural memory types that can be shoved into various MSRs and registers, e.g. MTRRs, PAT, VMX capabilities MSRs, EPTPs, etc. While most MSRs/registers support only a subset of all memory types, the values themselves are architectural and identical across all users. Leave the goofy MTRR_TYPE_* definitions as-is since they are in a uapi header, but add compile-time assertions to connect the dots (and sanity check that the msr-index.h values didn't get fat-fingered). Keep the VMX_EPTP_MT_* defines so that it's slightly more obvious that the EPTP holds a single memory type in 3 of its 64 bits; those bits just happen to be 2:0, i.e. don't need to be shifted. Opportunistically use X86_MEMTYPE_WB instead of an open coded '6' in setup_vmcs_config(). No functional change intended. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-12platform/x86/intel/ifs: Add SBAF test image loading supportJithu Joseph
Structural Based Functional Test at Field (SBAF) is a new type of testing that provides comprehensive core test coverage complementing existing IFS tests like Scan at Field (SAF) or ArrayBist. SBAF device will appear as a new device instance (intel_ifs_2) under /sys/devices/virtual/misc. The user interaction necessary to load the test image and test a particular core is the same as the existing scan test (intel_ifs_0). During the loading stage, the driver will look for a file named ff-mm-ss-<batch02x>.sbft in the /lib/firmware/intel/ifs_2 directory. The hardware interaction needed for loading the image is similar to SAF, with the only difference being the MSR addresses used. Reuse the SAF image loading code, passing the SBAF-specific MSR addresses via struct ifs_test_msrs in the driver device data. Unlike SAF, the SBAF test image chunks are further divided into smaller logical entities called bundles. Since the SBAF test is initiated per bundle, cache the maximum number of bundles in the current image, which is used for iterating through bundles during SBAF test execution. Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20240801051814.1935149-3-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-07-16Merge tag 'perf-core-2024-07-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: - Intel PT support enhancements & fixes - Fix leaked SIGTRAP events - Improve and fix the Intel uncore driver - Add support for Intel HBM and CXL uncore counters - Add Intel Lake and Arrow Lake support - AMD uncore driver fixes - Make SIGTRAP and __perf_pending_irq() work on RT - Micro-optimizations - Misc cleanups and fixes * tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) perf/x86/intel: Add a distinct name for Granite Rapids perf/x86/intel/ds: Fix non 0 retire latency on Raptorlake perf/x86/intel: Hide Topdown metrics events if the feature is not enumerated perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPR perf: Split __perf_pending_irq() out of perf_pending_irq() perf: Don't disable preemption in perf_pending_task(). perf: Move swevent_htable::recursion into task_struct. perf: Shrink the size of the recursion counter. perf: Enqueue SIGTRAP always via task_work. task_work: Add TWA_NMI_CURRENT as an additional notify mode. perf: Move irq_work_queue() where the event is prepared. perf: Fix event leak upon exec and file release perf: Fix event leak upon exit task_work: Introduce task_work_cancel() again task_work: s/task_work_cancel()/task_work_cancel_func()/ perf/x86/amd/uncore: Fix DF and UMC domain identification perf/x86/amd/uncore: Avoid PMU registration if counters are unavailable perf/x86/intel: Support Perfmon MSRs aliasing perf/x86/intel: Support PERFEVTSEL extension perf/x86: Add config_mask to represent EVENTSEL bitmask ...
2024-07-16Merge tag 'pm-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add a new cpufreq driver for Loongson-3, add support for new features in the intel_pstate (Lunar Lake and Arrow Lake platforms, OOB mode for Emerald Rapids, highest performance change interrupt), amd-pstate (fast CPPC) and sun50i (Allwinner H700 speed bin) cpufreq drivers, simplify the cpufreq driver interface, simplify the teo cpuidle governor, adjust the pm-graph utility for a new version of Python, address issues and clean up code. Specifics: - Add Loongson-3 CPUFreq driver support (Huacai Chen) - Add support for the Arrow Lake and Lunar Lake platforms and the out-of-band (OOB) mode on Emerald Rapids to the intel_pstate cpufreq driver, make it support the highest performance change interrupt and clean it up (Srinivas Pandruvada) - Switch cpufreq to new Intel CPU model defines (Tony Luck) - Simplify the cpufreq driver interface by switching the .exit() driver callback to the void return data type (Lizhe, Viresh Kumar) - Make cpufreq_boost_enabled() return bool (Dhruva Gole) - Add fast CPPC support to the amd-pstate cpufreq driver, address multiple assorted issues in it and clean it up (Perry Yuan, Mario Limonciello, Dhananjay Ugwekar, Meng Li, Xiaojian Du) - Add Allwinner H700 speed bin to the sun50i cpufreq driver (Ryan Walklin) - Fix memory leaks and of_node_put() usage in the sun50i and qcom-nvmem cpufreq drivers (Javier Carrasco) - Clean up the sti and dt-platdev cpufreq drivers (Jeff Johnson, Raphael Gallais-Pou) - Fix deferred probe handling in the TI cpufreq driver and wrong return values of ti_opp_supply_probe(), and add OPP tables for the AM62Ax and AM62Px SoCs to it (Bryan Brattlof, Primoz Fiser) - Avoid overflow of target_freq in .fast_switch() in the SCMI cpufreq driver (Jagadeesh Kona) - Use dev_err_probe() in every error path in probe in the Mediatek cpufreq driver (Nícolas Prado) - Fix kernel-doc param for longhaul_setstate in the longhaul cpufreq driver (Yang Li) - Fix system resume handling in the CPPC cpufreq driver (Riwen Lu) - Improve the teo cpuidle governor and clean up leftover comments from the menu cpuidle governor (Christian Loehle) - Clean up a comment typo in the teo cpuidle governor (Atul Kumar Pant) - Add missing MODULE_DESCRIPTION() macro to cpuidle haltpoll (Jeff Johnson) - Switch the intel_idle driver to new Intel CPU model defines (Tony Luck) - Switch the Intel RAPL driver new Intel CPU model defines (Tony Luck) - Simplify if condition in the idle_inject driver (Thorsten Blum) - Fix missing cleanup on error in _opp_attach_genpd() (Viresh Kumar) - Introduce an OF helper function to inform if required-opps is used and drop a redundant in-parameter to _set_opp_level() (Ulf Hansson) - Update pm-graph to v5.12 which includes fixes and major code revamp for python3.12 (Todd Brandt) - Address several assorted issues in the cpupower utility (Roman Storozhenko)" * tag 'pm-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (77 commits) cpufreq: sti: fix build warning cpufreq: mediatek: Use dev_err_probe in every error path in probe cpufreq: Add Loongson-3 CPUFreq driver support cpufreq: Make cpufreq_driver->exit() return void cpufreq/amd-pstate: Fix the scaling_max_freq setting on shared memory CPPC systems cpufreq/amd-pstate-ut: Convert nominal_freq to khz during comparisons cpufreq: pcc: Remove empty exit() callback cpufreq: loongson2: Remove empty exit() callback cpufreq: nforce2: Remove empty exit() callback cpupower: fix lib default installation path cpufreq: docs: Add missing scaling_available_frequencies description cpuidle: teo: Don't count non-existent intercepts cpupower: Disable direct build of the 'bench' subproject cpuidle: teo: Remove recent intercepts metric Revert: "cpuidle: teo: Introduce util-awareness" cpufreq: make cpufreq_boost_enabled() return bool cpufreq: intel_pstate: Support highest performance change interrupt x86/cpufeatures: Add HWP highest perf change feature flag Documentation: cpufreq: amd-pstate: update doc for Per CPU boost control method cpufreq: amd-pstate: Cap the CPPC.max_perf to nominal_perf if CPB is off ...
2024-07-16Merge tag 'x86_sev_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Add support for running the kernel in a SEV-SNP guest, over a Secure VM Service Module (SVSM). When running over a SVSM, different services can run at different protection levels, apart from the guest OS but still within the secure SNP environment. They can provide services to the guest, like a vTPM, for example. This series adds the required facilities to interface with such a SVSM module. - The usual fixlets, refactoring and cleanups [ And as always: "SEV" is AMD's "Secure Encrypted Virtualization". I can't be the only one who gets all the newer x86 TLA's confused, can I? - Linus ] * tag 'x86_sev_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI/configfs-tsm: Fix an unexpected indentation silly x86/sev: Do RMP memory coverage check after max_pfn has been set x86/sev: Move SEV compilation units virt: sev-guest: Mark driver struct with __refdata to prevent section mismatch x86/sev: Allow non-VMPL0 execution when an SVSM is present x86/sev: Extend the config-fs attestation support for an SVSM x86/sev: Take advantage of configfs visibility support in TSM fs/configfs: Add a callback to determine attribute visibility sev-guest: configfs-tsm: Allow the privlevel_floor attribute to be updated virt: sev-guest: Choose the VMPCK key based on executing VMPL x86/sev: Provide guest VMPL level to userspace x86/sev: Provide SVSM discovery support x86/sev: Use the SVSM to create a vCPU when not in VMPL0 x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0 x86/sev: Use kernel provided SVSM Calling Areas x86/sev: Check for the presence of an SVSM in the SNP secrets page x86/irqflags: Provide native versions of the local_irq_save()/restore()
2024-07-04perf/x86/intel: Support Perfmon MSRs aliasingKan Liang
The architectural performance monitoring V6 supports a new range of counters' MSRs in the 19xxH address range. They include all the GP counter MSRs, the GP control MSRs, and the fixed counter MSRs. The step between each sibling counter is 4. Add intel_pmu_addr_offset() to calculate the correct offset. Add fixedctr in struct x86_pmu to store the address of the fixed counter 0. It can be used to calculate the rest of the fixed counters. The MSR address of the fixed counter control is not changed. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-9-kan.liang@linux.intel.com
2024-07-02x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systemsTony Luck
Hardware has two RMID configuration options for SNC systems. The default mode divides RMID counters between SNC nodes. E.g. with 200 RMIDs and two SNC nodes per L3 cache RMIDs 0..99 are used on node 0, and 100..199 on node 1. This isn't compatible with Linux resctrl usage. On this example system a process using RMID 5 would only update monitor counters while running on SNC node 0. The other mode is "RMID Sharing Mode". This is enabled by clearing bit 0 of the RMID_SNC_CONFIG (0xCA0) model specific register. In this mode the number of logical RMIDs is the number of physical RMIDs (from CPUID leaf 0xF) divided by the number of SNC nodes per L3 cache instance. A process can use the same RMID across different SNC nodes. See the "Intel Resource Director Technology Architecture Specification" for additional details. When SNC is enabled, update the MSR when a monitor domain is marked online. Technically this is overkill. It only needs to be done once per L3 cache instance rather than per SNC domain. But there is no harm in doing it more than once, and this is not in a critical path. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20240702173820.90368-3-tony.luck@intel.com
2024-06-26cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.hPerry Yuan
There are some other drivers also need to use the MSR_K7_HWCR_CPB_DIS_BIT for CPB control bit, so it makes sense to move the definition to a common header file to allow other driver to use it. No intentional functional impact. Suggested-by: Gautham Ranjal Shenoy <gautham.shenoy@amd.com> Signed-off-by: Perry Yuan <perry.yuan@amd.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Huang Rui <ray.huang@amd.com> Link: https://lore.kernel.org/r/78b6c75e6cffddce3e950dd543af6ae9f8eeccc3.1718988436.git.perry.yuan@amd.com Link: https://lore.kernel.org/r/20240626042733.3747-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-06-17x86/sev: Provide SVSM discovery supportTom Lendacky
The SVSM specification documents an alternative method of discovery for the SVSM using a reserved CPUID bit and a reserved MSR. This is intended for guest components that do not have access to the secrets page in order to be able to call the SVSM (e.g. UEFI runtime services). For the MSR support, a new reserved MSR 0xc001f000 has been defined. A #VC should be generated when accessing this MSR. The #VC handler is expected to ignore writes to this MSR and return the physical calling area address (CAA) on reads of this MSR. While the CPUID leaf is updated, allowing the creation of a CPU feature, the code will continue to use the VMPL level as an indication of the presence of an SVSM. This is because the SVSM can be called well before the CPU feature is in place and a non-zero VMPL requires that an SVSM be present. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/4f93f10a2ff3e9f368fd64a5920d51bf38d0c19e.1717600736.git.thomas.lendacky@amd.com
2024-05-13Merge tag 'x86-cleanups-2024-05-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: - Fix function prototypes to address clang function type cast warnings in the math-emu code - Reorder definitions in <asm/msr-index.h> - Remove unused code - Fix typos - Simplify #include sections * tag 'x86-cleanups-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pci/ce4100: Remove unused 'struct sim_reg_op' x86/msr: Move ARCH_CAP_XAPIC_DISABLE bit definition to its rightful place x86/math-emu: Fix function cast warnings x86/extable: Remove unused fixup type EX_TYPE_COPY x86/rtc: Remove unused intel-mid.h x86/32: Remove unused IA32_STACK_TOP and two externs x86/head: Simplify relative include path to xen-head.S x86/fred: Fix typo in Kconfig description x86/syscall/compat: Remove ia32_unistd.h x86/syscall/compat: Remove unused macro __SYSCALL_ia32_NR x86/virt/tdx: Remove duplicate include x86/xen: Remove duplicate #include
2024-04-09x86/msr: Move ARCH_CAP_XAPIC_DISABLE bit definition to its rightful placePawan Gupta
The ARCH_CAP_XAPIC_DISABLE bit of MSR_IA32_ARCH_CAP is not in the correct sorted order. Move it where it belongs. No functional change. [ bp: Massage commit message. ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/243317ff6c8db307b7701a45f71e5c21da80194b.1705632532.git.pawan.kumar.gupta@linux.intel.com
2024-04-08x86/bhi: Enumerate Branch History Injection (BHI) bugPawan Gupta
Mitigation for BHI is selected based on the bug enumeration. Add bits needed to enumerate BHI bug. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bhi: Define SPEC_CTRL_BHI_DIS_SDaniel Sneddon
Newer processors supports a hardware control BHI_DIS_S to mitigate Branch History Injection (BHI). Setting BHI_DIS_S protects the kernel from userspace BHI attacks without having to manually overwrite the branch history. Define MSR_SPEC_CTRL bit BHI_DIS_S and its enumeration CPUID.BHI_CTRL. Mitigation is enabled later. Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-03-12Merge tag 'rfds-for-linus-2024-03-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RFDS mitigation from Dave Hansen: "RFDS is a CPU vulnerability that may allow a malicious userspace to infer stale register values from kernel space. Kernel registers can have all kinds of secrets in them so the mitigation is basically to wait until the kernel is about to return to userspace and has user values in the registers. At that point there is little chance of kernel secrets ending up in the registers and the microarchitectural state can be cleared. This leverages some recent robustness fixes for the existing MDS vulnerability. Both MDS and RFDS use the VERW instruction for mitigation" * tag 'rfds-for-linus-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests x86/rfds: Mitigate Register File Data Sampling (RFDS) Documentation/hw-vuln: Add documentation for RFDS x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
2024-03-11Merge tag 'x86_sev_for_v6.9_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Add the x86 part of the SEV-SNP host support. This will allow the kernel to be used as a KVM hypervisor capable of running SNP (Secure Nested Paging) guests. Roughly speaking, SEV-SNP is the ultimate goal of the AMD confidential computing side, providing the most comprehensive confidential computing environment up to date. This is the x86 part and there is a KVM part which did not get ready in time for the merge window so latter will be forthcoming in the next cycle. - Rework the early code's position-dependent SEV variable references in order to allow building the kernel with clang and -fPIE/-fPIC and -mcmodel=kernel - The usual set of fixes, cleanups and improvements all over the place * tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/sev: Disable KMSAN for memory encryption TUs x86/sev: Dump SEV_STATUS crypto: ccp - Have it depend on AMD_IOMMU iommu/amd: Fix failure return from snp_lookup_rmpentry() x86/sev: Fix position dependent variable references in startup code crypto: ccp: Make snp_range_list static x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT Documentation: virt: Fix up pre-formatted text block for SEV ioctls crypto: ccp: Add the SNP_SET_CONFIG command crypto: ccp: Add the SNP_COMMIT command crypto: ccp: Add the SNP_PLATFORM_STATUS command x86/cpufeatures: Enable/unmask SEV-SNP CPU feature KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown crypto: ccp: Handle legacy SEV commands when SNP is enabled crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled crypto: ccp: Handle the legacy TMR allocation when SNP is enabled x86/sev: Introduce an SNP leaked pages list crypto: ccp: Provide an API to issue SEV and SNP commands ...
2024-03-11x86/rfds: Mitigate Register File Data Sampling (RFDS)Pawan Gupta
RFDS is a CPU vulnerability that may allow userspace to infer kernel stale data previously used in floating point registers, vector registers and integer registers. RFDS only affects certain Intel Atom processors. Intel released a microcode update that uses VERW instruction to clear the affected CPU buffers. Unlike MDS, none of the affected cores support SMT. Add RFDS bug infrastructure and enable the VERW based mitigation by default, that clears the affected buffers just before exiting to userspace. Also add sysfs reporting and cmdline parameter "reg_file_data_sampling" to control the mitigation. For details see: Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-02-28x86/sev: Dump SEV_STATUSBorislav Petkov (AMD)
It is, and will be even more useful in the future, to dump the SEV features enabled according to SEV_STATUS. Do so: [ 0.542753] Memory Encryption Features active: AMD SEV SEV-ES SEV-SNP [ 0.544425] SEV: Status: SEV SEV-ES SEV-SNP DebugSwap Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20240219094216.GAZdMieDHKiI8aaP3n@fat_crate.local
2024-01-31x86/cpu: Add MSR numbers for FRED configurationH. Peter Anvin (Intel)
Add MSR numbers for the FRED configuration registers per FRED spec 5.0. Originally-by: Megha Dey <megha.dey@intel.com> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Shan Kang <shan.kang@intel.com> Link: https://lore.kernel.org/r/20231205105030.8698-13-xin3.li@intel.com
2024-01-29x86/sev: Add SEV-SNP host initialization supportBrijesh Singh
The memory integrity guarantees of SEV-SNP are enforced through a new structure called the Reverse Map Table (RMP). The RMP is a single data structure shared across the system that contains one entry for every 4K page of DRAM that may be used by SEV-SNP VMs. The APM Volume 2 section on Secure Nested Paging (SEV-SNP) details a number of steps needed to detect/enable SEV-SNP and RMP table support on the host: - Detect SEV-SNP support based on CPUID bit - Initialize the RMP table memory reported by the RMP base/end MSR registers and configure IOMMU to be compatible with RMP access restrictions - Set the MtrrFixDramModEn bit in SYSCFG MSR - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR - Configure IOMMU RMP table entry format is non-architectural and it can vary by processor. It is defined by the PPR document for each respective CPU family. Restrict SNP support to CPU models/families which are compatible with the current RMP table entry format to guard against any undefined behavior when running on other system types. Future models/support will handle this through an architectural mechanism to allow for broader compatibility. SNP host code depends on CONFIG_KVM_AMD_SEV config flag which may be enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV instead of CONFIG_AMD_MEM_ENCRYPT. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Co-developed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Link: https://lore.kernel.org/r/20240126041126.1927228-5-michael.roth@amd.com
2024-01-18Merge tag 'x86_tdx_for_6.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TDX updates from Dave Hansen: "This contains the initial support for host-side TDX support so that KVM can run TDX-protected guests. This does not include the actual KVM-side support which will come from the KVM folks. The TDX host interactions with kexec also needs to be ironed out before this is ready for prime time, so this code is currently Kconfig'd off when kexec is on. The majority of the code here is the kernel telling the TDX module which memory to protect and handing some additional memory over to it to use to store TDX module metadata. That sounds pretty simple, but the TDX architecture is rather flexible and it takes quite a bit of back-and-forth to say, "just protect all memory, please." There is also some code tacked on near the end of the series to handle a hardware erratum. The erratum can make software bugs such as a kernel write to TDX-protected memory cause a machine check and masquerade as a real hardware failure. The erratum handling watches out for these and tries to provide nicer user errors" * tag 'x86_tdx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/virt/tdx: Make TDX host depend on X86_MCE x86/virt/tdx: Disable TDX host support when kexec is enabled Documentation/x86: Add documentation for TDX host support x86/mce: Differentiate real hardware #MCs from TDX erratum ones x86/cpu: Detect TDX partial write machine check erratum x86/virt/tdx: Handle TDX interaction with sleep and hibernation x86/virt/tdx: Initialize all TDMRs x86/virt/tdx: Configure global KeyID on all packages x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID x86/virt/tdx: Designate reserved areas for all TDMRs x86/virt/tdx: Allocate and set up PAMTs for TDMRs x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions x86/virt/tdx: Get module global metadata for module initialization x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory x86/virt/tdx: Add skeleton to enable TDX on demand x86/virt/tdx: Add SEAMCALL error printing for module initialization x86/virt/tdx: Handle SEAMCALL no entropy error in common code x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC x86/virt/tdx: Define TDX supported page sizes as macros ...
2023-12-08x86/virt/tdx: Detect TDX during kernel bootKai Huang
Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. A CPU-attested software module called 'the TDX module' runs inside a new isolated memory range as a trusted hypervisor to manage and run protected VMs. Pre-TDX Intel hardware has support for a memory encryption architecture called MKTME. The memory encryption hardware underpinning MKTME is also used for Intel TDX. TDX ends up "stealing" some of the physical address space from the MKTME architecture for crypto-protection to VMs. The BIOS is responsible for partitioning the "KeyID" space between legacy MKTME and TDX. The KeyIDs reserved for TDX are called 'TDX private KeyIDs' or 'TDX KeyIDs' for short. During machine boot, TDX microcode verifies that the BIOS programmed TDX private KeyIDs consistently and correctly programmed across all CPU packages. The MSRs are locked in this state after verification. This is why MSR_IA32_MKTME_KEYID_PARTITIONING gets used for TDX enumeration: it indicates not just that the hardware supports TDX, but that all the boot-time security checks passed. The TDX module is expected to be loaded by the BIOS when it enables TDX, but the kernel needs to properly initialize it before it can be used to create and run any TDX guests. The TDX module will be initialized by the KVM subsystem when KVM wants to use TDX. Detect platform TDX support by detecting TDX private KeyIDs. The TDX module itself requires one TDX KeyID as the 'TDX global KeyID' to protect its metadata. Each TDX guest also needs a TDX KeyID for its own protection. Just use the first TDX KeyID as the global KeyID and leave the rest for TDX guests. If no TDX KeyID is left for TDX guests, disable TDX as initializing the TDX module alone is useless. [ dhansen: add X86_FEATURE, replace helper function ] Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/all/20231208170740.53979-1-dave.hansen%40intel.com
2023-11-15Merge branch 'tip/perf/urgent'Peter Zijlstra
Avoid conflicts, base on fixes. Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2023-11-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes LoongArch: - New architecture for kvm. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: - Support for the Smstateen and Zicond extensions - Support for virtualizing senvcfg - Support for virtualized SBI debug console (DBCN) S390: - Nested page table management can be monitored through tracepoints and statistics x86: - Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ - Avoid WARN on systems with Intel IPI virtualization - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. - Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. - Optimize injection of PMU interrupts that are simultaneous with NMIs. - Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: - Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. - Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. - Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: - Various updates and fixes, mostly for x86 - MTRR and PAT fixes and optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits) KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: x86: Service NMI requests after PMI requests in VM-Enter path KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} ...
2023-10-31Merge tag 'platform-drivers-x86-v6.7-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Ilpo Järvinen: - asus-wmi: Support for screenpad and solve brightness key press duplication - int3472: Eliminate the last use of deprecated GPIO functions - mlxbf-pmc: New HW support - msi-ec: Support new EC configurations - thinkpad_acpi: Support reading aux MAC address during passthrough - wmi: Fixes & improvements - x86-android-tablets: Detection fix and avoid use of GPIO private APIs - Debug & metrics interface improvements - Miscellaneous cleanups / fixes / improvements * tag 'platform-drivers-x86-v6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (80 commits) platform/x86: inspur-platform-profile: Add platform profile support platform/x86: thinkpad_acpi: Add battery quirk for Thinkpad X120e platform/x86: wmi: Decouple WMI device removal from wmi_block_list platform/x86: wmi: Fix opening of char device platform/x86: wmi: Fix probe failure when failing to register WMI devices platform/x86: wmi: Fix refcounting of WMI devices in legacy functions platform/x86: wmi: Decouple probe deferring from wmi_block_list platform/x86/amd/hsmp: Fix iomem handling platform/x86: asus-wmi: Do not report brightness up/down keys when also reported by acpi_video platform/x86: thinkpad_acpi: replace deprecated strncpy with memcpy tools/power/x86/intel-speed-select: v1.18 release tools/power/x86/intel-speed-select: Use cgroup isolate for CPU 0 tools/power/x86/intel-speed-select: Increase max CPUs in one request tools/power/x86/intel-speed-select: Display error for core-power support tools/power/x86/intel-speed-select: No TRL for non compute domains tools/power/x86/intel-speed-select: turbo-mode enable disable swapped tools/power/x86/intel-speed-select: Update help for TRL tools/power/x86/intel-speed-select: Sanitize integer arguments platform/x86: acer-wmi: Remove void function return platform/x86/amd/pmc: Add dump_custom_stb module parameter ...
2023-10-31Merge tag 'kvm-x86-misc-6.7' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.7: - Add CONFIG_KVM_MAX_NR_VCPUS to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add IBPB and SBPB virtualization support. - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Use octal notation for file permissions through KVM x86. - Fix a handful of typo fixes and warts.
2023-10-30Merge tag 'perf-core-2023-10-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance event updates from Ingo Molnar: - Add AMD Unified Memory Controller (UMC) events introduced with Zen 4 - Simplify & clean up the uncore management code - Fall back from RDPMC to RDMSR on certain uncore PMUs - Improve per-package and cstate event reading - Extend the Intel ref-cycles event to GP counters - Fix Intel MTL event constraints - Improve the Intel hybrid CPU handling code - Micro-optimize the RAPL code - Optimize perf_cgroup_switch() - Improve large AUX area error handling - Misc fixes and cleanups * tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) perf/x86/amd/uncore: Pass through error code for initialization failures, instead of -ENODEV perf/x86/amd/uncore: Fix uninitialized return value in amd_uncore_init() x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations perf: Optimize perf_cgroup_switch() perf/x86/amd/uncore: Add memory controller support perf/x86/amd/uncore: Add group exclusivity perf/x86/amd/uncore: Use rdmsr if rdpmc is unavailable perf/x86/amd/uncore: Move discovery and registration perf/x86/amd/uncore: Refactor uncore management perf/core: Allow reading package events from perf_event_read_local perf/x86/cstate: Allow reading the package statistics from local CPU perf/x86/intel/pt: Fix kernel-doc comments perf/x86/rapl: Annotate 'struct rapl_pmus' with __counted_by perf/core: Rename perf_proc_update_handler() -> perf_event_max_sample_rate_handler(), for readability perf/x86/rapl: Fix "Using plain integer as NULL pointer" Sparse warning perf/x86/rapl: Use local64_try_cmpxchg in rapl_event_update() perf/x86/rapl: Stop doing cpu_relax() in the local64_cmpxchg() loop in rapl_event_update() perf/core: Bail out early if the request AUX area is out of bound perf/x86/intel: Extend the ref-cycles event to GP counters perf/x86/intel: Fix broken fixed event constraints extension ...
2023-10-30Merge tag 'x86_cpu_for_6.7_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: - Make sure the "svm" feature flag is cleared from /proc/cpuinfo when virtualization support is disabled in the BIOS on AMD and Hygon platforms - A minor cleanup * tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Remove redundant 'break' statement x86/cpu: Clear SVM feature if disabled by BIOS
2023-10-27perf/x86/intel: Support branch counters loggingKan Liang
The branch counters logging (A.K.A LBR event logging) introduces a per-counter indication of precise event occurrences in LBRs. It can provide a means to attribute exposed retirement latency to combinations of events across a block of instructions. It also provides a means of attributing Timed LBR latencies to events. The feature is first introduced on SRF/GRR. It is an enhancement of the ARCH LBR. It adds new fields in the LBR_INFO MSRs to log the occurrences of events on the GP counters. The information is displayed by the order of counters. The design proposed in this patch requires that the events which are logged must be in a group with the event that has LBR. If there are more than one LBR group, the counters logging information only from the current group (overflowed) are stored for the perf tool, otherwise the perf tool cannot know which and when other groups are scheduled especially when multiplexing is triggered. The user can ensure it uses the maximum number of counters that support LBR info (4 by now) by making the group large enough. The HW only logs events by the order of counters. The order may be different from the order of enabling which the perf tool can understand. When parsing the information of each branch entry, convert the counter order to the enabled order, and store the enabled order in the extension space. Unconditionally reset LBRs for an LBR event group when it's deleted. The logged counter information is only valid for the current LBR group. If another LBR group is scheduled later, the information from the stale LBRs would be otherwise wrongly interpreted. Add a sanity check in intel_pmu_hw_config(). Disable the feature if other counter filters (inv, cmask, edge, in_tx) are set or LBR call stack mode is enabled. (For the LBR call stack mode, we cannot simply flush the LBR, since it will break the call stack. Also, there is no obvious usage with the call stack mode for now.) Only applying the PERF_SAMPLE_BRANCH_COUNTERS doesn't require any branch stack setup. Expose the maximum number of supported counters and the width of the counters into the sysfs. The perf tool can use the information to parse the logged counters in each branch. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20231025201626.3000228-5-kan.liang@linux.intel.com
2023-10-19KVM: x86: Ignore MSR_AMD64_TW_CFG accessMaciej S. Szmigiero
Hyper-V enabled Windows Server 2022 KVM VM cannot be started on Zen1 Ryzen since it crashes at boot with SYSTEM_THREAD_EXCEPTION_NOT_HANDLED + STATUS_PRIVILEGED_INSTRUCTION (in other words, because of an unexpected #GP in the guest kernel). This is because Windows tries to set bit 8 in MSR_AMD64_TW_CFG and can't handle receiving a #GP when doing so. Give this MSR the same treatment that commit 2e32b7190641 ("x86, kvm: Add MSR_AMD64_BU_CFG2 to the list of ignored MSRs") gave MSR_AMD64_BU_CFG2 under justification that this MSR is baremetal-relevant only. Although apparently it was then needed for Linux guests, not Windows as in this case. With this change, the aforementioned guest setup is able to finish booting successfully. This issue can be reproduced either on a Summit Ridge Ryzen (with just "-cpu host") or on a Naples EPYC (with "-cpu host,stepping=1" since EPYC is ordinarily stepping 2). Alternatively, userspace could solve the problem by using MSR filters, but forcing every userspace to define a filter isn't very friendly and doesn't add much, if any, value. The only potential hiccup is if one of these "baremetal-only" MSRs ever requires actual emulation and/or has F/M/S specific behavior. But if that happens, then KVM can still punt *that* handling to userspace since userspace MSR filters "win" over KVM's default handling. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1ce85d9c7c9e9632393816cf19c902e0a3f411f1.1697731406.git.maciej.szmigiero@oracle.com [sean: call out MSR filtering alternative] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-12x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerationsBorislav Petkov
The comments introduced in <asm/msr-index.h> in the merge conflict fixup in: 8f4156d58713 ("Merge branch 'x86/urgent' into perf/core, to resolve conflict") ... aren't right: AMD naming schemes are more complex than implied, family 0x17 is Zen1 and 2, family 0x19 is spread around Zen 3 and 4. So there's indeed four separate MSR namespaces for: MSR_F17H_ MSR_F19H_ MSR_ZEN2_ MSR_ZEN4_ ... and the namespaces cannot be merged. Fix it up. No change in functionality. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/D99589F4-BC5D-430B-87B2-72C20370CF57@exactcode.com
2023-10-11Merge branch 'x86/urgent' into perf/core, to resolve conflictIngo Molnar
Resolve an MSR enumeration conflict. Conflicts: arch/x86/include/asm/msr-index.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-10-11x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUsBorislav Petkov (AMD)
Fix erratum #1485 on Zen4 parts where running with STIBP disabled can cause an #UD exception. The performance impact of the fix is negligible. Reported-by: René Rebe <rene@exactcode.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: René Rebe <rene@exactcode.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/D99589F4-BC5D-430B-87B2-72C20370CF57@exactcode.com
2023-10-09perf/x86/amd/uncore: Add memory controller supportSandipan Das
Unified Memory Controller (UMC) events were introduced with Zen 4 as a part of the Performance Monitoring Version 2 (PerfMonV2) enhancements. An event is specified using the EventSelect bits and the RdWrMask bits can be used for additional filtering of read and write requests. As of now, a maximum of 12 channels of DDR5 are available on each socket and each channel is controlled by a dedicated UMC. Each UMC, in turn, has its own set of performance monitoring counters. Since the MSR address space for the UMC PERF_CTL and PERF_CTR registers are reused across sockets, uncore groups are created on the basis of socket IDs. Hence, group exclusivity is mandatory while opening events so that events for an UMC can only be opened on CPUs which are on the same socket as the corresponding memory channel. For each socket, the total number of available UMC counters and active memory channels are determined from CPUID leaf 0x80000022 EBX and ECX respectively. Usually, on Zen 4, each UMC has four counters. MSR assignments are determined on the basis of active UMCs. E.g. if UMCs 1, 4 and 9 are active for a given socket, then * UMC 1 gets MSRs 0xc0010800 to 0xc0010807 as PERF_CTLs and PERF_CTRs * UMC 4 gets MSRs 0xc0010808 to 0xc001080f as PERF_CTLs and PERF_CTRs * UMC 9 gets MSRs 0xc0010810 to 0xc0010817 as PERF_CTLs and PERF_CTRs If there are sockets without any online CPUs when the amd_uncore driver is loaded, UMCs for such sockets will not be discoverable since the mechanism relies on executing the CPUID instruction on an online CPU from the socket. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/b25f391205c22733493abec1ed850b71784edc5f.1696425185.git.sandipan.das@amd.com
2023-10-06platform/x86/intel/ifs: Store IFS generation numberJithu Joseph
IFS generation number is reported via MSR_INTEGRITY_CAPS. As IFS support gets added to newer CPUs, some differences are expected during IFS image loading and test flows. Define MSR bitmasks to extract and store the generation in driver data, so that driver can modify its MSR interaction appropriately. Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Link: https://lore.kernel.org/r/20231005195137.3117166-2-jithu.joseph@intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-09-22x86/cpu: Clear SVM feature if disabled by BIOSPaolo Bonzini
When SVM is disabled by BIOS, one cannot use KVM but the SVM feature is still shown in the output of /proc/cpuinfo. On Intel machines, VMX is cleared by init_ia32_feat_ctl(), so do the same on AMD and Hygon processors. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230921114940.957141-1-pbonzini@redhat.com
2023-08-07Merge tag 'gds-for-linus-2023-08-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/gds fixes from Dave Hansen: "Mitigate Gather Data Sampling issue: - Add Base GDS mitigation - Support GDS_NO under KVM - Fix a documentation typo" * tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/x86: Fix backwards on/off logic about YMM support KVM: Add GDS_NO support to KVM x86/speculation: Add Kconfig option for GDS x86/speculation: Add force option to GDS mitigation x86/speculation: Add Gather Data Sampling mitigation
2023-08-07Merge tag 'x86_bugs_srso' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/srso fixes from Borislav Petkov: "Add a mitigation for the speculative RAS (Return Address Stack) overflow vulnerability on AMD processors. In short, this is yet another issue where userspace poisons a microarchitectural structure which can then be used to leak privileged information through a side channel" * tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/srso: Tie SBPB bit setting to microcode patch detection x86/srso: Add a forgotten NOENDBR annotation x86/srso: Fix return thunks in generated code x86/srso: Add IBPB on VMEXIT x86/srso: Add IBPB x86/srso: Add SRSO_NO support x86/srso: Add IBPB_BRTYPE support x86/srso: Add a Speculative RAS Overflow mitigation x86/bugs: Increase the x86 bugs vector size to two u32s
2023-07-27x86/srso: Add SRSO_NO supportBorislav Petkov (AMD)
Add support for the CPUID flag which denotes that the CPU is not affected by SRSO. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-19x86/speculation: Add Gather Data Sampling mitigationDaniel Sneddon
Gather Data Sampling (GDS) is a hardware vulnerability which allows unprivileged speculative access to data which was previously stored in vector registers. Intel processors that support AVX2 and AVX512 have gather instructions that fetch non-contiguous data elements from memory. On vulnerable hardware, when a gather instruction is transiently executed and encounters a fault, stale data from architectural or internal vector registers may get transiently stored to the destination vector register allowing an attacker to infer the stale data using typical side channel techniques like cache timing attacks. This mitigation is different from many earlier ones for two reasons. First, it is enabled by default and a bit must be set to *DISABLE* it. This is the opposite of normal mitigation polarity. This means GDS can be mitigated simply by updating microcode and leaving the new control bit alone. Second, GDS has a "lock" bit. This lock bit is there because the mitigation affects the hardware security features KeyLocker and SGX. It needs to be enabled and *STAY* enabled for these features to be mitigated against GDS. The mitigation is enabled in the microcode by default. Disable it by setting gather_data_sampling=off or by disabling all mitigations with mitigations=off. The mitigation status can be checked by reading: /sys/devices/system/cpu/vulnerabilities/gather_data_sampling Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-07-17x86/cpu/amd: Add a Zenbleed fixBorislav Petkov (AMD)
Add a fix for the Zen2 VZEROUPPER data corruption bug where under certain circumstances executing VZEROUPPER can cause register corruption or leak data. The optimal fix is through microcode but in the case the proper microcode revision has not been applied, enable a fallback fix using a chicken bit. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-03-27x86/include/asm/msr-index.h: Add IFS Array test bitsJithu Joseph
Define MSR bitfields for enumerating support for Array BIST test. Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230322003359.213046-5-jithu.joseph@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-02-21Merge tag 'x86_cpu_for_v6.3_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: - Cache the AMD debug registers in per-CPU variables to avoid MSR writes where possible, when supporting a debug registers swap feature for SEV-ES guests - Add support for AMD's version of eIBRS called Automatic IBRS which is a set-and-forget control of indirect branch restriction speculation resources on privilege change - Add support for a new x86 instruction - LKGS - Load kernel GS which is part of the FRED infrastructure - Reset SPEC_CTRL upon init to accomodate use cases like kexec which rediscover - Other smaller fixes and cleanups * tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/amd: Cache debug register values in percpu variables KVM: x86: Propagate the AMD Automatic IBRS feature to the guest x86/cpu: Support AMD Automatic IBRS x86/cpu, kvm: Add the SMM_CTL MSR not present feature x86/cpu, kvm: Add the Null Selector Clears Base feature x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation code x86/cpu, kvm: Add support for CPUID_80000021_EAX x86/gsseg: Add the new <asm/gsseg.h> header to <asm/asm-prototypes.h> x86/gsseg: Use the LKGS instruction if available for load_gs_index() x86/gsseg: Move load_gs_index() to its own new header file x86/gsseg: Make asm_load_gs_index() take an u16 x86/opcode: Add the LKGS instruction to x86-opcode-map x86/cpufeature: Add the CPU feature bit for LKGS x86/bugs: Reset speculation control settings on init x86/cpu: Remove redundant extern x86_read_arch_cap_msr()
2023-02-21Merge tag 'x86_cache_for_v6.3_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Add support for a new AMD feature called slow memory bandwidth allocation. Its goal is to control resource allocation in external slow memory which is connected to the machine like for example through CXL devices, accelerators etc * tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix a silly -Wunused-but-set-variable warning Documentation/x86: Update resctrl.rst for new features x86/resctrl: Add interface to write mbm_local_bytes_config x86/resctrl: Add interface to write mbm_total_bytes_config x86/resctrl: Add interface to read mbm_local_bytes_config x86/resctrl: Add interface to read mbm_total_bytes_config x86/resctrl: Support monitor configuration x86/resctrl: Add __init attribute to rdt_get_mon_l3_config() x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation x86/resctrl: Include new features in command line options x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
2023-02-20Merge tag 'perf-core-2023-02-20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: - Optimize perf_sample_data layout - Prepare sample data handling for BPF integration - Update the x86 PMU driver for Intel Meteor Lake - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids) discovery breakage - Fix the x86 Zhaoxin PMU driver - Cleanups * tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) perf/x86/intel/uncore: Add Meteor Lake support x86/perf/zhaoxin: Add stepping check for ZXC perf/x86/intel/ds: Fix the conversion from TSC to perf time perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table perf/x86/uncore: Add a quirk for UPI on SPR perf/x86/uncore: Ignore broken units in discovery table perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name perf/x86/uncore: Factor out uncore_device_to_die() perf/core: Call perf_prepare_sample() before running BPF perf/core: Introduce perf_prepare_header() perf/core: Do not pass header for sample ID init perf/core: Set data->sample_flags in perf_prepare_sample() perf/core: Add perf_sample_save_brstack() helper perf/core: Add perf_sample_save_raw_data() helper perf/core: Add perf_sample_save_callchain() helper perf/core: Save the dynamic parts of sample data size x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulation perf/core: Change the layout of perf_sample_data perf/x86/msr: Add Meteor Lake support perf/x86/cstate: Add Meteor Lake support ...
2023-01-25x86/cpu: Support AMD Automatic IBRSKim Phillips
The AMD Zen4 core supports a new feature called Automatic IBRS. It is a "set-and-forget" feature that means that, like Intel's Enhanced IBRS, h/w manages its IBRS mitigation resources automatically across CPL transitions. The feature is advertised by CPUID_Fn80000021_EAX bit 8 and is enabled by setting MSR C000_0080 (EFER) bit 21. Enable Automatic IBRS by default if the CPU feature is present. It typically provides greater performance over the incumbent generic retpolines mitigation. Reuse the SPECTRE_V2_EIBRS spectre_v2_mitigation enum. AMD Automatic IBRS and Intel Enhanced IBRS have similar enablement. Add NO_EIBRS_PBRSB to cpu_vuln_whitelist, since AMD Automatic IBRS isn't affected by PBRSB-eIBRS. The kernel command line option spectre_v2=eibrs is used to select AMD Automatic IBRS, if available. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20230124163319.2277355-8-kim.phillips@amd.com
2023-01-23x86/resctrl: Add interface to read mbm_total_bytes_configBabu Moger
The event configuration can be viewed by the user by reading the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config. The event configuration settings are domain specific and will affect all the CPUs in the domain. Following are the types of events supported: ==== =========================================================== Bits Description ==== =========================================================== 6 Dirty Victims from the QOS domain to all types of memory 5 Reads to slow memory in the non-local NUMA domain 4 Reads to slow memory in the local NUMA domain 3 Non-temporal writes to non-local NUMA domain 2 Non-temporal writes to local NUMA domain 1 Reads to memory in the non-local NUMA domain 0 Reads to memory in the local NUMA domain ==== =========================================================== By default, the mbm_total_bytes_config is set to 0x7f to count all the event types. For example: $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 0=0x7f;1=0x7f;2=0x7f;3=0x7f In this case, the event mbm_total_bytes is configured with 0x7f on domains 0 to 3. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20230113152039.770054-10-babu.moger@amd.com