summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)Author
2025-02-11x86/cpu/kvm: SRSO: Fix possible missing IBPB on VM-ExitPatrick Bellasi
In [1] the meaning of the synthetic IBPB flags has been redefined for a better separation of concerns: - ENTRY_IBPB -- issue IBPB on entry only - IBPB_ON_VMEXIT -- issue IBPB on VM-Exit only and the Retbleed mitigations have been updated to match this new semantics. Commit [2] was merged shortly before [1], and their interaction was not handled properly. This resulted in IBPB not being triggered on VM-Exit in all SRSO mitigation configs requesting an IBPB there. Specifically, an IBPB on VM-Exit is triggered only when X86_FEATURE_IBPB_ON_VMEXIT is set. However: - X86_FEATURE_IBPB_ON_VMEXIT is not set for "spec_rstack_overflow=ibpb", because before [1] having X86_FEATURE_ENTRY_IBPB was enough. Hence, an IBPB is triggered on entry but the expected IBPB on VM-exit is not. - X86_FEATURE_IBPB_ON_VMEXIT is not set also when "spec_rstack_overflow=ibpb-vmexit" if X86_FEATURE_ENTRY_IBPB is already set. That's because before [1] this was effectively redundant. Hence, e.g. a "retbleed=ibpb spec_rstack_overflow=bpb-vmexit" config mistakenly reports the machine still vulnerable to SRSO, despite an IBPB being triggered both on entry and VM-Exit, because of the Retbleed selected mitigation config. - UNTRAIN_RET_VM won't still actually do anything unless CONFIG_MITIGATION_IBPB_ENTRY is set. For "spec_rstack_overflow=ibpb", enable IBPB on both entry and VM-Exit and clear X86_FEATURE_RSB_VMEXIT which is made superfluous by X86_FEATURE_IBPB_ON_VMEXIT. This effectively makes this mitigation option similar to the one for 'retbleed=ibpb', thus re-order the code for the RETBLEED_MITIGATION_IBPB option to be less confusing by having all features enabling before the disabling of the not needed ones. For "spec_rstack_overflow=ibpb-vmexit", guard this mitigation setting with CONFIG_MITIGATION_IBPB_ENTRY to ensure UNTRAIN_RET_VM sequence is effectively compiled in. Drop instead the CONFIG_MITIGATION_SRSO guard, since none of the SRSO compile cruft is required in this configuration. Also, check only that the required microcode is present to effectively enabled the IBPB on VM-Exit. Finally, update the KConfig description for CONFIG_MITIGATION_IBPB_ENTRY to list also all SRSO config settings enabled by this guard. Fixes: 864bcaa38ee4 ("x86/cpu/kvm: Provide UNTRAIN_RET_VM") [1] Fixes: d893832d0e1e ("x86/srso: Add IBPB on VMEXIT") [2] Reported-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Patrick Bellasi <derkling@google.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-01-28treewide: const qualify ctl_tables where applicableJoel Granados
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-01-25Merge tag 'hyperv-next-signed-20250123' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Introduce a new set of Hyper-V headers in include/hyperv and replace the old hyperv-tlfs.h with the new headers (Nuno Das Neves) - Fixes for the Hyper-V VTL mode (Roman Kisel) - Fixes for cpu mask usage in Hyper-V code (Michael Kelley) - Document the guest VM hibernation behaviour (Michael Kelley) - Miscellaneous fixes and cleanups (Jacob Pan, John Starks, Naman Jain) * tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Documentation: hyperv: Add overview of guest VM hibernation hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id() hyperv: Do not overlap the hvcall IO areas in get_vtl() hyperv: Enable the hypercall output page for the VTL mode hv_balloon: Fallback to generic_online_page() for non-HV hot added mem Drivers: hv: vmbus: Log on missing offers if any Drivers: hv: vmbus: Wait for boot-time offers during boot and resume uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup iommu/hyper-v: Don't assume cpu_possible_mask is dense Drivers: hv: Don't assume cpu_possible_mask is dense x86/hyperv: Don't assume cpu_possible_mask is dense hyperv: Remove the now unused hyperv-tlfs.h files hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h hyperv: Add new Hyper-V headers in include/hyperv hyperv: Clean up unnecessary #includes hyperv: Move hv_connection_id to hyperv-tlfs.h
2025-01-21Merge tag 'x86-cleanups-2025-01-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Miscellaneous x86 cleanups and typo fixes, and also the removal of the 'disablelapic' boot parameter" * tag 'x86-cleanups-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Remove a stray tab in the IO-APIC type string x86/cpufeatures: Remove "AMD" from the comments to the AMD-specific leaf Documentation/kernel-parameters: Fix a typo in kvm.enable_virt_at_load text x86/cpu: Fix typo in x86_match_cpu()'s doc x86/apic: Remove "disablelapic" cmdline option Documentation: Merge x86-specific boot options doc into kernel-parameters.txt x86/ioremap: Remove unused size parameter in remapping functions x86/ioremap: Simplify setup_data mapping variants x86/boot/compressed: Remove unused header includes from kaslr.c
2025-01-21Merge tag 'perf-core-2025-01-20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: "Seqlock optimizations that arose in a perf context and were merged into the perf tree: - seqlock: Add raw_seqcount_try_begin (Suren Baghdasaryan) - mm: Convert mm_lock_seq to a proper seqcount (Suren Baghdasaryan) - mm: Introduce mmap_lock_speculate_{try_begin|retry} (Suren Baghdasaryan) - mm/gup: Use raw_seqcount_try_begin() (Peter Zijlstra) Core perf enhancements: - Reduce 'struct page' footprint of perf by mapping pages in advance (Lorenzo Stoakes) - Save raw sample data conditionally based on sample type (Yabin Cui) - Reduce sampling overhead by checking sample_type in perf_sample_save_callchain() and perf_sample_save_brstack() (Yabin Cui) - Export perf_exclude_event() (Namhyung Kim) Uprobes scalability enhancements: (Andrii Nakryiko) - Simplify find_active_uprobe_rcu() VMA checks - Add speculative lockless VMA-to-inode-to-uprobe resolution - Simplify session consumer tracking - Decouple return_instance list traversal and freeing - Ensure return_instance is detached from the list before freeing - Reuse return_instances between multiple uretprobes within task - Guard against kmemdup() failing in dup_return_instance() AMD core PMU driver enhancements: - Relax privilege filter restriction on AMD IBS (Namhyung Kim) AMD RAPL energy counters support: (Dhananjay Ugwekar) - Introduce topology_logical_core_id() (K Prateek Nayak) - Remove the unused get_rapl_pmu_cpumask() function - Remove the cpu_to_rapl_pmu() function - Rename rapl_pmu variables - Make rapl_model struct global - Add arguments to the init and cleanup functions - Modify the generic variable names to *_pkg* - Remove the global variable rapl_msrs - Move the cntr_mask to rapl_pmus struct - Add core energy counter support for AMD CPUs Intel core PMU driver enhancements: - Support RDPMC 'metrics clear mode' feature (Kan Liang) - Clarify adaptive PEBS processing (Kan Liang) - Factor out functions for PEBS records processing (Kan Liang) - Simplify the PEBS records processing for adaptive PEBS (Kan Liang) Intel uncore driver enhancements: (Kan Liang) - Convert buggy pmu->func_id use to pmu->registered - Support more units on Granite Rapids" * tag 'perf-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) perf: map pages in advance perf/x86/intel/uncore: Support more units on Granite Rapids perf/x86/intel/uncore: Clean up func_id perf/x86/intel: Support RDPMC metrics clear mode uprobes: Guard against kmemdup() failing in dup_return_instance() perf/x86: Relax privilege filter restriction on AMD IBS perf/core: Export perf_exclude_event() uprobes: Reuse return_instances between multiple uretprobes within task uprobes: Ensure return_instance is detached from the list before freeing uprobes: Decouple return_instance list traversal and freeing uprobes: Simplify session consumer tracking uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution uprobes: simplify find_active_uprobe_rcu() VMA checks mm: introduce mmap_lock_speculate_{try_begin|retry} mm: convert mm_lock_seq to a proper seqcount mm/gup: Use raw_seqcount_try_begin() seqlock: add raw_seqcount_try_begin perf/x86/rapl: Add core energy counter support for AMD CPUs perf/x86/rapl: Move the cntr_mask to rapl_pmus struct perf/x86/rapl: Remove the global variable rapl_msrs ...
2025-01-21Merge tag 'x86_misc_for_v6.14_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Borislav Petkov: - The first part of a restructuring of AMD's representation of a northbridge which is legacy now, and the creation of the new AMD node concept which represents the Zen architecture of having a collection of I/O devices within an SoC. Those nodes comprise the so-called data fabric on Zen. This has at least one practical advantage of not having to add a PCI ID each time a new data fabric PCI device releases. Eventually, the lot more uniform provider of data fabric functionality amd_node.c will be used by all the drivers which need it - Smaller cleanups * tag 'x86_misc_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/amd_node: Use defines for SMN register offsets x86/amd_node: Remove dependency on AMD_NB x86/amd_node: Update __amd_smn_rw() error paths x86/amd_nb: Move SMN access code to a new amd_node driver x86/amd_nb, hwmon: (k10temp): Simplify amd_pci_dev_to_node_id() x86/amd_nb: Simplify function 3 search x86/amd_nb: Use topology info to get AMD node count x86/amd_nb: Simplify root device search x86/amd_nb: Simplify function 4 search x86: Start moving AMD node functionality out of AMD_NB x86/amd_nb: Clean up early_is_amd_nb() x86/amd_nb: Restrict init function to AMD-based systems x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
2025-01-21Merge tag 'x86_cpu_for_v6.14_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: - Remove the less generic CPU matching infra around struct x86_cpu_desc and use the generic struct x86_cpu_id thing - Remove magic naked numbers for CPUID functions and use proper defines of the prefix CPUID_LEAF_*. Consolidate some of the crazy use around the tree - Smaller cleanups and improvements * tag 'x86_cpu_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Make all all CPUID leaf names consistent x86/fpu: Remove unnecessary CPUID level check x86/fpu: Move CPUID leaf definitions to common code x86/tsc: Remove CPUID "frequency" leaf magic numbers. x86/tsc: Move away from TSC leaf magic numbers x86/cpu: Move TSC CPUID leaf definition x86/cpu: Refresh DCA leaf reading code x86/cpu: Remove unnecessary MwAIT leaf checks x86/cpu: Use MWAIT leaf definition x86/cpu: Move MWAIT leaf definition to common header x86/cpu: Remove 'x86_cpu_desc' infrastructure x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id' x86/cpu: Replace PEBS use of 'x86_cpu_desc' use with 'x86_cpu_id' x86/cpu: Expose only stepping min/max interface x86/cpu: Introduce new microcode matching helper x86/cpufeature: Document cpu_feature_enabled() as the default to use x86/paravirt: Remove the WBINVD callback x86/cpufeatures: Free up unused feature bits
2025-01-21Merge tag 'x86_sev_for_v6.14_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - A segmented Reverse Map table (RMP) is a across-nodes distributed table of sorts which contains per-node descriptors of each node-local 4K page, denoting its ownership (hypervisor, guest, etc) in the realm of confidential computing. Add support for such a table in order to improve referential locality when accessing or modifying RMP table entries - Add support for reading the TSC in SNP guests by removing any interference or influence the hypervisor might have, with the goal of making a confidential guest even more independent from the hypervisor * tag 'x86_sev_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Add the Secure TSC feature for SNP guests x86/tsc: Init the TSC for Secure TSC guests x86/sev: Mark the TSC in a secure TSC guest as reliable x86/sev: Prevent RDTSC/RDTSCP interception for Secure TSC enabled guests x86/sev: Prevent GUEST_TSC_FREQ MSR interception for Secure TSC enabled guests x86/sev: Change TSC MSR behavior for Secure TSC enabled guests x86/sev: Add Secure TSC support for SNP guests x86/sev: Relocate SNP guest messaging routines to common code x86/sev: Carve out and export SNP guest messaging init routines virt: sev-guest: Replace GFP_KERNEL_ACCOUNT with GFP_KERNEL virt: sev-guest: Remove is_vmpck_empty() helper x86/sev/docs: Document the SNP Reverse Map Table (RMP) x86/sev: Add full support for a segmented RMP table x86/sev: Treat the contiguous RMP table as a single RMP segment x86/sev: Map only the RMP table entries instead of the full RMP range x86/sev: Move the SNP probe routine out of the way x86/sev: Require the RMPREAD instruction after Zen4 x86/sev: Add support for the RMPREAD instruction x86/sev: Prepare for using the RMPREAD instruction to access the RMP
2025-01-21Merge tag 'x86_microcode_for_v6.14_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loader updates from Borislav Petkov: - A bunch of minor cleanups * tag 'x86_microcode_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Remove ret local var in early_apply_microcode() x86/microcode/AMD: Have __apply_microcode_amd() return bool x86/microcode/AMD: Make __verify_patch_size() return bool x86/microcode/AMD: Remove bogus comment from parse_container() x86/microcode/AMD: Return bool from find_blobs_in_containers()
2025-01-21Merge tag 'x86_cache_for_v6.14_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Extend resctrl with the capability of total memory bandwidth monitoring, thus accomodating systems which support only total but not local memory bandwidth monitoring. Add the respective new mount options - The usual cleanups * tag 'x86_cache_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Document the new "mba_MBps_event" file x86/resctrl: Add write option to "mba_MBps_event" file x86/resctrl: Add "mba_MBps_event" file to CTRL_MON directories x86/resctrl: Make mba_sc use total bandwidth if local is not supported x86/resctrl: Compute memory bandwidth for all supported events x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event x86/resctrl: Prepare for per-CTRL_MON group mba_MBps control x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags x86/resctrl: Use kthread_run_on_cpu()
2025-01-21Merge tag 'x86_bugs_for_v6.14_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU speculation update from Borislav Petkov: - Add support for AMD hardware which is not affected by SRSO on the user/kernel attack vector and advertise it to guest userspace * tag 'x86_bugs_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM: x86: Advertise SRSO_USER_KERNEL_NO to userspace x86/bugs: Add SRSO_USER_KERNEL_NO support
2025-01-10hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.hNuno Das Neves
Switch to using hvhdk.h everywhere in the kernel. This header includes all the new Hyper-V headers in include/hyperv, which form a superset of the definitions found in hyperv-tlfs.h. This makes it easier to add new Hyper-V interfaces without being restricted to those in the TLFS doc (reflected in hyperv-tlfs.h). To be more consistent with the original Hyper-V code, the names of some definitions are changed slightly. Update those where needed. Update comments in mshyperv.h files to point to include/hyperv for adding new definitions. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://lore.kernel.org/r/1732577084-2122-5-git-send-email-nunodasneves@linux.microsoft.com Link: https://lore.kernel.org/r/20250108222138.1623703-3-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-01-03x86/mce/amd: Remove shared threshold bank plumbingYazen Ghannam
Legacy AMD systems include an integrated Northbridge that is represented by MCA bank 4. This is the only non-core MCA bank in legacy systems. The Northbridge is physically shared by all the CPUs within an AMD "Node". However, in practice the "shared" MCA bank can only by managed by a single CPU within that AMD Node. This is known as the "Node Base Core" (NBC). For example, only the NBC will be able to read the MCA bank 4 registers; they will be Read-as-Zero for other CPUs. Also, the MCA Thresholding interrupt will only signal the NBC; the other CPUs will not receive it. This is enforced by hardware, and it should not be managed by software. The current AMD Thresholding code attempts to deal with the "shared" MCA bank by micromanaging the bank's sysfs kobjects. However, this does not follow the intended kobject use cases. It is also fragile, and it has caused bugs in the past. Modern AMD systems do not need this shared MCA bank support, and it should not be needed on legacy systems either. Remove the shared threshold bank code. Also, move the threshold struct definitions to mce/amd.c, since they are no longer needed in amd_nb.c. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241206161210.163701-2-yazen.ghannam@amd.com
2024-12-31x86/microcode/AMD: Remove ret local var in early_apply_microcode()Borislav Petkov (AMD)
No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-12-31x86/microcode/AMD: Have __apply_microcode_amd() return boolBorislav Petkov (AMD)
This is the natural thing to do anyway. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-12-31x86/microcode/AMD: Make __verify_patch_size() return boolNikolay Borisov
The result of that function is in essence boolean, so simplify to return the result of the relevant expression. It also makes it follow the convention used by __verify_patch_section(). No functional changes. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241018155151.702350-3-nik.borisov@suse.com
2024-12-31x86/microcode/AMD: Remove bogus comment from parse_container()Nikolay Borisov
The function doesn't return an equivalence ID, remove the false comment. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241018155151.702350-4-nik.borisov@suse.com
2024-12-31x86/microcode/AMD: Return bool from find_blobs_in_containers()Nikolay Borisov
Instead of open-coding the check for size/data move it inside the function and make it return a boolean indicating whether data was found or not. No functional changes. [ bp: Write @ret in find_blobs_in_containers() only on success. ] Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241018155151.702350-2-nik.borisov@suse.com
2024-12-31x86/mce: Remove the redundant mce_hygon_feature_init()Qiuxu Zhuo
Get HYGON to directly call mce_amd_feature_init() and remove the redundant mce_hygon_feature_init(). Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20241212140103.66964-7-qiuxu.zhuo@intel.com
2024-12-31x86/mce: Convert family/model mixed checks to VFM-based checksQiuxu Zhuo
Convert family/model mixed checks to VFM-based checks to make the code more compact. Simplify. [ bp: Drop the "what" from the commit message - it should be visible from the diff alone. ] Suggested-by: Sohil Mehta <sohil.mehta@intel.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20241212140103.66964-6-qiuxu.zhuo@intel.com
2024-12-31x86/mce: Break up __mcheck_cpu_apply_quirks()Tony Luck
Split each vendor specific part into its own helper function. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20241212140103.66964-5-qiuxu.zhuo@intel.com
2024-12-30x86/mce: Make four functions return boolQiuxu Zhuo
Make those functions whose callers only care about success or failure return a boolean value for better readability. Also, update the call sites accordingly as the polarities of all the return values have been flipped. No functional changes. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20241212140103.66964-4-qiuxu.zhuo@intel.com
2024-12-30x86/mce/threshold: Remove the redundant this_cpu_dec_return()Qiuxu Zhuo
The 'storm' variable points to this_cpu_ptr(&storm_desc). Access the 'stormy_bank_count' field through the 'storm' to avoid calling this_cpu_*() on the same per-CPU variable twice. This minor optimization reduces the text size by 16 bytes. $ size threshold.o.* text data bss dec hex filename 1395 1664 0 3059 bf3 threshold.o.old 1379 1664 0 3043 be3 threshold.o.new No functional changes intended. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20241212140103.66964-3-qiuxu.zhuo@intel.com
2024-12-30x86/mce: Make several functions return boolQiuxu Zhuo
Make several functions that return 0 or 1 return a boolean value for better readability. No functional changes are intended. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20241212140103.66964-2-qiuxu.zhuo@intel.com
2024-12-30x86/bugs: Add SRSO_USER_KERNEL_NO supportBorislav Petkov (AMD)
If the machine has: CPUID Fn8000_0021_EAX[30] (SRSO_USER_KERNEL_NO) -- If this bit is 1, it indicates the CPU is not subject to the SRSO vulnerability across user/kernel boundaries. have it fall back to IBPB on VMEXIT only, in the case it is going to run VMs: Speculative Return Stack Overflow: Mitigation: IBPB on VMEXIT only Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20241202120416.6054-2-bp@kernel.org
2024-12-18Merge tag 'hyperv-fixes-signed-20241217' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Various fixes to Hyper-V tools in the kernel tree (Dexuan Cui, Olaf Hering, Vitaly Kuznetsov) - Fix a bug in the Hyper-V TSC page based sched_clock() (Naman Jain) - Two bug fixes in the Hyper-V utility functions (Michael Kelley) - Convert open-coded timeouts to secs_to_jiffies() in Hyper-V drivers (Easwar Hariharan) * tag 'hyperv-fixes-signed-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: tools/hv: reduce resource usage in hv_kvp_daemon tools/hv: add a .gitignore file tools/hv: reduce resouce usage in hv_get_dns_info helper hv/hv_kvp_daemon: Pass NIC name to hv_get_dns_info as well Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet Drivers: hv: util: Don't force error code to ENODEV in util_probe() tools/hv: terminate fcopy daemon if read from uio fails drivers: hv: Convert open-coded timeouts to secs_to_jiffies() tools: hv: change permissions of NetworkManager configuration file x86/hyperv: Fix hv tsc page based sched_clock for hibernation tools: hv: Fix a complier warning in the fcopy uio daemon
2024-12-18x86/cpu: Make all all CPUID leaf names consistentDave Hansen
The leaf names are not consistent. Give them all a CPUID_LEAF_ prefix for consistency and vertical alignment. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Dave Jiang <dave.jiang@intel.com> # for ioatdma bits Link: https://lore.kernel.org/all/20241213205040.7B0C3241%40davehans-spike.ostc.intel.com
2024-12-18x86/fpu: Move CPUID leaf definitions to common codeDave Hansen
Move the XSAVE-related CPUID leaf definitions to common code. Then, use the new definition to remove the last magic number from the CPUID level dependency table. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/all/20241213205037.43C57CDE%40davehans-spike.ostc.intel.com
2024-12-18x86/cpu: Refresh DCA leaf reading codeDave Hansen
The DCA leaf number is also hard-coded in the CPUID level dependency table. Move its definition to common code and use it. While at it, fix up the naming and types in the probe code. All CPUID data is provided in 32-bit registers, not 'unsigned long'. Also stop referring to "level_9". Move away from test_bit() because the type is no longer an 'unsigned long'. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/all/20241213205032.476A30FE%40davehans-spike.ostc.intel.com
2024-12-18x86/cpu: Use MWAIT leaf definitionDave Hansen
The leaf-to-feature dependency array uses hard-coded leaf numbers. Use the new common header definition for the MWAIT leaf. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/all/20241213205029.5B055D6E%40davehans-spike.ostc.intel.com
2024-12-18x86/cpu: Remove 'x86_cpu_desc' infrastructureDave Hansen
All the users of 'x86_cpu_desc' are gone. Zap it from the tree. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20241213185133.AF0BF2BC%40davehans-spike.ostc.intel.com
2024-12-18x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'Dave Hansen
The AMD erratum 1386 detection code uses and old style 'x86_cpu_desc' table. Replace it with 'x86_cpu_id' so the old style can be removed. I did not create a new helper macro here. The new table is certainly more noisy than the old and it can be improved on. But I was hesitant to create a new macro just for a single site that is only two ugly lines in the end. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20241213185132.07555E1D%40davehans-spike.ostc.intel.com
2024-12-17x86/cpu: Expose only stepping min/max interfaceDave Hansen
The x86_match_cpu() infrastructure can match CPU steppings. Since there are only 16 possible steppings, the matching infrastructure goes all out and stores the stepping match as a bitmap. That means it can match any possible steppings in a single list entry. Fun. But it exposes this bitmap to each of the X86_MATCH_*() helpers when none of them really need a bitmap. It makes up for this by exporting a helper (X86_STEPPINGS()) which converts a contiguous stepping range into the bitmap which every single user leverages. Instead of a bitmap, have the main helper for this sort of thing (X86_MATCH_VFM_STEPS()) just take a stepping range. This ends up actually being even more compact than before. Leave the helper in place (renamed to __X86_STEPPINGS()) to make it more clear what is going on instead of just having a random GENMASK() in the middle of an already complicated macro. One oddity that I hit was this macro: X86_MATCH_VFM_STEPS(vfm, X86_STEPPING_MIN, max_stepping, issues) It *could* have been converted over to take a min/max stepping value for each entry. But that would have been a bit too verbose and would prevent the one oddball in the list (INTEL_COMETLAKE_L stepping 0) from sticking out. Instead, just have it take a *maximum* stepping and imply that the match is from 0=>max_stepping. This is functional for all the cases now and also retains the nice property of having INTEL_COMETLAKE_L stepping 0 stick out like a sore thumb. skx_cpuids[] is goofy. It uses the stepping match but encodes all possible steppings. Just use a normal, non-stepping match helper. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20241213185129.65527B2A%40davehans-spike.ostc.intel.com
2024-12-17x86/cpu: Introduce new microcode matching helperDave Hansen
The 'x86_cpu_id' and 'x86_cpu_desc' structures are very similar and need to be consolidated. There is a microcode version matching function for 'x86_cpu_desc' but not 'x86_cpu_id'. Create one for 'x86_cpu_id'. This essentially just leverages the x86_cpu_id->driver_data field to replace the less generic x86_cpu_desc->x86_microcode_rev field. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20241213185128.8F24EEFC%40davehans-spike.ostc.intel.com
2024-12-14x86/sev: Require the RMPREAD instruction after Zen4Tom Lendacky
Limit usage of the non-architectural RMP format to Zen3/Zen4 processors. The RMPREAD instruction, with architectural defined output, is available and should be used for RMP access beyond Zen4. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Reviewed-by: Ashish Kalra <ashish.kalra@amd.com> Link: https://lore.kernel.org/r/5be0093e091778a151266ea853352f62f838eb99.1733172653.git.thomas.lendacky@amd.com
2024-12-13x86: make get_cpu_vendor() accessible from Xen codeJuergen Gross
In order to be able to differentiate between AMD and Intel based systems for very early hypercalls without having to rely on the Xen hypercall page, make get_cpu_vendor() non-static. Refactor early_cpu_init() for the same reason by splitting out the loop initializing cpu_devs() into an externally callable function. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-12-12x86/resctrl: Add write option to "mba_MBps_event" fileTony Luck
The "mba_MBps" mount option provides an alternate method to control memory bandwidth. Instead of specifying allowable bandwidth as a percentage of maximum possible, the user provides a MiB/s limit value. There is a file in each CTRL_MON group directory that shows the event currently in use. Allow writing that file to choose a different event. A user can choose any of the memory bandwidth monitoring events listed in /sys/fs/resctrl/info/L3_mon/mon_features independently for each CTRL_MON group by writing to each of the "mba_MBps_event" files. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20241206163148.83828-8-tony.luck@intel.com
2024-12-12x86/resctrl: Add "mba_MBps_event" file to CTRL_MON directoriesTony Luck
The "mba_MBps" mount option provides an alternate method to control memory bandwidth. Instead of specifying allowable bandwidth as a percentage of maximum possible, the user provides a MiB/s limit value. In preparation to allow the user to pick the memory bandwidth monitoring event used as input to the feedback loop, provide a file in each CTRL_MON group directory that shows the event currently in use. Note that this file is only visible when the "mba_MBps" mount option is in use. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20241206163148.83828-7-tony.luck@intel.com
2024-12-10x86/cpu: Fix typo in x86_match_cpu()'s docRaag Jadav
Fix typo in x86_match_cpu()'s description. [ bp: Massage commit message. ] Signed-off-by: Raag Jadav <raag.jadav@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241030065804.407793-1-raag.jadav@intel.com
2024-12-10Merge branch 'linus' into x86/cleanups, to resolve conflictIngo Molnar
These two commits interact: upstream: 73da582a476e ("x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC") x86/cleanups: 13148e22c151 ("x86/apic: Remove "disablelapic" cmdline option") Resolve it. Conflicts: arch/x86/kernel/cpu/topology.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-10x86/apic: Remove "disablelapic" cmdline optionBorislav Petkov (AMD)
The convention is "no<something>" and there already is "nolapic". Drop the disable one. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20241202190011.11979-2-bp@kernel.org
2024-12-10x86/resctrl: Make mba_sc use total bandwidth if local is not supportedTony Luck
The default input measurement to the mba_sc feedback loop for memory bandwidth control when the user mounts with the "mba_MBps" option is the local bandwidth event. But some systems may not support a local bandwidth event. When local bandwidth event is not supported, check for support of total bandwidth and use that instead. Relax the mount option check to allow use of the "mba_MBps" option for systems when only total bandwidth monitoring is supported. Also update the error message. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20241206163148.83828-6-tony.luck@intel.com
2024-12-10x86/resctrl: Compute memory bandwidth for all supported eventsTony Luck
Switching between local and total memory bandwidth events as the input to the mba_sc feedback loop would be cumbersome and take effect slowly in the current implementation as the bandwidth is only known after two consecutive readings of the same event. Compute the bandwidth for all supported events. This doesn't add significant overhead and will make changing which event is used simple. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20241206163148.83828-5-tony.luck@intel.com
2024-12-10x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group eventTony Luck
update_mba_bw() hard codes use of the memory bandwidth local event which prevents more flexible options from being deployed. Change this function to use the event specified in the rdtgroup that is being processed. Mount time checks for the "mba_MBps" option ensure that local memory bandwidth is enabled. So drop the redundant is_mbm_local_enabled() check. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20241206163148.83828-4-tony.luck@intel.com
2024-12-10x86/resctrl: Prepare for per-CTRL_MON group mba_MBps controlTony Luck
Resctrl uses local memory bandwidth event as input to the feedback loop when the mba_MBps mount option is used. This means that this mount option cannot be used on systems that only support monitoring of total bandwidth. Prepare to allow users to choose the input event independently for each CTRL_MON group by adding a global variable "mba_mbps_default_event" used to set the default event for each CTRL_MON group, and a new field "mba_mbps_event" in struct rdtgroup to track which event is used for each CTRL_MON group. Notes: 1) Both of these are only used when the user mounts the filesystem with the "mba_MBps" option. 2) Only check for support of local bandwidth event when initializing mba_mbps_default_event. Support for total bandwidth event can be added after other routines in resctrl have been updated to handle total bandwidth event. [ bp: Move mba_mbps_default_event extern into the arch header. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20241206163148.83828-3-tony.luck@intel.com
2024-12-09x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflagsBabu Moger
thread_throttle_mode_init() and mbm_config_rftype_init() both initialize fflags for resctrl files. Adding new files will involve adding another function to initialize the fflags. This can be simplified by adding a new function resctrl_file_fflags_init() and passing the file name and flags to be initialized. Consolidate fflags initialization into resctrl_file_fflags_init() and remove thread_throttle_mode_init() and mbm_config_rftype_init(). [ Tony: Drop __init attribute so resctrl_file_fflags_init() can be used at run time. ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20241206163148.83828-2-tony.luck@intel.com
2024-12-09x86/resctrl: Use kthread_run_on_cpu()Frederic Weisbecker
Use the proper API instead of open coding it. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20240807160228.26206-3-frederic@kernel.org
2024-12-09x86/hyperv: Fix hv tsc page based sched_clock for hibernationNaman Jain
read_hv_sched_clock_tsc() assumes that the Hyper-V clock counter is bigger than the variable hv_sched_clock_offset, which is cached during early boot, but depending on the timing this assumption may be false when a hibernated VM starts again (the clock counter starts from 0 again) and is resuming back (Note: hv_init_tsc_clocksource() is not called during hibernation/resume); consequently, read_hv_sched_clock_tsc() may return a negative integer (which is interpreted as a huge positive integer since the return type is u64) and new kernel messages are prefixed with huge timestamps before read_hv_sched_clock_tsc() grows big enough (which typically takes several seconds). Fix the issue by saving the Hyper-V clock counter just before the suspend, and using it to correct the hv_sched_clock_offset in resume. This makes hv tsc page based sched_clock continuous and ensures that post resume, it starts from where it left off during suspend. Override x86_platform.save_sched_clock_state and x86_platform.restore_sched_clock_state routines to correct this as soon as possible. Note: if Invariant TSC is available, the issue doesn't happen because 1) we don't register read_hv_sched_clock_tsc() for sched clock: See commit e5313f1c5404 ("clocksource/drivers/hyper-v: Rework clocksource and sched clock setup"); 2) the common x86 code adjusts TSC similarly: see __restore_processor_state() -> tsc_verify_tsc_adjust(true) and x86_platform.restore_sched_clock_state(). Cc: stable@vger.kernel.org Fixes: 1349401ff1aa ("clocksource/drivers/hyper-v: Suspend/resume Hyper-V clocksource for hibernation") Co-developed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20240917053917.76787-1-namjain@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240917053917.76787-1-namjain@linux.microsoft.com>
2024-12-06x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR failsSean Christopherson
When ensuring EFER.AUTOIBRS is set, WARN only on a negative return code from msr_set_bit(), as '1' is used to indicate the WRMSR was successful ('0' indicates the MSR bit was already set). Fixes: 8cc68c9c9e92 ("x86/CPU/AMD: Make sure EFER[AIBRSE] is set") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/Z1MkNofJjt7Oq0G6@google.com Closes: https://lore.kernel.org/all/20241205220604.GA2054199@thelio-3990X
2024-12-06x86/cacheinfo: Delete global num_cache_leavesRicardo Neri
Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all CPUs from the same global "num_cache_leaves". This is erroneous on systems such as Meteor Lake, where each CPU has a distinct num_leaves value. Delete the global "num_cache_leaves" and initialize num_leaves on each CPU. init_cache_level() no longer needs to set num_leaves. Also, it never had to set num_levels as it is unnecessary in x86. Keep checking for zero cache leaves. Such condition indicates a bug. [ bp: Cleanup. ] Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org # 6.3+ Link: https://lore.kernel.org/r/20241128002247.26726-3-ricardo.neri-calderon@linux.intel.com