summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2023-10-19KVM: x86: remove the unused assigned_dev_head from kvm_archLiang Chen
Legacy device assignment was dropped years ago. This field is not used anymore. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Link: https://lore.kernel.org/r/20231019043336.8998-1-liangchen.linux@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-19x86/microcode/intel: Cleanup code furtherThomas Gleixner
Sanitize the microcode scan loop, fixup printks and move the loading function for builtin microcode next to the place where it is used and mark it __init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.389400871@linutronix.de
2023-10-19x86/microcode/intel: Simplify and rename generic_load_microcode()Thomas Gleixner
so it becomes less obfuscated and rename it because there is nothing generic about it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.330295409@linutronix.de
2023-10-19x86/microcode/intel: Simplify scan_microcode()Thomas Gleixner
Make it readable and comprehensible. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.271940980@linutronix.de
2023-10-19x86/microcode/intel: Rip out mixed stepping support for Intel CPUsAshok Raj
Mixed steppings aren't supported on Intel CPUs. Only one microcode patch is required for the entire system. The caching of microcode blobs which match the family and model is therefore pointless and in fact is dysfunctional as CPU hotplug updates use only a single microcode blob, i.e. the one where *intel_ucode_patch points to. Remove the microcode cache and make it an AMD local feature. [ tglx: - save only at the end. Otherwise random microcode ends up in the pointer for early loading - free the ucode patch pointer in save_microcode_patch() only after kmemdup() has succeeded, as reported by Andrew Cooper ] Originally-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.404362809@linutronix.de
2023-10-18KVM: x86/mmu: Remove unnecessary ‘NULL’ values from sptepLi zeming
Don't initialize "spte" and "sptep" in fast_page_fault() as they are both guaranteed (for all intents and purposes) to be written at the start of every loop iteration. Add a sanity check that "sptep" is non-NULL after walking the shadow page tables, as encountering a NULL root would result in "spte" not being written, i.e. would lead to uninitialized data or the previous value being consumed. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20230905182006.2964-1-zeming@nfschina.com [sean: rewrite changelog with --verbose] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-18bitops: add xor_unlock_is_negative_byte()Matthew Wilcox (Oracle)
Replace clear_bit_and_unlock_is_negative_byte() with xor_unlock_is_negative_byte(). We have a few places that like to lock a folio, set a flag and unlock it again. Allow for the possibility of combining the latter two operations for efficiency. We are guaranteed that the caller holds the lock, so it is safe to unlock it with the xor. The caller must guarantee that nobody else will set the flag without holding the lock; it is not safe to do this with the PG_dirty flag, for example. Link: https://lkml.kernel.org/r/20231004165317.1061855-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18x86: KVM: Add feature flag for CPUID.80000021H:EAX[bit 1]Jim Mattson
Define an X86_FEATURE_* flag for CPUID.80000021H:EAX.[bit 1], and advertise the feature to userspace via KVM_GET_SUPPORTED_CPUID. Per AMD's "Processor Programming Reference (PPR) for AMD Family 19h Model 61h, Revision B1 Processors (56713-B1-PUB)," this CPUID bit indicates that a WRMSR to MSR_FS_BASE, MSR_GS_BASE, or MSR_KERNEL_GS_BASE is non-serializing. This is a change in previously architected behavior. Effectively, this CPUID bit is a "defeature" bit, or a reverse polarity feature bit. When this CPUID bit is clear, the feature (serialization on WRMSR to any of these three MSRs) is available. When this CPUID bit is set, the feature is not available. KVM_GET_SUPPORTED_CPUID must pass this bit through from the underlying hardware, if it is set. Leaving the bit clear claims that WRMSR to these three MSRs will be serializing in a guest running under KVM. That isn't true. Though KVM could emulate the feature by intercepting writes to the specified MSRs, it does not do so today. The guest is allowed direct read/write access to these MSRs without interception, so the innate hardware behavior is preserved under KVM. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231005031237.1652871-1-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-18KVM: x86: remove always-false condition in kvmclock_sync_fnDongli Zhang
The 'kvmclock_periodic_sync' is a readonly param that cannot change after bootup. The kvm_arch_vcpu_postcreate() is not going to schedule the kvmclock_sync_work if kvmclock_periodic_sync == false. As a result, the "if (!kvmclock_periodic_sync)" can never be true if the kvmclock_sync_work = kvmclock_sync_fn() is scheduled. Link: https://lore.kernel.org/kvm/a461bf3f-c17e-9c3f-56aa-726225e8391d@oracle.com Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Link: https://lore.kernel.org/r/20231001213637.76686-1-dongli.zhang@oracle.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-18x86/microcode/32: Move early loading after paging enableThomas Gleixner
32-bit loads microcode before paging is enabled. The commit which introduced that has zero justification in the changelog. The cover letter has slightly more content, but it does not give any technical justification either: "The problem in current microcode loading method is that we load a microcode way, way too late; ideally we should load it before turning paging on. This may only be practical on 32 bits since we can't get to 64-bit mode without paging on, but we should still do it as early as at all possible." Handwaving word salad with zero technical content. Someone claimed in an offlist conversation that this is required for curing the ATOM erratum AAE44/AAF40/AAG38/AAH41. That erratum requires an microcode update in order to make the usage of PSE safe. But during early boot, PSE is completely irrelevant and it is evaluated way later. Neither is it relevant for the AP on single core HT enabled CPUs as the microcode loading on the AP is not doing anything. On dual core CPUs there is a theoretical problem if a split of an executable large page between enabling paging including PSE and loading the microcode happens. But that's only theoretical, it's practically irrelevant because the affected dual core CPUs are 64bit enabled and therefore have paging and PSE enabled before loading the microcode on the second core. So why would it work on 64-bit but not on 32-bit? The erratum: "AAG38 Code Fetch May Occur to Incorrect Address After a Large Page is Split Into 4-Kbyte Pages Problem: If software clears the PS (page size) bit in a present PDE (page directory entry), that will cause linear addresses mapped through this PDE to use 4-KByte pages instead of using a large page after old TLB entries are invalidated. Due to this erratum, if a code fetch uses this PDE before the TLB entry for the large page is invalidated then it may fetch from a different physical address than specified by either the old large page translation or the new 4-KByte page translation. This erratum may also cause speculative code fetches from incorrect addresses." The practical relevance for this is exactly zero because there is no splitting of large text pages during early boot-time, i.e. between paging enable and microcode loading, and neither during CPU hotplug. IOW, this load microcode before paging enable is yet another voodoo programming solution in search of a problem. What's worse is that it causes at least two serious problems: 1) When stackprotector is enabled, the microcode loader code has the stackprotector mechanics enabled. The read from the per CPU variable __stack_chk_guard is always accessing the virtual address either directly on UP or via %fs on SMP. In physical address mode this results in an access to memory above 3GB. So this works by chance as the hardware returns the same value when there is no RAM at this physical address. When there is RAM populated above 3G then the read is by chance the same as nothing changes that memory during the very early boot stage. That's not necessarily true during runtime CPU hotplug. 2) When function tracing is enabled, the relevant microcode loader functions and the functions invoked from there will call into the tracing code and evaluate global and per CPU variables in physical address mode. What could potentially go wrong? Cure this and move the microcode loading after the early paging enable, use the new temporary initrd mapping and remove the gunk in the microcode loader which is required to handle physical address mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.348298216@linutronix.de
2023-10-18x86/boot/32: Temporarily map initrd for microcode loadingThomas Gleixner
Early microcode loading on 32-bit runs in physical address mode because the initrd is not covered by the initial page tables. That results in a horrible mess all over the microcode loader code. Provide a temporary mapping for the initrd in the initial page tables by appending it to the actual initial mapping starting with a new PGD or PMD depending on the configured page table levels ([non-]PAE). The page table entries are located after _brk_end so they are not permanently using memory space. The mapping is invalidated right away in i386_start_kernel() after the early microcode loader has run. This prepares for removing the physical address mode oddities from all over the microcode loader code, which in turn allows further cleanups. Provide the map and unmap code and document the place where the microcode loader needs to be invoked with a comment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.292291436@linutronix.de
2023-10-18x86/microcode: Provide CONFIG_MICROCODE_INITRD32Thomas Gleixner
Create an aggregate config switch which covers X86_32, MICROCODE and BLK_DEV_INITRD to avoid lengthy #ifdeffery in upcoming code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.236208250@linutronix.de
2023-10-18x86/boot/32: Restructure mk_early_pgtbl_32()Thomas Gleixner
Prepare it for adding a temporary initrd mapping by splitting out the actual map loop. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.175910753@linutronix.de
2023-10-18x86/boot/32: De-uglify the 2/3 level paging difference in mk_early_pgtbl_32()Thomas Gleixner
Move the ifdeffery out of the function and use proper typedefs to make it work for both 2 and 3 level paging. No functional change. [ bp: Move mk_early_pgtbl_32() declaration into a header. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.111059491@linutronix.de
2023-10-18x86/boot: Rename conflicting 'boot_params' pointer to 'boot_params_ptr'Ard Biesheuvel
The x86 decompressor is built and linked as a separate executable, but it shares components with the kernel proper, which are either #include'd as C files, or linked into the decompresor as a static library (e.g, the EFI stub) Both the kernel itself and the decompressor define a global symbol 'boot_params' to refer to the boot_params struct, but in the former case, it refers to the struct directly, whereas in the decompressor, it refers to a global pointer variable referring to the struct boot_params passed by the bootloader or constructed from scratch. This ambiguity is unfortunate, and makes it impossible to assign this decompressor variable from the x86 EFI stub, given that declaring it as extern results in a clash. So rename the decompressor version (whose scope is limited) to boot_params_ptr. [ mingo: Renamed 'boot_params_p' to 'boot_params_ptr' for clarity ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org
2023-10-18x86/boot: Use __pa_nodebug() in mk_early_pgtbl_32()Thomas Gleixner
Use the existing macro instead of undefining and redefining __pa(). No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.051625827@linutronix.de
2023-10-18x86/boot/32: Disable stackprotector and tracing for mk_early_pgtbl_32()Thomas Gleixner
Stackprotector cannot work before paging is enabled. The read from the per CPU variable __stack_chk_guard is always accessing the virtual address either directly on UP or via FS on SMP. In physical address mode this results in an access to memory above 3GB. So this works by chance as the hardware returns the same value when there is no RAM at this physical address. When there is RAM populated above 3G then the read is by chance the same as nothing changes that memory during the very early boot stage. Stop relying on pure luck and disable the stack protector for the only C function which is called during early boot before paging is enabled. Remove function tracing from the whole source file as there is no way to trace this at all, but in case of CONFIG_DYNAMIC_FTRACE=n mk_early_pgtbl_32() would access global function tracer variables in physical address mode which again might work by chance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.156063939@linutronix.de
2023-10-18UML: remove unused cmd_vdso_installMasahiro Yamada
You cannot run this code because arch/um/Makefile does not define the vdso_install target. It appears that this code was blindly copied from another architecture. Remove the dead code. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Richard Weinberger <richard@nod.at>
2023-10-17KVM: x86: hyper-v: Don't auto-enable stimer on write from user-spaceNicolas Saenz Julienne
Don't apply the stimer's counter side effects when modifying its value from user-space, as this may trigger spurious interrupts. For example: - The stimer is configured in auto-enable mode. - The stimer's count is set and the timer enabled. - The stimer expires, an interrupt is injected. - The VM is live migrated. - The stimer config and count are deserialized, auto-enable is ON, the stimer is re-enabled. - The stimer expires right away, and injects an unwarranted interrupt. Cc: stable@vger.kernel.org Fixes: 1f4b34f825e8 ("kvm/x86: Hyper-V SynIC timers") Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20231017155101.40677-1-nsaenz@amazon.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-17KVM: x86: Update the variable naming in kvm_x86_ops.sched_in()Mingwei Zhang
Update the variable with name 'kvm' in kvm_x86_ops.sched_in() to 'vcpu' to avoid confusions. Variable naming in KVM has a clear convention that 'kvm' refers to pointer of type 'struct kvm *', while 'vcpu' refers to pointer of type 'struct kvm_vcpu *'. Fix this 9-year old naming issue for fun. Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20231017232610.4008690-1-mizhang@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-17x86/microcode/amd: Fix snprintf() format string warning in W=1 buildPaolo Bonzini
Building with GCC 11.x results in the following warning: arch/x86/kernel/cpu/microcode/amd.c: In function ‘find_blobs_in_containers’: arch/x86/kernel/cpu/microcode/amd.c:504:58: error: ‘h.bin’ directive output may be truncated writing 5 bytes into a region of size between 1 and 7 [-Werror=format-truncation=] arch/x86/kernel/cpu/microcode/amd.c:503:17: note: ‘snprintf’ output between 35 and 41 bytes into a destination of size 36 The issue is that GCC does not know that the family can only be a byte (it ultimately comes from CPUID). Suggest the right size to the compiler by marking the argument as char-size ("hh"). While at it, instead of using the slightly more obscure precision specifier use the width with zero padding (over 23000 occurrences in kernel sources, vs 500 for the idiom using the precision). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Closes: https://lore.kernel.org/oe-kbuild-all/202308252255.2HPJ6x5Q-lkp@intel.com/ Link: https://lore.kernel.org/r/20231016224858.2829248-1-pbonzini@redhat.com
2023-10-17KVM: x86/mmu: Stop kicking vCPUs to sync the dirty log when PML is disabledDavid Matlack
Stop kicking vCPUs in kvm_arch_sync_dirty_log() when PML is disabled. Kicking vCPUs when PML is disabled serves no purpose and could negatively impact guest performance. This restores KVM's behavior to prior to 5.12 commit a018eba53870 ("KVM: x86: Move MMU's PML logic to common code"), which replaced a static_call_cond(kvm_x86_flush_log_dirty) with unconditional calls to kvm_vcpu_kick(). Fixes: a018eba53870 ("KVM: x86: Move MMU's PML logic to common code") Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20231016221228.1348318-1-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-17KVM: x86: Use octal for file permissionPeng Hao
Convert all module params to octal permissions to improve code readability and to make checkpatch happy: WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20231013113020.77523-1-flyingpeng@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-17x86/head/64: Move the __head definition to <asm/init.h>Hou Wenlong
Move the __head section definition to a header to widen its use. An upcoming patch will mark the code as __head in mem_encrypt_identity.c too. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/0583f57977be184689c373fe540cbd7d85ca2047.1697525407.git.houwenlong.hwl@antgroup.com
2023-10-17x86/resctrl: Display RMID of resource groupBabu Moger
In x86, hardware uses RMID to identify a monitoring group. When a user creates a monitor group these details are not visible. These details can help resctrl debugging. Add RMID(mon_hw_id) to the monitor groups display in the resctrl interface. Users can see these details when resctrl is mounted with "-o debug" option. Add RFTYPE_MON_BASE that complements existing RFTYPE_CTRL_BASE and represents files belonging to monitoring groups. Other architectures do not use "RMID". Use the name mon_hw_id to refer to "RMID" in an effort to keep the naming generic. For example: $cat /sys/fs/resctrl/mon_groups/mon_grp1/mon_hw_id 3 Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-10-babu.moger@amd.com
2023-10-17x86/resctrl: Add support for the files of MON groups onlyBabu Moger
Files unique to monitoring groups have the RFTYPE_MON flag. When a new monitoring group is created the resctrl files with flags RFTYPE_BASE (files common to all resource groups) and RFTYPE_MON (files unique to monitoring groups) are created to support interacting with the new monitoring group. A resource group can support both monitoring and control, also termed a CTRL_MON resource group. CTRL_MON groups should get both monitoring and control resctrl files but that is not the case. Only the RFTYPE_BASE and RFTYPE_CTRL files are created for CTRL_MON groups. Ensure that files with the RFTYPE_MON flag are created for CTRL_MON groups. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-9-babu.moger@amd.com
2023-10-17x86/resctrl: Display CLOSID for resource groupBabu Moger
In x86, hardware uses CLOSID to identify a control group. When a user creates a control group this information is not visible to the user. It can help resctrl debugging. Add CLOSID(ctrl_hw_id) to the control groups display in the resctrl interface. Users can see this detail when resctrl is mounted with the "-o debug" option. Other architectures do not use "CLOSID". Use the names ctrl_hw_id to refer to "CLOSID" in an effort to keep the naming generic. For example: $cat /sys/fs/resctrl/ctrl_grp1/ctrl_hw_id 1 Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-8-babu.moger@amd.com
2023-10-17x86/resctrl: Introduce "-o debug" mount optionBabu Moger
Add "-o debug" option to mount resctrl filesystem in debug mode. When in debug mode resctrl displays files that have the new RFTYPE_DEBUG flag to help resctrl debugging. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-7-babu.moger@amd.com
2023-10-17x86/resctrl: Move default group file creation to mountBabu Moger
The default resource group and its files are created during kernel init time. Upcoming changes will make some resctrl files optional based on a mount parameter. If optional files are to be added to the default group based on the mount option, then each new file needs to be created separately and call kernfs_activate() again. Create all files of the default resource group during resctrl mount, destroyed during unmount, to avoid scattering resctrl file addition across two separate code flows. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-6-babu.moger@amd.com
2023-10-17x86/resctrl: Unwind properly from rdt_enable_ctx()Babu Moger
rdt_enable_ctx() enables the features provided during resctrl mount. Additions to rdt_enable_ctx() are required to also modify error paths of rdt_enable_ctx() callers to ensure correct unwinding if errors are encountered after calling rdt_enable_ctx(). This is error prone. Introduce rdt_disable_ctx() to refactor the error unwinding of rdt_enable_ctx() to simplify future additions. This also simplifies cleanup in rdt_kill_sb(). Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-5-babu.moger@amd.com
2023-10-17x86/resctrl: Rename rftype flags for consistencyBabu Moger
resctrl associates rftype flags with its files so that files can be chosen based on the resource, whether it is info or base, and if it is control or monitor type file. These flags use the RF_ as well as RFTYPE_ prefixes. Change the prefix to RFTYPE_ for all these flags to be consistent. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-4-babu.moger@amd.com
2023-10-17x86/resctrl: Simplify rftype flag definitionsBabu Moger
The rftype flags are bitmaps used for adding files under the resctrl filesystem. Some of these bitmap defines have one extra level of indirection which is not necessary. Drop the RF_* defines and simplify the macros. [ bp: Massage commit message. ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-3-babu.moger@amd.com
2023-10-17x86/resctrl: Add multiple tasks to the resctrl group at onceBabu Moger
The resctrl task assignment for monitor or control group needs to be done one at a time. For example: $mount -t resctrl resctrl /sys/fs/resctrl/ $mkdir /sys/fs/resctrl/ctrl_grp1 $echo 123 > /sys/fs/resctrl/ctrl_grp1/tasks $echo 456 > /sys/fs/resctrl/ctrl_grp1/tasks $echo 789 > /sys/fs/resctrl/ctrl_grp1/tasks This is not user-friendly when dealing with hundreds of tasks. Support multiple task assignment in one command with tasks ids separated by commas. For example: $echo 123,456,789 > /sys/fs/resctrl/ctrl_grp1/tasks Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-2-babu.moger@amd.com
2023-10-17x86/sev: Check for user-space IOIO pointing to kernel spaceJoerg Roedel
Check the memory operand of INS/OUTS before emulating the instruction. The #VC exception can get raised from user-space, but the memory operand can be manipulated to access kernel memory before the emulation actually begins and after the exception handler has run. [ bp: Massage commit message. ] Fixes: 597cfe48212a ("x86/boot/compressed/64: Setup a GHCB-based VC Exception handler") Reported-by: Tom Dohrmann <erbse.13@gmx.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org>
2023-10-17vgacon: remove screen_info dependencyArnd Bergmann
The vga console driver is fairly self-contained, and only used by architectures that explicitly initialize the screen_info settings. Chance every instance that picks the vga console by setting conswitchp to call a function instead, and pass a reference to the screen_info there. Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Khalid Azzi <khalid@gonehiking.org> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231009211845.3136536-6-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the handling of the phycal timer offset when FEAT_ECV and CNTPOFF_EL2 are implemented - Restore the functionnality of Permission Indirection that was broken by the Fine Grained Trapping rework - Cleanup some PMU event sharing code MIPS: - Fix W=1 build s390: - One small fix for gisa to avoid stalls x86: - Truncate writes to PMU counters to the counter's width to avoid spurious overflows when emulating counter events in software - Set the LVTPC entry mask bit when handling a PMI (to match Intel-defined architectural behavior) - Treat KVM_REQ_PMI as a wake event instead of queueing host IRQ work to kick the guest out of emulated halt - Fix for loading XSAVE state from an old kernel into a new one - Fixes for AMD AVIC selftests: - Play nice with %llx when formatting guest printf and assert statements - Clean up stale test metadata - Zero-initialize structures in memslot perf test to workaround a suspected 'may be used uninitialized' false positives from GCC" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: arm64: timers: Correctly handle TGE flip with CNTPOFF_EL2 KVM: arm64: POR{E0}_EL1 do not need trap handlers KVM: arm64: Add nPIR{E0}_EL1 to HFG traps KVM: MIPS: fix -Wunused-but-set-variable warning KVM: arm64: pmu: Drop redundant check for non-NULL kvm_pmu_events KVM: SVM: Fix build error when using -Werror=unused-but-set-variable x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested() x86: KVM: SVM: add support for Invalid IPI Vector interception x86: KVM: SVM: always update the x2avic msr interception KVM: selftests: Force load all supported XSAVE state in state test KVM: selftests: Load XSAVE state into untouched vCPU during state test KVM: selftests: Touch relevant XSAVE state in guest for state test KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2} x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer KVM: selftests: Zero-initialize entire test_result in memslot perf test KVM: selftests: Remove obsolete and incorrect test case metadata KVM: selftests: Treat %llx like %lx when formatting guest printf KVM: x86/pmu: Synthesize at most one PMI per VM-exit KVM: x86: Mask LVTPC when handling a PMI KVM: x86/pmu: Truncate counter value to allowed width on write ...
2023-10-16x86/mce: Cleanup mce_usable_address()Yazen Ghannam
Move Intel-specific checks into a helper function. Explicitly use "bool" for return type. No functional change intended. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230613141142.36801-4-yazen.ghannam@amd.com
2023-10-16x86/mce: Define amd_mce_usable_address()Yazen Ghannam
Currently, all valid MCA_ADDR values are assumed to be usable on AMD systems. However, this is not correct in most cases. Notifiers expecting usable addresses may then operate on inappropriate values. Define a helper function to do AMD-specific checks for a usable memory address. List out all known cases. [ bp: Tone down the capitalized words. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230613141142.36801-3-yazen.ghannam@amd.com
2023-10-16x86/MCE/AMD: Split amd_mce_is_memory_error()Yazen Ghannam
Define helper functions for legacy and SMCA systems in order to reuse individual checks in later changes. Describe what each function is checking for, and correct the XEC bitmask for SMCA. No functional change intended. [ bp: Use "else in amd_mce_is_memory_error() to make the conditional balanced, for readability. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Link: https://lore.kernel.org/r/20230613141142.36801-2-yazen.ghannam@amd.com
2023-10-16x86/head/64: Add missing __head annotation to startup_64_load_idt()Hou Wenlong
This function is currently only used in the head code and is only called from startup_64_setup_env(). Although it would be inlined by the compiler, it would be better to mark it as __head too in case it doesn't. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/efcc5b5e18af880e415d884e072bf651c1fa7c34.1689130310.git.houwenlong.hwl@antgroup.com
2023-10-16x86/head/64: Mark 'startup_gdt[]' and 'startup_gdt_descr' as __initdataHou Wenlong
As 'startup_gdt[]' and 'startup_gdt_descr' are only used in booting, mark them as __initdata to allow them to be freed after boot. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/c85903a7cfad37d14a7e5a4df9fc7119a3669fb3.1689130310.git.houwenlong.hwl@antgroup.com
2023-10-16perf/x86/amd/uncore: Pass through error code for initialization failures, ↵Sandipan Das
instead of -ENODEV Pass through the appropriate error code when the registration of hotplug callbacks fail during initialization, instead of returning a blanket -ENODEV. [ mingo: Updated the changelog. ] Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231016060743.332051-1-sandipan.das@amd.com
2023-10-15Revert "x86/smp: Put CPUs into INIT on shutdown if possible"Linus Torvalds
This reverts commit 45e34c8af58f23db4474e2bfe79183efec09a18b, and the two subsequent fixes to it: 3f874c9b2aae ("x86/smp: Don't send INIT to non-present and non-booted CPUs") b1472a60a584 ("x86/smp: Don't send INIT to boot CPU") because it seems to result in hung machines at shutdown. Particularly some Dell machines, but Thomas says "The rest seems to be Lenovo and Sony with Alderlake/Raptorlake CPUs - at least that's what I could figure out from the various bug reports. I don't know which CPUs the DELL machines have, so I can't say it's a pattern. I agree with the revert for now" Ashok Raj chimes in: "There was a report (probably this same one), and it turns out it was a bug in the BIOS SMI handler. The client BIOS's were waiting for the lowest APICID to be the SMI rendevous master. If this is MeteorLake, the BSP wasn't the one with the lowest APIC and it triped here. The BIOS change is also being pushed to others for assimilation :) Server BIOS's had this correctly for a while now" and it does look likely to be some bad interaction between SMI and the non-BSP cores having put into INIT (and thus unresponsive until reset). Link: https://bbs.archlinux.org/viewtopic.php?pid=2124429 Link: https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/ Link: https://forum.artixlinux.org/index.php/topic,5997.0.html Link: https://bugzilla.redhat.com/show_bug.cgi?id=2241279 Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-10-15Merge tag 'smp-urgent-2023-10-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug fix from Ingo Molnar: "Fix a Longsoon build warning by harmonizing the arch_[un]register_cpu() prototypes between architectures" * tag 'smp-urgent-2023-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu-hotplug: Provide prototypes for arch CPU registration
2023-10-15Merge tag 'kvm-x86-pmu-6.6-fixes' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86/pmu fixes for 6.6: - Truncate writes to PMU counters to the counter's width to avoid spurious overflows when emulating counter events in software. - Set the LVTPC entry mask bit when handling a PMI (to match Intel-defined architectural behavior). - Treat KVM_REQ_PMI as a wake event instead of queueing host IRQ work to kick the guest out of emulated halt.
2023-10-14Merge tag 'x86-urgent-2023-10-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix a false-positive KASAN warning, fix an AMD erratum on Zen4 CPUs, and fix kernel-doc build warnings" * tag 'x86-urgent-2023-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Disable KASAN in apply_alternatives() x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs x86/resctrl: Fix kernel-doc warnings
2023-10-14Merge tag 'perf-urgent-2023-10-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event fix from Ingo Molnar: "Fix an LBR sampling bug" * tag 'perf-urgent-2023-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/lbr: Filter vsyscall addresses
2023-10-13x86/entry/32: Clean up syscall fast exit testsBrian Gerst
Merge compat and native code and clarify comments. No change in functionality expected. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20231011224351.130935-4-brgerst@gmail.com
2023-10-13x86/entry/64: Use TASK_SIZE_MAX for canonical RIP testBrian Gerst
Using shifts to determine if an address is canonical is difficult for the compiler to optimize when the virtual address width is variable (LA57 feature) without using inline assembly. Instead, compare RIP against TASK_SIZE_MAX. The only user executable address outside of that range is the deprecated vsyscall page, which can fall back to using IRET. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20231011224351.130935-3-brgerst@gmail.com
2023-10-13x86/entry/64: Convert SYSRET validation tests to CBrian Gerst
No change in functionality expected. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20231011224351.130935-2-brgerst@gmail.com