summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2023-10-28Merge tag 'x86-urgent-2023-10-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix a possible CPU hotplug deadlock bug caused by the new TSC synchronization code - Fix a legacy PIC discovery bug that results in device troubles on affected systems, such as non-working keybards, etc - Add a new Intel CPU model number to <asm/intel-family.h> * tag 'x86-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Defer marking TSC unstable to a worker x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility x86/cpu: Add model number for Intel Arrow Lake mobile processor
2023-10-27x86/tsc: Defer marking TSC unstable to a workerThomas Gleixner
Tetsuo reported the following lockdep splat when the TSC synchronization fails during CPU hotplug: tsc: Marking TSC unstable due to check_tsc_sync_source failed WARNING: inconsistent lock state inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. ffffffff8cfa1c78 (watchdog_lock){?.-.}-{2:2}, at: clocksource_watchdog+0x23/0x5a0 {IN-HARDIRQ-W} state was registered at: _raw_spin_lock_irqsave+0x3f/0x60 clocksource_mark_unstable+0x1b/0x90 mark_tsc_unstable+0x41/0x50 check_tsc_sync_source+0x14f/0x180 sysvec_call_function_single+0x69/0x90 Possible unsafe locking scenario: lock(watchdog_lock); <Interrupt> lock(watchdog_lock); stack backtrace: _raw_spin_lock+0x30/0x40 clocksource_watchdog+0x23/0x5a0 run_timer_softirq+0x2a/0x50 sysvec_apic_timer_interrupt+0x6e/0x90 The reason is the recent conversion of the TSC synchronization function during CPU hotplug on the control CPU to a SMP function call. In case that the synchronization with the upcoming CPU fails, the TSC has to be marked unstable via clocksource_mark_unstable(). clocksource_mark_unstable() acquires 'watchdog_lock', but that lock is taken with interrupts enabled in the watchdog timer callback to minimize interrupt disabled time. That's obviously a possible deadlock scenario, Before that change the synchronization function was invoked in thread context so this could not happen. As it is not crucical whether the unstable marking happens slightly delayed, defer the call to a worker thread which avoids the lock context problem. Fixes: 9d349d47f0e3 ("x86/smpboot: Make TSC synchronization function call based") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87zg064ceg.ffs@tglx
2023-10-27x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibilityThomas Gleixner
David and a few others reported that on certain newer systems some legacy interrupts fail to work correctly. Debugging revealed that the BIOS of these systems leaves the legacy PIC in uninitialized state which makes the PIC detection fail and the kernel switches to a dummy implementation. Unfortunately this fallback causes quite some code to fail as it depends on checks for the number of legacy PIC interrupts or the availability of the real PIC. In theory there is no reason to use the PIC on any modern system when IO/APIC is available, but the dependencies on the related checks cannot be resolved trivially and on short notice. This needs lots of analysis and rework. The PIC detection has been added to avoid quirky checks and force selection of the dummy implementation all over the place, especially in VM guest scenarios. So it's not an option to revert the relevant commit as that would break a lot of other scenarios. One solution would be to try to initialize the PIC on detection fail and retry the detection, but that puts the burden on everything which does not have a PIC. Fortunately the ACPI/MADT table header has a flag field, which advertises in bit 0 that the system is PCAT compatible, which means it has a legacy 8259 PIC. Evaluate that bit and if set avoid the detection routine and keep the real PIC installed, which then gets initialized (for nothing) and makes the rest of the code with all the dependencies work again. Fixes: e179f6914152 ("x86, irq, pic: Probe for legacy PIC and set legacy_pic appropriately") Reported-by: David Lazar <dlazar@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: David Lazar <dlazar@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218003 Link: https://lore.kernel.org/r/875y2u5s8g.ffs@tglx
2023-10-19Merge tag 'sev_fixes_for_v6.6' of ↵Linus Torvalds
//git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "Take care of a race between when the #VC exception is raised and when the guest kernel gets to emulate certain instructions in SEV-{ES,SNP} guests by: - disabling emulation of MMIO instructions when coming from user mode - checking the IO permission bitmap before emulating IO instructions and verifying the memory operands of INS/OUTS insns" * tag 'sev_fixes_for_v6.6' of //git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Check for user-space IOIO pointing to kernel space x86/sev: Check IOBM for IOIO exceptions from user-space x86/sev: Disable MMIO emulation from user mode
2023-10-17x86/sev: Check for user-space IOIO pointing to kernel spaceJoerg Roedel
Check the memory operand of INS/OUTS before emulating the instruction. The #VC exception can get raised from user-space, but the memory operand can be manipulated to access kernel memory before the emulation actually begins and after the exception handler has run. [ bp: Massage commit message. ] Fixes: 597cfe48212a ("x86/boot/compressed/64: Setup a GHCB-based VC Exception handler") Reported-by: Tom Dohrmann <erbse.13@gmx.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org>
2023-10-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the handling of the phycal timer offset when FEAT_ECV and CNTPOFF_EL2 are implemented - Restore the functionnality of Permission Indirection that was broken by the Fine Grained Trapping rework - Cleanup some PMU event sharing code MIPS: - Fix W=1 build s390: - One small fix for gisa to avoid stalls x86: - Truncate writes to PMU counters to the counter's width to avoid spurious overflows when emulating counter events in software - Set the LVTPC entry mask bit when handling a PMI (to match Intel-defined architectural behavior) - Treat KVM_REQ_PMI as a wake event instead of queueing host IRQ work to kick the guest out of emulated halt - Fix for loading XSAVE state from an old kernel into a new one - Fixes for AMD AVIC selftests: - Play nice with %llx when formatting guest printf and assert statements - Clean up stale test metadata - Zero-initialize structures in memslot perf test to workaround a suspected 'may be used uninitialized' false positives from GCC" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: arm64: timers: Correctly handle TGE flip with CNTPOFF_EL2 KVM: arm64: POR{E0}_EL1 do not need trap handlers KVM: arm64: Add nPIR{E0}_EL1 to HFG traps KVM: MIPS: fix -Wunused-but-set-variable warning KVM: arm64: pmu: Drop redundant check for non-NULL kvm_pmu_events KVM: SVM: Fix build error when using -Werror=unused-but-set-variable x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested() x86: KVM: SVM: add support for Invalid IPI Vector interception x86: KVM: SVM: always update the x2avic msr interception KVM: selftests: Force load all supported XSAVE state in state test KVM: selftests: Load XSAVE state into untouched vCPU during state test KVM: selftests: Touch relevant XSAVE state in guest for state test KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2} x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer KVM: selftests: Zero-initialize entire test_result in memslot perf test KVM: selftests: Remove obsolete and incorrect test case metadata KVM: selftests: Treat %llx like %lx when formatting guest printf KVM: x86/pmu: Synthesize at most one PMI per VM-exit KVM: x86: Mask LVTPC when handling a PMI KVM: x86/pmu: Truncate counter value to allowed width on write ...
2023-10-15Revert "x86/smp: Put CPUs into INIT on shutdown if possible"Linus Torvalds
This reverts commit 45e34c8af58f23db4474e2bfe79183efec09a18b, and the two subsequent fixes to it: 3f874c9b2aae ("x86/smp: Don't send INIT to non-present and non-booted CPUs") b1472a60a584 ("x86/smp: Don't send INIT to boot CPU") because it seems to result in hung machines at shutdown. Particularly some Dell machines, but Thomas says "The rest seems to be Lenovo and Sony with Alderlake/Raptorlake CPUs - at least that's what I could figure out from the various bug reports. I don't know which CPUs the DELL machines have, so I can't say it's a pattern. I agree with the revert for now" Ashok Raj chimes in: "There was a report (probably this same one), and it turns out it was a bug in the BIOS SMI handler. The client BIOS's were waiting for the lowest APICID to be the SMI rendevous master. If this is MeteorLake, the BSP wasn't the one with the lowest APIC and it triped here. The BIOS change is also being pushed to others for assimilation :) Server BIOS's had this correctly for a while now" and it does look likely to be some bad interaction between SMI and the non-BSP cores having put into INIT (and thus unresponsive until reset). Link: https://bbs.archlinux.org/viewtopic.php?pid=2124429 Link: https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/ Link: https://forum.artixlinux.org/index.php/topic,5997.0.html Link: https://bugzilla.redhat.com/show_bug.cgi?id=2241279 Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-10-15Merge tag 'smp-urgent-2023-10-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug fix from Ingo Molnar: "Fix a Longsoon build warning by harmonizing the arch_[un]register_cpu() prototypes between architectures" * tag 'smp-urgent-2023-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu-hotplug: Provide prototypes for arch CPU registration
2023-10-12x86/alternatives: Disable KASAN in apply_alternatives()Kirill A. Shutemov
Fei has reported that KASAN triggers during apply_alternatives() on a 5-level paging machine: BUG: KASAN: out-of-bounds in rcu_is_watching() Read of size 4 at addr ff110003ee6419a0 by task swapper/0/0 ... __asan_load4() rcu_is_watching() trace_hardirqs_on() text_poke_early() apply_alternatives() ... On machines with 5-level paging, cpu_feature_enabled(X86_FEATURE_LA57) gets patched. It includes KASAN code, where KASAN_SHADOW_START depends on __VIRTUAL_MASK_SHIFT, which is defined with cpu_feature_enabled(). KASAN gets confused when apply_alternatives() patches the KASAN_SHADOW_START users. A test patch that makes KASAN_SHADOW_START static, by replacing __VIRTUAL_MASK_SHIFT with 56, works around the issue. Fix it for real by disabling KASAN while the kernel is patching alternatives. [ mingo: updated the changelog ] Fixes: 6657fca06e3f ("x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y") Reported-by: Fei Yang <fei.yang@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231012100424.1456-1-kirill.shutemov@linux.intel.com
2023-10-12KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}Sean Christopherson
Mask off xfeatures that aren't exposed to the guest only when saving guest state via KVM_GET_XSAVE{2} instead of modifying user_xfeatures directly. Preserving the maximal set of xfeatures in user_xfeatures restores KVM's ABI for KVM_SET_XSAVE, which prior to commit ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0") allowed userspace to load xfeatures that are supported by the host, irrespective of what xfeatures are exposed to the guest. There is no known use case where userspace *intentionally* loads xfeatures that aren't exposed to the guest, but the bug fixed by commit ad856280ddea was specifically that KVM_GET_SAVE{2} would save xfeatures that weren't exposed to the guest, e.g. would lead to userspace unintentionally loading guest-unsupported xfeatures when live migrating a VM. Restricting KVM_SET_XSAVE to guest-supported xfeatures is especially problematic for QEMU-based setups, as QEMU has a bug where instead of terminating the VM if KVM_SET_XSAVE fails, QEMU instead simply stops loading guest state, i.e. resumes the guest after live migration with incomplete guest state, and ultimately results in guest data corruption. Note, letting userspace restore all host-supported xfeatures does not fix setups where a VM is migrated from a host *without* commit ad856280ddea, to a target with a subset of host-supported xfeatures. However there is no way to safely address that scenario, e.g. KVM could silently drop the unsupported features, but that would be a clear violation of KVM's ABI and so would require userspace to opt-in, at which point userspace could simply be updated to sanitize the to-be-loaded XSAVE state. Reported-by: Tyler Stachecki <stachecki.tyler@gmail.com> Closes: https://lore.kernel.org/all/20230914010003.358162-1-tstachecki@bloomberg.net Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0") Cc: stable@vger.kernel.org Cc: Leonardo Bras <leobras@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Message-Id: <20230928001956.924301-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12x86/fpu: Allow caller to constrain xfeatures when copying to uabi bufferSean Christopherson
Plumb an xfeatures mask into __copy_xstate_to_uabi_buf() so that KVM can constrain which xfeatures are saved into the userspace buffer without having to modify the user_xfeatures field in KVM's guest_fpu state. KVM's ABI for KVM_GET_XSAVE{2} is that features that are not exposed to guest must not show up in the effective xstate_bv field of the buffer. Saving only the guest-supported xfeatures allows userspace to load the saved state on a different host with a fewer xfeatures, so long as the target host supports the xfeatures that are exposed to the guest. KVM currently sets user_xfeatures directly to restrict KVM_GET_XSAVE{2} to the set of guest-supported xfeatures, but doing so broke KVM's historical ABI for KVM_SET_XSAVE, which allows userspace to load any xfeatures that are supported by the *host*. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230928001956.924301-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-11cpu-hotplug: Provide prototypes for arch CPU registrationRussell King (Oracle)
Provide common prototypes for arch_register_cpu() and arch_unregister_cpu(). These are called by acpi_processor.c, with weak versions, so the prototype for this is already set. It is generally not necessary for function prototypes to be conditional on preprocessor macros. Some architectures (e.g. Loongarch) are missing the prototype for this, and rather than add it to Loongarch's asm/cpu.h, do the job once for everyone. Since this covers everyone, remove the now unnecessary prototypes in asm/cpu.h, and therefore remove the 'static' from one of ia64's arch_register_cpu() definitions. [ tglx: Bring back the ia64 part and remove the ACPI prototypes ] Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/E1qkoRr-0088Q8-Da@rmk-PC.armlinux.org.uk
2023-10-11x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUsBorislav Petkov (AMD)
Fix erratum #1485 on Zen4 parts where running with STIBP disabled can cause an #UD exception. The performance impact of the fix is negligible. Reported-by: René Rebe <rene@exactcode.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: René Rebe <rene@exactcode.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/D99589F4-BC5D-430B-87B2-72C20370CF57@exactcode.com
2023-10-09x86/sev: Check IOBM for IOIO exceptions from user-spaceJoerg Roedel
Check the IO permission bitmap (if present) before emulating IOIO #VC exceptions for user-space. These permissions are checked by hardware already before the #VC is raised, but due to the VC-handler decoding race it needs to be checked again in software. Fixes: 25189d08e516 ("x86/sev-es: Add support for handling IOIO exceptions") Reported-by: Tom Dohrmann <erbse.13@gmx.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Dohrmann <erbse.13@gmx.de> Cc: <stable@kernel.org>
2023-10-09x86/sev: Disable MMIO emulation from user modeBorislav Petkov (AMD)
A virt scenario can be constructed where MMIO memory can be user memory. When that happens, a race condition opens between when the hardware raises the #VC and when the #VC handler gets to emulate the instruction. If the MOVS is replaced with a MOVS accessing kernel memory in that small race window, then write to kernel memory happens as the access checks are not done at emulation time. Disable MMIO emulation in user mode temporarily until a sensible use case appears and justifies properly handling the race window. Fixes: 0118b604c2c9 ("x86/sev-es: Handle MMIO String Instructions") Reported-by: Tom Dohrmann <erbse.13@gmx.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Dohrmann <erbse.13@gmx.de> Cc: <stable@kernel.org>
2023-10-08x86/resctrl: Fix kernel-doc warningsRandy Dunlap
The kernel test robot reported kernel-doc warnings here: monitor.c:34: warning: Cannot understand * @rmid_free_lru A least recently used list of free RMIDs on line 34 - I thought it was a doc line monitor.c:41: warning: Cannot understand * @rmid_limbo_count count of currently unused but (potentially) on line 41 - I thought it was a doc line monitor.c:50: warning: Cannot understand * @rmid_entry - The entry in the limbo and free lists. on line 50 - I thought it was a doc line We don't have a syntax for documenting individual data items via kernel-doc, so remove the "/**" kernel-doc markers and add a hyphen for consistency. Fixes: 6a445edce657 ("x86/intel_rdt/cqm: Add RDT monitoring initialization") Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231006235132.16227-1-rdunlap@infradead.org
2023-10-02x86/sev: Change npages to unsigned long in snp_accept_memory()Tom Lendacky
In snp_accept_memory(), the npages variables value is calculated from phys_addr_t variables but is an unsigned int. A very large range passed into snp_accept_memory() could lead to truncating npages to zero. This doesn't happen at the moment but let's be prepared. Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/6d511c25576494f682063c9fb6c705b526a3757e.1687441505.git.thomas.lendacky@amd.com
2023-10-02x86/sev: Use the GHCB protocol when available for SNP CPUID requestsTom Lendacky
SNP retrieves the majority of CPUID information from the SNP CPUID page. But there are times when that information needs to be supplemented by the hypervisor, for example, obtaining the initial APIC ID of the vCPU from leaf 1. The current implementation uses the MSR protocol to retrieve the data from the hypervisor, even when a GHCB exists. The problem arises when an NMI arrives on return from the VMGEXIT. The NMI will be immediately serviced and may generate a #VC requiring communication with the hypervisor. Since a GHCB exists in this case, it will be used. As part of using the GHCB, the #VC handler will write the GHCB physical address into the GHCB MSR and the #VC will be handled. When the NMI completes, processing resumes at the site of the VMGEXIT which is expecting to read the GHCB MSR and find a CPUID MSR protocol response. Since the NMI handling overwrote the GHCB MSR response, the guest will see an invalid reply from the hypervisor and self-terminate. Fix this problem by using the GHCB when it is available. Any NMI received is properly handled because the GHCB contents are copied into a backup page and restored on NMI exit, thus preserving the active GHCB request or result. [ bp: Touchups. ] Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/a5856fa1ebe3879de91a8f6298b6bbd901c61881.1690578565.git.thomas.lendacky@amd.com
2023-09-28x86/sgx: Resolves SECS reclaim vs. page fault for EAUG raceHaitao Huang
The SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC page for an enclave and set secs.epc_page to NULL. The SECS page is used for EAUG and ELDU in the SGX page fault handler. However, the NULL check for secs.epc_page is only done for ELDU, not EAUG before being used. Fix this by doing the same NULL check and reloading of the SECS page as needed for both EAUG and ELDU. The SECS page holds global enclave metadata. It can only be reclaimed when there are no other enclave pages remaining. At that point, virtually nothing can be done with the enclave until the SECS page is paged back in. An enclave can not run nor generate page faults without a resident SECS page. But it is still possible for a #PF for a non-SECS page to race with paging out the SECS page: when the last resident non-SECS page A triggers a #PF in a non-resident page B, and then page A and the SECS both are paged out before the #PF on B is handled. Hitting this bug requires that race triggered with a #PF for EAUG. Following is a trace when it happens. BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:sgx_encl_eaug_page+0xc7/0x210 Call Trace: ? __kmem_cache_alloc_node+0x16a/0x440 ? xa_load+0x6e/0xa0 sgx_vma_fault+0x119/0x230 __do_fault+0x36/0x140 do_fault+0x12f/0x400 __handle_mm_fault+0x728/0x1110 handle_mm_fault+0x105/0x310 do_user_addr_fault+0x1ee/0x750 ? __this_cpu_preempt_check+0x13/0x20 exc_page_fault+0x76/0x180 asm_exc_page_fault+0x27/0x30 Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave") Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20230728051024.33063-1-haitao.huang%40linux.intel.com
2023-09-28x86/srso: Add SRSO mitigation for Hygon processorsPu Wen
Add mitigation for the speculative return stack overflow vulnerability which exists on Hygon processors too. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/tencent_4A14812842F104E93AA722EC939483CEFF05@qq.com
2023-09-24x86/kgdb: Fix a kerneldoc warning when build with W=1Christophe JAILLET
When compiled with W=1, the following warning is generated: arch/x86/kernel/kgdb.c:698: warning: Cannot understand * on line 698 - I thought it was a doc line Remove the corresponding empty comment line to fix the warning. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/aad659537c1d4ebd86912a6f0be458676c8e69af.1695401178.git.christophe.jaillet@wanadoo.fr
2023-09-22Merge tag 'x86_urgent_for_v6.6-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 rethunk fixes from Borislav Petkov: "Fix the patching ordering between static calls and return thunks" * tag 'x86_urgent_for_v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86,static_call: Fix static-call vs return-thunk x86/alternatives: Remove faulty optimization
2023-09-22Merge tag 'x86-urgent-2023-09-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix a kexec bug - Fix an UML build bug - Fix a handful of SRSO related bugs - Fix a shadow stacks handling bug & robustify related code * tag 'x86-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/shstk: Add warning for shadow stack double unmap x86/shstk: Remove useless clone error handling x86/shstk: Handle vfork clone failure correctly x86/srso: Fix SBPB enablement for spec_rstack_overflow=off x86/srso: Don't probe microcode in a guest x86/srso: Set CPUID feature bits independently of bug or mitigation status x86/srso: Fix srso_show_state() side effect x86/asm: Fix build of UML with KASAN x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer()
2023-09-22x86,static_call: Fix static-call vs return-thunkPeter Zijlstra
Commit 7825451fa4dc ("static_call: Add call depth tracking support") failed to realize the problem fixed there is not specific to call depth tracking but applies to all return-thunk uses. Move the fix to the appropriate place and condition. Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding") Reported-by: David Kaplan <David.Kaplan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org>
2023-09-22x86/alternatives: Remove faulty optimizationJosh Poimboeuf
The following commit 095b8303f383 ("x86/alternative: Make custom return thunk unconditional") made '__x86_return_thunk' a placeholder value. All code setting X86_FEATURE_RETHUNK also changes the value of 'x86_return_thunk'. So the optimization at the beginning of apply_returns() is dead code. Also, before the above-mentioned commit, the optimization actually had a bug It bypassed __static_call_fixup(), causing some raw returns to remain unpatched in static call trampolines. Thus the 'Fixes' tag. Fixes: d2408e043e72 ("x86/alternative: Optimize returns patching") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/16d19d2249d4485d8380fb215ffaae81e6b8119e.1693889988.git.jpoimboe@kernel.org
2023-09-19x86/shstk: Add warning for shadow stack double unmapRick Edgecombe
There are several ways a thread's shadow stacks can get unmapped. This can happen on exit or exec, as well as error handling in exec or clone. The task struct already keeps track of the thread's shadow stack. Use the size variable to keep track of if the shadow stack has already been freed. When an attempt to double unmap the thread shadow stack is caught, warn about it and abort the operation. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: H.J. Lu <hjl.tools@gmail.com> Link: https://lore.kernel.org/all/20230908203655.543765-4-rick.p.edgecombe%40intel.com
2023-09-19x86/shstk: Remove useless clone error handlingRick Edgecombe
When clone fails after the shadow stack is allocated, any allocated shadow stack is cleaned up in exit_thread() in copy_process(). So the logic in copy_thread() is unneeded, and also will not handle failures that happen outside of copy_thread(). In addition, since there is a second attempt to unmap the same shadow stack, there is a race where an newly mapped region could get unmapped. So remove the logic in copy_thread() and rely on exit_thread() to handle clone failure. Fixes: b2926a36b97a ("x86/shstk: Handle thread shadow stack") Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: H.J. Lu <hjl.tools@gmail.com> Link: https://lore.kernel.org/all/20230908203655.543765-3-rick.p.edgecombe%40intel.com
2023-09-19x86/shstk: Handle vfork clone failure correctlyRick Edgecombe
Shadow stacks are allocated automatically and freed on exit, depending on the clone flags. The two cases where new shadow stacks are not allocated are !CLONE_VM (fork()) and CLONE_VFORK (vfork()). For !CLONE_VM, although a new stack is not allocated, it can be freed normally because it will happen in the child's copy of the VM. However, for CLONE_VFORK the parent and the child are actually using the same shadow stack. So the kernel doesn't need to allocate *or* free a shadow stack for a CLONE_VFORK child. CLONE_VFORK children already need special tracking to avoid returning to userspace until the child exits or execs. Shadow stack uses this same tracking to avoid freeing CLONE_VFORK shadow stacks. However, the tracking is not setup until the clone has succeeded (internally). Which means, if a CLONE_VFORK fails, the existing logic will not know it is a CLONE_VFORK and proceed to unmap the parents shadow stack. This error handling cleanup logic runs via exit_thread() in the bad_fork_cleanup_thread label in copy_process(). The issue was seen in the glibc test "posix/tst-spawn3-pidfd" while running with shadow stack using currently out-of-tree glibc patches. Fix it by not unmapping the vfork shadow stack in the error case as well. Since clone is implemented in core code, it is not ideal to pass the clone flags along the error path in order to have shadow stack code have symmetric logic in the freeing half of the thread shadow stack handling. Instead use the existing state for thread shadow stacks to track whether the thread is managing its own shadow stack. For CLONE_VFORK, simply set shstk->base and shstk->size to 0, and have it mean the thread is not managing a shadow stack and so should skip cleanup work. Implement this by breaking up the CLONE_VFORK and !CLONE_VM cases in shstk_alloc_thread_stack() to separate conditionals since, the logic is now different between them. In the case of CLONE_VFORK && !CLONE_VM, the existing behavior is to not clean up the shadow stack in the child (which should go away quickly with either be exit or exec), so maintain that behavior by handling the CLONE_VFORK case first in the allocation path. This new logioc cleanly handles the case of normal, successful CLONE_VFORK's skipping cleaning up their shadow stack's on exit as well. So remove the existing, vfork shadow stack freeing logic. This is in deactivate_mm() where vfork_done is used to tell if it is a vfork child that can skip cleaning up the thread shadow stack. Fixes: b2926a36b97a ("x86/shstk: Handle thread shadow stack") Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: H.J. Lu <hjl.tools@gmail.com> Link: https://lore.kernel.org/all/20230908203655.543765-2-rick.p.edgecombe%40intel.com
2023-09-19x86/srso: Fix SBPB enablement for spec_rstack_overflow=offJosh Poimboeuf
If the user has requested no SRSO mitigation, other mitigations can use the lighter-weight SBPB instead of IBPB. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/b20820c3cfd1003171135ec8d762a0b957348497.1693889988.git.jpoimboe@kernel.org
2023-09-19x86/srso: Don't probe microcode in a guestJosh Poimboeuf
To support live migration, the hypervisor sets the "lowest common denominator" of features. Probing the microcode isn't allowed because any detected features might go away after a migration. As Andy Cooper states: "Linux must not probe microcode when virtualised.  What it may see instantaneously on boot (owing to MSR_PRED_CMD being fully passed through) is not accurate for the lifetime of the VM." Rely on the hypervisor to set the needed IBPB_BRTYPE and SBPB bits. Fixes: 1b5277c0ea0b ("x86/srso: Add SRSO_NO support") Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/3938a7209606c045a3f50305d201d840e8c834c7.1693889988.git.jpoimboe@kernel.org
2023-09-19x86/srso: Set CPUID feature bits independently of bug or mitigation statusJosh Poimboeuf
Booting with mitigations=off incorrectly prevents the X86_FEATURE_{IBPB_BRTYPE,SBPB} CPUID bits from getting set. Also, future CPUs without X86_BUG_SRSO might still have IBPB with branch type prediction flushing, in which case SBPB should be used instead of IBPB. The current code doesn't allow for that. Also, cpu_has_ibpb_brtype_microcode() has some surprising side effects and the setting of these feature bits really doesn't belong in the mitigation code anyway. Move it to earlier. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/869a1709abfe13b673bdd10c2f4332ca253a40bc.1693889988.git.jpoimboe@kernel.org
2023-09-19x86/srso: Fix srso_show_state() side effectJosh Poimboeuf
Reading the 'spec_rstack_overflow' sysfs file can trigger an unnecessary MSR write, and possibly even a (handled) exception if the microcode hasn't been updated. Avoid all that by just checking X86_FEATURE_IBPB_BRTYPE instead, which gets set by srso_select_mitigation() if the updated microcode exists. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/27d128899cb8aee9eb2b57ddc996742b0c1d776b.1693889988.git.jpoimboe@kernel.org
2023-09-19x86/xen: move paravirt lazy codeJuergen Gross
Only Xen is using the paravirt lazy mode code, so it can be moved to Xen specific sources. This allows to make some of the functions static or to merge them into their only call sites. While at it do a rename from "paravirt" to "xen" for all moved specifiers. No functional change. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20230913113828.18421-3-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-09-18x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer()Rik van Riel
The code calling ima_free_kexec_buffer() runs long after the memblock allocator has already been torn down, potentially resulting in a use after free in memblock_isolate_range(). With KASAN or KFENCE, this use after free will result in a BUG from the idle task, and a subsequent kernel panic. Switch ima_free_kexec_buffer() over to memblock_free_late() to avoid that bug. Fixes: fee3ff99bc67 ("powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c") Suggested-by: Mike Rappoport <rppt@kernel.org> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230817135558.67274c83@imladris.surriel.com
2023-09-17Merge tag 'x86-urgent-2023-09-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - Fix an UV boot crash - Skip spurious ENDBR generation on _THIS_IP_ - Fix ENDBR use in putuser() asm methods - Fix corner case boot crashes on 5-level paging - and fix a false positive WARNING on LTO kernels" * tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/purgatory: Remove LTO flags x86/boot/compressed: Reserve more memory for page tables x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*() x86/ibt: Suppress spurious ENDBR x86/platform/uv: Use alternate source for socket to node data
2023-09-17Merge tag 'sched-urgent-2023-09-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Fix a performance regression on large SMT systems, an Intel SMT4 balancing bug, and a topology setup bug on (Intel) hybrid processors" * tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain sched/fair: Fix SMT4 group_smt_balance handling sched/fair: Optimize should_we_balance() for large SMT systems
2023-09-13x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domainRicardo Neri
Commit 8f2d6c41e5a6 ("x86/sched: Rewrite topology setup") dropped the SD_ASYM_PACKING flag in the DIE domain added in commit 044f0e27dec6 ("x86/sched: Add the SD_ASYM_PACKING flag to the die domain of hybrid processors"). Restore it on hybrid processors. The die-level domain does not depend on any build configuration and now x86_sched_itmt_flags() is always needed. Remove the build dependency on CONFIG_SCHED_[SMT|CLUSTER|MC]. Fixes: 8f2d6c41e5a6 ("x86/sched: Rewrite topology setup") Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Caleb Callaway <caleb.callaway@intel.com> Link: https://lkml.kernel.org/r/20230815035747.11529-1-ricardo.neri-calderon@linux.intel.com
2023-09-11x86/platform/uv: Use alternate source for socket to node dataSteve Wahl
The UV code attempts to build a set of tables to allow it to do bidirectional socket<=>node lookups. But when nr_cpus is set to a smaller number than actually present, the cpu_to_node() mapping information for unused CPUs is not available to build_socket_tables(). This results in skipping some nodes or sockets when creating the tables and leaving some -1's for later code to trip. over, causing oopses. The problem is that the socket<=>node lookups are created by doing a loop over all CPUs, then looking up the CPU's APICID and socket. But if a CPU is not present, there is no way to start this lookup. Instead of looping over all CPUs, take CPUs out of the equation entirely. Loop over all APICIDs which are mapped to a valid NUMA node. Then just extract the socket-id from the APICID. This avoid tripping over disabled CPUs. Fixes: 8a50c5851927 ("x86/platform/uv: UV support for sub-NUMA clustering") Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20230807141730.1117278-1-steve.wahl%40hpe.com
2023-09-10Merge tag 'x86-urgent-2023-09-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix preemption delays in the SGX code, remove unnecessarily UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and make the x86 SMP init code a bit more conservative to fix kexec() lockups" * tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Break up long non-preemptible delays in sgx_vepc_release() x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld x86/smp: Don't send INIT to non-present and non-booted CPUs
2023-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Clean up vCPU targets, always returning generic v8 as the preferred target - Trap forwarding infrastructure for nested virtualization (used for traps that are taken from an L2 guest and are needed by the L1 hypervisor) - FEAT_TLBIRANGE support to only invalidate specific ranges of addresses when collapsing a table PTE to a block PTE. This avoids that the guest refills the TLBs again for addresses that aren't covered by the table PTE. - Fix vPMU issues related to handling of PMUver. - Don't unnecessary align non-stack allocations in the EL2 VA space - Drop HCR_VIRT_EXCP_MASK, which was never used... - Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu parameter instead - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort() - Remove prototypes without implementations RISC-V: - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest - Added ONE_REG interface for SATP mode - Added ONE_REG interface to enable/disable multiple ISA extensions - Improved error codes returned by ONE_REG interfaces - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V - Added get-reg-list selftest for KVM RISC-V s390: - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch) Allows a PV guest to use crypto cards. Card access is governed by the firmware and once a crypto queue is "bound" to a PV VM every other entity (PV or not) looses access until it is not bound anymore. Enablement is done via flags when creating the PV VM. - Guest debug fixes (Ilya) x86: - Clean up KVM's handling of Intel architectural events - Intel bugfixes - Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug registers and generate/handle #DBs - Clean up LBR virtualization code - Fix a bug where KVM fails to set the target pCPU during an IRTE update - Fix fatal bugs in SEV-ES intrahost migration - Fix a bug where the recent (architecturally correct) change to reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it) - Retry APIC map recalculation if a vCPU is added/enabled - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the "emergency disabling" behavior to KVM actually being loaded, and move all of the logic within KVM - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC ratio MSR cannot diverge from the default when TSC scaling is disabled up related code - Add a framework to allow "caching" feature flags so that KVM can check if the guest can use a feature without needing to search guest CPUID - Rip out the ancient MMU_DEBUG crud and replace the useful bits with CONFIG_KVM_PROVE_MMU - Fix KVM's handling of !visible guest roots to avoid premature triple fault injection - Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface that is needed by external users (currently only KVMGT), and fix a variety of issues in the process Generic: - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass action specific data without needing to constantly update the main handlers. - Drop unused function declarations Selftests: - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs - Add support for printf() in guest code and covert all guest asserts to use printf-based reporting - Clean up the PMU event filter test and add new testcases - Include x86 selftests in the KVM x86 MAINTAINERS entry" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits) KVM: x86/mmu: Include mmu.h in spte.h KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots KVM: x86/mmu: Disallow guest from using !visible slots for page tables KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page KVM: x86/mmu: Harden new PGD against roots without shadow pages KVM: x86/mmu: Add helper to convert root hpa to shadow page drm/i915/gvt: Drop final dependencies on KVM internal details KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers KVM: x86/mmu: Drop @slot param from exported/external page-track APIs KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled KVM: x86/mmu: Assert that correct locks are held for page write-tracking KVM: x86/mmu: Rename page-track APIs to reflect the new reality KVM: x86/mmu: Drop infrastructure for multiple page-track modes KVM: x86/mmu: Use page-track notifiers iff there are external users KVM: x86/mmu: Move KVM-only page-track declarations to internal header KVM: x86: Remove the unused page-track hook track_flush_slot() drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() KVM: x86: Add a new page-track hook to handle memslot deletion drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot KVM: x86: Reject memslot MOVE operations if KVMGT is attached ...
2023-09-06x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()Jack Wang
On large enclaves we hit the softlockup warning with following call trace: xa_erase() sgx_vepc_release() __fput() task_work_run() do_exit() The latency issue is similar to the one fixed in: 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves") The test system has 64GB of enclave memory, and all is assigned to a single VM. Release of 'vepc' takes a longer time and causes long latencies, which triggers the softlockup warning. Add cond_resched() to give other tasks a chance to run and reduce latencies, which also avoids the softlockup detector. [ mingo: Rewrote the changelog. ] Fixes: 540745ddbc70 ("x86/sgx: Introduce virtual EPC for use by KVM guests") Reported-by: Yu Zhang <yu.zhang@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Yu Zhang <yu.zhang@ionos.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Haitao Huang <haitao.huang@linux.intel.com> Cc: stable@vger.kernel.org
2023-09-06x86/build: Fix linker fill bytes quirk/incompatibility for ld.lldSong Liu
With ":text =0xcccc", ld.lld fills unused text area with 0xcccc0000. Example objdump -D output: ffffffff82b04203: 00 00 add %al,(%rax) ffffffff82b04205: cc int3 ffffffff82b04206: cc int3 ffffffff82b04207: 00 00 add %al,(%rax) ffffffff82b04209: cc int3 ffffffff82b0420a: cc int3 Replace it with ":text =0xcccccccc", so we get the following instead: ffffffff82b04203: cc int3 ffffffff82b04204: cc int3 ffffffff82b04205: cc int3 ffffffff82b04206: cc int3 ffffffff82b04207: cc int3 ffffffff82b04208: cc int3 gcc/ld doesn't seem to have the same issue. The generated code stays the same for gcc/ld. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Fixes: 7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes") Link: https://lore.kernel.org/r/20230906175215.2236033-1-song@kernel.org
2023-09-04Merge tag 'hyperv-next-signed-20230902' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Support for SEV-SNP guests on Hyper-V (Tianyu Lan) - Support for TDX guests on Hyper-V (Dexuan Cui) - Use SBRM API in Hyper-V balloon driver (Mitchell Levy) - Avoid dereferencing ACPI root object handle in VMBus driver (Maciej Szmigiero) - A few misecllaneous fixes (Jiapeng Chong, Nathan Chancellor, Saurabh Sengar) * tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits) x86/hyperv: Remove duplicate include x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef's x86/hyperv: Remove hv_isolation_type_en_snp x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor x86/hyperv: Introduce a global variable hyperv_paravisor_present Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests Drivers: hv: vmbus: Support fully enlightened TDX guests x86/hyperv: Support hypercalls for fully enlightened TDX guests x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests x86/hyperv: Fix undefined reference to isolation_type_en_snp without CONFIG_HYPERV x86/hyperv: Add missing 'inline' to hv_snp_boot_ap() stub hv: hyperv.h: Replace one-element array with flexible-array member Drivers: hv: vmbus: Don't dereference ACPI root object handle x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES x86/hyperv: Add smp support for SEV-SNP guest clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP enlightened guest ...
2023-09-04x86/smp: Don't send INIT to non-present and non-booted CPUsThomas Gleixner
Vasant reported that kexec() can hang or reset the machine when it tries to park CPUs via INIT. This happens when the kernel is using extended APIC, but the present mask has APIC IDs >= 0x100 enumerated. As extended APIC can only handle 8 bit of APIC ID sending INIT to APIC ID 0x100 sends INIT to APIC ID 0x0. That's the boot CPU which is special on x86 and INIT causes the system to hang or resets the machine. Prevent this by sending INIT only to those CPUs which have been booted once. Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible") Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/87cyzwjbff.ffs@tglx
2023-09-01Merge tag 'x86-urgent-2023-09-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Dave Hansen: "The most important fix here adds a missing CPU model to the recent Gather Data Sampling (GDS) mitigation list to ensure that mitigations are available on that CPU. There are also a pair of warning fixes, and closure of a covert channel that pops up when protection keys are disabled. Summary: - Mark all Skylake CPUs as vulnerable to GDS - Fix PKRU covert channel - Fix -Wmissing-variable-declarations warning for ia32_xyz_class - Fix kernel-doc annotation warning" * tag 'x86-urgent-2023-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu/xstate: Fix PKRU covert channel x86/irq/i8259: Fix kernel-doc annotation warning x86/speculation: Mark all Skylake CPUs as vulnerable to GDS x86/audit: Fix -Wmissing-variable-declarations warning for ia32_xyz_class
2023-09-01Merge tag 'char-misc-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big set of char/misc and other small driver subsystem changes for 6.6-rc1. Stuff all over the place here, lots of driver updates and changes and new additions. Short summary is: - new IIO drivers and updates - Interconnect driver updates - fpga driver updates and additions - fsi driver updates - mei driver updates - coresight driver updates - nvmem driver updates - counter driver updates - lots of smaller misc and char driver updates and additions All of these have been in linux-next for a long time with no reported problems" * tag 'char-misc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (267 commits) nvmem: core: Notify when a new layout is registered nvmem: core: Do not open-code existing functions nvmem: core: Return NULL when no nvmem layout is found nvmem: core: Create all cells before adding the nvmem device nvmem: u-boot-env:: Replace zero-length array with DECLARE_FLEX_ARRAY() helper nvmem: sec-qfprom: Add Qualcomm secure QFPROM support dt-bindings: nvmem: sec-qfprom: Add bindings for secure qfprom dt-bindings: nvmem: Add compatible for QCM2290 nvmem: Kconfig: Fix typo "drive" -> "driver" nvmem: Explicitly include correct DT includes nvmem: add new NXP QorIQ eFuse driver dt-bindings: nvmem: Add t1023-sfp efuse support dt-bindings: nvmem: qfprom: Add compatible for MSM8226 nvmem: uniphier: Use devm_platform_get_and_ioremap_resource() nvmem: qfprom: do some cleanup nvmem: stm32-romem: Use devm_platform_get_and_ioremap_resource() nvmem: rockchip-efuse: Use devm_platform_get_and_ioremap_resource() nvmem: meson-mx-efuse: Convert to devm_platform_ioremap_resource() nvmem: lpc18xx_otp: Convert to devm_platform_ioremap_resource() nvmem: brcm_nvram: Use devm_platform_get_and_ioremap_resource() ...
2023-09-01Merge tag 'driver-core-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of driver core updates and additions for 6.6-rc1. Included in here are: - stable kernel documentation updates - class structure const work from Ivan on various subsystems - kernfs tweaks - driver core tests! - kobject sanity cleanups - kobject structure reordering to save space - driver core error code handling fixups - other minor driver core cleanups All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits) driver core: Call in reversed order in device_platform_notify_remove() driver core: Return proper error code when dev_set_name() fails kobject: Remove redundant checks for whether ktype is NULL kobject: Add sanity check for kset->kobj.ktype in kset_register() drivers: base: test: Add missing MODULE_* macros to root device tests drivers: base: test: Add missing MODULE_* macros for platform devices tests drivers: base: Free devm resources when unregistering a device drivers: base: Add basic devm tests for platform devices drivers: base: Add basic devm tests for root devices kernfs: fix missing kernfs_iattr_rwsem locking docs: stable-kernel-rules: mention that regressions must be prevented docs: stable-kernel-rules: fine-tune various details docs: stable-kernel-rules: make the examples for option 1 a proper list docs: stable-kernel-rules: move text around to improve flow docs: stable-kernel-rules: improve structure by changing headlines base/node: Remove duplicated include kernfs: attach uuid for every kernfs and report it in fsid kernfs: add stub helper for kernfs_generic_poll() x86/resctrl: make pseudo_lock_class a static const structure x86/MSR: make msr_class a static const structure ...
2023-08-31x86/fpu/xstate: Fix PKRU covert channelJim Mattson
When XCR0[9] is set, PKRU can be read and written from userspace with XSAVE and XRSTOR, even when CR4.PKE is clear. Clear XCR0[9] when protection keys are disabled. Reported-by: Tavis Ormandy <taviso@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20230831043228.1194256-1-jmattson@google.com
2023-08-31Merge tag 'x86_shstk_for_6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 shadow stack support from Dave Hansen: "This is the long awaited x86 shadow stack support, part of Intel's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more information, refer to the links below for the earlier versions of this patch set" Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/ Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/ * tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/shstk: Change order of __user in type x86/ibt: Convert IBT selftest to asm x86/shstk: Don't retry vm_munmap() on -EINTR x86/kbuild: Fix Documentation/ reference x86/shstk: Move arch detail comment out of core mm x86/shstk: Add ARCH_SHSTK_STATUS x86/shstk: Add ARCH_SHSTK_UNLOCK x86: Add PTRACE interface for shadow stack selftests/x86: Add shadow stack test x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/shstk: Wire in shadow stack interface x86: Expose thread features in /proc/$PID/status x86/shstk: Support WRSS for userspace x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Check that signal frame is shadow stack mem x86/shstk: Check that SSP is aligned on sigreturn x86/shstk: Handle signals for shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle thread shadow stack x86/shstk: Add user-mode shadow stack support ...
2023-08-31x86/irq/i8259: Fix kernel-doc annotation warningVincenzo Palazzo
Fix this warning: arch/x86/kernel/i8259.c:235: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * ELCR registers (0x4d0, 0x4d1) control edge/level of IRQ CC arch/x86/kernel/irqinit.o Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230830131211.88226-1-vincenzopalazzodev@gmail.com