summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2023-10-24x86/microcode/intel: Simplify early loadingThomas Gleixner
The early loading code is overly complicated: - It scans the builtin/initrd for microcode not only on the BSP, but also on all APs during early boot and then later in the boot process it scans again to duplicate and save the microcode before initrd goes away. That's a pointless exercise because this can be simply done before bringing up the APs when the memory allocator is up and running. - Saving the microcode from within the scan loop is completely non-obvious and a left over of the microcode cache. This can be done at the call site now which makes it obvious. Rework the code so that only the BSP scans the builtin/initrd microcode once during early boot and save it away in an early initcall for later use. [ bp: Test and fold in a fix from tglx ontop which handles the need to distinguish what save_microcode() does depending on when it is called: - when on the BSP during early load, it needs to find a newer revision than the one currently loaded on the BSP - later, before SMP init, it still runs on the BSP and gets the BSP revision just loaded and uses that revision to know which patch to save for the APs. For that it needs to find the exact one as on the BSP. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.629085215@linutronix.de
2023-10-23x86: Enable IBT in Rust if enabled in CMatthew Maurer
These flags are not made conditional on compiler support because at the moment exactly one version of rustc supported, and that one supports these flags. Building without these additional flags will manifest as objtool printing a large number of errors about missing ENDBR and if CFI is enabled (not currently possible) will result in incorrectly structured function prefixes. Signed-off-by: Matthew Maurer <mmaurer@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: https://lore.kernel.org/r/20231009224347.2076221-1-mmaurer@google.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-10-23Merge tag 'v6.6-rc7' into sched/core, to pick up fixesIngo Molnar
Pick up recent sched/urgent fixes merged upstream. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-10-23BackMerge tag 'v6.6-rc7' into drm-nextDave Airlie
This is needed to add the msm pr which is based on a higher base. Signed-off-by: Dave Airlie <airlied@redhat.com>
2023-10-20x86/retpoline: Document some thunk handling aspectsBorislav Petkov (AMD)
After a lot of experimenting (see thread Link points to) document for now the issues and requirements for future improvements to the thunk handling and potential issuing of a diagnostic when the default thunk hasn't been patched out. This documentation is only temporary and that close before the merge window it is only a placeholder for those future improvements. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231010171020.462211-1-david.kaplan@amd.com
2023-10-20x86/callthunks: Delete unused "struct thunk_desc"Alexey Dobriyan
It looks like it was never used. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/843bf596-db67-4b33-a865-2bae4a4418e5@p183
2023-10-20x86/vdso: Run objtool on vdso32-setup.oDavid Kaplan
vdso32-setup.c is part of the main kernel image and should not be excluded from objtool. Objtool is necessary in part for ensuring that returns in this file are correctly patched to the appropriate return thunk at runtime. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20231010171020.462211-3-david.kaplan@amd.com
2023-10-20x86/srso: Remove unnecessary semicolonYang Li
scripts/coccinelle/misc/semicolon.cocci reports: arch/x86/kernel/cpu/bugs.c:713:2-3: Unneeded semicolon Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230810010550.25733-1-yang.lee@linux.alibaba.com
2023-10-20x86/pti: Fix kernel warnings for pti= and nopti cmdline optionsJo Van Bulck
Parse the pti= and nopti cmdline options using early_param to fix 'Unknown kernel command line parameters "nopti", will be passed to user space' warnings in the kernel log when nopti or pti= are passed to the kernel cmdline on x86 platforms. Additionally allow the kernel to warn for malformed pti= options. Signed-off-by: Jo Van Bulck <jo.vanbulck@cs.kuleuven.be> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20230819080921.5324-2-jo.vanbulck@cs.kuleuven.be
2023-10-20x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk()Josh Poimboeuf
For consistency with the other return thunks, rename __x86_return_skl() to call_depth_return_thunk(). Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/ae44e9f9976934e3b5b47a458d523ccb15867561.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/nospec: Refactor UNTRAIN_RET[_*]Josh Poimboeuf
Factor out the UNTRAIN_RET[_*] common bits into a helper macro. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/f06d45489778bd49623297af2a983eea09067a74.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/rethunk: Use SYM_CODE_START[_LOCAL]_NOALIGN macrosJosh Poimboeuf
Macros already exist for unaligned code block symbols. Use them. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/26d461bd509cc840af24c94586561c06d39812b2.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Disentangle rethunk-dependent optionsJosh Poimboeuf
CONFIG_RETHUNK, CONFIG_CPU_UNRET_ENTRY and CONFIG_CPU_SRSO are all tangled up. De-spaghettify the code a bit. Some of the rethunk-related code has been shuffled around within the '.text..__x86.return_thunk' section, but otherwise there are no functional changes. srso_alias_untrain_ret() and srso_alias_safe_ret() ((which are very address-sensitive) haven't moved. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/2845084ed303d8384905db3b87b77693945302b4.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Move retbleed IBPB check into existing 'has_microcode' code blockJosh Poimboeuf
Simplify the code flow a bit by moving the retbleed IBPB check into the existing 'has_microcode' block. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/0a22b86b1f6b07f9046a9ab763fc0e0d1b7a91d4.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/bugs: Remove default case for fully switched enumsJosh Poimboeuf
For enum switch statements which handle all possible cases, remove the default case so a compiler warning gets printed if one of the enums gets accidentally omitted from the switch statement. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/fcf6feefab991b72e411c2aed688b18e65e06aed.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Remove 'pred_cmd' labelJosh Poimboeuf
SBPB is only enabled in two distinct cases: 1) when SRSO has been disabled with srso=off 2) when SRSO has been fixed (in future HW) Simplify the control flow by getting rid of the 'pred_cmd' label and moving the SBPB enablement check to the two corresponding code sites. This makes it more clear when exactly SBPB gets enabled. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/bb20e8569cfa144def5e6f25e610804bc4974de2.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Unexport untraining functionsJosh Poimboeuf
These functions aren't called outside of retpoline.S. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/1ae080f95ce7266c82cba6d2adde82349b832654.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Improve i-cache locality for alias mitigationJosh Poimboeuf
Move srso_alias_return_thunk() to the same section as srso_alias_safe_ret() so they can share a cache line. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/eadaf5530b46a7ae8b936522da45ae555d2b3393.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Fix unret validation dependenciesJosh Poimboeuf
CONFIG_CPU_SRSO isn't dependent on CONFIG_CPU_UNRET_ENTRY (AMD Retbleed), so the two features are independently configurable. Fix several issues for the (presumably rare) case where CONFIG_CPU_SRSO is enabled but CONFIG_CPU_UNRET_ENTRY isn't. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/299fb7740174d0f2335e91c58af0e9c242b4bac1.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Fix vulnerability reporting for missing microcodeJosh Poimboeuf
The SRSO default safe-ret mitigation is reported as "mitigated" even if microcode hasn't been updated. That's wrong because userspace may still be vulnerable to SRSO attacks due to IBPB not flushing branch type predictions. Report the safe-ret + !microcode case as vulnerable. Also report the microcode-only case as vulnerable as it leaves the kernel open to attacks. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/a8a14f97d1b0e03ec255c81637afdf4cf0ae9c99.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Print mitigation for retbleed IBPB caseJosh Poimboeuf
When overriding the requested mitigation with IBPB due to retbleed=ibpb, print the mitigation in the usual format instead of a custom error message. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/ec3af919e267773d896c240faf30bfc6a1fd6304.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Print actual mitigation if requested mitigation isn't possibleJosh Poimboeuf
If the kernel wasn't compiled to support the requested option, print the actual option that ends up getting used. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/7e7a12ea9d85a9f76ca16a3efb71f262dee46ab1.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/srso: Fix SBPB enablement for (possible) future fixed HWJosh Poimboeuf
Make the SBPB check more robust against the (possible) case where future HW has SRSO fixed but doesn't have the SRSO_NO bit set. Fixes: 1b5277c0ea0b ("x86/srso: Add SRSO_NO support") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/cee5050db750b391c9f35f5334f8ff40e66c01b9.1693889988.git.jpoimboe@kernel.org
2023-10-20x86/mm: Drop the 4 MB restriction on minimal NUMA node memory sizeMike Rapoport (IBM)
Qi Zheng reported crashes in a production environment and provided a simplified example as a reproducer: | For example, if we use Qemu to start a two NUMA node kernel, | one of the nodes has 2M memory (less than NODE_MIN_SIZE), | and the other node has 2G, then we will encounter the | following panic: | | BUG: kernel NULL pointer dereference, address: 0000000000000000 | <...> | RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40 | <...> | Call Trace: | <TASK> | deactivate_slab() | bootstrap() | kmem_cache_init() | start_kernel() | secondary_startup_64_no_verify() The crashes happen because of inconsistency between the nodemask that has nodes with less than 4MB as memoryless, and the actual memory fed into the core mm. The commit: 9391a3f9c7f1 ("[PATCH] x86_64: Clear more state when ignoring empty node in SRAT parsing") ... that introduced minimal size of a NUMA node does not explain why a node size cannot be less than 4MB and what boot failures this restriction might fix. Fixes have been submitted to the core MM code to tighten up the memory topologies it accepts and to not crash on weird input: mm: page_alloc: skip memoryless nodes entirely mm: memory_hotplug: drop memoryless node from fallback lists Andrew has accepted them into the -mm tree, but there are no stable SHA1's yet. This patch drops the limitation for minimal node size on x86: - which works around the crash without the fixes to the core MM. - makes x86 topologies less weird, - removes an arbitrary and undocumented limitation on NUMA topologies. [ mingo: Improved changelog clarity. ] Reported-by: Qi Zheng <zhengqi.arch@bytedance.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/ZS+2qqjEO5/867br@gmail.com
2023-10-20crypto: x86/nhpoly1305 - implement ->digestEric Biggers
Implement the ->digest method to improve performance on single-page messages by reducing the number of indirect calls. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20crypto: x86/sha256 - implement ->digest for sha256Eric Biggers
Implement a ->digest function for sha256-ssse3, sha256-avx, sha256-avx2, and sha256-ni. This improves the performance of crypto_shash_digest() with these algorithms by reducing the number of indirect calls that are made. For now, don't bother with this for sha224, since sha224 is rarely used. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-19Merge tag 'sev_fixes_for_v6.6' of ↵Linus Torvalds
//git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "Take care of a race between when the #VC exception is raised and when the guest kernel gets to emulate certain instructions in SEV-{ES,SNP} guests by: - disabling emulation of MMIO instructions when coming from user mode - checking the IO permission bitmap before emulating IO instructions and verifying the memory operands of INS/OUTS insns" * tag 'sev_fixes_for_v6.6' of //git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Check for user-space IOIO pointing to kernel space x86/sev: Check IOBM for IOIO exceptions from user-space x86/sev: Disable MMIO emulation from user mode
2023-10-19virt: tdx-guest: Add Quote generation support using TSM_REPORTSKuppuswamy Sathyanarayanan
In TDX guest, the attestation process is used to verify the TDX guest trustworthiness to other entities before provisioning secrets to the guest. The first step in the attestation process is TDREPORT generation, which involves getting the guest measurement data in the format of TDREPORT, which is further used to validate the authenticity of the TDX guest. TDREPORT by design is integrity-protected and can only be verified on the local machine. To support remote verification of the TDREPORT in a SGX-based attestation, the TDREPORT needs to be sent to the SGX Quoting Enclave (QE) to convert it to a remotely verifiable Quote. SGX QE by design can only run outside of the TDX guest (i.e. in a host process or in a normal VM) and guest can use communication channels like vsock or TCP/IP to send the TDREPORT to the QE. But for security concerns, the TDX guest may not support these communication channels. To handle such cases, TDX defines a GetQuote hypercall which can be used by the guest to request the host VMM to communicate with the SGX QE. More details about GetQuote hypercall can be found in TDX Guest-Host Communication Interface (GHCI) for Intel TDX 1.0, section titled "TDG.VP.VMCALL<GetQuote>". Trusted Security Module (TSM) [1] exposes a common ABI for Confidential Computing Guest platforms to get the measurement data via ConfigFS. Extend the TSM framework and add support to allow an attestation agent to get the TDX Quote data (included usage example below). report=/sys/kernel/config/tsm/report/report0 mkdir $report dd if=/dev/urandom bs=64 count=1 > $report/inblob hexdump -C $report/outblob rmdir $report GetQuote TDVMCALL requires TD guest pass a 4K aligned shared buffer with TDREPORT data as input, which is further used by the VMM to copy the TD Quote result after successful Quote generation. To create the shared buffer, allocate a large enough memory and mark it shared using set_memory_decrypted() in tdx_guest_init(). This buffer will be re-used for GetQuote requests in the TDX TSM handler. Although this method reserves a fixed chunk of memory for GetQuote requests, such one time allocation can help avoid memory fragmentation related allocation failures later in the uptime of the guest. Since the Quote generation process is not time-critical or frequently used, the current version uses a polling model for Quote requests and it also does not support parallel GetQuote requests. Link: https://lore.kernel.org/lkml/169342399185.3934343.3035845348326944519.stgit@dwillia2-xfh.jf.intel.com/ [1] Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Erdem Aktas <erdemaktas@google.com> Tested-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Tested-by: Peter Gonda <pgonda@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. net/mac80211/key.c 02e0e426a2fb ("wifi: mac80211: fix error path key leak") 2a8b665e6bcc ("wifi: mac80211: remove key_mtx") 7d6904bf26b9 ("Merge wireless into wireless-next") https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/ti/Kconfig a602ee3176a8 ("net: ethernet: ti: Fix mixed module-builtin object") 98bdeae9502b ("net: cpmac: remove driver to prepare for platform removal") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19KVM: x86: Ignore MSR_AMD64_TW_CFG accessMaciej S. Szmigiero
Hyper-V enabled Windows Server 2022 KVM VM cannot be started on Zen1 Ryzen since it crashes at boot with SYSTEM_THREAD_EXCEPTION_NOT_HANDLED + STATUS_PRIVILEGED_INSTRUCTION (in other words, because of an unexpected #GP in the guest kernel). This is because Windows tries to set bit 8 in MSR_AMD64_TW_CFG and can't handle receiving a #GP when doing so. Give this MSR the same treatment that commit 2e32b7190641 ("x86, kvm: Add MSR_AMD64_BU_CFG2 to the list of ignored MSRs") gave MSR_AMD64_BU_CFG2 under justification that this MSR is baremetal-relevant only. Although apparently it was then needed for Linux guests, not Windows as in this case. With this change, the aforementioned guest setup is able to finish booting successfully. This issue can be reproduced either on a Summit Ridge Ryzen (with just "-cpu host") or on a Naples EPYC (with "-cpu host,stepping=1" since EPYC is ordinarily stepping 2). Alternatively, userspace could solve the problem by using MSR filters, but forcing every userspace to define a filter isn't very friendly and doesn't add much, if any, value. The only potential hiccup is if one of these "baremetal-only" MSRs ever requires actual emulation and/or has F/M/S specific behavior. But if that happens, then KVM can still punt *that* handling to userspace since userspace MSR filters "win" over KVM's default handling. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1ce85d9c7c9e9632393816cf19c902e0a3f411f1.1697731406.git.maciej.szmigiero@oracle.com [sean: call out MSR filtering alternative] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-19KVM: x86: remove the unused assigned_dev_head from kvm_archLiang Chen
Legacy device assignment was dropped years ago. This field is not used anymore. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Link: https://lore.kernel.org/r/20231019043336.8998-1-liangchen.linux@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-19x86/microcode/intel: Cleanup code furtherThomas Gleixner
Sanitize the microcode scan loop, fixup printks and move the loading function for builtin microcode next to the place where it is used and mark it __init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.389400871@linutronix.de
2023-10-19x86/microcode/intel: Simplify and rename generic_load_microcode()Thomas Gleixner
so it becomes less obfuscated and rename it because there is nothing generic about it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.330295409@linutronix.de
2023-10-19x86/microcode/intel: Simplify scan_microcode()Thomas Gleixner
Make it readable and comprehensible. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.271940980@linutronix.de
2023-10-19x86/microcode/intel: Rip out mixed stepping support for Intel CPUsAshok Raj
Mixed steppings aren't supported on Intel CPUs. Only one microcode patch is required for the entire system. The caching of microcode blobs which match the family and model is therefore pointless and in fact is dysfunctional as CPU hotplug updates use only a single microcode blob, i.e. the one where *intel_ucode_patch points to. Remove the microcode cache and make it an AMD local feature. [ tglx: - save only at the end. Otherwise random microcode ends up in the pointer for early loading - free the ucode patch pointer in save_microcode_patch() only after kmemdup() has succeeded, as reported by Andrew Cooper ] Originally-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.404362809@linutronix.de
2023-10-18KVM: x86/mmu: Remove unnecessary ‘NULL’ values from sptepLi zeming
Don't initialize "spte" and "sptep" in fast_page_fault() as they are both guaranteed (for all intents and purposes) to be written at the start of every loop iteration. Add a sanity check that "sptep" is non-NULL after walking the shadow page tables, as encountering a NULL root would result in "spte" not being written, i.e. would lead to uninitialized data or the previous value being consumed. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20230905182006.2964-1-zeming@nfschina.com [sean: rewrite changelog with --verbose] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-18bitops: add xor_unlock_is_negative_byte()Matthew Wilcox (Oracle)
Replace clear_bit_and_unlock_is_negative_byte() with xor_unlock_is_negative_byte(). We have a few places that like to lock a folio, set a flag and unlock it again. Allow for the possibility of combining the latter two operations for efficiency. We are guaranteed that the caller holds the lock, so it is safe to unlock it with the xor. The caller must guarantee that nobody else will set the flag without holding the lock; it is not safe to do this with the PG_dirty flag, for example. Link: https://lkml.kernel.org/r/20231004165317.1061855-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18x86: KVM: Add feature flag for CPUID.80000021H:EAX[bit 1]Jim Mattson
Define an X86_FEATURE_* flag for CPUID.80000021H:EAX.[bit 1], and advertise the feature to userspace via KVM_GET_SUPPORTED_CPUID. Per AMD's "Processor Programming Reference (PPR) for AMD Family 19h Model 61h, Revision B1 Processors (56713-B1-PUB)," this CPUID bit indicates that a WRMSR to MSR_FS_BASE, MSR_GS_BASE, or MSR_KERNEL_GS_BASE is non-serializing. This is a change in previously architected behavior. Effectively, this CPUID bit is a "defeature" bit, or a reverse polarity feature bit. When this CPUID bit is clear, the feature (serialization on WRMSR to any of these three MSRs) is available. When this CPUID bit is set, the feature is not available. KVM_GET_SUPPORTED_CPUID must pass this bit through from the underlying hardware, if it is set. Leaving the bit clear claims that WRMSR to these three MSRs will be serializing in a guest running under KVM. That isn't true. Though KVM could emulate the feature by intercepting writes to the specified MSRs, it does not do so today. The guest is allowed direct read/write access to these MSRs without interception, so the innate hardware behavior is preserved under KVM. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231005031237.1652871-1-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-18KVM: x86: remove always-false condition in kvmclock_sync_fnDongli Zhang
The 'kvmclock_periodic_sync' is a readonly param that cannot change after bootup. The kvm_arch_vcpu_postcreate() is not going to schedule the kvmclock_sync_work if kvmclock_periodic_sync == false. As a result, the "if (!kvmclock_periodic_sync)" can never be true if the kvmclock_sync_work = kvmclock_sync_fn() is scheduled. Link: https://lore.kernel.org/kvm/a461bf3f-c17e-9c3f-56aa-726225e8391d@oracle.com Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Link: https://lore.kernel.org/r/20231001213637.76686-1-dongli.zhang@oracle.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-18x86/microcode/32: Move early loading after paging enableThomas Gleixner
32-bit loads microcode before paging is enabled. The commit which introduced that has zero justification in the changelog. The cover letter has slightly more content, but it does not give any technical justification either: "The problem in current microcode loading method is that we load a microcode way, way too late; ideally we should load it before turning paging on. This may only be practical on 32 bits since we can't get to 64-bit mode without paging on, but we should still do it as early as at all possible." Handwaving word salad with zero technical content. Someone claimed in an offlist conversation that this is required for curing the ATOM erratum AAE44/AAF40/AAG38/AAH41. That erratum requires an microcode update in order to make the usage of PSE safe. But during early boot, PSE is completely irrelevant and it is evaluated way later. Neither is it relevant for the AP on single core HT enabled CPUs as the microcode loading on the AP is not doing anything. On dual core CPUs there is a theoretical problem if a split of an executable large page between enabling paging including PSE and loading the microcode happens. But that's only theoretical, it's practically irrelevant because the affected dual core CPUs are 64bit enabled and therefore have paging and PSE enabled before loading the microcode on the second core. So why would it work on 64-bit but not on 32-bit? The erratum: "AAG38 Code Fetch May Occur to Incorrect Address After a Large Page is Split Into 4-Kbyte Pages Problem: If software clears the PS (page size) bit in a present PDE (page directory entry), that will cause linear addresses mapped through this PDE to use 4-KByte pages instead of using a large page after old TLB entries are invalidated. Due to this erratum, if a code fetch uses this PDE before the TLB entry for the large page is invalidated then it may fetch from a different physical address than specified by either the old large page translation or the new 4-KByte page translation. This erratum may also cause speculative code fetches from incorrect addresses." The practical relevance for this is exactly zero because there is no splitting of large text pages during early boot-time, i.e. between paging enable and microcode loading, and neither during CPU hotplug. IOW, this load microcode before paging enable is yet another voodoo programming solution in search of a problem. What's worse is that it causes at least two serious problems: 1) When stackprotector is enabled, the microcode loader code has the stackprotector mechanics enabled. The read from the per CPU variable __stack_chk_guard is always accessing the virtual address either directly on UP or via %fs on SMP. In physical address mode this results in an access to memory above 3GB. So this works by chance as the hardware returns the same value when there is no RAM at this physical address. When there is RAM populated above 3G then the read is by chance the same as nothing changes that memory during the very early boot stage. That's not necessarily true during runtime CPU hotplug. 2) When function tracing is enabled, the relevant microcode loader functions and the functions invoked from there will call into the tracing code and evaluate global and per CPU variables in physical address mode. What could potentially go wrong? Cure this and move the microcode loading after the early paging enable, use the new temporary initrd mapping and remove the gunk in the microcode loader which is required to handle physical address mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.348298216@linutronix.de
2023-10-18x86/boot/32: Temporarily map initrd for microcode loadingThomas Gleixner
Early microcode loading on 32-bit runs in physical address mode because the initrd is not covered by the initial page tables. That results in a horrible mess all over the microcode loader code. Provide a temporary mapping for the initrd in the initial page tables by appending it to the actual initial mapping starting with a new PGD or PMD depending on the configured page table levels ([non-]PAE). The page table entries are located after _brk_end so they are not permanently using memory space. The mapping is invalidated right away in i386_start_kernel() after the early microcode loader has run. This prepares for removing the physical address mode oddities from all over the microcode loader code, which in turn allows further cleanups. Provide the map and unmap code and document the place where the microcode loader needs to be invoked with a comment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.292291436@linutronix.de
2023-10-18x86/microcode: Provide CONFIG_MICROCODE_INITRD32Thomas Gleixner
Create an aggregate config switch which covers X86_32, MICROCODE and BLK_DEV_INITRD to avoid lengthy #ifdeffery in upcoming code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.236208250@linutronix.de
2023-10-18x86/boot/32: Restructure mk_early_pgtbl_32()Thomas Gleixner
Prepare it for adding a temporary initrd mapping by splitting out the actual map loop. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.175910753@linutronix.de
2023-10-18x86/boot/32: De-uglify the 2/3 level paging difference in mk_early_pgtbl_32()Thomas Gleixner
Move the ifdeffery out of the function and use proper typedefs to make it work for both 2 and 3 level paging. No functional change. [ bp: Move mk_early_pgtbl_32() declaration into a header. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.111059491@linutronix.de
2023-10-18x86/boot: Rename conflicting 'boot_params' pointer to 'boot_params_ptr'Ard Biesheuvel
The x86 decompressor is built and linked as a separate executable, but it shares components with the kernel proper, which are either #include'd as C files, or linked into the decompresor as a static library (e.g, the EFI stub) Both the kernel itself and the decompressor define a global symbol 'boot_params' to refer to the boot_params struct, but in the former case, it refers to the struct directly, whereas in the decompressor, it refers to a global pointer variable referring to the struct boot_params passed by the bootloader or constructed from scratch. This ambiguity is unfortunate, and makes it impossible to assign this decompressor variable from the x86 EFI stub, given that declaring it as extern results in a clash. So rename the decompressor version (whose scope is limited) to boot_params_ptr. [ mingo: Renamed 'boot_params_p' to 'boot_params_ptr' for clarity ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org
2023-10-18x86/boot: Use __pa_nodebug() in mk_early_pgtbl_32()Thomas Gleixner
Use the existing macro instead of undefining and redefining __pa(). No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.051625827@linutronix.de
2023-10-18x86/boot/32: Disable stackprotector and tracing for mk_early_pgtbl_32()Thomas Gleixner
Stackprotector cannot work before paging is enabled. The read from the per CPU variable __stack_chk_guard is always accessing the virtual address either directly on UP or via FS on SMP. In physical address mode this results in an access to memory above 3GB. So this works by chance as the hardware returns the same value when there is no RAM at this physical address. When there is RAM populated above 3G then the read is by chance the same as nothing changes that memory during the very early boot stage. Stop relying on pure luck and disable the stack protector for the only C function which is called during early boot before paging is enabled. Remove function tracing from the whole source file as there is no way to trace this at all, but in case of CONFIG_DYNAMIC_FTRACE=n mk_early_pgtbl_32() would access global function tracer variables in physical address mode which again might work by chance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.156063939@linutronix.de
2023-10-18UML: remove unused cmd_vdso_installMasahiro Yamada
You cannot run this code because arch/um/Makefile does not define the vdso_install target. It appears that this code was blindly copied from another architecture. Remove the dead code. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Richard Weinberger <richard@nod.at>
2023-10-17KVM: x86: hyper-v: Don't auto-enable stimer on write from user-spaceNicolas Saenz Julienne
Don't apply the stimer's counter side effects when modifying its value from user-space, as this may trigger spurious interrupts. For example: - The stimer is configured in auto-enable mode. - The stimer's count is set and the timer enabled. - The stimer expires, an interrupt is injected. - The VM is live migrated. - The stimer config and count are deserialized, auto-enable is ON, the stimer is re-enabled. - The stimer expires right away, and injects an unwarranted interrupt. Cc: stable@vger.kernel.org Fixes: 1f4b34f825e8 ("kvm/x86: Hyper-V SynIC timers") Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20231017155101.40677-1-nsaenz@amazon.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-17KVM: x86: Update the variable naming in kvm_x86_ops.sched_in()Mingwei Zhang
Update the variable with name 'kvm' in kvm_x86_ops.sched_in() to 'vcpu' to avoid confusions. Variable naming in KVM has a clear convention that 'kvm' refers to pointer of type 'struct kvm *', while 'vcpu' refers to pointer of type 'struct kvm_vcpu *'. Fix this 9-year old naming issue for fun. Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20231017232610.4008690-1-mizhang@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>