summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2023-01-31sched/clock/x86: Mark sched_clock() noinstrPeter Zijlstra
In order to use sched_clock() from noinstr code, mark it and all it's implenentations noinstr. The whole pvclock thing (used by KVM/Xen) is a bit of a pain, since it calls out to watchdogs, create a pvclock_clocksource_read_nowd() variant doesn't do that and can be noinstr. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230126151323.702003578@infradead.org
2023-01-31x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()Uros Bizjak
Improve atomic update of last_value in pvclock_clocksource_read: - Atomic update can be skipped if the "last_value" is already equal to "ret". - The detection of atomic update failure is not correct. The value, returned by atomic64_cmpxchg should be compared to the old value from the location to be updated. If these two are the same, then atomic update succeeded and "last_value" location is updated to "ret" in an atomic way. Otherwise, the atomic update failed and it should be retried with the value from "last_value" - exactly what atomic64_try_cmpxchg does in a correct and more optimal way. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20230118202330.3740-1-ubizjak@gmail.com Link: https://lore.kernel.org/r/20230126151323.643408110@infradead.org
2023-01-31x86/atomics: Always inline arch_atomic64*()Peter Zijlstra
As already done for regular arch_atomic*(), always inline arch_atomic64*(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230126151323.585115019@infradead.org
2023-01-31Merge tag 'v6.2-rc6' into sched/core, to pick up fixesIngo Molnar
Pick up fixes before merging another batch of cpuidle updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-31x86/debug: Fix stack recursion caused by wrongly ordered DR7 accessesJoerg Roedel
In kernels compiled with CONFIG_PARAVIRT=n, the compiler re-orders the DR7 read in exc_nmi() to happen before the call to sev_es_ist_enter(). This is problematic when running as an SEV-ES guest because in this environment the DR7 read might cause a #VC exception, and taking #VC exceptions is not safe in exc_nmi() before sev_es_ist_enter() has run. The result is stack recursion if the NMI was caused on the #VC IST stack, because a subsequent #VC exception in the NMI handler will overwrite the stack frame of the interrupted #VC handler. As there are no compiler barriers affecting the ordering of DR7 reads/writes, make the accesses to this register volatile, forbidding the compiler to re-order them. [ bp: Massage text, make them volatile too, to make sure some aggressive compiler optimization pass doesn't discard them. ] Fixes: 315562c9af3d ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler") Reported-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230127035616.508966-1-aik@amd.com
2023-01-30efi/libstub: Add memory attribute protocol definitionsEvgeniy Baskov
EFI_MEMORY_ATTRIBUTE_PROTOCOL servers as a better alternative to DXE services for setting memory attributes in EFI Boot Services environment. This protocol is better since it is a part of UEFI specification itself and not UEFI PI specification like DXE services. Add EFI_MEMORY_ATTRIBUTE_PROTOCOL definitions. Support mixed mode properly for its calls. Tested-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-29Merge tag 'x86_urgent_for_v6.2_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Start checking for -mindirect-branch-cs-prefix clang support too now that LLVM 16 will support it - Fix a NULL ptr deref when suspending with Xen PV - Have a SEV-SNP guest check explicitly for features enabled by the hypervisor and fail gracefully if some are unsupported by the guest instead of failing in a non-obvious and hard-to-debug way - Fix a MSI descriptor leakage under Xen - Mark Xen's MSI domain as supporting MSI-X - Prevent legacy PIC interrupts from being resent in software by marking them level triggered, as they should be, which lead to a NULL ptr deref * tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block acpi: Fix suspend with Xen PV x86/sev: Add SEV-SNP guest feature negotiation support x86/pci/xen: Fixup fallout from the PCI/MSI overhaul x86/pci/xen: Set MSI_FLAG_PCI_MSIX support in Xen MSI domain x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL
2023-01-28Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2023-01-28 We've added 124 non-merge commits during the last 22 day(s) which contain a total of 124 files changed, 6386 insertions(+), 1827 deletions(-). The main changes are: 1) Implement XDP hints via kfuncs with initial support for RX hash and timestamp metadata kfuncs, from Stanislav Fomichev and Toke Høiland-Jørgensen. Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk 2) Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case, from Andrii Nakryiko. 3) Significantly reduce the search time for module symbols by livepatch and BPF, from Jiri Olsa and Zhen Lei. 4) Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals, from David Vernet. 5) Fix several issues in the dynptr processing such as stack slot liveness propagation, missing checks for PTR_TO_STACK variable offset, etc, from Kumar Kartikeya Dwivedi. 6) Various performance improvements, fixes, and introduction of more than just one XDP program to XSK selftests, from Magnus Karlsson. 7) Big batch to BPF samples to reduce deprecated functionality, from Daniel T. Lee. 8) Enable struct_ops programs to be sleepable in verifier, from David Vernet. 9) Reduce pr_warn() noise on BTF mismatches when they are expected under the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien. 10) Describe modulo and division by zero behavior of the BPF runtime in BPF's instruction specification document, from Dave Thaler. 11) Several improvements to libbpf API documentation in libbpf.h, from Grant Seltzer. 12) Improve resolve_btfids header dependencies related to subcmd and add proper support for HOSTCC, from Ian Rogers. 13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room() helper along with BPF selftests, from Ziyang Xuan. 14) Simplify the parsing logic of structure parameters for BPF trampoline in the x86-64 JIT compiler, from Pu Lehui. 15) Get BTF working for kernels with CONFIG_RUST enabled by excluding Rust compilation units with pahole, from Martin Rodriguez Reboredo. 16) Get bpf_setsockopt() working for kTLS on top of TCP sockets, from Kui-Feng Lee. 17) Disable stack protection for BPF objects in bpftool given BPF backends don't support it, from Holger Hoffstätte. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits) selftest/bpf: Make crashes more debuggable in test_progs libbpf: Add documentation to map pinning API functions libbpf: Fix malformed documentation formatting selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket. bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt(). bpf/selftests: Verify struct_ops prog sleepable behavior bpf: Pass const struct bpf_prog * to .check_member libbpf: Support sleepable struct_ops.s section bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable selftests/bpf: Fix vmtest static compilation error tools/resolve_btfids: Alter how HOSTCC is forced tools/resolve_btfids: Install subcmd headers bpf/docs: Document the nocast aliasing behavior of ___init bpf/docs: Document how nested trusted fields may be defined bpf/docs: Document cpumask kfuncs in a new file selftests/bpf: Add selftest suite for cpumask kfuncs selftests/bpf: Add nested trust selftests suite bpf: Enable cpumasks to be queried and used as kptrs bpf: Disallow NULLable pointers for trusted kfuncs ... ==================== Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/intel/ice/ice_main.c 418e53401e47 ("ice: move devlink port creation/deletion") 643ef23bd9dd ("ice: Introduce local var for readability") https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/ https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/ drivers/net/ethernet/engleder/tsnep_main.c 3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues") 25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/ net/netfilter/nf_conntrack_proto_sctp.c 13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths") f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/ https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27x86/tdx: Disable NOTIFY_ENABLESKirill A. Shutemov
== Background == There is a class of side-channel attacks against SGX enclaves called "SGX Step"[1]. These attacks create lots of exceptions inside of enclaves. Basically, run an in-enclave instruction, cause an exception. Over and over. There is a concern that a VMM could attack a TDX guest in the same way by causing lots of #VE's. The TDX architecture includes new countermeasures for these attacks. It basically counts the number of exceptions and can send another *special* exception once the number of VMM-induced #VE's hits a critical threshold[2]. == Problem == But, these special exceptions are independent of any action that the guest takes. They can occur anywhere that the guest executes. This includes sensitive areas like the entry code. The (non-paranoid) #VE handler is incapable of handling exceptions in these areas. == Solution == Fortunately, the special exceptions can be disabled by the guest via write to NOTIFY_ENABLES TDCS field. NOTIFY_ENABLES is disabled by default, but might be enabled by a bootloader, firmware or an earlier kernel before the current kernel runs. Disable NOTIFY_ENABLES feature explicitly and unconditionally. Any NOTIFY_ENABLES-based #VE's that occur before this point will end up in the early #VE exception handler and die due to unexpected exit reason. [1] https://github.com/jovanbulck/sgx-step [2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#safety-against-ve-in-kernel-code Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-8-kirill.shutemov%40linux.intel.com
2023-01-27x86/tdx: Relax SEPT_VE_DISABLE check for debug TDKirill A. Shutemov
A "SEPT #VE" occurs when a TDX guest touches memory that is not properly mapped into the "secure EPT". This can be the result of hypervisor attacks or bugs, *OR* guest bugs. Most notably, buggy guests might touch unaccepted memory for lots of different memory safety bugs like buffer overflows. TDX guests do not want to continue in the face of hypervisor attacks or hypervisor bugs. They want to terminate as fast and safely as possible. SEPT_VE_DISABLE ensures that TDX guests *can't* continue in the face of these kinds of issues. But, that causes a problem. TDX guests that can't continue can't spit out oopses or other debugging info. In essence SEPT_VE_DISABLE=1 guests are not debuggable. Relax the SEPT_VE_DISABLE check to warning on debug TD and panic() in the #VE handler on EPT-violation on private memory. It will produce useful backtrace. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-7-kirill.shutemov%40linux.intel.com
2023-01-27x86/tdx: Use ReportFatalError to report missing SEPT_VE_DISABLEKirill A. Shutemov
Linux TDX guests require that the SEPT_VE_DISABLE "attribute" be set. If it is not set, the kernel is theoretically required to handle exceptions anywhere that kernel memory is accessed, including places like NMI handlers and in the syscall entry gap. Rather than even try to handle these exceptions, the kernel refuses to run if SEPT_VE_DISABLE is unset. However, the SEPT_VE_DISABLE detection and refusal code happens very early in boot, even before earlyprintk runs. Calling panic() will effectively just hang the system. Instead, call a TDX-specific panic() function. This makes a very simple TDVMCALL which gets a short error string out to the hypervisor without any console infrastructure. Use TDG.VP.VMCALL<ReportFatalError> to report the error. The hypercall can encode message up to 64 bytes in eight registers. [ dhansen: tweak comment and remove while loop brackets. ] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-6-kirill.shutemov%40linux.intel.com
2023-01-27x86/tdx: Expand __tdx_hypercall() to handle more argumentsKirill A. Shutemov
So far __tdx_hypercall() only handles six arguments for VMCALL. Expanding it to six more register would allow to cover more use-cases like ReportFatalError() and Hyper-V hypercalls. With all preparations in place, the expansion is pretty straight forward. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-5-kirill.shutemov%40linux.intel.com
2023-01-27x86/tdx: Refactor __tdx_hypercall() to allow pass down more argumentsKirill A. Shutemov
RDI is the first argument to __tdx_hypercall() that used to pass pointer to struct tdx_hypercall_args. RSI is the second argument that contains flags, such as TDX_HCALL_HAS_OUTPUT and TDX_HCALL_ISSUE_STI. RDI and RSI can also be used as arguments to TDVMCALL leafs. Move RDI to RAX and RSI to RBP to free up them for the hypercall arguments. RAX saved on stack during TDCALL as it returns status code in the register. RBP value has to be restored before returning from __tdx_hypercall() as it is callee-saved register. This is preparatory patch. No functional change. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-4-kirill.shutemov%40linux.intel.com
2023-01-27x86/tdx: Add more registers to struct tdx_hypercall_argsKirill A. Shutemov
struct tdx_hypercall_args is used to pass down hypercall arguments to __tdx_hypercall() assembly routine. Currently __tdx_hypercall() handles up to 6 arguments. In preparation to changes in __tdx_hypercall(), expand the structure to 6 more registers and generate asm offsets for them. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-3-kirill.shutemov%40linux.intel.com
2023-01-27x86/tdx: Fix typo in comment in __tdx_hypercall()Kirill A. Shutemov
Comment in __tdx_hypercall() points that RAX==0 indicates TDVMCALL failure which is opposite of the truth: RAX==0 is success. Fix the comment. No functional changes. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-2-kirill.shutemov%40linux.intel.com
2023-01-27x86: Suppress KMSAN reports in arch_within_stack_frames()Alexander Potapenko
arch_within_stack_frames() performs stack walking and may confuse KMSAN by stepping on stale shadow values. To prevent false positive reports, disable KMSAN checks in this function. This fixes KMSAN's interoperability with CONFIG_HARDENED_USERCOPY. Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Eric Biggers <ebiggers@google.com> Link: https://github.com/google/kmsan/issues/89 Link: https://lore.kernel.org/lkml/Y3b9AAEKp2Vr3e6O@sol.localdomain/ Link: https://lore.kernel.org/all/20221118172305.3321253-1-glider%40google.com
2023-01-27net: checksum: drop the linux/uaccess.h includeJakub Kicinski
net/checksum.h pulls in linux/uaccess.h which is large. In the x86 header the include seems to not be needed at all. ARM on the other hand does not include uaccess.h, even tho it calls access_ok(). In the generic implementation guard the include of linux/uaccess.h with the same condition as the code that needs it. With this change pre-processed net/checksum.h shrinks on x86 from 30616 lines to just 1193. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-26KVM: x86/pmu: Provide "error" semantics for unsupported-but-known PMU MSRsSean Christopherson
Provide "error" semantics (read zeros, drop writes) for userspace accesses to MSRs that are ultimately unsupported for whatever reason, but for which KVM told userspace to save and restore the MSR, i.e. for MSRs that KVM included in KVM_GET_MSR_INDEX_LIST. Previously, KVM special cased a few PMU MSRs that were problematic at one point or another. Extend the treatment to all PMU MSRs, e.g. to avoid spurious unsupported accesses. Note, the logic can also be used for non-PMU MSRs, but as of today only PMU MSRs can end up being unsupported after KVM told userspace to save and restore them. Link: https://lore.kernel.org/r/20230124234905.3774678-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-26KVM: x86/pmu: Don't tell userspace to save MSRs for non-existent fixed PMCsLike Xu
Limit the set of MSRs for fixed PMU counters based on the number of fixed counters actually supported by the host so that userspace doesn't waste time saving and restoring dummy values. Signed-off-by: Like Xu <likexu@tencent.com> [sean: split for !enable_pmu logic, drop min(), write changelog] Link: https://lore.kernel.org/r/20230124234905.3774678-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-26KVM: x86/pmu: Don't tell userspace to save PMU MSRs if PMU is disabledSean Christopherson
Omit all PMU MSRs from the "MSRs to save" list if the PMU is disabled so that userspace doesn't waste time saving and restoring dummy values. KVM provides "error" semantics (read zeros, drop writes) for such known-but- unsupported MSRs, i.e. has fudged around this issue for quite some time. Keep the "error" semantics as-is for now, the logic will be cleaned up in a separate patch. Cc: Aaron Lewis <aaronlewis@google.com> Cc: Weijiang Yang <weijiang.yang@intel.com> Link: https://lore.kernel.org/r/20230124234905.3774678-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-26KVM: x86/pmu: Use separate array for defining "PMU MSRs to save"Sean Christopherson
Move all potential to-be-saved PMU MSRs into a separate array so that a future patch can easily omit all PMU MSRs from the list when the PMU is disabled. No functional change intended. Link: https://lore.kernel.org/r/20230124234905.3774678-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-26KVM: x86/pmu: Gate all "unimplemented MSR" prints on report_ignored_msrsSean Christopherson
Add helpers to print unimplemented MSR accesses and condition all such prints on report_ignored_msrs, i.e. honor userspace's request to not print unimplemented MSRs. Even though vcpu_unimpl() is ratelimited, printing can still be problematic, e.g. if a print gets stalled when host userspace is writing MSRs during live migration, an effective stall can result in very noticeable disruption in the guest. E.g. the profile below was taken while calling KVM_SET_MSRS on the PMU counters while the PMU was disabled in KVM. - 99.75% 0.00% [.] __ioctl - __ioctl - 99.74% entry_SYSCALL_64_after_hwframe do_syscall_64 sys_ioctl - do_vfs_ioctl - 92.48% kvm_vcpu_ioctl - kvm_arch_vcpu_ioctl - 85.12% kvm_set_msr_ignored_check svm_set_msr kvm_set_msr_common printk vprintk_func vprintk_default vprintk_emit console_unlock call_console_drivers univ8250_console_write serial8250_console_write uart_console_write Reported-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230124234905.3774678-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-26KVM: x86/pmu: Cap kvm_pmu_cap.num_counters_gp at KVM's internal maxSean Christopherson
Limit kvm_pmu_cap.num_counters_gp during kvm_init_pmu_capability() based on the vendor PMU capabilities so that consuming num_counters_gp naturally does the right thing. This fixes a mostly theoretical bug where KVM could over-report its PMU support in KVM_GET_SUPPORTED_CPUID for leaf 0xA, e.g. if the number of counters reported by perf is greater than KVM's hardcoded internal limit. Incorporating input from the AMD PMU also avoids over-reporting MSRs to save when running on AMD. Link: https://lore.kernel.org/r/20230124234905.3774678-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-26KVM: x86/pmu: Drop event_type and rename "struct kvm_event_hw_type_mapping"Like Xu
After commit ("02791a5c362b KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp,fixed}counter()"), vPMU starts to directly use the hardware event eventsel and unit_mask to reprogram perf_event, and the event_type field in the "struct kvm_event_hw_type_mapping" is simply no longer being used. Convert the struct into an anonymous struct as the current name is obsolete as the structure no longer has any mapping semantics, and placing the struct definition directly above its sole user makes its easier to understand what the array is filling in. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20221205122048.16023-1-likexu@tencent.com [sean: drop new comment, use anonymous struct] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-26x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock()Uros Bizjak
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in __acpi_{acquire,release}_global_lock(). x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after CMPXCHG (and related MOV instruction in front of CMPXCHG). Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when CMPXCHG fails. There is no need to re-read the value in the loop. Note that the value from *ptr should be read using READ_ONCE() to prevent the compiler from merging, refetching or reordering the read. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230116162522.4072-1-ubizjak@gmail.com
2023-01-26x86/PAT: Use try_cmpxchg() in set_page_memtype()Uros Bizjak
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in set_page_memtype. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. Note that the value from *ptr should be read using READ_ONCE to prevent the compiler from merging, refetching or reordering the read. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230116163446.4734-1-ubizjak@gmail.com
2023-01-26x86/resctrl: Fix a silly -Wunused-but-set-variable warningBorislav Petkov (AMD)
clang correctly complains arch/x86/kernel/cpu/resctrl/rdtgroup.c:1456:6: warning: variable \ 'h' set but not used [-Wunused-but-set-variable] u32 h; ^ but it can't know whether this use is innocuous or really a problem. There's a reason why those warning switches are behind a W=1 and not enabled by default - yes, one needs to do: make W=1 CC=clang HOSTCC=clang arch/x86/kernel/cpu/resctrl/ with clang 14 in order to trigger it. I would normally not take a silly fix like that but this one is simple and doesn't make the code uglier so... Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/202301242015.kbzkVteJ-lkp@intel.com
2023-01-26x86/boot/compressed: prefer cc-option for CFLAGS additionsNick Desaulniers
as-option tests new options using KBUILD_CFLAGS, which causes problems when using as-option to update KBUILD_AFLAGS because many compiler options are not valid assembler options. This will be fixed in a follow up patch. Before doing so, move the assembler test for -Wa,-mrelax-relocations=no from using as-option to cc-option. Link: https://lore.kernel.org/llvm/CAK7LNATcHt7GcXZ=jMszyH=+M_LC9Qr6yeAGRCBbE6xriLxtUQ@mail.gmail.com/ Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-01-25KVM: x86: Propagate the AMD Automatic IBRS feature to the guestKim Phillips
Add the AMD Automatic IBRS feature bit to those being propagated to the guest, and enable the guest EFER bit. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230124163319.2277355-9-kim.phillips@amd.com
2023-01-25x86/cpu: Support AMD Automatic IBRSKim Phillips
The AMD Zen4 core supports a new feature called Automatic IBRS. It is a "set-and-forget" feature that means that, like Intel's Enhanced IBRS, h/w manages its IBRS mitigation resources automatically across CPL transitions. The feature is advertised by CPUID_Fn80000021_EAX bit 8 and is enabled by setting MSR C000_0080 (EFER) bit 21. Enable Automatic IBRS by default if the CPU feature is present. It typically provides greater performance over the incumbent generic retpolines mitigation. Reuse the SPECTRE_V2_EIBRS spectre_v2_mitigation enum. AMD Automatic IBRS and Intel Enhanced IBRS have similar enablement. Add NO_EIBRS_PBRSB to cpu_vuln_whitelist, since AMD Automatic IBRS isn't affected by PBRSB-eIBRS. The kernel command line option spectre_v2=eibrs is used to select AMD Automatic IBRS, if available. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20230124163319.2277355-8-kim.phillips@amd.com
2023-01-25x86/cpu, kvm: Add the SMM_CTL MSR not present featureKim Phillips
The SMM_CTL MSR not present feature was being open-coded for KVM. Add it to its newly added CPUID leaf 0x80000021 EAX proper. Also drop the bit description comments now the code is more self-describing. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230124163319.2277355-7-kim.phillips@amd.com
2023-01-25x86/cpu, kvm: Add the Null Selector Clears Base featureKim Phillips
The Null Selector Clears Base feature was being open-coded for KVM. Add it to its newly added native CPUID leaf 0x80000021 EAX proper. Also drop the bit description comments now it's more self-describing. [ bp: Convert test in check_null_seg_clears_base() too. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230124163319.2277355-6-kim.phillips@amd.com
2023-01-25x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leafKim Phillips
The LFENCE always serializing feature bit was defined as scattered LFENCE_RDTSC and its native leaf bit position open-coded for KVM. Add it to its newly added CPUID leaf 0x80000021 EAX proper. With LFENCE_RDTSC in its proper place, the kernel's set_cpu_cap() will effectively synthesize the feature for KVM going forward. Also, DE_CFG[1] doesn't need to be set on such CPUs anymore. [ bp: Massage and merge diff from Sean. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230124163319.2277355-5-kim.phillips@amd.com
2023-01-25x86/cpu, kvm: Add the NO_NESTED_DATA_BP featureKim Phillips
The "Processor ignores nested data breakpoints" feature was being open-coded for KVM. Add the feature to its newly introduced CPUID leaf 0x80000021 EAX proper. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230124163319.2277355-4-kim.phillips@amd.com
2023-01-25x86/fpu: Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threadsJens Axboe
We don't set it on PF_KTHREAD threads as they never return to userspace, and PF_IO_WORKER threads are identical in that regard. As they keep running in the kernel until they die, skip setting the FPU flag on them. More of a cosmetic thing that was found while debugging and issue and pondering why the FPU flag is set on these threads. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/560c844c-f128-555b-40c6-31baff27537f@kernel.dk
2023-01-25x86/vdso: Move VDSO image init to vdso2c generated codeBrian Gerst
Generate an init function for each VDSO image, replacing init_vdso() and sysenter_setup(). Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230124184019.26850-1-brgerst@gmail.com
2023-01-25KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation codeKim Phillips
Move code from __do_cpuid_func() to kvm_set_cpu_caps() in preparation for adding the features in their native leaf. Also drop the bit description comments as it will be more self-describing once the individual features are added. Whilst there, switch to using the more efficient cpu_feature_enabled() instead of static_cpu_has(). Note, LFENCE_RDTSC and "NULL selector clears base" are currently synthetic, Linux-defined feature flags as Linux tracking of the features predates AMD's definition. Keep the manual propagation of the flags from their synthetic counterparts until the kernel fully converts to AMD's definition, otherwise KVM would stop synthesizing the flags as intended. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230124163319.2277355-3-kim.phillips@amd.com
2023-01-25x86/cpu, kvm: Add support for CPUID_80000021_EAXKim Phillips
Add support for CPUID leaf 80000021, EAX. The majority of the features will be used in the kernel and thus a separate leaf is appropriate. Include KVM's reverse_cpuid entry because features are used by VM guests, too. [ bp: Massage commit message. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230124163319.2277355-2-kim.phillips@amd.com
2023-01-25x86/Kconfig: Fix spellos & punctuationRandy Dunlap
Fix spelling (reported by codespell) & punctuation in arch/x86/ Kconfig. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230124181753.19309-1-rdunlap@infradead.org
2023-01-25x86/cpu: Use cpu_feature_enabled() when checking global pages supportBorislav Petkov (AMD)
X86_FEATURE_PGE determines whether the CPU has enabled global page translations support. Use the faster cpu_feature_enabled() check to shave off some more cycles when flushing all TLB entries, including the global ones. What this practically saves is: mov 0x82eb308(%rip),%rax # 0xffffffff8935bec8 <boot_cpu_data+40> test $0x20,%ah ... which test the bit. Not a lot, but TLB flushing is a timing-sensitive path, so anything to make it even faster. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230125075013.9292-1-bp@alien8.de
2023-01-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM64: - Pass the correct address to mte_clear_page_tags() on initialising a tagged page - Plug a race against a GICv4.1 doorbell interrupt while saving the vgic-v3 pending state. x86: - A command line parsing fix and a clang compilation fix for selftests - A fix for a longstanding VMX issue, that surprisingly was only found now to affect real world guests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Make reclaim_period_ms input always be positive KVM: x86/vmx: Do not skip segment attributes if unusable bit is set selftests: kvm: move declaration at the beginning of main() KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation KVM: arm64: Pass the actual page address to mte_clear_page_tags()
2023-01-24KVM: x86: Replace IS_ERR() with IS_ERR_VALUE()ye xingchen
Avoid type casts that are needed for IS_ERR() and use IS_ERR_VALUE() instead. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Link: https://lore.kernel.org/r/202211161718436948912@zte.com.cn Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: Handle NMI VM-Exits in noinstr regionSean Christopherson
Move VMX's handling of NMI VM-Exits into vmx_vcpu_enter_exit() so that the NMI is handled prior to leaving the safety of noinstr. Handling the NMI after leaving noinstr exposes the kernel to potential ordering problems as an instrumentation-induced fault, e.g. #DB, #BP, #PF, etc. will unblock NMIs when IRETing back to the faulting instruction. Reported-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlersSean Christopherson
Split the asm subroutines for handling NMIs versus IRQs that occur in the guest so that the NMI handler can be called from a noinstr section. As a bonus, the NMI path doesn't need an indirect branch. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels tooSean Christopherson
Use a dedicated entry for invoking the NMI handler from KVM VMX's VM-Exit path for 32-bit even though using a dedicated entry for 32-bit isn't strictly necessary. Exposing a single symbol will allow KVM to reference the entry point in assembly code without having to resort to more #ifdefs (or #defines). identry.h is intended to be included from asm files only once, and so simply including idtentry.h in KVM assembly isn't an option. Bypassing the ESP fixup and CR3 switching in the standard NMI entry code is safe as KVM always handles NMIs that occur in the guest on a kernel stack, with a kernel CR3. Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: Always inline to_vmx() and to_kvm_vmx()Sean Christopherson
Tag to_vmx() and to_kvm_vmx() __always_inline as they both just reflect the passed in pointer (the embedded struct is the first field in the container), and drop the @vmx param from vmx_vcpu_enter_exit(), which likely existed purely to make noinstr validation happy. Amusingly, when the compiler decides to not inline the helpers, e.g. for KASAN builds, to_vmx() and to_kvm_vmx() may end up pointing at the same symbol, which generates very confusing objtool warnings. E.g. the use of to_vmx() in a future patch led to objtool complaining about to_kvm_vmx(), and only once all use of to_kvm_vmx() was commented out did to_vmx() pop up in the obj tool report. vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x160: call to to_kvm_vmx() leaves .noinstr.text section Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: Always inline eVMCS read/write helpersSean Christopherson
Tag all evmcs_{read,write}() helpers __always_inline so that they can be freely used in noinstr sections, e.g. to get the VM-Exit reason in vcpu_vmx_enter_exit() (in a future patch). For consistency and to avoid more spot fixes in the future, e.g. see commit 010050a86393 ("x86/kvm: Always inline evmcs_write64()"), tag all accessors even though evmcs_read32() is the only anticipated use case in the near future. In practice, non-KASAN builds are all but guaranteed to inline the helpers anyways. vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x107: call to evmcs_read32() leaves .noinstr.text section Reported-by: kernel test robot <lkp@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: Allow VM-Fail path of VMREAD helper to be instrumentedSean Christopherson
Allow instrumentation in the VM-Fail path of __vmcs_readl() so that the helper can be used in noinstr functions, e.g. to get the exit reason in vmx_vcpu_enter_exit() in order to handle NMI VM-Exits in the noinstr section. While allowing instrumentation isn't technically safe, KVM has much bigger problems if VMREAD fails in a noinstr section. Note, all other VMX instructions also allow instrumentation in their VM-Fail paths for similar reasons, VMREAD was simply omitted by commit 3ebccdf373c2 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text") because VMREAD wasn't used in a noinstr section at the time. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86: Make vmx_get_exit_qual() and vmx_get_intr_info() noinstr-friendlySean Christopherson
Add an extra special noinstr-friendly helper to test+mark a "register" available and use it when caching vmcs.EXIT_QUALIFICATION and vmcs.VM_EXIT_INTR_INFO. Make the caching helpers __always_inline too so that they can be used in noinstr functions. A future fix will move VMX's handling of NMI exits into the noinstr vmx_vcpu_enter_exit() so that the NMI is processed before any kind of instrumentation can trigger a fault and thus IRET, i.e. so that KVM doesn't invoke the NMI handler with NMIs enabled. Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>