summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-11-10x86/msr-index: sort AMD RAPL MSRs by addressVictor Ding
MSRs in the rest of this file are sorted by their addresses; fixing the two outliers. No functional changes. Signed-off-by: Victor Ding <victording@google.com> Acked-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10efi/x86: Free efi_pgd with free_pages()Arvind Sankar
Commit d9e9a6418065 ("x86/mm/pti: Allocate a separate user PGD") changed the PGD allocation to allocate PGD_ALLOCATION_ORDER pages, so in the error path it should be freed using free_pages() rather than free_page(). Commit 06ace26f4e6f ("x86/efi: Free efi_pgd with free_pages()") fixed one instance of this, but missed another. Move the freeing out-of-line to avoid code duplication and fix this bug. Fixes: d9e9a6418065 ("x86/mm/pti: Allocate a separate user PGD") Link: https://lore.kernel.org/r/20201110163919.1134431-1-nivedita@alum.mit.edu Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-11-10x86/ioapic: Correct the PCI/ISA trigger type selectionThomas Gleixner
PCI's default trigger type is level and ISA's is edge. The recent refactoring made it the other way round, which went unnoticed as it seems only to cause havoc on some AMD systems. Make the comment and code do the right thing again. Fixes: a27dca645d2c ("x86/io_apic: Cleanup trigger/polarity helpers") Reported-by: Tom Lendacky <thomas.lendacky@amd.com> Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Qian Cai <cai@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/87d00lgu13.fsf@nanos.tec.linutronix.de
2020-11-10perf/x86/intel/uncore: Fix Add BW copypastaArnd Bergmann
gcc -Wextra points out a duplicate initialization of one array member: arch/x86/events/intel/uncore_snb.c:478:37: warning: initialized field overwritten [-Woverride-init] 478 | [SNB_PCI_UNCORE_IMC_DATA_READS] = { SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE, The only sensible explanation is that a duplicate 'READS' was used instead of the correct 'WRITES', so change it back. Fixes: 24633d901ea4 ("perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201026215203.3893972-1-arnd@kernel.org
2020-11-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - fix compilation error when PMD and PUD are folded - fix regression in reads-as-zero behaviour of ID_AA64ZFR0_EL1 - add aarch64 get-reg-list test x86: - fix semantic conflict between two series merged for 5.10 - fix (and test) enforcement of paravirtual cpuid features selftests: - various cleanups to memory management selftests - new selftests testcase for performance of dirty logging" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits) KVM: selftests: allow two iterations of dirty_log_perf_test KVM: selftests: Introduce the dirty log perf test KVM: selftests: Make the number of vcpus global KVM: selftests: Make the per vcpu memory size global KVM: selftests: Drop pointless vm_create wrapper KVM: selftests: Add wrfract to common guest code KVM: selftests: Simplify demand_paging_test with timespec_diff_now KVM: selftests: Remove address rounding in guest code KVM: selftests: Factor code out of demand_paging_test KVM: selftests: Use a single binary for dirty/clear log test KVM: selftests: Always clear dirty bitmap after iteration KVM: selftests: Add blessed SVE registers to get-reg-list KVM: selftests: Add aarch64 get-reg-list test selftests: kvm: test enforcement of paravirtual cpuid features selftests: kvm: Add exception handling to selftests selftests: kvm: Clear uc so UCALL_NONE is being properly reported selftests: kvm: Fix the segment descriptor layout to match the actual layout KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrs kvm: x86: request masterclock update any time guest uses different msr kvm: x86: ensure pv_cpuid.features is initialized when enabling cap ...
2020-11-09perf/x86/intel: Make anythread filter support conditionalStephane Eranian
Starting with Arch Perfmon v5, the anythread filter on generic counters may be deprecated. The current kernel was exporting the any filter without checking. On Icelake, it means you could do cpu/event=0x3c,any/ even though the filter does not exist. This patch corrects the problem by relying on the CPUID 0xa leaf function to determine if anythread is supported or not as described in the Intel SDM Vol3b 18.2.5.1 AnyThread Deprecation section. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201028194247.3160610-1-eranian@google.com
2020-11-09perf/x86: Make dummy_iregs staticPeter Zijlstra
Having pt_regs on-stack is unfortunate, it's 168 bytes. Since it isn't actually used, make it a static variable. This both gets if off the stack and ensures it gets 0 initialized, just in case someone does look at it. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151955.324273677@infradead.org
2020-11-09perf/arch: Remove perf_sample_data::regs_user_copyPeter Zijlstra
struct perf_sample_data lives on-stack, we should be careful about it's size. Furthermore, the pt_regs copy in there is only because x86_64 is a trainwreck, solve it differently. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org
2020-11-09perf/x86: Reduce stack usage for x86_pmu::drain_pebs()Peter Zijlstra
intel_pmu_drain_pebs_*() is typically called from handle_pmi_common(), both have an on-stack struct perf_sample_data, which is *big*. Rewire things so that drain_pebs() can use the one handle_pmi_common() has. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151955.054099690@infradead.org
2020-11-09perf: Reduce stack usage of perf_output_begin()Peter Zijlstra
__perf_output_begin() has an on-stack struct perf_sample_data in the unlikely case it needs to generate a LOST record. However, every call to perf_output_begin() must already have a perf_sample_data on-stack. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org
2020-11-09x86/xen: don't unbind uninitialized lock_kicker_irqBrian Masney
When booting a hyperthreaded system with the kernel parameter 'mitigations=auto,nosmt', the following warning occurs: WARNING: CPU: 0 PID: 1 at drivers/xen/events/events_base.c:1112 unbind_from_irqhandler+0x4e/0x60 ... Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 ... Call Trace: xen_uninit_lock_cpu+0x28/0x62 xen_hvm_cpu_die+0x21/0x30 takedown_cpu+0x9c/0xe0 ? trace_suspend_resume+0x60/0x60 cpuhp_invoke_callback+0x9a/0x530 _cpu_up+0x11a/0x130 cpu_up+0x7e/0xc0 bringup_nonboot_cpus+0x48/0x50 smp_init+0x26/0x79 kernel_init_freeable+0xea/0x229 ? rest_init+0xaa/0xaa kernel_init+0xa/0x106 ret_from_fork+0x35/0x40 The secondary CPUs are not activated with the nosmt mitigations and only the primary thread on each CPU core is used. In this situation, xen_hvm_smp_prepare_cpus(), and more importantly xen_init_lock_cpu(), is not called, so the lock_kicker_irq is not initialized for the secondary CPUs. Let's fix this by exiting early in xen_uninit_lock_cpu() if the irq is not set to avoid the warning from above for each secondary CPU. Signed-off-by: Brian Masney <bmasney@redhat.com> Link: https://lore.kernel.org/r/20201107011119.631442-1-bmasney@redhat.com Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2020-11-08Merge tag 'x86-urgent-2020-11-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 fixes: - Use SYM_FUNC_START_WEAK in the mem* ASM functions instead of a combination of .weak and SYM_FUNC_START_LOCAL which makes LLVMs integrated assembler upset - Correct the mitigation selection logic which prevented the related prctl to work correctly - Make the UV5 hubless system work correctly by fixing up the malformed table entries and adding the missing ones" * tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/uv: Recognize UV5 hubless system identifier x86/platform/uv: Remove spaces from OEM IDs x86/platform/uv: Fix missing OEM_TABLE_ID x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S
2020-11-08KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrsPankaj Gupta
Windows2016 guest tries to enable LBR by setting the corresponding bits in MSR_IA32_DEBUGCTLMSR. KVM does not emulate MSR_IA32_DEBUGCTLMSR and spams the host kernel logs with error messages like: kvm [...]: vcpu1, guest rIP: 0xfffff800a8b687d3 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop" This patch fixes this by enabling error logging only with 'report_ignored_msrs=1'. Signed-off-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Message-Id: <20201105153932.24316-1-pankaj.gupta.linux@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08kvm: x86: request masterclock update any time guest uses different msrOliver Upton
Commit 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21) subtly changed the behavior of guest writes to MSR_KVM_SYSTEM_TIME(_NEW). Restore the previous behavior; update the masterclock any time the guest uses a different msr than before. Fixes: 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21) Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20201027231044.655110-6-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08kvm: x86: ensure pv_cpuid.features is initialized when enabling capOliver Upton
Make the paravirtual cpuid enforcement mechanism idempotent to ioctl() ordering by updating pv_cpuid.features whenever userspace requests the capability. Extract this update out of kvm_update_cpuid_runtime() into a new helper function and move its other call site into kvm_vcpu_after_set_cpuid() where it more likely belongs. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20201027231044.655110-5-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08kvm: x86: reads of restricted pv msrs should also result in #GPOliver Upton
commit 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") only protects against disallowed guest writes to KVM paravirtual msrs, leaving msr reads unchecked. Fix this by enforcing KVM_CPUID_FEATURES for msr reads as well. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20201027231044.655110-4-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08KVM: x86: use positive error values for msr emulation that causes #GPMaxim Levitsky
Recent introduction of the userspace msr filtering added code that uses negative error codes for cases that result in either #GP delivery to the guest, or handled by the userspace msr filtering. This breaks an assumption that a negative error code returned from the msr emulation code is a semi-fatal error which should be returned to userspace via KVM_RUN ioctl and usually kill the guest. Fix this by reusing the already existing KVM_MSR_RET_INVALID error code, and by adding a new KVM_MSR_RET_FILTERED error code for the userspace filtered msrs. Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace") Reported-by: Qian Cai <cai@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201101115523.115780-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08KVM: x86/mmu: fix counting of rmap entries in pte_list_addLi RongQing
Fix an off-by-one style bug in pte_list_add() where it failed to account the last full set of SPTEs, i.e. when desc->sptes is full and desc->more is NULL. Merge the two "PTE_LIST_EXT-1" checks as part of the fix to avoid an extra comparison. Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <1601196297-24104-1-git-send-email-lirongqing@baidu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-07Merge branch 'linus' into perf/kprobesIngo Molnar
Conflicts: include/asm-generic/atomic-instrumented.h kernel/kprobes.c Use the upstream atomic-instrumented.h checksum, and pick the kprobes version of kernel/kprobes.c, which effectively reverts this upstream workaround: 645f224e7ba2: ("kprobes: Tell lockdep about kprobe nesting") Since the new code *should* be fine without nesting. Knock on wood ... Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-11-07Merge branch 'linus' into perf/kprobesIngo Molnar
Merge recent kprobes updates into perf/kprobes that came from -mm. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-11-07x86/platform/uv: Recognize UV5 hubless system identifierMike Travis
Testing shows a problem in that UV5 hubless systems were not being recognized. Add them to the list of OEM IDs checked. Fixes: 6c7794423a998 ("Add UV5 direct references") Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201105222741.157029-4-mike.travis@hpe.com
2020-11-07x86/platform/uv: Remove spaces from OEM IDsMike Travis
Testing shows that trailing spaces caused problems with the OEM_ID and the OEM_TABLE_ID. One being that the OEM_ID would not string compare correctly. Another the OEM_ID and OEM_TABLE_ID would be concatenated in the printout. Remove any trailing spaces. Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab") Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201105222741.157029-3-mike.travis@hpe.com
2020-11-07x86/platform/uv: Fix missing OEM_TABLE_IDMike Travis
Testing shows a problem in that the OEM_TABLE_ID was missing for hubless systems. This is used to determine the APIC type (legacy or extended). Add the OEM_TABLE_ID to the early hubless processing. Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab") Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201105222741.157029-2-mike.travis@hpe.com
2020-11-06x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUsPaul E. McKenney
Currently, accessing /proc/cpuinfo sends IPIs to idle CPUs in order to learn their clock frequency. Which is a bit strange, given that waking them from idle likely significantly changes their clock frequency. This commit therefore avoids sending /proc/cpuinfo-induced IPIs to idle CPUs. [ paulmck: Also check for idle in arch_freq_prepare_all(). ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org>
2020-11-06x86/cpu: Avoid cpuinfo-induced IPI pileupsPaul E. McKenney
The aperfmperf_snapshot_cpu() function is invoked upon access to /proc/cpuinfo, and it does do an early exit if the specified CPU has recently done a snapshot. Unfortunately, the indication that a snapshot has been completed is set in an IPI handler, and the execution of this handler can be delayed by any number of unfortunate events. This means that a system that starts a number of applications, each of which parses /proc/cpuinfo, can suffer from an smp_call_function_single() storm, especially given that each access to /proc/cpuinfo invokes smp_call_function_single() for all CPUs. Please note that this is not theoretical speculation. Note also that one CPU's pending IPI serves all requests, so there is no point in ever having more than one IPI pending to a given CPU. This commit therefore suppresses duplicate IPIs to a given CPU via a new ->scfpending field in the aperfmperf_sample structure. This field is set to the value one if an IPI is pending to the corresponding CPU and to zero otherwise. The aperfmperf_snapshot_cpu() function uses atomic_xchg() to set this field to the value one and sample the old value. If this function's "wait" parameter is zero, smp_call_function_single() is called only if the old value of the ->scfpending field was zero. The IPI handler uses atomic_set_release() to set this new field to zero just before returning, so that the prior stores into the aperfmperf_sample structure are seen by future requests that get to the atomic_xchg(). Future requests that pass the elapsed-time check are ordered by the fact that on x86 loads act as acquire loads, just as was the case prior to this change. The return value is based off of the age of the prior snapshot, just as before. Reported-by: Dave Jones <davej@codemonkey.org.uk> [ paulmck: Allow /proc/cpuinfo to take advantage of arch_freq_get_on_cpu(). ] [ paulmck: Add comment on memory barrier. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org>
2020-11-06io-mapping: Cleanup atomic iomapThomas Gleixner
Switch the atomic iomap implementation over to kmap_local and stick the preempt/pagefault mechanics into the generic code similar to the kmap_atomic variants. Rename the x86 map function in preparation for a non-atomic variant. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20201103095858.625310005@linutronix.de
2020-11-06x86/mm/highmem: Use generic kmap atomic implementationThomas Gleixner
Convert X86 to the generic kmap atomic implementation and make the iomap_atomic() naming convention consistent while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201103095857.375127260@linutronix.de
2020-11-06x86/mce: Correct the detection of invalid notifier prioritiesZhen Lei
Commit c9c6d216ed28 ("x86/mce: Rename "first" function as "early"") changed the enumeration of MCE notifier priorities. Correct the check for notifier priorities to cover the new range. [ bp: Rewrite commit message, remove superfluous brackets in conditional. ] Fixes: c9c6d216ed28 ("x86/mce: Rename "first" function as "early"") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201106141216.2062-2-thunder.leizhen@huawei.com
2020-11-06ftrace: Add recording of functions that caused recursionSteven Rostedt (VMware)
This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file "recursed_functions" all the functions that caused recursion while a callback to the function tracer was running. Link: https://lkml.kernel.org/r/20201106023548.102375687@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Guo Ren <guoren@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-csky@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: live-patching@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06kprobes/ftrace: Add recursion protection to the ftrace callbackSteven Rostedt (VMware)
If a ftrace callback does not supply its own recursion protection and does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will make a helper trampoline to do so before calling the callback instead of just calling the callback directly. The default for ftrace_ops is going to change. It will expect that handlers provide their own recursion protection, unless its ftrace_ops states otherwise. Link: https://lkml.kernel.org/r/20201028115613.140212174@goodmis.org Link: https://lkml.kernel.org/r/20201106023546.944907560@goodmis.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Ren <guoren@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-csky@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06x86/mce: Assign boolean values to a bool variableKaixu Xia
Fix the following coccinelle warnings: ./arch/x86/kernel/cpu/mce/core.c:1765:3-20: WARNING: Assignment of 0/1 to bool variable ./arch/x86/kernel/cpu/mce/core.c:1584:2-9: WARNING: Assignment of 0/1 to bool variable [ bp: Massage commit message. ] Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1604654363-1463-1-git-send-email-kaixuxia@tencent.com
2020-11-06ima: generalize x86/EFI arch glue for other EFI architecturesChester Lin
Move the x86 IMA arch code into security/integrity/ima/ima_efi.c, so that we will be able to wire it up for arm64 in a future patch. Co-developed-by: Chester Lin <clin@suse.com> Signed-off-by: Chester Lin <clin@suse.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-11-05x86/speculation: Allow IBPB to be conditionally enabled on CPUs with ↵Anand K Mistry
always-on STIBP On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON, STIBP is set to on and spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED At the same time, IBPB can be set to conditional. However, this leads to the case where it's impossible to turn on IBPB for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED condition leads to a return before the task flag is set. Similarly, ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to conditional. More generally, the following cases are possible: 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb 2. STIBP = on && IBPB = conditional for AMD CPUs with X86_FEATURE_AMD_STIBP_ALWAYS_ON The first case functions correctly today, but only because spectre_v2_user_ibpb isn't updated to reflect the IBPB mode. At a high level, this change does one thing. If either STIBP or IBPB is set to conditional, allow the prctl to change the task flag. Also, reflect that capability when querying the state. This isn't perfect since it doesn't take into account if only STIBP or IBPB is unconditionally on. But it allows the conditional feature to work as expected, without affecting the unconditional one. [ bp: Massage commit message and comment; space out statements for better readability. ] Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Signed-off-by: Anand K Mistry <amistry@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid
2020-11-05Merge tag 'hyperv-fixes-signed' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - clarify a comment (Michael Kelley) - change a pr_warn() to pr_info() (Olaf Hering) * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Clarify comment on x2apic mode hv_balloon: disable warning when floor reached
2020-11-04efi: generalize efi_get_securebootChester Lin
Generalize the efi_get_secureboot() function so not only efistub but also other subsystems can use it. Note that the MokSbState handling is not factored out: the variable is boot time only, and so it cannot be parameterized as easily. Also, the IMA code will switch to this version in a future patch, and it does not incorporate the MokSbState exception in the first place. Note that the new efi_get_secureboot_mode() helper treats any failures to read SetupMode as setup mode being disabled. Co-developed-by: Chester Lin <clin@suse.com> Signed-off-by: Chester Lin <clin@suse.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-11-04x86/entry: Move nmi entry/exit into common codeThomas Gleixner
Lockdep state handling on NMI enter and exit is nothing specific to X86. It's not any different on other architectures. Also the extra state type is not necessary, irqentry_state_t can carry the necessary information as well. Move it to common code and extend irqentry_state_t to carry lockdep state. [ Ira: Make exit_rcu and lockdep a union as they are mutually exclusive between the IRQ and NMI exceptions, and add kernel documentation for struct irqentry_state_t ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201102205320.1458656-7-ira.weiny@intel.com
2020-11-04Merge branch 'core/urgent' into core/entryThomas Gleixner
Pick up the entry fix before further modifications.
2020-11-04x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.SFangrui Song
Commit 393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions") added .weak directives to arch/x86/lib/mem*_64.S instead of changing the existing ENTRY macros to WEAK. This can lead to the assembly snippet .weak memcpy ... .globl memcpy which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy with LLVM's integrated assembler before LLVM 12. LLVM 12 (since https://reviews.llvm.org/D90108) will error on such an overridden symbol binding. Commit ef1e03152cb0 ("x86/asm: Make some functions local") changed ENTRY in arch/x86/lib/memcpy_64.S to SYM_FUNC_START_LOCAL, which was ineffective due to the preceding .weak directive. Use the appropriate SYM_FUNC_START_WEAK instead. Fixes: 393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions") Fixes: ef1e03152cb0 ("x86/asm: Make some functions local") Reported-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201103012358.168682-1-maskray@google.com
2020-11-04x86/ioapic: Use I/O-APIC ID for finding irqdomain, not indexDavid Woodhouse
In commit b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain") the I/O-APIC code was changed to find its parent irqdomain using irq_find_matching_fwspec(), but the key used for the lookup was wrong. It shouldn't use 'ioapic' which is the index into its own ioapics[] array. It should use the actual arbitration ID of the I/O-APIC in question, which is mpc_ioapic_id(ioapic). Fixes: b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain") Reported-by: lkp <oliver.sang@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/57adf2c305cd0c5e9d860b2f3007a7e676fd0f9f.camel@infradead.org
2020-11-04x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports itDexuan Cui
When a Linux VM runs on Hyper-V, if the VM has CPUs with >255 APIC IDs, the CPUs can't be the destination of IOAPIC interrupts, because the IOAPIC RTE's Dest Field has only 8 bits. Currently the hackery driver drivers/iommu/hyperv-iommu.c is used to ensure IOAPIC interrupts are only routed to CPUs that don't have >255 APIC IDs. However, there is an issue with kdump, because the kdump kernel can run on any CPU, and hence IOAPIC interrupts can't work if the kdump kernel run on a CPU with a >255 APIC ID. The kdump issue can be fixed by the Extended Dest ID, which is introduced recently by David Woodhouse (for IOAPIC, see the field virt_destid_8_14 in struct IO_APIC_route_entry). Of course, the Extended Dest ID needs the support of the underlying hypervisor. The latest Hyper-V has added the support recently: with this commit, on such a Hyper-V host, Linux VM does not use hyperv-iommu.c because hyperv_prepare_irq_remapping() returns -ENODEV; instead, Linux kernel's generic support of Extended Dest ID from David is used, meaning that Linux VM is able to support up to 32K CPUs, and IOAPIC interrupts can be routed to all the CPUs. On an old Hyper-V host that doesn't support the Extended Dest ID, nothing changes with this commit: Linux VM is still able to bring up the CPUs with > 255 APIC IDs with the help of hyperv-iommu.c, but IOAPIC interrupts still can not go to such CPUs, and the kdump kernel still can not work properly on such CPUs. [ tglx: Updated comment as suggested by David ] Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20201103011136.59108-1-decui@microsoft.com
2020-11-03Merge tag 'x86_seves_for_v5.10_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV-ES fixes from Borislav Petkov: "A couple of changes to the SEV-ES code to perform more stringent hypervisor checks before enabling encryption (Joerg Roedel)" * tag 'x86_seves_for_v5.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev-es: Do not support MMIO to/from encrypted memory x86/head/64: Check SEV encryption before switching to kernel page-table x86/boot/compressed/64: Check SEV encryption in 64-bit boot-path x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handler x86/boot/compressed/64: Introduce sev_status
2020-11-02x86/mtrr: Fix a kernel-doc markupMauro Carvalho Chehab
Kernel-doc markup should use this format: identifier - description Fix it. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/2217cd4ae9e561da2825485eb97de77c65741489.1603469755.git.mchehab+huawei@kernel.org
2020-11-02x86/mce: Enable additional error logging on certain Intel CPUsTony Luck
The Xeon versions of Sandy Bridge, Ivy Bridge and Haswell support an optional additional error logging mode which is enabled by an MSR. Previously, this mode was enabled from the mcelog(8) tool via /dev/cpu, but userspace should not be poking at MSRs. So move the enabling into the kernel. [ bp: Correct the explanation why this is done. ] Suggested-by: Boris Petkov <bp@alien8.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201030190807.GA13884@agluck-desk2.amr.corp.intel.com
2020-11-01Merge tag 'x86-urgent-2020-11-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Three fixes all related to #DB: - Handle the BTF bit correctly so it doesn't get lost due to a kernel #DB - Only clear and set the virtual DR6 value used by ptrace on user space triggered #DB. A kernel #DB must leave it alone to ensure data consistency for ptrace. - Make the bitmasking of the virtual DR6 storage correct so it does not lose DR_STEP" * tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/debug: Fix DR_STEP vs ptrace_get_debugreg(6) x86/debug: Only clear/set ->virtual_dr6 for userspace #DB x86/debug: Fix BTF handling
2020-11-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - selftest fix - force PTE mapping on device pages provided via VFIO - fix detection of cacheable mapping at S2 - fallback to PMD/PTE mappings for composite huge pages - fix accounting of Stage-2 PGD allocation - fix AArch32 handling of some of the debug registers - simplify host HYP entry - fix stray pointer conversion on nVHE TLB invalidation - fix initialization of the nVHE code - simplify handling of capabilities exposed to HYP - nuke VCPUs caught using a forbidden AArch32 EL0 x86: - new nested virtualization selftest - miscellaneous fixes - make W=1 fixes - reserve new CPUID bit in the KVM leaves" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: vmx: remove unused variable KVM: selftests: Don't require THP to run tests KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again KVM: selftests: test behavior of unmapped L2 APIC-access address KVM: x86: Fix NULL dereference at kvm_msr_ignored_check() KVM: x86: replace static const variables with macros KVM: arm64: Handle Asymmetric AArch32 systems arm64: cpufeature: upgrade hyp caps to final arm64: cpufeature: reorder cpus_have_{const, final}_cap() KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code() KVM: arm64: Force PTE mapping on fault resulting in a device mapping KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes KVM: arm64: Fix masks in stage2_pte_cacheable() KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
2020-10-31KVM: vmx: remove unused variablePaolo Bonzini
Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-31KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work againVitaly Kuznetsov
It was noticed that evmcs_sanitize_exec_ctrls() is not being executed nowadays despite the code checking 'enable_evmcs' static key looking correct. Turns out, static key magic doesn't work in '__init' section (and it is unclear when things changed) but setup_vmcs_config() is called only once per CPU so we don't really need it to. Switch to checking 'enlightened_vmcs' instead, it is supposed to be in sync with 'enable_evmcs'. Opportunistically make evmcs_sanitize_exec_ctrls '__init' and drop unneeded extra newline from it. Reported-by: Yang Weijiang <weijiang.yang@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201014143346.2430936-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-30timekeeping: default GENERIC_CLOCKEVENTS to enabledArnd Bergmann
Almost all machines use GENERIC_CLOCKEVENTS, so it feels wrong to require each one to select that symbol manually. Instead, enable it whenever CONFIG_LEGACY_TIMER_TICK is disabled as a simplification. It should be possible to select both GENERIC_CLOCKEVENTS and LEGACY_TIMER_TICK from an architecture now and decide at runtime between the two. For the clockevents arch-support.txt file, this means that additional architectures are marked as TODO when they have at least one machine that still uses LEGACY_TIMER_TICK, rather than being marked 'ok' when at least one machine has been converted. This means that both m68k and arm (for riscpc) revert to TODO. At this point, we could just always enable CONFIG_GENERIC_CLOCKEVENTS rather than leaving it off when not needed. I built an m68k defconfig kernel (using gcc-10.1.0) and found that this would add around 5.5KB in kernel image size: text data bss dec hex filename 3861936 1092236 196656 5150828 4e986c obj-m68k/vmlinux-no-clockevent 3866201 1093832 196184 5156217 4ead79 obj-m68k/vmlinux-clockevent On Arm (MACH_RPC), that difference appears to be twice as large, around 11KB on top of an 6MB vmlinux. Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-10-30KVM: x86: Fix NULL dereference at kvm_msr_ignored_check()Takashi Iwai
The newly introduced kvm_msr_ignored_check() tries to print error or debug messages via vcpu_*() macros, but those may cause Oops when NULL vcpu is passed for KVM_GET_MSRS ioctl. Fix it by replacing the print calls with kvm_*() macros. (Note that this will leave vcpu argument completely unused in the function, but I didn't touch it to make the fix as small as possible. A clean up may be applied later.) Fixes: 12bc2132b15e ("KVM: X86: Do the same ignore_msrs check for feature msrs") BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1178280 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Message-Id: <20201030151414.20165-1-tiwai@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-30KVM: x86: replace static const variables with macrosPaolo Bonzini
Even though the compiler is able to replace static const variables with their value, it will warn about them being unused when Linux is built with W=1. Use good old macros instead, this is not C++. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>