summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2018-03-31x86/cpu/tme: Fix spelling: "configuation" -> "configuration"Colin Ian King
Trivial fix to spelling mistake in the pr_err_once() error message text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20180313154709.1015-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-31x86/build: Don't pass in -D__KERNEL__ multiple timesCao jin
Some .<target>.cmd files under arch/x86 are showing two instances of -D__KERNEL__, like arch/x86/boot/ and arch/x86/realmode/rm/. __KERNEL__ is already defined in KBUILD_CPPFLAGS in the top Makefile, so it can be dropped safely. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kbuild@vger.kernel.org Link: http://lkml.kernel.org/r/20180316084944.3997-1-caoj.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-31Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "PPC: - Fix a bug causing occasional machine check exceptions on POWER8 hosts (introduced in 4.16-rc1) x86: - Fix a guest crashing regression with nested VMX and restricted guest (introduced in 4.16-rc1) - Fix dependency check for pv tlb flush (the wrong dependency that effectively disabled the feature was added in 4.16-rc4, the original feature in 4.16-rc1, so it got decent testing)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix pv tlb flush dependencies KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0 KVM: PPC: Book3S HV: Fix duplication of host SLB entries
2018-03-29um: Compile with modern headersJason A. Donenfeld
Recent libcs have gotten a bit more strict, so we actually need to include the right headers and use the right types. This enables UML to compile again. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at>
2018-03-29perf/x86/pt, coresight: Clean up address filter structureAlexander Shishkin
This is a cosmetic patch that deals with the address filter structure's ambiguous fields 'filter' and 'range'. The former stands to mean that the filter's *action* should be to filter the traces to its address range if it's set or stop tracing if it's unset. This is confusing and hard on the eyes, so this patch replaces it with 'action' enum. The 'range' field is completely redundant (meaning that the filter is an address range as opposed to a single address trigger), as we can use zero size to mean the same thing. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20180329120648.11902-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-29Merge branch 'perf/urgent' into perf/coreIngo Molnar
Conflicts: kernel/events/hw_breakpoint.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-28KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with ↵Liran Alon
nested_run_pending When vCPU runs L2 and there is a pending event that requires to exit from L2 to L1 and nested_run_pending=1, vcpu_enter_guest() will request an immediate-exit from L2 (See req_immediate_exit). Since now handling of req_immediate_exit also makes sure to set KVM_REQ_EVENT, there is no need to also set it on vmx_vcpu_run() when nested_run_pending=1. This optimizes cases where VMRESUME was executed by L1 to enter L2 and there is no pending events that require exit from L2 to L1. Previously, this would have set KVM_REQ_EVENT unnecessarly. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event ↵Liran Alon
pending In case L2 VMExit to L0 during event-delivery, VMCS02 is filled with IDT-vectoring-info which vmx_complete_interrupts() makes sure to reinject before next resume of L2. While handling the VMExit in L0, an IPI could be sent by another L1 vCPU to the L1 vCPU which currently runs L2 and exited to L0. When L0 will reach vcpu_enter_guest() and call inject_pending_event(), it will note that a previous event was re-injected to L2 (by IDT-vectoring-info) and therefore won't check if there are pending L1 events which require exit from L2 to L1. Thus, L0 enters L2 without immediate VMExit even though there are pending L1 events! This commit fixes the issue by making sure to check for L1 pending events even if a previous event was reinjected to L2 and bailing out from inject_pending_event() before evaluating a new pending event in case an event was already reinjected. The bug was observed by the following setup: * L0 is a 64CPU machine which runs KVM. * L1 is a 16CPU machine which runs KVM. * L0 & L1 runs with APICv disabled. (Also reproduced with APICv enabled but easier to analyze below info with APICv disabled) * L1 runs a 16CPU L2 Windows Server 2012 R2 guest. During L2 boot, L1 hangs completely and analyzing the hang reveals that one L1 vCPU is holding KVM's mmu_lock and is waiting forever on an IPI that he has sent for another L1 vCPU. And all other L1 vCPUs are currently attempting to grab mmu_lock. Therefore, all L1 vCPUs are stuck forever (as L1 runs with kernel-preemption disabled). Observing /sys/kernel/debug/tracing/trace_pipe reveals the following series of events: (1) qemu-system-x86-19066 [030] kvm_nested_vmexit: rip: 0xfffff802c5dca82f reason: EPT_VIOLATION ext_inf1: 0x0000000000000182 ext_inf2: 0x00000000800000d2 ext_int: 0x00000000 ext_int_err: 0x00000000 (2) qemu-system-x86-19054 [028] kvm_apic_accept_irq: apicid f vec 252 (Fixed|edge) (3) qemu-system-x86-19066 [030] kvm_inj_virq: irq 210 (4) qemu-system-x86-19066 [030] kvm_entry: vcpu 15 (5) qemu-system-x86-19066 [030] kvm_exit: reason EPT_VIOLATION rip 0xffffe00069202690 info 83 0 (6) qemu-system-x86-19066 [030] kvm_nested_vmexit: rip: 0xffffe00069202690 reason: EPT_VIOLATION ext_inf1: 0x0000000000000083 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000 (7) qemu-system-x86-19066 [030] kvm_nested_vmexit_inject: reason: EPT_VIOLATION ext_inf1: 0x0000000000000083 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000 (8) qemu-system-x86-19066 [030] kvm_entry: vcpu 15 Which can be analyzed as follows: (1) L2 VMExit to L0 on EPT_VIOLATION during delivery of vector 0xd2. Therefore, vmx_complete_interrupts() will set KVM_REQ_EVENT and reinject a pending-interrupt of 0xd2. (2) L1 sends an IPI of vector 0xfc (CALL_FUNCTION_VECTOR) to destination vCPU 15. This will set relevant bit in LAPIC's IRR and set KVM_REQ_EVENT. (3) L0 reach vcpu_enter_guest() which calls inject_pending_event() which notes that interrupt 0xd2 was reinjected and therefore calls vmx_inject_irq() and returns. Without checking for pending L1 events! Note that at this point, KVM_REQ_EVENT was cleared by vcpu_enter_guest() before calling inject_pending_event(). (4) L0 resumes L2 without immediate-exit even though there is a pending L1 event (The IPI pending in LAPIC's IRR). We have already reached the buggy scenario but events could be furthered analyzed: (5+6) L2 VMExit to L0 on EPT_VIOLATION. This time not during event-delivery. (7) L0 decides to forward the VMExit to L1 for further handling. (8) L0 resumes into L1. Note that because KVM_REQ_EVENT is cleared, the LAPIC's IRR is not examined and therefore the IPI is still not delivered into L1! Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: x86: Fix misleading comments on handling pending exceptionsLiran Alon
The reason that exception.pending should block re-injection of NMI/interrupt is not described correctly in comment in code. Instead, it describes why a pending exception should be injected before a pending NMI/interrupt. Therefore, move currently present comment to code-block evaluating a new pending event which explains why exception.pending is evaluated first. In addition, create a new comment describing that exception.pending blocks re-injection of NMI/interrupt because the exception was queued by handling vmexit which was due to NMI/interrupt delivery. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@orcle.com> [Used a comment from Sean J <sean.j.christopherson@intel.com>. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: x86: Rename interrupt.pending to interrupt.injectedLiran Alon
For exceptions & NMIs events, KVM code use the following coding convention: *) "pending" represents an event that should be injected to guest at some point but it's side-effects have not yet occurred. *) "injected" represents an event that it's side-effects have already occurred. However, interrupts don't conform to this coding convention. All current code flows mark interrupt.pending when it's side-effects have already taken place (For example, bit moved from LAPIC IRR to ISR). Therefore, it makes sense to just rename interrupt.pending to interrupt.injected. This change follows logic of previous commit 664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been injected") which changed exception to follow this coding convention as well. It is important to note that in case !lapic_in_kernel(vcpu), interrupt.pending usage was and still incorrect. In this case, interrrupt.pending can only be set using one of the following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that QEMU uses them either to re-set an "interrupt.pending" state it has received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR before sending ioctl to KVM. So it seems that indeed "interrupt.pending" in this case is also suppose to represent "interrupt.injected". However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr() is misusing (now named) interrupt.injected in order to return if there is a pending interrupt. This leads to nVMX/nSVM not be able to distinguish if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should re-inject an injected interrupt. Therefore, add a FIXME at these functions for handling this issue. This patch introduce no semantics change. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interruptLiran Alon
kvm_inject_realmode_interrupt() is called from one of the injection functions which writes event-injection to VMCS: vmx_queue_exception(), vmx_inject_irq() and vmx_inject_nmi(). All these functions are called just to cause an event-injection to guest. They are not responsible of manipulating the event-pending flag. The only purpose of kvm_inject_realmode_interrupt() should be to emulate real-mode interrupt-injection. This was also incorrect when called from vmx_queue_exception(). Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28x86/kvm: use Enlightened VMCS when running on Hyper-VVitaly Kuznetsov
Enlightened VMCS is just a structure in memory, the main benefit besides avoiding somewhat slower VMREAD/VMWRITE is using clean field mask: we tell the underlying hypervisor which fields were modified since VMEXIT so there's no need to inspect them all. Tight CPUID loop test shows significant speedup: Before: 18890 cycles After: 8304 cycles Static key is being used to avoid performance penalty for non-Hyper-V deployments. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28x86/hyper-v: detect nested featuresVitaly Kuznetsov
TLFS 5.0 says: "Support for an enlightened VMCS interface is reported with CPUID leaf 0x40000004. If an enlightened VMCS interface is supported, additional nested enlightenments may be discovered by reading the CPUID leaf 0x4000000A (see 2.4.11)." Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28x86/hyper-v: define struct hv_enlightened_vmcs and clean field bitsVitaly Kuznetsov
The definitions are according to the Hyper-V TLFS v5.0. KVM on Hyper-V will use these. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28x86/hyper-v: allocate and use Virtual Processor Assist PagesVitaly Kuznetsov
Virtual Processor Assist Pages usage allows us to do optimized EOI processing for APIC, enable Enlightened VMCS support in KVM and more. struct hv_vp_assist_page is defined according to the Hyper-V TLFS v5.0b. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGELadi Prosek
The assist page has been used only for the paravirtual EOI so far, hence the "APIC" in the MSR name. Renaming to match the Hyper-V TLFS where it's called "Virtual VP Assist MSR". Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28x86/hyper-v: move definitions from TLFS to hyperv-tlfs.hVitaly Kuznetsov
mshyperv.h now only contains fucntions/variables we define in kernel, all definitions from TLFS should go to hyperv-tlfs.h. 'enum hv_cpuid_function' is removed as we already have this info in hyperv-tlfs.h, code in mshyperv.c is adjusted accordingly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28x86/hyper-v: move hyperv.h out of uapiVitaly Kuznetsov
hyperv.h is not part of uapi, there are no (known) users outside of kernel. We are making changes to this file to match current Hyper-V Hypervisor Top-Level Functional Specification (TLFS, see: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs) and we don't want to maintain backwards compatibility. Move the file renaming to hyperv-tlfs.h to avoid confusing it with mshyperv.h. In future, all definitions from TLFS should go to it and all kernel objects should go to mshyperv.h or include/linux/hyperv.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: X86: Fix setup the virt_spin_lock_key before static key get initializedWanpeng Li
static_key_disable_cpuslocked(): static key 'virt_spin_lock_key+0x0/0x20' used before call to jump_label_init() WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:161 static_key_disable_cpuslocked+0x61/0x80 RIP: 0010:static_key_disable_cpuslocked+0x61/0x80 Call Trace: static_key_disable+0x16/0x20 start_kernel+0x192/0x4b3 secondary_startup_64+0xa5/0xb0 Qspinlock will be choosed when dedicated pCPUs are available, however, the static virt_spin_lock_key is set in kvm_spinlock_init() before jump_label_init() has been called, which will result in a WARN(). This patch fixes it by delaying the virt_spin_lock_key setup to .smp_prepare_cpus(). Reported-by: Davidlohr Bueso <dbueso@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Fixes: b2798ba0b876 ("KVM: X86: Choose qspinlock when dedicated physical CPUs are available") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: SVM: Implement pause loop exit logic in SVMBabu Moger
Bring the PLE(pause loop exit) logic to AMD svm driver. While testing, we found this helping in situations where numerous pauses are generated. Without these patches we could see continuos VMEXITS due to pause interceptions. Tested it on AMD EPYC server with boot parameter idle=poll on a VM with 32 vcpus to simulate extensive pause behaviour. Here are VMEXITS in 10 seconds interval. Pauses 810199 504 Total 882184 325415 Signed-off-by: Babu Moger <babu.moger@amd.com> [Prevented the window from dropping below the initial value. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: SVM: Add pause filter thresholdBabu Moger
This patch adds the support for pause filtering threshold. This feature support is indicated by CPUID Fn8000_000A_EDX. See AMD APM Vol 2 Section 15.14.4 Pause Intercept Filtering for more details. In this mode, a 16-bit pause filter threshold field is added in VMCB. The threshold value is a cycle count that is used to reset the pause counter. As with simple pause filtering, VMRUN loads the pause count value from VMCB into an internal counter. Then, on each pause instruction the hardware checks the elapsed number of cycles since the most recent pause instruction against the pause Filter Threshold. If the elapsed cycle count is greater than the pause filter threshold, then the internal pause count is reloaded from VMCB and execution continues. If the elapsed cycle count is less than the pause filter threshold, then the internal pause count is decremented. If the count value is less than zero and pause intercept is enabled, a #VMEXIT is triggered. If advanced pause filtering is supported and pause filter threshold field is set to zero, the filter will operate in the simpler, count only mode. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: VMX: Bring the common code to header fileBabu Moger
This patch brings some of the code from vmx to x86.h header file. Now, we can share this code between vmx and svm. Modified couple functions to make it common. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: VMX: Remove ple_window_actual_maxBabu Moger
Get rid of ple_window_actual_max, because its benefits are really minuscule and the logic is complicated. The overflows(and underflow) are controlled in __ple_window_grow and _ple_window_shrink respectively. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Babu Moger <babu.moger@amd.com> [Fixed potential wraparound and change the max to UINT_MAX. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: VMX: Fix the module parameters for vmxBabu Moger
The vmx module parameters are supposed to be unsigned variants. Also fixed the checkpatch errors like the one below. WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. +module_param(ple_gap, uint, S_IRUGO); Signed-off-by: Babu Moger <babu.moger@amd.com> [Expanded uint to unsigned int in code. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28Revert "x86/mce/AMD: Collect error info even if valid bits are not set"Yazen Ghannam
This reverts commit 4b1e84276a6172980c5bf39aa091ba13e90d6dad. Software uses the valid bits to decide if the values can be used for further processing or other actions. So setting the valid bits will have software act on values that it shouldn't be acting on. The recommendation to save all the register values does not mean that the values are always valid. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: tony.luck@intel.com Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: bp@suse.de Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20180326191526.64314-1-Yazen.Ghannam@amd.com
2018-03-28x86/platform/UV: Fix critical UV MMR address errormike.travis@hpe.com
A critical error was found testing the fixed UV4 HUB in that an MMR address was found to be incorrect. This causes the virtual address space for accessing the MMIOH1 region to be allocated with the incorrect size. Fixes: 673aa20c55a1 ("x86/platform/UV: Update uv_mmrs.h to prepare for UV4A fixes") Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Link: https://lkml.kernel.org/r/20180328174011.041801248@stormcage.americas.sgi.com
2018-03-28KVM: x86: Fix perf timer mode IP reportingAndi Kleen
KVM and perf have a special backdoor mechanism to report the IP for interrupts re-executed after vm exit. This works for the NMIs that perf normally uses. However when perf is in timer mode it doesn't work because the timer interrupt doesn't get this special treatment. This is common when KVM is running nested in another hypervisor which may not implement the PMU, so only timer mode is available. Call the functions to set up the backdoor IP also for non NMI interrupts. I renamed the functions to set up the backdoor IP reporting to be more appropiate for their new use. The SVM change is only compile tested. v2: Moved the functions inline. For the normal interrupt case the before/after functions are now called from x86.c, not arch specific code. For the NMI case we still need to call it in the architecture specific code, because it's already needed in the low level *_run functions. Signed-off-by: Andi Kleen <ak@linux.intel.com> [Removed unnecessary calls from arch handle_external_intr. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: x86: Fix pv tlb flush dependenciesWanpeng Li
PV TLB FLUSH can only be turned on when steal time is enabled. The condition got reversed during conflict resolution. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Fixes: 4f2f61fc5071 ("KVM: X86: Avoid traversing all the cpus for pv tlb flush when steal time is disabled") [Rebased on top of kvm/master and reworded the commit message. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28Merge 4.16-rc7 into char-misc-nextGreg Kroah-Hartman
We want the hyperv fix in here for merging and testing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28x86/boot: Fix SEV boot failure from change to __PHYSICAL_MASK_SHIFTTom Lendacky
In arch/x86/boot/compressed/kaslr_64.c, CONFIG_AMD_MEM_ENCRYPT support was initially #undef'd to support SME with minimal effort. When support for SEV was added, the #undef remained and some minimal support for setting the encryption bit was added for building identity mapped pagetable entries. Commit b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52") changed __PHYSICAL_MASK_SHIFT from 46 to 52 in support of 5-level paging. This change resulted in SEV guests failing to boot because the encryption bit was no longer being automatically masked out. The compressed boot path now requires sme_me_mask to be defined in order for the pagetable functions, such as pud_present(), to properly mask out the encryption bit (currently bit 47) when evaluating pagetable entries. Add an sme_me_mask variable in arch/x86/boot/compressed/mem_encrypt.S, which is set when SEV is active, delete the #undef CONFIG_AMD_MEM_ENCRYPT from arch/x86/boot/compressed/kaslr_64.c and use sme_me_mask when building the identify mapped pagetable entries. Fixes: b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180327220711.8702.55842.stgit@tlendack-t1.amdoffice.net
2018-03-28x86/platform/uv/BAU: Add APIC idt entryAndrew Banman
BAU uses the old alloc_initr_gate90 method to setup its interrupt. This fails silently as the BAU vector is in the range of APIC vectors that are registered to the spurious interrupt handler. As a consequence BAU broadcasts are not handled, and the broadcast source CPU hangs. Update BAU to use new idt structure. Fixes: dc20b2d52653 ("x86/idt: Move interrupt gate initialization to IDT code") Signed-off-by: Andrew Banman <abanman@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <mike.travis@hpe.com> Cc: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Cc: stable@vger.kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/1522188546-196177-1-git-send-email-abanman@hpe.com
2018-03-28x86/msr: Make rdmsrl_safe_on_cpu() scheduling safe as wellEric Dumazet
When changing rdmsr_safe_on_cpu() to schedule, it was missed that __rdmsr_safe_on_cpu() was also used by rdmsrl_safe_on_cpu() Make rdmsrl_safe_on_cpu() a wrapper instead of copy/pasting the code which was added for the completion handling. Fixes: 07cde313b2d2 ("x86/msr: Allow rdmsr_safe_on_cpu() to schedule") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180328032233.153055-1-edumazet@google.com
2018-03-28Backmerge tag 'v4.16-rc7' into drm-nextDave Airlie
Linux 4.16-rc7 This was requested by Daniel, and things were getting a bit hard to reconcile, most of the conflicts were trivial though.
2018-03-28Merge remote-tracking branch 'asoc/topic/intel' into asoc-nextMark Brown
2018-03-27x86/cpuid: Allow cpuid_read() to scheduleEric Dumazet
High latencies can be observed caused by a daemon periodically reading CPUID on all cpus. On KASAN enabled kernels ~10ms latencies can be observed. Even without KASAN, sending an IPI to a CPU, which is in a deep sleep state or in a long hard IRQ disabled section, waiting for the answer can consume hundreds of microseconds. cpuid_read() is invoked in preemptible context, so it can be converted to sleep instead of busy wait. Switching to smp_call_function_single_async() and a completion allows to reschedule and reduces CPU usage and latencies. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Link: https://lkml.kernel.org/r/20180323215818.127774-2-edumazet@google.com
2018-03-27x86/msr: Allow rdmsr_safe_on_cpu() to scheduleEric Dumazet
High latencies can be observed caused by a daemon periodically reading various MSR on all cpus. On KASAN enabled kernels ~10ms latencies can be observed simply reading one MSR. Even without KASAN, sending an IPI to a CPU, which is in a deep sleep state or in a long hard IRQ disabled section, waiting for the answer can consume hundreds of microseconds. All usage sites are in preemptible context, convert rdmsr_safe_on_cpu() to use a completion instead of busy polling. Overall daemon cpu usage was reduced by 35 %, and latencies caused by msr_read() disappeared. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Link: https://lkml.kernel.org/r/20180323215818.127774-1-edumazet@google.com
2018-03-27x86/mm: Update comment in detect_tme() regarding x86_phys_bitsKirill A. Shutemov
As Kai pointed out, the primary reason for adjusting x86_phys_bits is to reflect that the the address space is reduced and not the ability to communicate the available physical address space to virtual machines. Suggested-by: Kai Huang <kai.huang@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: linux-mm@kvack.org Link: https://lkml.kernel.org/r/20180315134907.9311-2-kirill.shutemov@linux.intel.com
2018-03-27x86/PCI: Fix a potential regression when using dmi_get_bios_year()Andy Shevchenko
dmi_get_bios_year() may return 0 when it cannot parse the BIOS date string. Previously this has been checked in pci_acpi_crs_quirks(). Update the code to restore old behaviour. Reported-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: linux-acpi@vger.kernel.org Cc: linux-pci@vger.kernel.org Fixes: 69c42d493db4 ("x86/pci: Simplify code by using the new dmi_get_bios_year() helper") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-27Documentation: Fix early-microcode.txt references after file renameJaak Ristioja
The file Documentation/x86/early-microcode.txt was renamed to Documentation/x86/microcode.txt in 0e3258753f81, but it was still referenced by its old name in a three places: * Documentation/x86/00-INDEX * arch/x86/Kconfig * arch/x86/kernel/cpu/microcode/amd.c This commit updates these references accordingly. Fixes: 0e3258753f81 ("x86/microcode: Document the three loading methods") Signed-off-by: Jaak Ristioja <jaak@ristioja.ee> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-03-27x86/alternatives: Fixup alternative_call_2Alexey Dobriyan
The following pattern fails to compile while the same pattern with alternative_call() does: if (...) alternative_call_2(...); else alternative_call_2(...); as it expands into if (...) { }; <=== else { }; Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20180114120504.GA11368@avx2
2018-03-27x86/mm/32: Remove unused node_memmap_size_bytes() & ↵David Rientjes
CONFIG_NEED_NODE_MEMMAP_SIZE logic node_memmap_size_bytes() has been unused since the v3.9 kernel, so remove it. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: f03574f2d5b2 ("x86-32, mm: Rip out x86_32 NUMA remapping code") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803262325540.256524@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-27Merge tag 'v4.16-rc7' into x86/mm, to fix up conflictIngo Molnar
Conflicts: arch/x86/mm/init_64.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-27perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUsStephane Eranian
this patch fix a bug in how the pebs->real_ip is handled in the PEBS handler. real_ip only exists in Haswell and later processor. It is actually the eventing IP, i.e., where the event occurred. As opposed to the pebs->ip which is the PEBS interrupt IP which is always off by one. The problem is that the real_ip just like the IP needs to be fixed up because PEBS does not record all the machine state registers, and in particular the code segement (cs). This is why we have the set_linear_ip() function. The problem was that set_linear_ip() was only used on the pebs->ip and not the pebs->real_ip. We have profiles which ran into invalid callstacks because of this. Here is an example: ..... 0: ffffffffffffff80 recent entry, marker kernel v ..... 1: 000000000040044d <= user address in kernel space! ..... 2: fffffffffffffe00 marker enter user v ..... 3: 000000000040044d ..... 4: 00000000004004b6 oldest entry Debugging output in get_perf_callchain(): [ 857.769909] CALLCHAIN: CPU8 ip=40044d regs->cs=10 user_mode(regs)=0 The problem is that the kernel entry in 1: points to a user level address. How can that be? The reason is that with PEBS sampling the instruction that caused the event to occur and the instruction where the CPU was when the interrupt was posted may be far apart. And sometime during that time window, the privilege level may change. This happens, for instance, when the PEBS sample is taken close to a kernel entry point. Here PEBS, eventing IP (real_ip) captured a user level instruction. But by the time the PMU interrupt fired, the processor had already entered kernel space. This is why the debug output shows a user address with user_mode() false. The problem comes from PEBS not recording the code segment (cs) register. The register is used in x86_64 to determine if executing in kernel vs user space. This is okay because the kernel has a software workaround called set_linear_ip(). But the issue in setup_pebs_sample_data() is that set_linear_ip() is never called on the real_ip value when it is available (Haswell and later) and precise_ip > 1. This patch fixes this problem and eliminates the callchain discrepancy. The patch restructures the code around set_linear_ip() to minimize the number of times the IP has to be set. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1521788507-10231-1-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-27perf/x86: Update rdpmc_always_available static key to the modern APIDavidlohr Bueso
No changes in refcount semantics -- use DEFINE_STATIC_KEY_FALSE() for initialization and replace: static_key_slow_inc|dec() => static_branch_inc|dec() static_key_false() => static_branch_unlikely() Added a '_key' suffix to rdpmc_always_available, for better self-documentation. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Link: http://lkml.kernel.org/r/20180326210929.5244-5-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-26x86/devicetree: Use CPU description from Device TreeIvan Gorinov
Current x86 Device Tree implementation does not support multiprocessing. Use new DT bindings to describe the processors. Signed-off-by: Ivan Gorinov <ivan.gorinov@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Link: https://lkml.kernel.org/r/c291fb2cef51b730b59916d7745be0eaa4378c6c.1521753738.git.ivan.gorinov@intel.com
2018-03-26treewide: Align function definition open/close bracesJoe Perches
Some functions definitions have either the initial open brace and/or the closing brace outside of column 1. Move those braces to column 1. This allows various function analyzers like gnu complexity to work properly for these modified functions. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Nicolin Chen <nicoleotsuka@gmail.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-03-26x86/apic: Finish removing unused callbacksDavid Rientjes
The ->cpu_mask_to_apicid() and ->vector_allocation_domain() callbacks are now unused, so remove them. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: baab1e84b112 ("x86/apic: Remove unused callbacks") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803251403540.80485@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-25Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 and PTI fixes from Ingo Molnar: "Misc fixes: - fix EFI pagetables freeing - fix vsyscall pagetable setting on Xen PV guests - remove ancient CONFIG_X86_PPRO_FENCE=y - x86 is TSO again - fix two binutils (ld) development version related incompatibilities - clean up breakpoint handling - fix an x86 self-test" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/64: Don't use IST entry for #BP stack x86/efi: Free efi_pgd with free_pages() x86/vsyscall/64: Use proper accessor to update P4D entry x86/cpu: Remove the CONFIG_X86_PPRO_FENCE=y quirk x86/boot/64: Verify alignment of the LOAD segment x86/build/64: Force the linker to use 2MB page size selftests/x86/ptrace_syscall: Fix for yet more glibc interference
2018-03-25Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc kernel side fixes. Generic: - cgroup events counting fix x86: - Intel PMU truncated-parameter fix - RDPMC fix - API naming fix/rename - uncore driver big-hardware PCI enumeration fix - uncore driver filter constraint fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/cgroup: Fix child event counting bug perf/x86/intel/uncore: Fix multi-domain PCI CHA enumeration bug on Skylake servers perf/x86/intel: Rename confusing 'freerunning PEBS' API and implementation to 'large PEBS' perf/x86/intel/uncore: Add missing filter constraint for SKX CHA event perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period() perf/x86/intel: Disable userspace RDPMC usage for large PEBS