summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-11-15kvm: vmx: Rename NR_AUTOLOAD_MSRS to NR_LOADSTORE_MSRSAaron Lewis
Rename NR_AUTOLOAD_MSRS to NR_LOADSTORE_MSRS. This needs to be done due to the addition of the MSR-autostore area that will be added in a future patch. After that the name AUTOLOAD will no longer make sense. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15kvm: nested: Introduce read_and_check_msr_entry()Aaron Lewis
Add the function read_and_check_msr_entry() which just pulls some code out of nested_vmx_store_msr(). This will be useful as reusable code in upcoming patches. Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: mark functions in the header as "static inline"Paolo Bonzini
Correct a small inaccuracy in the shattering of vmx.c, which becomes visible now that pmu_intel.c includes nested.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} controlOliver Upton
The "load IA32_PERF_GLOBAL_CTRL" bit for VM-entry and VM-exit should only be exposed to the guest if IA32_PERF_GLOBAL_CTRL is a valid MSR. Create a new helper to allow pmu_refresh() to update the VM-Entry and VM-Exit controls to ensure PMU values are initialized when performing the is_valid_msr() check. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Load GUEST_IA32_PERF_GLOBAL_CTRL MSR on VM-EntryOliver Upton
Add condition to prepare_vmcs02 which loads IA32_PERF_GLOBAL_CTRL on VM-entry if the "load IA32_PERF_GLOBAL_CTRL" bit on the VM-entry control is set. Use SET_MSR_OR_WARN() rather than directly writing to the field to avoid overwrite by atomic_switch_perf_msrs(). Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Use kvm_set_msr to load IA32_PERF_GLOBAL_CTRL on VM-ExitOliver Upton
The existing implementation for loading the IA32_PERF_GLOBAL_CTRL MSR on VM-exit was incorrect, as the next call to atomic_switch_perf_msrs() could cause this value to be overwritten. Instead, call kvm_set_msr() which will allow atomic_switch_perf_msrs() to correctly set the values. Define a macro, SET_MSR_OR_WARN(), to set the MSR with kvm_set_msr() and WARN on failure. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Check HOST_IA32_PERF_GLOBAL_CTRL on VM-EntryOliver Upton
Add a consistency check on nested vm-entry for host's IA32_PERF_GLOBAL_CTRL from vmcs12. Per Intel's SDM Vol 3 26.2.2: If the "load IA32_PERF_GLOBAL_CTRL" VM-exit control is 1, bits reserved in the IA32_PERF_GLOBAL_CTRL MSR must be 0 in the field for that register" Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Check GUEST_IA32_PERF_GLOBAL_CTRL on VM-EntryOliver Upton
Add condition to nested_vmx_check_guest_state() to check the validity of GUEST_IA32_PERF_GLOBAL_CTRL. Per Intel's SDM Vol 3 26.3.1.1: If the "load IA32_PERF_GLOBAL_CTRL" VM-entry control is 1, bits reserved in the IA32_PERF_GLOBAL_CTRL MSR must be 0 in the field for that register. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: VMX: Add helper to check reserved bits in IA32_PERF_GLOBAL_CTRLOliver Upton
Create a helper function to check the validity of a proposed value for IA32_PERF_GLOBAL_CTRL from the existing check in intel_pmu_set_msr(). Per Intel's SDM, the reserved bits in IA32_PERF_GLOBAL_CTRL must be cleared for the corresponding host/guest state fields. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15selftests: kvm: Simplify loop in kvm_create_max_vcpus testWainer dos Santos Moschetta
On kvm_create_max_vcpus test remove unneeded local variable in the loop that add vcpus to the VM. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86: Optimization: Requst TLB flush in fast_cr3_switch() instead of do ↵Liran Alon
it directly When KVM emulates a nested VMEntry (L1->L2 VMEntry), it switches mmu root page. If nEPT is used, this will happen from kvm_init_shadow_ept_mmu()->__kvm_mmu_new_cr3() and otherwise it will happpen from nested_vmx_load_cr3()->kvm_mmu_new_cr3(). Either case, __kvm_mmu_new_cr3() will use fast_cr3_switch() in attempt to switch to a previously cached root page. In case fast_cr3_switch() finds a matching cached root page, it will set it in mmu->root_hpa and request KVM_REQ_LOAD_CR3 such that on next entry to guest, KVM will set root HPA in appropriate hardware fields (e.g. vmcs->eptp). In addition, fast_cr3_switch() calls kvm_x86_ops->tlb_flush() in order to flush TLB as MMU root page was replaced. This works as mmu->root_hpa, which vmx_flush_tlb() use, was already replaced in cached_root_available(). However, this may result in unnecessary INVEPT execution because a KVM_REQ_TLB_FLUSH may have already been requested. For example, by prepare_vmcs02() in case L1 don't use VPID. Therefore, change fast_cr3_switch() to just request TLB flush on next entry to guest. Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMCLike Xu
Currently, a host perf_event is created for a vPMC functionality emulation. It’s unpredictable to determine if a disabled perf_event will be reused. If they are disabled and are not reused for a considerable period of time, those obsolete perf_events would increase host context switch overhead that could have been avoided. If the guest doesn't WRMSR any of the vPMC's MSRs during an entire vcpu sched time slice, and its independent enable bit of the vPMC isn't set, we can predict that the guest has finished the use of this vPMC, and then do request KVM_REQ_PMU in kvm_arch_sched_in and release those perf_events in the first call of kvm_pmu_handle_event() after the vcpu is scheduled in. This lazy mechanism delays the event release time to the beginning of the next scheduled time slice if vPMC's MSRs aren't changed during this time slice. If guest comes back to use this vPMC in next time slice, a new perf event would be re-created via perf_event_create_kernel_counter() as usual. Suggested-by: Wei Wang <wei.w.wang@intel.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counterLike Xu
The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is a heavyweight and high-frequency operation, especially when host disables the watchdog (maximum 21000000 ns) which leads to an unacceptable latency of the guest NMI handler. It limits the use of vPMUs in the guest. When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop and release its existing perf_event (if any) every time EVEN in most cases almost the same requested perf_event will be created and configured again. For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl' for fixed) is the same as its current config AND a new sample period based on pmc->counter is accepted by host perf interface, the current event could be reused safely as a new created one does. Otherwise, do release the undesirable perf_event and reprogram a new one as usual. It's light-weight to call pmc_pause_counter (disable, read and reset event) and pmc_resume_counter (recalibrate period and re-enable event) as guest expects instead of release-and-create again on any condition. Compared to use the filterable event->attr or hw.config, a new 'u64 current_config' field is added to save the last original programed config for each vPMC. Based on this implementation, the number of calls to pmc_reprogram_counter is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event. In the usage of multiplexing perf sampling mode, the average latency of the guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up). If host disables watchdog, the minimum latecy of guest NMI handler could be speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average. Suggested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callbackLike Xu
Introduce a new callback msr_idx_to_pmc that returns a struct kvm_pmc*, and change kvm_pmu_is_valid_msr to return ".msr_idx_to_pmc(vcpu, msr) || .is_valid_msr(vcpu, msr)" and AMD just returns false from .is_valid_msr. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_ecxLike Xu
The leagcy pmu_ops->msr_idx_to_pmc is only called in kvm_pmu_rdpmc, so this function actually receives the contents of ECX before RDPMC, and translates it to a kvm_pmc. Let's clarify its semantic by renaming the existing msr_idx_to_pmc to rdpmc_ecx_to_pmc, and is_valid_msr_idx to is_valid_rdpmc_ecx; likewise for the wrapper kvm_pmu_is_valid_msr_idx. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15perf/core: Provide a kernel-internal interface to pause perf_eventLike Xu
Exporting perf_event_pause() as an external accessor for kernel users (such as KVM) who may do both disable perf_event and read count with just one time to hold perf_event_ctx_lock. Also the value could be reset optionally. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15perf/core: Provide a kernel-internal interface to recalibrate event periodLike Xu
Currently, perf_event_period() is used by user tools via ioctl. Based on naming convention, exporting perf_event_period() for kernel users (such as KVM) who may recalibrate the event period for their assigned counter according to their requirements. The perf_event_period() is an external accessor, just like the perf_event_{en,dis}able() and should thus use perf_event_ctx_lock(). Suggested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Update vmcs01 TPR_THRESHOLD if L2 changed L1 TPRLiran Alon
When L1 don't use TPR-Shadow to run L2, L0 configures vmcs02 without TPR-Shadow and install intercepts on CR8 access (load and store). If L1 do not intercept L2 CR8 access, L0 intercepts on those accesses will emulate load/store on L1's LAPIC TPR. If in this case L2 lowers TPR such that there is now an injectable interrupt to L1, apic_update_ppr() will request a KVM_REQ_EVENT which will trigger a call to update_cr8_intercept() to update TPR-Threshold to highest pending IRR priority. However, this update to TPR-Threshold is done while active vmcs is vmcs02 instead of vmcs01. Thus, when later at some point L0 will emulate an exit from L2 to L1, L1 will still run with high TPR-Threshold. This will result in every VMEntry to L1 to immediately exit on TPR_BELOW_THRESHOLD and continue to do so infinitely until some condition will cause KVM_REQ_EVENT to be set. (Note that TPR_BELOW_THRESHOLD exit handler do not set KVM_REQ_EVENT until apic_update_ppr() will notice a new injectable interrupt for PPR) To fix this issue, change update_cr8_intercept() such that if L2 lowers L1's TPR in a way that requires to lower L1's TPR-Threshold, save update to TPR-Threshold and apply it to vmcs01 when L0 emulates an exit from L2 to L1. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: VMX: Refactor update_cr8_intercept()Liran Alon
No functional changes. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: SVM: Remove check if APICv enabled in SVM update_cr8_intercept() handlerLiran Alon
This check is unnecessary as x86 update_cr8_intercept() which calls this VMX/SVM specific callback already performs this check. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: APIC: add helper func to remove duplicate code in kvm_pv_send_ipiMiaohe Lin
There are some duplicate code in kvm_pv_send_ipi when deal with ipi bitmap. Add helper func to remove it, and eliminate odd out label, get rid of unnecessary kvm_lapic_irq field init and so on. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: X86: avoid unused setup_syscalls_segments call when SYSCALL check failedMiaohe Lin
When SYSCALL/SYSENTER ability check failed, cs and ss is inited but remain not used. Delay initializing cs and ss until SYSCALL/SYSENTER ability check passed. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: MMIO: get rid of odd out_err label in kvm_coalesced_mmio_initMiaohe Lin
The out_err label and var ret is unnecessary, clean them up. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: VMX: Consume pending LAPIC INIT event when exit on INIT_SIGNALLiran Alon
Intel SDM section 25.2 OTHER CAUSES OF VM EXITS specifies the following on INIT signals: "Such exits do not modify register state or clear pending events as they would outside of VMX operation." When commit 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") was applied, I interepted above Intel SDM statement such that INIT_SIGNAL exit don’t consume the LAPIC INIT pending event. However, when Nadav Amit run matching kvm-unit-test on a bare-metal machine, it turned out my interpetation was wrong. i.e. INIT_SIGNAL exit does consume the LAPIC INIT pending event. (See: https://www.spinics.net/lists/kvm/msg196757.html) Therefore, fix KVM code to behave as observed on bare-metal. Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") Reported-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86: Prevent set vCPU into INIT/SIPI_RECEIVED state when INIT are latchedLiran Alon
Commit 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX operation. However, current API of KVM_SET_MP_STATE allows userspace to put vCPU into KVM_MP_STATE_SIPI_RECEIVED or KVM_MP_STATE_INIT_RECEIVED even when vCPU is in VMX operation. Fix this by introducing a util method to check if vCPU state latch INIT signals and use it in KVM_SET_MP_STATE handler. Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") Reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86: Evaluate latched_init in KVM_SET_VCPU_EVENTS when vCPU not in SMMLiran Alon
Commit 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX operation. However, current API of KVM_SET_VCPU_EVENTS defines this field as part of SMM state and only set pending LAPIC INIT event if vCPU is specified to be in SMM mode (events->smi.smm is set). Change KVM_SET_VCPU_EVENTS handler to set pending LAPIC INIT event by latched_init field regardless of if vCPU is in SMM mode or not. Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15x86: retpolines: eliminate retpoline from msr event handlersAndrea Arcangeli
It's enough to check the value and issue the direct call. After this commit is applied, here the most common retpolines executed under a high resolution timer workload in the guest on a VMX host: [..] @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 267 @[]: 2256 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 __kvm_wait_lapic_expire+284 vmx_vcpu_run.part.97+1091 vcpu_enter_guest+377 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 2390 @[]: 33410 @total: 315707 Note the highest hit above is __delay so probably not worth optimizing even if it would be more frequent than 2k hits per sec. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: retpolines: x86: eliminate retpoline from svm.c exit handlersAndrea Arcangeli
It's enough to check the exit value and issue a direct call to avoid the retpoline for all the common vmexit reasons. After this commit is applied, here the most common retpolines executed under a high resolution timer workload in the guest on a SVM host: [..] @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get_update_offsets_now+70 hrtimer_interrupt+131 smp_apic_timer_interrupt+106 apic_timer_interrupt+15 start_sw_timer+359 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 1940 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_r12+33 force_qs_rnp+217 rcu_gp_kthread+1270 kthread+268 ret_from_fork+34 ]: 4644 @[]: 25095 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 lapic_next_event+28 clockevents_program_event+148 hrtimer_start_range_ns+528 start_sw_timer+356 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 41474 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 clockevents_program_event+148 hrtimer_start_range_ns+528 start_sw_timer+356 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 41474 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get+58 clockevents_program_event+84 hrtimer_start_range_ns+528 start_sw_timer+356 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 41887 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 lapic_next_event+28 clockevents_program_event+148 hrtimer_try_to_cancel+168 hrtimer_cancel+21 kvm_set_lapic_tscdeadline_msr+43 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 42723 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 clockevents_program_event+148 hrtimer_try_to_cancel+168 hrtimer_cancel+21 kvm_set_lapic_tscdeadline_msr+43 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 42766 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get+58 clockevents_program_event+84 hrtimer_try_to_cancel+168 hrtimer_cancel+21 kvm_set_lapic_tscdeadline_msr+43 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 42848 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get+58 start_sw_timer+279 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 499845 @total: 1780243 SVM has no TSC based programmable preemption timer so it is invoking ktime_get() frequently. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlersAndrea Arcangeli
It's enough to check the exit value and issue a direct call to avoid the retpoline for all the common vmexit reasons. Of course CONFIG_RETPOLINE already forbids gcc to use indirect jumps while compiling all switch() statements, however switch() would still allow the compiler to bisect the case value. It's more efficient to prioritize the most frequent vmexits instead. The halt may be slow paths from the point of the guest, but not necessarily so from the point of the host if the host runs at full CPU capacity and no host CPU is ever left idle. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86: optimize more exit handlers in vmx.cAndrea Arcangeli
Eliminate wasteful call/ret non RETPOLINE case and unnecessary fentry dynamic tracing hooking points. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15Merge tag 'usb-serial-5.5-rc1' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next Johan writes: USB-serial updates for 5.5-rc1 Here are the USB-serial updates for 5.5-rc1, including: - support for a new class of pl2303 devices - improved divisor calculations for ch341 - fixes for a remote-wakeup bug in the moschip drivers - improved device-type handling in mos7840 - various cleanups of mos7840 Included are also some new device ids. All have been in linux-next with no reported issues. Signed-off-by: Johan Hovold <johan@kernel.org> * tag 'usb-serial-5.5-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P USB: serial: option: add support for Foxconn T77W968 LTE modules USB: serial: mos7840: drop port open flag USB: serial: mos7840: drop read-urb check USB: serial: mos7840: drop port driver data accessors USB: serial: mos7840: drop serial struct accessor USB: serial: mos7840: drop paranoid serial checks USB: serial: mos7840: drop paranoid port checks USB: serial: mos7840: drop redundant urb context check USB: serial: mos7840: rip out broken interrupt handling USB: serial: mos7840: fix probe error handling USB: serial: mos7840: document MCS7810 detection hack USB: serial: mos7840: clean up device-type handling USB: serial: mos7840: fix remote wakeup USB: serial: mos7720: fix remote wakeup USB: serial: option: add support for DW5821e with eSIM support USB: serial: mos7840: add USB ID to support Moxa UPort 2210 USB: serial: ch341: reimplement line-speed handling USB: serial: pl2303: add support for PL2303HXN
2019-11-15sched/uclamp: Fix incorrect conditionQais Yousef
uclamp_update_active() should perform the update when p->uclamp[clamp_id].active is true. But when the logic was inverted in [1], the if condition wasn't inverted correctly too. [1] https://lore.kernel.org/lkml/20190902073836.GO2369@hirez.programming.kicks-ass.net/ Reported-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Patrick Bellasi <patrick.bellasi@matbug.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: babbe170e053 ("sched/uclamp: Update CPU's refcount on TG's clamp changes") Link: https://lkml.kernel.org/r/20191114211052.15116-1-qais.yousef@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-15genirq: Fix function documentation of __irq_alloc_descs()luanshi
The function got renamed at some point, but the kernel-doc was not updated. Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1573656093-8643-1-git-send-email-zhangliguang@linux.alibaba.com
2019-11-15irq_work: Fix IRQ_WORK_BUSY bit clearingFrederic Weisbecker
While attempting to clear the busy bit at the end of a work execution, atomic_cmpxchg() expects the value of the flags with the pending bit cleared as the old value. However by mistake the value of the flags is passed without clearing the pending bit first. As a result, clearing the busy bit fails and irq_work_sync() may stall: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [blktrace:4948] CPU: 0 PID: 4948 Comm: blktrace Not tainted 5.4.0-rc7-00003-gfeb4a51323bab #1 RIP: 0010:irq_work_sync+0x4/0x10 Call Trace: relay_close_buf+0x19/0x50 relay_close+0x64/0x100 blk_trace_free+0x1f/0x50 __blk_trace_remove+0x1e/0x30 blk_trace_ioctl+0x11b/0x140 blkdev_ioctl+0x6c1/0xa40 block_ioctl+0x39/0x40 do_vfs_ioctl+0xa5/0x700 ksys_ioctl+0x70/0x80 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5b/0x1d0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 So clear the appropriate bit before passing the old flags to cmpxchg(). Fixes: feb4a51323ba ("irq_work: Slightly simplify IRQ_WORK_PENDING clearing") Reported-by: kernel test robot <rong.a.chen@intel.com> Reported-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Leonard Crestez <leonard.crestez@nxp.com> Link: https://lkml.kernel.org/r/20191113171201.14032-1-frederic@kernel.org
2019-11-15x86/pci: Remove #ifdef __KERNEL__ guard from <asm/pci.h>Christoph Hellwig
pci.h is not a UAPI header, so the __KERNEL__ ifdef is rather pointless. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191113071836.21041-4-hch@lst.de
2019-11-15x86/pci: Remove pci_64.hChristoph Hellwig
This file only contains external declarations for two non-existing function pointers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191113071836.21041-3-hch@lst.de
2019-11-15x86: Remove the calgary IOMMU driverChristoph Hellwig
The calgary IOMMU was only used on high-end IBM systems in the early x86_64 age and has no known users left. Remove it to avoid having to touch it for pending changes to the DMA API. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191113071836.21041-2-hch@lst.de
2019-11-15x86/hyperv: Initialize clockevents earlier in CPU onliningMichael Kelley
Hyper-V has historically initialized stimer-based clockevents late in the process of onlining a CPU because clockevents depend on stimer interrupts. In the original Hyper-V design, stimer interrupts generate a VMbus message, so the VMbus machinery must be running first, and VMbus can't be initialized until relatively late. On x86/64, LAPIC timer based clockevents are used during early initialization before VMbus and stimer-based clockevents are ready, and again during CPU offlining after the stimer clockevents have been shut down. Unfortunately, this design creates problems when offlining CPUs for hibernation or other purposes. stimer-based clockevents are shut down relatively early in the offlining process, so clockevents_unbind_device() must be used to fallback to the LAPIC-based clockevents for the remainder of the offlining process. Furthermore, the late initialization and early shutdown of stimer-based clockevents doesn't work well on ARM64 since there is no other timer like the LAPIC to fallback to. So CPU onlining and offlining doesn't work properly. Fix this by recognizing that stimer Direct Mode is the normal path for newer versions of Hyper-V on x86/64, and the only path on other architectures. With stimer Direct Mode, stimer interrupts don't require any VMbus machinery. stimer clockevents can be initialized and shut down consistent with how it is done for other clockevent devices. While the old VMbus-based stimer interrupts must still be supported for backward compatibility on x86, that mode of operation can be treated as legacy. So add a new Hyper-V stimer entry in the CPU hotplug state list, and use that new state when in Direct Mode. Update the Hyper-V clocksource driver to allocate and initialize stimer clockevents earlier during boot. Update Hyper-V initialization and the VMbus driver to use this new design. As a result, the LAPIC timer is no longer used during boot or CPU onlining/offlining and clockevents_unbind_device() is not called. But retain the old design as a legacy implementation for older versions of Hyper-V that don't support Direct Mode. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com
2019-11-15Merge branch 'linus' into x86/hypervThomas Gleixner
Pick up upstream fixes to avoid conflicts.
2019-11-15KVM: Add a comment describing the /dev/kvm no_compat handlingMarc Zyngier
Add a comment explaining the rational behind having both no_compat open and ioctl callbacks to fend off compat tasks. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15net: wireless: ti: remove local VENDOR_ID and DEVICE_ID definitionsH. Nikolaus Schaller
They are already included from mmc/sdio_ids.h and do not need a local definition. Fixes: 884f38607897 ("mmc: core: move some sdio IDs out of quirks file") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Acked-by: Kalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-11-15net: wireless: ti: wl1251 use new SDIO_VENDOR_ID_TI_WL1251 definitionH. Nikolaus Schaller
SDIO_VENDOR_ID_TI_WL1251 is now defined in mmc/sdio_ids.h separately from SDIO_VENDOR_ID_TI for wl1271. Fixes: 884f38607897 ("mmc: core: move some sdio IDs out of quirks file") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Acked-by: Kalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-11-15mmc: core: fix wl1251 sdio quirksH. Nikolaus Schaller
wl1251 and wl1271 have different vendor id and device id. So we need to handle both with sdio quirks. Fixes: 884f38607897 ("mmc: core: move some sdio IDs out of quirks file") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-11-15mmc: sdio: fix wl1251 vendor idH. Nikolaus Schaller
v4.11-rc1 did introduce a patch series that rearranged the sdio quirks into a header file. Unfortunately this did forget to handle SDIO_VENDOR_ID_TI differently between wl1251 and wl1271 with the result that although the wl1251 was found on the sdio bus, the firmware did not load any more and there was no interface registration. This patch defines separate constants to be used by sdio quirks and drivers. Fixes: 884f38607897 ("mmc: core: move some sdio IDs out of quirks file") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-11-15mmc: host: omap-hsmmc: remove init_card pdata callback from pdataH. Nikolaus Schaller
Now as we have removed the last user (pandora_wl1251_init_card) of this callback, we can remove it from the hsmmc code. Suggested-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-11-15omap: remove omap2_hsmmc_info in old hsmmc.[ch] and update MakefileH. Nikolaus Schaller
There is a new driver in drivers/mmc/host/omap_hsmmc.c configured by CONFIG_MMC_OMAP_HS and the last user was the pdata-quirks for pandora. Suggested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-11-15omap: pdata-quirks: remove openpandora quirks for mmc3 and wl1251H. Nikolaus Schaller
With a wl1251 child node of mmc3 in the device tree decoded in omap_hsmmc.c to handle special wl1251 initialization, we do no longer need to instantiate the mmc3 through pdata quirks. We also can remove the wlan regulator and reset/interrupt definitions and do them through device tree. Fixes: 81eef6ca9201 ("mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.7+ Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-11-15omap: pdata-quirks: revert pandora specific gpiod additionsH. Nikolaus Schaller
This partly reverts the commit efdfeb079cc3 ("regulator: fixed: Convert to use GPIO descriptor only"). We must remove this from mainline first, so that the following patch to remove the openpandora quirks for mmc3 and wl1251 cleanly applies to stable v4.9, v4.14, v4.19 where the above mentioned patch is not yet present. Since the code affected is removed (no pandora gpios in pdata-quirks and more), there will be no matching revert-of-the-revert. Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-11-15mmc: host: omap_hsmmc: add code for special init of wl1251 to get rid of ↵H. Nikolaus Schaller
pandora_wl1251_init_card Pandora_wl1251_init_card was used to do special pdata based setup of the sdio mmc interface. This does no longer work with v4.7 and later. A fix requires a device tree based mmc3 setup. Therefore we move the special setup to omap_hsmmc.c instead of calling some pdata supplied init_card function. The new code checks for a DT child node compatible to wl1251 so it will not affect other MMC3 use cases. Generally, this code was and still is a hack and should be moved to mmc core to e.g. read such properties from optional DT child nodes. Fixes: 81eef6ca9201 ("mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.7+ [Ulf: Fixed up some checkpatch complaints] Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-11-15ARM: dts: pandora-common: define wl1251 as child node of mmc3H. Nikolaus Schaller
Since v4.7 the dma initialization requires that there is a device tree property for "rx" and "tx" channels which is not provided by the pdata-quirks initialization. By conversion of the mmc3 setup to device tree this will finally allows to remove the OpenPandora wlan specific omap3 data-quirks. Fixes: 81eef6ca9201 ("mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.7+ Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>