Age | Commit message (Collapse) | Author |
|
The "load IA32_PERF_GLOBAL_CTRL" bit for VM-entry and VM-exit should
only be exposed to the guest if IA32_PERF_GLOBAL_CTRL is a valid MSR.
Create a new helper to allow pmu_refresh() to update the VM-Entry and
VM-Exit controls to ensure PMU values are initialized when performing
the is_valid_msr() check.
Suggested-by: Jim Mattson <jmattson@google.com>
Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add condition to prepare_vmcs02 which loads IA32_PERF_GLOBAL_CTRL on
VM-entry if the "load IA32_PERF_GLOBAL_CTRL" bit on the VM-entry control
is set. Use SET_MSR_OR_WARN() rather than directly writing to the field
to avoid overwrite by atomic_switch_perf_msrs().
Suggested-by: Jim Mattson <jmattson@google.com>
Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The existing implementation for loading the IA32_PERF_GLOBAL_CTRL MSR
on VM-exit was incorrect, as the next call to atomic_switch_perf_msrs()
could cause this value to be overwritten. Instead, call kvm_set_msr()
which will allow atomic_switch_perf_msrs() to correctly set the values.
Define a macro, SET_MSR_OR_WARN(), to set the MSR with kvm_set_msr()
and WARN on failure.
Suggested-by: Jim Mattson <jmattson@google.com>
Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a consistency check on nested vm-entry for host's
IA32_PERF_GLOBAL_CTRL from vmcs12. Per Intel's SDM Vol 3 26.2.2:
If the "load IA32_PERF_GLOBAL_CTRL"
VM-exit control is 1, bits reserved in the IA32_PERF_GLOBAL_CTRL
MSR must be 0 in the field for that register"
Suggested-by: Jim Mattson <jmattson@google.com>
Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add condition to nested_vmx_check_guest_state() to check the validity of
GUEST_IA32_PERF_GLOBAL_CTRL. Per Intel's SDM Vol 3 26.3.1.1:
If the "load IA32_PERF_GLOBAL_CTRL" VM-entry control is 1, bits
reserved in the IA32_PERF_GLOBAL_CTRL MSR must be 0 in the field for that
register.
Suggested-by: Jim Mattson <jmattson@google.com>
Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Create a helper function to check the validity of a proposed value for
IA32_PERF_GLOBAL_CTRL from the existing check in intel_pmu_set_msr().
Per Intel's SDM, the reserved bits in IA32_PERF_GLOBAL_CTRL must be
cleared for the corresponding host/guest state fields.
Suggested-by: Jim Mattson <jmattson@google.com>
Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
On kvm_create_max_vcpus test remove unneeded local
variable in the loop that add vcpus to the VM.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
it directly
When KVM emulates a nested VMEntry (L1->L2 VMEntry), it switches mmu root
page. If nEPT is used, this will happen from
kvm_init_shadow_ept_mmu()->__kvm_mmu_new_cr3() and otherwise it will
happpen from nested_vmx_load_cr3()->kvm_mmu_new_cr3(). Either case,
__kvm_mmu_new_cr3() will use fast_cr3_switch() in attempt to switch to a
previously cached root page.
In case fast_cr3_switch() finds a matching cached root page, it will
set it in mmu->root_hpa and request KVM_REQ_LOAD_CR3 such that on
next entry to guest, KVM will set root HPA in appropriate hardware
fields (e.g. vmcs->eptp). In addition, fast_cr3_switch() calls
kvm_x86_ops->tlb_flush() in order to flush TLB as MMU root page
was replaced.
This works as mmu->root_hpa, which vmx_flush_tlb() use, was
already replaced in cached_root_available(). However, this may
result in unnecessary INVEPT execution because a KVM_REQ_TLB_FLUSH
may have already been requested. For example, by prepare_vmcs02()
in case L1 don't use VPID.
Therefore, change fast_cr3_switch() to just request TLB flush on
next entry to guest.
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently, a host perf_event is created for a vPMC functionality emulation.
It’s unpredictable to determine if a disabled perf_event will be reused.
If they are disabled and are not reused for a considerable period of time,
those obsolete perf_events would increase host context switch overhead that
could have been avoided.
If the guest doesn't WRMSR any of the vPMC's MSRs during an entire vcpu
sched time slice, and its independent enable bit of the vPMC isn't set,
we can predict that the guest has finished the use of this vPMC, and then
do request KVM_REQ_PMU in kvm_arch_sched_in and release those perf_events
in the first call of kvm_pmu_handle_event() after the vcpu is scheduled in.
This lazy mechanism delays the event release time to the beginning of the
next scheduled time slice if vPMC's MSRs aren't changed during this time
slice. If guest comes back to use this vPMC in next time slice, a new perf
event would be re-created via perf_event_create_kernel_counter() as usual.
Suggested-by: Wei Wang <wei.w.wang@intel.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
a heavyweight and high-frequency operation, especially when host disables
the watchdog (maximum 21000000 ns) which leads to an unacceptable latency
of the guest NMI handler. It limits the use of vPMUs in the guest.
When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop
and release its existing perf_event (if any) every time EVEN in most cases
almost the same requested perf_event will be created and configured again.
For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl'
for fixed) is the same as its current config AND a new sample period based
on pmc->counter is accepted by host perf interface, the current event could
be reused safely as a new created one does. Otherwise, do release the
undesirable perf_event and reprogram a new one as usual.
It's light-weight to call pmc_pause_counter (disable, read and reset event)
and pmc_resume_counter (recalibrate period and re-enable event) as guest
expects instead of release-and-create again on any condition. Compared to
use the filterable event->attr or hw.config, a new 'u64 current_config'
field is added to save the last original programed config for each vPMC.
Based on this implementation, the number of calls to pmc_reprogram_counter
is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event.
In the usage of multiplexing perf sampling mode, the average latency of the
guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up).
If host disables watchdog, the minimum latecy of guest NMI handler could be
speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average.
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Introduce a new callback msr_idx_to_pmc that returns a struct kvm_pmc*,
and change kvm_pmu_is_valid_msr to return ".msr_idx_to_pmc(vcpu, msr) ||
.is_valid_msr(vcpu, msr)" and AMD just returns false from .is_valid_msr.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The leagcy pmu_ops->msr_idx_to_pmc is only called in kvm_pmu_rdpmc, so
this function actually receives the contents of ECX before RDPMC, and
translates it to a kvm_pmc. Let's clarify its semantic by renaming the
existing msr_idx_to_pmc to rdpmc_ecx_to_pmc, and is_valid_msr_idx to
is_valid_rdpmc_ecx; likewise for the wrapper kvm_pmu_is_valid_msr_idx.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Exporting perf_event_pause() as an external accessor for kernel users (such
as KVM) who may do both disable perf_event and read count with just one
time to hold perf_event_ctx_lock. Also the value could be reset optionally.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently, perf_event_period() is used by user tools via ioctl. Based on
naming convention, exporting perf_event_period() for kernel users (such
as KVM) who may recalibrate the event period for their assigned counter
according to their requirements.
The perf_event_period() is an external accessor, just like the
perf_event_{en,dis}able() and should thus use perf_event_ctx_lock().
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When L1 don't use TPR-Shadow to run L2, L0 configures vmcs02 without
TPR-Shadow and install intercepts on CR8 access (load and store).
If L1 do not intercept L2 CR8 access, L0 intercepts on those accesses
will emulate load/store on L1's LAPIC TPR. If in this case L2 lowers
TPR such that there is now an injectable interrupt to L1,
apic_update_ppr() will request a KVM_REQ_EVENT which will trigger a call
to update_cr8_intercept() to update TPR-Threshold to highest pending IRR
priority.
However, this update to TPR-Threshold is done while active vmcs is
vmcs02 instead of vmcs01. Thus, when later at some point L0 will
emulate an exit from L2 to L1, L1 will still run with high
TPR-Threshold. This will result in every VMEntry to L1 to immediately
exit on TPR_BELOW_THRESHOLD and continue to do so infinitely until
some condition will cause KVM_REQ_EVENT to be set.
(Note that TPR_BELOW_THRESHOLD exit handler do not set KVM_REQ_EVENT
until apic_update_ppr() will notice a new injectable interrupt for PPR)
To fix this issue, change update_cr8_intercept() such that if L2 lowers
L1's TPR in a way that requires to lower L1's TPR-Threshold, save update
to TPR-Threshold and apply it to vmcs01 when L0 emulates an exit from
L2 to L1.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
No functional changes.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This check is unnecessary as x86 update_cr8_intercept() which calls
this VMX/SVM specific callback already performs this check.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
There are some duplicate code in kvm_pv_send_ipi when deal with ipi
bitmap. Add helper func to remove it, and eliminate odd out label,
get rid of unnecessary kvm_lapic_irq field init and so on.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When SYSCALL/SYSENTER ability check failed, cs and ss is inited but
remain not used. Delay initializing cs and ss until SYSCALL/SYSENTER
ability check passed.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The out_err label and var ret is unnecessary, clean them up.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Intel SDM section 25.2 OTHER CAUSES OF VM EXITS specifies the following
on INIT signals: "Such exits do not modify register state or clear pending
events as they would outside of VMX operation."
When commit 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states")
was applied, I interepted above Intel SDM statement such that
INIT_SIGNAL exit don’t consume the LAPIC INIT pending event.
However, when Nadav Amit run matching kvm-unit-test on a bare-metal
machine, it turned out my interpetation was wrong. i.e. INIT_SIGNAL
exit does consume the LAPIC INIT pending event.
(See: https://www.spinics.net/lists/kvm/msg196757.html)
Therefore, fix KVM code to behave as observed on bare-metal.
Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states")
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states")
fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX
operation.
However, current API of KVM_SET_MP_STATE allows userspace to put vCPU
into KVM_MP_STATE_SIPI_RECEIVED or KVM_MP_STATE_INIT_RECEIVED even when
vCPU is in VMX operation.
Fix this by introducing a util method to check if vCPU state latch INIT
signals and use it in KVM_SET_MP_STATE handler.
Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states")
Reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states")
fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX
operation.
However, current API of KVM_SET_VCPU_EVENTS defines this field as
part of SMM state and only set pending LAPIC INIT event if vCPU is
specified to be in SMM mode (events->smi.smm is set).
Change KVM_SET_VCPU_EVENTS handler to set pending LAPIC INIT event
by latched_init field regardless of if vCPU is in SMM mode or not.
Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states")
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
It's enough to check the value and issue the direct call.
After this commit is applied, here the most common retpolines executed
under a high resolution timer workload in the guest on a VMX host:
[..]
@[
trace_retpoline+1
__trace_retpoline+30
__x86_indirect_thunk_rax+33
do_syscall_64+89
entry_SYSCALL_64_after_hwframe+68
]: 267
@[]: 2256
@[
trace_retpoline+1
__trace_retpoline+30
__x86_indirect_thunk_rax+33
__kvm_wait_lapic_expire+284
vmx_vcpu_run.part.97+1091
vcpu_enter_guest+377
kvm_arch_vcpu_ioctl_run+261
kvm_vcpu_ioctl+559
do_vfs_ioctl+164
ksys_ioctl+96
__x64_sys_ioctl+22
do_syscall_64+89
entry_SYSCALL_64_after_hwframe+68
]: 2390
@[]: 33410
@total: 315707
Note the highest hit above is __delay so probably not worth optimizing
even if it would be more frequent than 2k hits per sec.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
It's enough to check the exit value and issue a direct call to avoid
the retpoline for all the common vmexit reasons.
After this commit is applied, here the most common retpolines executed
under a high resolution timer workload in the guest on a SVM host:
[..]
@[
trace_retpoline+1
__trace_retpoline+30
__x86_indirect_thunk_rax+33
ktime_get_update_offsets_now+70
hrtimer_interrupt+131
smp_apic_timer_interrupt+106
apic_timer_interrupt+15
start_sw_timer+359
restart_apic_timer+85
kvm_set_msr_common+1497
msr_interception+142
vcpu_enter_guest+684
kvm_arch_vcpu_ioctl_run+261
kvm_vcpu_ioctl+559
do_vfs_ioctl+164
ksys_ioctl+96
__x64_sys_ioctl+22
do_syscall_64+89
entry_SYSCALL_64_after_hwframe+68
]: 1940
@[
trace_retpoline+1
__trace_retpoline+30
__x86_indirect_thunk_r12+33
force_qs_rnp+217
rcu_gp_kthread+1270
kthread+268
ret_from_fork+34
]: 4644
@[]: 25095
@[
trace_retpoline+1
__trace_retpoline+30
__x86_indirect_thunk_rax+33
lapic_next_event+28
clockevents_program_event+148
hrtimer_start_range_ns+528
start_sw_timer+356
restart_apic_timer+85
kvm_set_msr_common+1497
msr_interception+142
vcpu_enter_guest+684
kvm_arch_vcpu_ioctl_run+261
kvm_vcpu_ioctl+559
do_vfs_ioctl+164
ksys_ioctl+96
__x64_sys_ioctl+22
do_syscall_64+89
entry_SYSCALL_64_after_hwframe+68
]: 41474
@[
trace_retpoline+1
__trace_retpoline+30
__x86_indirect_thunk_rax+33
clockevents_program_event+148
hrtimer_start_range_ns+528
start_sw_timer+356
restart_apic_timer+85
kvm_set_msr_common+1497
msr_interception+142
vcpu_enter_guest+684
kvm_arch_vcpu_ioctl_run+261
kvm_vcpu_ioctl+559
do_vfs_ioctl+164
ksys_ioctl+96
__x64_sys_ioctl+22
do_syscall_64+89
entry_SYSCALL_64_after_hwframe+68
]: 41474
@[
trace_retpoline+1
__trace_retpoline+30
__x86_indirect_thunk_rax+33
ktime_get+58
clockevents_program_event+84
hrtimer_start_range_ns+528
start_sw_timer+356
restart_apic_timer+85
kvm_set_msr_common+1497
msr_interception+142
vcpu_enter_guest+684
kvm_arch_vcpu_ioctl_run+261
kvm_vcpu_ioctl+559
do_vfs_ioctl+164
ksys_ioctl+96
__x64_sys_ioctl+22
do_syscall_64+89
entry_SYSCALL_64_after_hwframe+68
]: 41887
@[
trace_retpoline+1
__trace_retpoline+30
__x86_indirect_thunk_rax+33
lapic_next_event+28
clockevents_program_event+148
hrtimer_try_to_cancel+168
hrtimer_cancel+21
kvm_set_lapic_tscdeadline_msr+43
kvm_set_msr_common+1497
msr_interception+142
vcpu_enter_guest+684
kvm_arch_vcpu_ioctl_run+261
kvm_vcpu_ioctl+559
do_vfs_ioctl+164
ksys_ioctl+96
__x64_sys_ioctl+22
do_syscall_64+89
entry_SYSCALL_64_after_hwframe+68
]: 42723
@[
trace_retpoline+1
__trace_retpoline+30
__x86_indirect_thunk_rax+33
clockevents_program_event+148
hrtimer_try_to_cancel+168
hrtimer_cancel+21
kvm_set_lapic_tscdeadline_msr+43
kvm_set_msr_common+1497
msr_interception+142
vcpu_enter_guest+684
kvm_arch_vcpu_ioctl_run+261
kvm_vcpu_ioctl+559
do_vfs_ioctl+164
ksys_ioctl+96
__x64_sys_ioctl+22
do_syscall_64+89
entry_SYSCALL_64_after_hwframe+68
]: 42766
@[
trace_retpoline+1
__trace_retpoline+30
__x86_indirect_thunk_rax+33
ktime_get+58
clockevents_program_event+84
hrtimer_try_to_cancel+168
hrtimer_cancel+21
kvm_set_lapic_tscdeadline_msr+43
kvm_set_msr_common+1497
msr_interception+142
vcpu_enter_guest+684
kvm_arch_vcpu_ioctl_run+261
kvm_vcpu_ioctl+559
do_vfs_ioctl+164
ksys_ioctl+96
__x64_sys_ioctl+22
do_syscall_64+89
entry_SYSCALL_64_after_hwframe+68
]: 42848
@[
trace_retpoline+1
__trace_retpoline+30
__x86_indirect_thunk_rax+33
ktime_get+58
start_sw_timer+279
restart_apic_timer+85
kvm_set_msr_common+1497
msr_interception+142
vcpu_enter_guest+684
kvm_arch_vcpu_ioctl_run+261
kvm_vcpu_ioctl+559
do_vfs_ioctl+164
ksys_ioctl+96
__x64_sys_ioctl+22
do_syscall_64+89
entry_SYSCALL_64_after_hwframe+68
]: 499845
@total: 1780243
SVM has no TSC based programmable preemption timer so it is invoking
ktime_get() frequently.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
It's enough to check the exit value and issue a direct call to avoid
the retpoline for all the common vmexit reasons.
Of course CONFIG_RETPOLINE already forbids gcc to use indirect jumps
while compiling all switch() statements, however switch() would still
allow the compiler to bisect the case value. It's more efficient to
prioritize the most frequent vmexits instead.
The halt may be slow paths from the point of the guest, but not
necessarily so from the point of the host if the host runs at full CPU
capacity and no host CPU is ever left idle.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Eliminate wasteful call/ret non RETPOLINE case and unnecessary fentry
dynamic tracing hooking points.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
uclamp_update_active() should perform the update when
p->uclamp[clamp_id].active is true. But when the logic was inverted in
[1], the if condition wasn't inverted correctly too.
[1] https://lore.kernel.org/lkml/20190902073836.GO2369@hirez.programming.kicks-ass.net/
Reported-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Patrick Bellasi <patrick.bellasi@matbug.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: babbe170e053 ("sched/uclamp: Update CPU's refcount on TG's clamp changes")
Link: https://lkml.kernel.org/r/20191114211052.15116-1-qais.yousef@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add a comment explaining the rational behind having both
no_compat open and ioctl callbacks to fend off compat tasks.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.5/drivers
Pull MD changes from Song.
* 'md-next' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md:
drivers/md/raid5-ppl.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
drivers/md/raid5.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
|
|
Huazhong Tan says:
====================
net: hns3: fixes for -net
This series includes misc fixes for the HNS3 ethernet driver.
[patch 1/3] adds a compatible handling for configuration of
MAC VLAN swithch parameter.
[patch 2/3] re-allocates SSU buffer when pfc_en changed.
[patch 3/3] fixes a bug for ETS bandwidth validation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some device only support 4 TCs, but the driver check the total
bandwidth of 8 TCs, so may cause wrong configurations write to
the hw.
This patch uses hdev->tc_max to instead HNAE3_MAX_TC to fix it.
Fixes: e432abfb99e5 ("net: hns3: add common validation in hclge_dcb")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a TC's PFC is disabled or enabled, the RX private buffer for
this TC need to be changed too, otherwise this may cause packet
dropped problem.
This patch fixes it by calling hclge_buffer_alloc to reallocate
buffer when pfc_en changes.
Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously, hns3 driver just directly send specific setting bit
and mask bits of MAC VLAN switch parameter to the firmware, it
can not be compatible with the old firmware, because the old one
ignores mask bits and covers all bits with new setting bits.
So when running with old firmware, the communication between PF
and VF will fail after resetting or configuring spoof check, since
they will do the MAC VLAN switch parameter configuration.
This patch fixes this problem by reading switch parameter firstly,
then just modifies the corresponding bit and sends it to firmware.
Fixes: dd2956eab104 ("net: hns3: not allow SSU loopback while execute ethtool -t dev")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pre-allocates buffers sufficient for the maximum supported MTU (2026) in
order to eliminate the possibility of resource exhaustion when changing the
MTU while the device is up.
Signed-off-by: Ulrich Hecht <uli+renesas@fpond.eu>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The tipc prefix for log messages generated by tipc was
removed in commit 07f6c4bc048a ("tipc: convert tipc reference
table to use generic rhashtable").
This is still a useful prefix so add it back.
Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://people.freedesktop.org/~agd5f/linux into drm-fixes
drm-fixes-5.4-2019-11-14:
amdgpu:
- Fix a potential crash in firmware parsing
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191114221354.3914-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
- One fix to the dotclock dividers range for sun4i
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191113142645.GA967172@gilmour.lan
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- MOCS table fixes for EHL and TGL
- Update Display's rawclock on resume
- GVT's dmabuf reference drop fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191114055302.GA3564@intel.com
|
|
As it is consistent with prefixes of other write life time hints.
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
As it is consistent with prefixes of other write life time hints.
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
v2: declare as (struct common_firmware_header *) type because
struct xxx_firmware_header inherits from it
When CE's ucode_id(8) is used to get sdma_hdr, we will be accessing an
unallocated amdgpu_firmware_info instance.
This issue appears on rhel7.7 with gcc 4.8.5. Newer compilers might have
optimized out such 'defined but not referenced' variable.
[ 1120.798564] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
[ 1120.806703] IP: [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
[ 1120.813693] PGD 80000002603ff067 PUD 271b8d067 PMD 0
[ 1120.818931] Oops: 0000 [#1] SMP
[ 1120.822245] Modules linked in: amdgpu(OE+) amdkcl(OE) amd_iommu_v2 amdttm(OE) amd_sched(OE) xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun bridge stp llc devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dm_mirror dm_region_hash dm_log dm_mod intel_pmc_core intel_powerclamp coretemp intel_rapl joydev kvm_intel eeepc_wmi asus_wmi kvm sparse_keymap iTCO_wdt irqbypass rfkill crc32_pclmul snd_hda_codec_realtek mxm_wmi ghash_clmulni_intel intel_wmi_thunderbolt iTCO_vendor_support snd_hda_codec_generic snd_hda_codec_hdmi aesni_intel lrw gf128mul glue_helper ablk_helper sg cryptd pcspkr snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd pinctrl_sunrisepoint pinctrl_intel soundcore acpi_pad mei_me wmi mei i2c_i801 pcc_cpufreq ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic i915 i2c_algo_bit iosf_mbi drm_kms_helper e1000e syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm ptp libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw pps_core drm_panel_orientation_quirks video i2c_hid
[ 1120.954136] CPU: 4 PID: 2426 Comm: modprobe Tainted: G OE ------------ 3.10.0-1062.el7.x86_64 #1
[ 1120.964390] Hardware name: System manufacturer System Product Name/Z170-A, BIOS 1302 11/09/2015
[ 1120.973321] task: ffff991ef1e3c1c0 ti: ffff991ee625c000 task.ti: ffff991ee625c000
[ 1120.981020] RIP: 0010:[<ffffffffc0e3c9b3>] [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
[ 1120.990483] RSP: 0018:ffff991ee625f950 EFLAGS: 00010202
[ 1120.995935] RAX: 0000000000000002 RBX: ffff991edf6b2d38 RCX: ffff991edf6a0000
[ 1121.003391] RDX: 0000000000000000 RSI: ffff991f01d13898 RDI: ffffffffc110afb3
[ 1121.010706] RBP: ffff991ee625f9b0 R08: 0000000000000000 R09: 0000000000000000
[ 1121.018029] R10: 00000000000004c4 R11: ffff991ee625f64e R12: ffff991edf6b3220
[ 1121.025353] R13: ffff991edf6a0000 R14: 0000000000000008 R15: ffff991edf6b2d30
[ 1121.032666] FS: 00007f97b0c0b740(0000) GS:ffff991f01d00000(0000) knlGS:0000000000000000
[ 1121.041000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1121.046880] CR2: 000000000000000a CR3: 000000025e604000 CR4: 00000000003607e0
[ 1121.054239] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1121.061631] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1121.068938] Call Trace:
[ 1121.071494] [<ffffffffc0e3dba8>] psp_hw_init+0x218/0x270 [amdgpu]
[ 1121.077886] [<ffffffffc0da3188>] amdgpu_device_fw_loading+0xe8/0x160 [amdgpu]
[ 1121.085296] [<ffffffffc0e3b34c>] ? vega10_ih_irq_init+0x4bc/0x730 [amdgpu]
[ 1121.092534] [<ffffffffc0da5c75>] amdgpu_device_init+0x1495/0x1c90 [amdgpu]
[ 1121.099675] [<ffffffffc0da9cab>] amdgpu_driver_load_kms+0x8b/0x2f0 [amdgpu]
[ 1121.106888] [<ffffffffc01b25cf>] drm_dev_register+0x12f/0x1d0 [drm]
[ 1121.113419] [<ffffffffa4dcdfd8>] ? pci_enable_device_flags+0xe8/0x140
[ 1121.120183] [<ffffffffc0da260a>] amdgpu_pci_probe+0xca/0x170 [amdgpu]
[ 1121.126919] [<ffffffffa4dcf97a>] local_pci_probe+0x4a/0xb0
[ 1121.132622] [<ffffffffa4dd10c9>] pci_device_probe+0x109/0x160
[ 1121.138607] [<ffffffffa4eb4205>] driver_probe_device+0xc5/0x3e0
[ 1121.144766] [<ffffffffa4eb4603>] __driver_attach+0x93/0xa0
[ 1121.150507] [<ffffffffa4eb4570>] ? __device_attach+0x50/0x50
[ 1121.156422] [<ffffffffa4eb1da5>] bus_for_each_dev+0x75/0xc0
[ 1121.162213] [<ffffffffa4eb3b7e>] driver_attach+0x1e/0x20
[ 1121.167771] [<ffffffffa4eb3620>] bus_add_driver+0x200/0x2d0
[ 1121.173590] [<ffffffffa4eb4c94>] driver_register+0x64/0xf0
[ 1121.179345] [<ffffffffa4dd0905>] __pci_register_driver+0xa5/0xc0
[ 1121.185593] [<ffffffffc099f000>] ? 0xffffffffc099efff
[ 1121.190914] [<ffffffffc099f0a4>] amdgpu_init+0xa4/0xb0 [amdgpu]
[ 1121.197101] [<ffffffffa4a0210a>] do_one_initcall+0xba/0x240
[ 1121.202901] [<ffffffffa4b1c90a>] load_module+0x271a/0x2bb0
[ 1121.208598] [<ffffffffa4dad740>] ? ddebug_proc_write+0x100/0x100
[ 1121.214894] [<ffffffffa4b1ce8f>] SyS_init_module+0xef/0x140
[ 1121.220698] [<ffffffffa518bede>] system_call_fastpath+0x25/0x2a
[ 1121.226870] Code: b4 01 60 a2 00 00 31 c0 e8 83 60 33 e4 41 8b 47 08 48 8b 4d d0 48 c7 c7 b3 af 10 c1 48 69 c0 68 07 00 00 48 8b 84 01 60 a2 00 00 <48> 8b 70 08 31 c0 48 89 75 c8 e8 56 60 33 e4 48 8b 4d d0 48 c7
[ 1121.247422] RIP [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
[ 1121.254432] RSP <ffff991ee625f950>
[ 1121.258017] CR2: 000000000000000a
[ 1121.261427] ---[ end trace e98b35387ede75bd ]---
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Fixes: c5fb912653dae3f878 ("drm/amdgpu: add firmware header printing for psp fw loading (v2)")
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The driver misses calling destroy_workqueue in remove like what is done
when probe fails.
Add the missed calls to fix it.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is a bug that checking the same active_list over and over again
in iocg_activate(). The intention of the code was checking whether all
the ancestors and self have already been activated. So fix it.
Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
One of the obvious use cases for these commands is networking, where
it's not uncommon to have tons of sockets open and polled for. The
current implementation uses a list for insertion and lookup, which works
fine for file based use cases where the count is usually low, it breaks
down somewhat for higher number of files / sockets. A test case with
30k sockets being polled for and cancelled takes:
real 0m6.968s
user 0m0.002s
sys 0m6.936s
with the patch it takes:
real 0m0.233s
user 0m0.010s
sys 0m0.176s
If you go to 50k sockets, it gets even more abysmal with the current
code:
real 0m40.602s
user 0m0.010s
sys 0m40.555s
with the patch it takes:
real 0m0.398s
user 0m0.000s
sys 0m0.341s
Change is pretty straight forward, just replace the cancel_list with
a red/black tree instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some versions of gcc (so far 6.3 and 7.4) throw a warning:
drivers/block/rbd.c: In function 'rbd_object_map_callback':
drivers/block/rbd.c:2124:21: warning: 'current_state' may be used uninitialized in this function [-Wmaybe-uninitialized]
(current_state == OBJECT_EXISTS && state == OBJECT_EXISTS_CLEAN))
drivers/block/rbd.c:2092:23: note: 'current_state' was declared here
u8 state, new_state, current_state;
^~~~~~~~~~~~~
It's bogus because all current_state accesses are guarded by
has_current_state.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Ceph can in some cases issue an async DIO request, in which case we can
end up calling ceph_end_io_direct before the I/O is actually complete.
That may allow buffered operations to proceed while DIO requests are
still in flight.
Fix this by incrementing the i_dio_count when issuing an async DIO
request, and decrement it when tearing down the aio_req.
Fixes: 321fe13c9398 ("ceph: add buffered/direct exclusionary locking for reads and writes")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Most of the time, we (or the vfs layer) takes the inode_lock and then
acquires caps, but ceph_read_iter does the opposite, and that can lead
to a deadlock.
When there are multiple clients treading over the same data, we can end
up in a situation where a reader takes caps and then tries to acquire
the inode_lock. Another task holds the inode_lock and issues a request
to the MDS which needs to revoke the caps, but that can't happen until
the inode_lock is unwedged.
Fix this by having ceph_read_iter take the inode_lock earlier, before
attempting to acquire caps.
Fixes: 321fe13c9398 ("ceph: add buffered/direct exclusionary locking for reads and writes")
Link: https://tracker.ceph.com/issues/36348
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The recently introduced unit descriptor validation had some bug for
processing and extension units, it counts a bControlSize byte twice so
it expected a bigger size than it should have been. This seems
resulting in a probe error on a few devices.
Fix the calculation for proper checks of PU and EU.
Fixes: 57f8770620e9 ("ALSA: usb-audio: More validations of descriptor units")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191114165613.7422-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix build error when compiling SPARC VDSO with CONFIG_COMPAT=y
- pass correct --arch option to Sparse
* tag 'kbuild-fixes-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: tell sparse about the $ARCH
sparc: vdso: fix build error of vdso32
|