summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-19KVM: SVM: Use kvm_vcpu_is_blocking() in AVIC load to handle preemptionSean Christopherson
Use kvm_vcpu_is_blocking() to determine whether or not the vCPU should be marked running during avic_vcpu_load(). Drop avic_is_running, which really should have been named "vcpu_is_not_blocking", as it tracked if the vCPU was blocking, not if it was actually running, e.g. it was set during svm_create_vcpu() when the vCPU was obviously not running. This is technically a teeny tiny functional change, as the vCPU will be marked IsRunning=1 on being reloaded if the vCPU is preempted between svm_vcpu_blocking() and prepare_to_rcuwait(). But that's a benign change as the vCPU will be marked IsRunning=0 when KVM voluntarily schedules out the vCPU. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211208015236.1616697-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: SVM: Remove unnecessary APICv/AVIC update in vCPU unblocking pathSean Christopherson
Remove handling of KVM_REQ_APICV_UPDATE from svm_vcpu_unblocking(), it's no longer needed as it was made obsolete by commit df7e4827c549 ("KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC"). Prior to that commit, the manual check was necessary to ensure the AVIC stuff was updated by avic_set_running() when a request to enable APICv became pending while the vCPU was blocking, as the request handling itself would not do the update. But, as evidenced by the commit, that logic was flawed and subject to various races. Now that svm_refresh_apicv_exec_ctrl() does avic_vcpu_load/put() in response to an APICv status change, drop the manual check in the unblocking path. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211208015236.1616697-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: SVM: Don't bother checking for "running" AVIC when kicking for IPIsSean Christopherson
Drop the avic_vcpu_is_running() check when waking vCPUs in response to a VM-Exit due to incomplete IPI delivery. The check isn't wrong per se, but it's not 100% accurate in the sense that it doesn't guarantee that the vCPU was one of the vCPUs that didn't receive the IPI. The check isn't required for correctness as blocking == !running in this context. From a performance perspective, waking a live task is not expensive as the only moderately costly operation is a locked operation to temporarily disable preemption. And if that is indeed a performance issue, kvm_vcpu_is_blocking() would be a better check than poking into the AVIC. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: SVM: Signal AVIC doorbell iff vCPU is in guest modeSean Christopherson
Signal the AVIC doorbell iff the vCPU is running in the guest. If the vCPU is not IN_GUEST_MODE, it's guaranteed to pick up any pending IRQs on the next VMRUN, which unconditionally processes the vIRR. Add comments to document the logic. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211208015236.1616697-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: x86: Remove defunct pre_block/post_block kvm_x86_ops hooksSean Christopherson
Drop kvm_x86_ops' pre/post_block() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: x86: Unexport LAPIC's switch_to_{hv,sw}_timer() helpersSean Christopherson
Unexport switch_to_{hv,sw}_timer() now that common x86 handles the transitions. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: VMX: Move preemption timer <=> hrtimer dance to common x86Sean Christopherson
Handle the switch to/from the hypervisor/software timer when a vCPU is blocking in common x86 instead of in VMX. Even though VMX is the only user of a hypervisor timer, the logic and all functions involved are generic x86 (unless future CPUs do something completely different and implement a hypervisor timer that runs regardless of mode). Handling the switch in common x86 will allow for the elimination of the pre/post_blocks hooks, and also lets KVM switch back to the hypervisor timer if and only if it was in use (without additional params). Add a comment explaining why the switch cannot be deferred to kvm_sched_out() or kvm_vcpu_block(). Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: Move x86 VMX's posted interrupt list_head to vcpu_vmxSean Christopherson
Move the seemingly generic block_vcpu_list from kvm_vcpu to vcpu_vmx, and rename the list and all associated variables to clarify that it tracks the set of vCPU that need to be poked on a posted interrupt to the wakeup vector. The list is not used to track _all_ vCPUs that are blocking, and the term "blocked" can be misleading as it may refer to a blocking condition in the host or the guest, where as the PI wakeup case is specifically for the vCPUs that are actively blocking from within the guest. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: Drop unused kvm_vcpu.pre_pcpu fieldSean Christopherson
Remove kvm_vcpu.pre_pcpu as it no longer has any users. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: VMX: Handle PI descriptor updates during vcpu_put/loadSean Christopherson
Move the posted interrupt pre/post_block logic into vcpu_put/load respectively, using the kvm_vcpu_is_blocking() to determining whether or not the wakeup handler needs to be set (and unset). This avoids updating the PI descriptor if halt-polling is successful, reduces the number of touchpoints for updating the descriptor, and eliminates the confusing behavior of intentionally leaving a "stale" PI.NDST when a blocking vCPU is scheduled back in after preemption. The downside is that KVM will do the PID update twice if the vCPU is preempted after prepare_to_rcuwait() but before schedule(), but that's a rare case (and non-existent on !PREEMPT kernels). The notable wart is the need to send a self-IPI on the wakeup vector if an outstanding notification is pending after configuring the wakeup vector. Ideally, KVM would just do a kvm_vcpu_wake_up() in this case, but the scheduler doesn't support waking a task from its preemption notifier callback, i.e. while the task is right in the middle of being scheduled out. Note, setting the wakeup vector before halt-polling is not necessary: once the pending IRQ will be recorded in the PIR, kvm_vcpu_has_events() will detect this (via kvm_cpu_get_interrupt(), kvm_apic_get_interrupt(), apic_has_interrupt_for_ppr() and finally vmx_sync_pir_to_irr()) and terminate the polling. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19Merge branch 'kvm-pi-raw-spinlock' into HEADPaolo Bonzini
Bring in fix for VT-d posted interrupts before further changing the code in 5.17. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: avoid warning on s390 in mark_page_dirtyChristian Borntraeger
Avoid warnings on s390 like [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1 [ 1801.980938] Workqueue: events irqfd_inject [kvm] [...] [ 1801.981057] Call Trace: [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm] [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm] [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm] [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm] [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm] [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470 [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498 [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110 [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58 [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40 when writing to a guest from an irqfd worker as long as we do not have the dirty ring. Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reluctantly-acked-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20220113122924.740496-1-borntraeger@linux.ibm.com> Fixes: 2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without an active vCPU") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: selftests: Add a test to force emulation with a pending exceptionSean Christopherson
Add a VMX specific test to verify that KVM doesn't explode if userspace attempts KVM_RUN when emulation is required with a pending exception. KVM VMX's emulation support for !unrestricted_guest punts exceptions to userspace instead of attempting to synthesize the exception with all the correct state (and stack switching, etc...). Punting is acceptable as there's never been a request to support injecting exceptions when emulating due to invalid state, but KVM has historically assumed that userspace will do the right thing and either clear the exception or kill the guest. Deliberately do the opposite and attempt to re-enter the guest with a pending exception and emulation required to verify KVM continues to punt the combination to userspace, e.g. doesn't explode, WARN, etc... Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211228232437.1875318-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: VMX: Reject KVM_RUN if emulation is required with pending exceptionSean Christopherson
Reject KVM_RUN if emulation is required (because VMX is running without unrestricted guest) and an exception is pending, as KVM doesn't support emulating exceptions except when emulating real mode via vm86. The vCPU is hosed either way, but letting KVM_RUN proceed triggers a WARN due to the impossible condition. Alternatively, the WARN could be removed, but then userspace and/or KVM bugs would result in the vCPU silently running in a bad state, which isn't very friendly to users. Originally, the bug was hit by syzkaller with a nested guest as that doesn't require kvm_intel.unrestricted_guest=0. That particular flavor is likely fixed by commit cd0e615c49e5 ("KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required"), but it's trivial to trigger the WARN with a non-nested guest, and userspace can likely force bad state via ioctls() for a nested guest as well. Checking for the impossible condition needs to be deferred until KVM_RUN because KVM can't force specific ordering between ioctls. E.g. clearing exception.pending in KVM_SET_SREGS doesn't prevent userspace from setting it in KVM_SET_VCPU_EVENTS, and disallowing KVM_SET_VCPU_EVENTS with emulation_required would prevent userspace from queuing an exception and then stuffing sregs. Note, if KVM were to try and detect/prevent the condition prior to KVM_RUN, handle_invalid_guest_state() and/or handle_emulation_failure() would need to be modified to clear the pending exception prior to exiting to userspace. ------------[ cut here ]------------ WARNING: CPU: 6 PID: 137812 at arch/x86/kvm/vmx/vmx.c:1623 vmx_queue_exception+0x14f/0x160 [kvm_intel] CPU: 6 PID: 137812 Comm: vmx_invalid_nes Not tainted 5.15.2-7cc36c3e14ae-pop #279 Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014 RIP: 0010:vmx_queue_exception+0x14f/0x160 [kvm_intel] Code: <0f> 0b e9 fd fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 RSP: 0018:ffffa45c83577d38 EFLAGS: 00010202 RAX: 0000000000000003 RBX: 0000000080000006 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000010002 RDI: ffff9916af734000 RBP: ffff9916af734000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000006 R13: 0000000000000000 R14: ffff9916af734038 R15: 0000000000000000 FS: 00007f1e1a47c740(0000) GS:ffff99188fb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1e1a6a8008 CR3: 000000026f83b005 CR4: 00000000001726e0 Call Trace: kvm_arch_vcpu_ioctl_run+0x13a2/0x1f20 [kvm] kvm_vcpu_ioctl+0x279/0x690 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Reported-by: syzbot+82112403ace4cbd780d8@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211228232437.1875318-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19selftests: kvm/x86: Add test for KVM_SET_PMU_EVENT_FILTERJim Mattson
Verify that the PMU event filter works as expected. Note that the virtual PMU doesn't work as expected on AMD Zen CPUs (an intercepted rdmsr is counted as a retired branch instruction), but the PMU event filter does work. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-7-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19selftests: kvm/x86: Introduce x86_model()Jim Mattson
Extract the x86 model number from CPUID.01H:EAX. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-6-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19selftests: kvm/x86: Export x86_family() for use outside of processor.cJim Mattson
Move this static inline function to processor.h, so that it can be used in individual tests, as needed. Opportunistically replace the bare 'unsigned' with 'unsigned int.' Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-5-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19selftests: kvm/x86: Introduce is_amd_cpu()Jim Mattson
Replace the one ad hoc "AuthenticAMD" CPUID vendor string comparison with a new function, is_amd_cpu(). Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-4-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19selftests: kvm/x86: Parameterize the CPUID vendor string checkJim Mattson
Refactor is_intel_cpu() to make it easier to reuse the bulk of the code for other vendors in the future. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-3-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: x86/pmu: Use binary search to check filtered eventsJim Mattson
The PMU event filter may contain up to 300 events. Replace the linear search in reprogram_gp_counter() with a binary search. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-2-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19cifs: cifs_ses_mark_for_reconnect should also update reconnect bitsShyam Prasad N
Recent restructuring of cifs_reconnect introduced a helper func named cifs_ses_mark_for_reconnect, which updates the state of tcp session for all the channels of a session for reconnect. However, this does not update the session state and chans_need_reconnect bitmask. This change fixes that. Also, cifs_mark_tcp_sess_for_reconnect should mark set the bitmask for all channels when the whole session is marked for reconnect. Fixed that here too. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19cifs: update tcpStatus during negotiate and sess setupShyam Prasad N
Till the end of SMB session setup, update tcpStatus and avoid updating session status field. There was a typo in cifs_setup_session, which caused ses->status to be updated instead. This was causing issues during reconnect. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19cifs: make status checks in version independent callersShyam Prasad N
The status of tcp session, smb session and tcon have the same flow, irrespective of the SMB version used. Hence these status checks and updates should happen in the version independent callers of these commands. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19cifs: remove repeated state change in dfs tree connectShyam Prasad N
cifs_tree_connect checks and sets the tidStatus for the tcon. cifs_tree_connect also calls a dfs specific tree connect function, which also does similar checks. This should not happen. Removing it with this change. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19cifs: fix the cifs_reconnect path for DFSShyam Prasad N
Recently, the cifs_reconnect code was refactored into two branches for regular vs dfs codepath. Some of my recent changes were missing in the dfs path, namely the code to enable periodic DNS query, and a missing lock. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19cifs: remove unused variable ses_selectedMuhammad Usama Anjum
ses_selected is being declared and set at several places. It is not being used. Remove it. Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19cifs: protect all accesses to chan_* with chan_lockShyam Prasad N
A spin lock called chan_lock was introduced recently. But not all accesses were protected. Doing that with this change. To make sure that a channel is not freed when in use, we need to introduce a ref count. But today, we don't ever free channels. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19cifs: fix the connection state transitions with multichannelShyam Prasad N
Recent changes to multichannel required some adjustments in the way connection states transitioned during/after reconnect. Also some minor fixes: 1. A pending switch of GlobalMid_Lock to cifs_tcp_ses_lock 2. Relocations of the code that logs reconnect 3. Changed some code in allocate_mid to suit the new scheme Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19cifs: check reconnects for channels of active tcons tooShyam Prasad N
With the new multichannel logic, when a channel needs reconnection, the tree connect and other channels can still be active. This fix will handle cases of checking for channel reconnect, when the tcon does not need reconnect. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19kvm: selftests: conditionally build vm_xsave_req_perm()Wei Wang
vm_xsave_req_perm() is currently defined and used by x86_64 only. Make it compiled into vm_create_with_vcpus() only when on x86_64 machines. Otherwise, it would cause linkage errors, e.g. on s390x. Fixes: 415a3c33e8 ("kvm: selftests: Add support for KVM_CAP_XSAVE2") Reported-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Tested-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Message-Id: <20220118014817.30910-1-wei.w.wang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: x86/cpuid: Clear XFD for component i if the base feature is missingLike Xu
According to Intel extended feature disable (XFD) spec, the sub-function i (i > 1) of CPUID function 0DH enumerates "details for state component i. ECX[2] enumerates support for XFD support for this state component." If KVM does not report F(XFD) feature (e.g. due to CONFIG_X86_64), then the corresponding XFD support for any state component i should also be removed. Translate this dependency into KVM terms. Fixes: 690a757d610e ("kvm: x86: Add CPUID support for Intel AMX") Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220117074531.76925-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: x86/mmu: Improve TLB flush comment in kvm_mmu_slot_remove_write_access()David Matlack
Rewrite the comment in kvm_mmu_slot_remove_write_access() that explains why it is safe to flush TLBs outside of the MMU lock after write-protecting SPTEs for dirty logging. The current comment is a long run-on sentence that was difficult to understand. In addition it was specific to the shadow MMU (mentioning mmu_spte_update()) when the TDP MMU has to handle this as well. The new comment explains: - Why the TLB flush is necessary at all. - Why it is desirable to do the TLB flush outside of the MMU lock. - Why it is safe to do the TLB flush outside of the MMU lock. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220113233020.3986005-5-dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: x86/mmu: Document and enforce MMU-writable and Host-writable invariantsDavid Matlack
SPTEs are tagged with software-only bits to indicate if it is "MMU-writable" and "Host-writable". These bits are used to determine why KVM has marked an SPTE as read-only. Document these bits and their invariants, and enforce the invariants with new WARNs in spte_can_locklessly_be_made_writable() to ensure they are not accidentally violated in the future. Opportunistically move DEFAULT_SPTE_{MMU,HOST}_WRITABLE next to EPT_SPTE_{MMU,HOST}_WRITABLE since the new documentation applies to both. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220113233020.3986005-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: x86/mmu: Clear MMU-writable during changed_pte notifierDavid Matlack
When handling the changed_pte notifier and the new PTE is read-only, clear both the Host-writable and MMU-writable bits in the SPTE. This preserves the invariant that MMU-writable is set if-and-only-if Host-writable is set. No functional change intended. Nothing currently relies on the aforementioned invariant and technically the changed_pte notifier is dead code. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220113233020.3986005-3-dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMUDavid Matlack
When the TDP MMU is write-protection GFNs for page table protection (as opposed to for dirty logging, or due to the HVA not being writable), it checks if the SPTE is already write-protected and if so skips modifying the SPTE and the TLB flush. This behavior is incorrect because it fails to check if the SPTE is write-protected for page table protection, i.e. fails to check that MMU-writable is '0'. If the SPTE was write-protected for dirty logging but not page table protection, the SPTE could locklessly be made writable, and vCPUs could still be running with writable mappings cached in their TLB. Fix this by only skipping setting the SPTE if the SPTE is already write-protected *and* MMU-writable is already clear. Technically, checking only MMU-writable would suffice; a SPTE cannot be writable without MMU-writable being set. But check both to be paranoid and because it arguably yields more readable code. Fixes: 46044f72c382 ("kvm: x86/mmu: Support write protection for nesting in tdp MMU") Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220113233020.3986005-2-dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19perf machine: Use path__join() to compose a path instead of snprintf(dir, ↵Arnaldo Carvalho de Melo
'/', filename) Its more intention revealing, and if we're interested in the odd cases where this may end up truncating we can do debug checks at one centralized place. Motivation, of all the container builds, fedora rawhide started complaining of: util/machine.c: In function ‘machine__create_modules’: util/machine.c:1419:50: error: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=] 1419 | snprintf(path, sizeof(path), "%s/%s", dir_name, dent->d_name); | ^~ In file included from /usr/include/stdio.h:894, from util/branch.h:9, from util/callchain.h:8, from util/machine.c:7: In function ‘snprintf’, inlined from ‘maps__set_modules_path_dir’ at util/machine.c:1419:3, inlined from ‘machine__set_modules_path’ at util/machine.c:1473:9, inlined from ‘machine__create_modules’ at util/machine.c:1519:7: /usr/include/bits/stdio2.h:71:10: note: ‘__builtin___snprintf_chk’ output between 2 and 4352 bytes into a destination of size 4096 There are other places where we should use path__join(), but lets get rid of this one first. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: Link: https://lore.kernel.org/r/YebZKjwgfdOz0lAs@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-19ALSA: core: Simplify snd_power_ref_and_wait() with the standard macroTakashi Iwai
Use wait_event_cmd() macro and simplify snd_power_ref_wait() implementation. This may also cover possible races in the current open code, too. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20220119091050.30125-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-01-19Merge branch 'ipv4-avoid-pathological-hash-tables'Jakub Kicinski
Eric Dumazet says: ==================== ipv4: avoid pathological hash tables This series speeds up netns dismantles on hosts having many active netns, by making sure two hash tables used for IPV4 fib contains uniformly spread items. v2: changed second patch to add fib_info_laddrhash_bucket() for consistency (David Ahern suggestion). ==================== Link: https://lore.kernel.org/r/20220119100413.4077866-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keysEric Dumazet
net/ipv4/fib_semantics.c uses a hash table (fib_info_laddrhash) in which fib_sync_down_addr() can locate fib_info based on IPv4 local address. This hash table is resized based on total number of hashed fib_info, but the hash function is only using the local address. For hosts having many active network namespaces, all fib_info for loopback devices (IPv4 address 127.0.0.1) are hashed into a single bucket, making netns dismantles very slow. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19ipv4: avoid quadratic behavior in netns dismantleEric Dumazet
net/ipv4/fib_semantics.c uses an hash table of 256 slots, keyed by device ifindexes: fib_info_devhash[DEVINDEX_HASHSIZE] Problem is that with network namespaces, devices tend to use the same ifindex. lo device for instance has a fixed ifindex of one, for all network namespaces. This means that hosts with thousands of netns spend a lot of time looking at some hash buckets with thousands of elements, notably at netns dismantle. Simply add a per netns perturbation (net_hash_mix()) to spread elements more uniformely. Also change fib_devindex_hashfn() to use more entropy. Fixes: aa79e66eee5d ("net: Make ifindex generation per-net namespace") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19Merge branch 'net-fsl-xgmac_mdio-add-workaround-for-erratum-a-009885'Jakub Kicinski
Tobias Waldekranz says: ==================== net/fsl: xgmac_mdio: Add workaround for erratum A-009885 The individual messages mostly speak for themselves. It is very possible that there are more chips out there that are impacted by this, but I only have access to the errata document for the T1024 family, so I've limited the DT changes to the exact FMan version used in that device. Hopefully someone from NXP can supply a follow-up if need be. The final commit is an unrelated fix that was brought to my attention by sparse. ==================== Link: https://lore.kernel.org/r/20220118215054.2629314-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19net/fsl: xgmac_mdio: Fix incorrect iounmap when removing moduleTobias Waldekranz
As reported by sparse: In the remove path, the driver would attempt to unmap its own priv pointer - instead of the io memory that it mapped in probe. Fixes: 9f35a7342cff ("net/fsl: introduce Freescale 10G MDIO driver") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO busesTobias Waldekranz
This block is used in (at least) T1024 and T1040, including their variants like T1023 etc. Fixes: d55ad2967d89 ("powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19dt-bindings: net: Document fsl,erratum-a009885Tobias Waldekranz
Update FMan binding documentation with the newly added workaround for erratum A-009885. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19net/fsl: xgmac_mdio: Add workaround for erratum A-009885Tobias Waldekranz
Once an MDIO read transaction is initiated, we must read back the data register within 16 MDC cycles after the transaction completes. Outside of this window, reads may return corrupt data. Therefore, disable local interrupts in the critical section, to maximize the probability that we can satisfy this requirement. Fixes: d55ad2967d89 ("powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19dt-bindings: rtc: st,stm32-rtc: Make each example a separate entryRob Herring
Each independent example should be a separate entry. This allows for 'interrupts' to have different cell sizes. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220106182518.1435497-7-robh@kernel.org
2022-01-19dt-bindings: mmc: arm,pl18x: Make each example a separate entryRob Herring
Each independent example should be a separate entry. This and dropping 'interrupt-parent' allows for 'interrupts' to have different cell sizes. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20220106182518.1435497-6-robh@kernel.org
2022-01-19dt-bindings: display: Add SPI peripheral schema to SPI based displaysRob Herring
With 'unevaluatedProperties' support enabled, several SPI based display binding examples have warnings: Documentation/devicetree/bindings/display/panel/samsung,ld9040.example.dt.yaml: lcd@0: Unevaluated properties are not allowed ('#address-cells', '#size-cells', 'spi-max-frequency', 'spi-cpol', 'spi-cpha' were unexpected) Documentation/devicetree/bindings/display/panel/kingdisplay,kd035g6-54nt.example.dt.yaml: panel@0: Unevaluated properties are not allowed ('spi-max-frequency', 'spi-3wire' were unexpected) Documentation/devicetree/bindings/display/panel/ilitek,ili9322.example.dt.yaml: display@0: Unevaluated properties are not allowed ('reg' was unexpected) Documentation/devicetree/bindings/display/panel/samsung,s6e63m0.example.dt.yaml: display@0: Unevaluated properties are not allowed ('spi-max-frequency' was unexpected) Documentation/devicetree/bindings/display/panel/abt,y030xx067a.example.dt.yaml: panel@0: Unevaluated properties are not allowed ('spi-max-frequency' was unexpected) Documentation/devicetree/bindings/display/panel/sony,acx565akm.example.dt.yaml: panel@2: Unevaluated properties are not allowed ('spi-max-frequency', 'reg' were unexpected) Documentation/devicetree/bindings/display/panel/tpo,td.example.dt.yaml: panel@0: Unevaluated properties are not allowed ('spi-max-frequency', 'spi-cpol', 'spi-cpha' were unexpected) Documentation/devicetree/bindings/display/panel/lgphilips,lb035q02.example.dt.yaml: panel@0: Unevaluated properties are not allowed ('reg', 'spi-max-frequency', 'spi-cpol', 'spi-cpha' were unexpected) Documentation/devicetree/bindings/display/panel/innolux,ej030na.example.dt.yaml: panel@0: Unevaluated properties are not allowed ('spi-max-frequency' was unexpected) Documentation/devicetree/bindings/display/panel/sitronix,st7789v.example.dt.yaml: panel@0: Unevaluated properties are not allowed ('spi-max-frequency', 'spi-cpol', 'spi-cpha' were unexpected) Fix all of these by adding a reference to spi-peripheral-props.yaml. With this, the description that the binding must follow spi-controller.yaml is both a bit out of date and redundant, so remove it. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Paul Cercueil <paul@crapouillou.net> Acked-by: Sam Ravnborg <sam@ravnborg.org> Link: https://lore.kernel.org/r/20211221125209.1195932-1-robh@kernel.org
2022-01-19HID: uhid: Use READ_ONCE()/WRITE_ONCE() for ->runningJann Horn
The flag uhid->running can be set to false by uhid_device_add_worker() without holding the uhid->devlock. Mark all reads/writes of the flag that might race with READ_ONCE()/WRITE_ONCE() for clarity and correctness. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-01-19HID: uhid: Fix worker destroying device without any protectionJann Horn
uhid has to run hid_add_device() from workqueue context while allowing parallel use of the userspace API (which is protected with ->devlock). But hid_add_device() can fail. Currently, that is handled by immediately destroying the associated HID device, without using ->devlock - but if there are concurrent requests from userspace, that's wrong and leads to NULL dereferences and/or memory corruption (via use-after-free). Fix it by leaving the HID device as-is in the worker. We can clean it up later, either in the UHID_DESTROY command handler or in the ->release() handler. Cc: stable@vger.kernel.org Fixes: 67f8ecc550b5 ("HID: uhid: fix timeout when probe races with IO") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>