summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-04-17KVM: introduce KVM_CAP_SET_GUEST_DEBUG2Paolo Bonzini
This capability will allow the user to know which KVM_GUESTDBG_* bits are supported. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401135451.1004564-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: pending exceptions must not be blocked by an injected eventMaxim Levitsky
Injected interrupts/nmi should not block a pending exception, but rather be either lost if nested hypervisor doesn't intercept the pending exception (as in stock x86), or be delivered in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit that corresponds to the pending exception. The only reason for an exception to be blocked is when nested run is pending (and that can't really happen currently but still worth checking for). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: selftests: remove redundant semi-colonYang Yingliang
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Message-Id: <20210401142514.1688199-1-yangyingliang@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: nSVM: call nested_svm_load_cr3 on nested state loadMaxim Levitsky
While KVM's MMU should be fully reset by loading of nested CR0/CR3/CR4 by KVM_SET_SREGS, we are not in nested mode yet when we do it and therefore only root_mmu is reset. On regular nested entries we call nested_svm_load_cr3 which both updates the guest's CR3 in the MMU when it is needed, and it also initializes the mmu again which makes it initialize the walk_mmu as well when nested paging is enabled in both host and guest. Since we don't call nested_svm_load_cr3 on nested state load, the walk_mmu can be left uninitialized, which can lead to a NULL pointer dereference while accessing it if we happen to get a nested page fault right after entering the nested guest first time after the migration and we decide to emulate it, which leads to the emulator trying to access walk_mmu->gva_to_gpa which is NULL. Therefore we should call this function on nested state load as well. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401141814.1029036-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should include the autoload/autostore MSR listsDavid Edmondson
When dumping the current VMCS state, include the MSRs that are being automatically loaded/stored during VM entry/exit. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-6-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should show the effective EFERDavid Edmondson
If EFER is not being loaded from the VMCS, show the effective value by reference to the MSR autoload list or calculation. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-5-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should consider only the load controls of EFER/PATDavid Edmondson
When deciding whether to dump the GUEST_IA32_EFER and GUEST_IA32_PAT fields of the VMCS, examine only the VM entry load controls, as saving on VM exit has no effect on whether VM entry succeeds or fails. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-4-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should not conflate EFER and PAT presence in VMCSDavid Edmondson
Show EFER and PAT based on their individual entry/exit controls. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-3-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should not assume GUEST_IA32_EFER is validDavid Edmondson
If the VM entry/exit controls for loading/saving MSR_EFER are either not available (an older processor or explicitly disabled) or not used (host and guest values are the same), reading GUEST_IA32_EFER from the VMCS returns an inaccurate value. Because of this, in dump_vmcs() don't use GUEST_IA32_EFER to decide whether to print the PDPTRs - always do so if the fields exist. Fixes: 4eb64dce8d0a ("KVM: x86: dump VMCS on invalid entry") Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-2-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: nSVM: improve SYSENTER emulation on AMDMaxim Levitsky
Currently to support Intel->AMD migration, if CPU vendor is GenuineIntel, we emulate the full 64 value for MSR_IA32_SYSENTER_{EIP|ESP} msrs, and we also emulate the sysenter/sysexit instruction in long mode. (Emulator does still refuse to emulate sysenter in 64 bit mode, on the ground that the code for that wasn't tested and likely has no users) However when virtual vmload/vmsave is enabled, the vmload instruction will update these 32 bit msrs without triggering their msr intercept, which will lead to having stale values in kvm's shadow copy of these msrs, which relies on the intercept to be up to date. Fix/optimize this by doing the following: 1. Enable the MSR intercepts for SYSENTER MSRs iff vendor=GenuineIntel (This is both a tiny optimization and also ensures that in case the guest cpu vendor is AMD, the msrs will be 32 bit wide as AMD defined). 2. Store only high 32 bit part of these msrs on interception and combine it with hardware msr value on intercepted read/writes iff vendor=GenuineIntel. 3. Disable vmload/vmsave virtualization if vendor=GenuineIntel. (It is somewhat insane to set vendor=GenuineIntel and still enable SVM for the guest but well whatever). Then zero the high 32 bit parts when kvm intercepts and emulates vmload. Thanks a lot to Paulo Bonzini for helping me with fixing this in the most correct way. This patch fixes nested migration of 32 bit nested guests, that was broken because incorrect cached values of SYSENTER msrs were stored in the migration stream if L1 changed these msrs with vmload prior to L2 entry. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401111928.996871-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: add guest_cpuid_is_intelMaxim Levitsky
This is similar to existing 'guest_cpuid_is_amd_or_hygon' Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401111928.996871-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: Account a variety of miscellaneous allocationsSean Christopherson
Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are clearly associated with a single task/VM. Note, there are a several SEV allocations that aren't accounted, but those can (hopefully) be fixed by using the local stack for memory. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331023025.2485960-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs are createdSean Christopherson
Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one or more vCPUs have been created. KVM assumes a VM is tagged SEV/SEV-ES prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV enabled, and svm_create_vcpu() needs to allocate the VMSA. At best, creating vCPUs before SEV/SEV-ES init will lead to unexpected errors and/or behavior, and at worst it will crash the host, e.g. sev_launch_update_vmsa() will dereference a null svm->vmsa pointer. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: SVM: Do not set sev->es_active until KVM_SEV_ES_INIT completesSean Christopherson
Set sev->es_active only after the guts of KVM_SEV_ES_INIT succeeds. If the command fails, e.g. because SEV is already active or there are no available ASIDs, then es_active will be left set even though the VM is not fully SEV-ES capable. Refactor the code so that "es_active" is passed on the stack instead of being prematurely shoved into sev_info, both to avoid having to unwind sev_info and so that it's more obvious what actually consumes es_active in sev_guest_init() and its helpers. Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: SVM: Use online_vcpus, not created_vcpus, to iterate over vCPUsSean Christopherson
Use the kvm_for_each_vcpu() helper to iterate over vCPUs when encrypting VMSAs for SEV, which effectively switches to use online_vcpus instead of created_vcpus. This fixes a possible null-pointer dereference as created_vcpus does not guarantee a vCPU exists, since it is updated at the very beginning of KVM_CREATE_VCPU. created_vcpus exists to allow the bulk of vCPU creation to run in parallel, while still correctly restricting the max number of max vCPUs. Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Simplify code for aging SPTEs in TDP MMUSean Christopherson
Use a basic NOT+AND sequence to clear the Accessed bit in TDP MMU SPTEs, as opposed to the fancy ffs()+clear_bit() logic that was copied from the legacy MMU. The legacy MMU uses clear_bit() because it is operating on the SPTE itself, i.e. clearing needs to be atomic. The TDP MMU operates on a local variable that it later writes to the SPTE, and so doesn't need to be atomic or even resident in memory. Opportunistically drop unnecessary initialization of new_spte, it's guaranteed to be written before being accessed. Using NOT+AND instead of ffs()+clear_bit() reduces the sequence from: 0x0000000000058be6 <+134>: test %rax,%rax 0x0000000000058be9 <+137>: je 0x58bf4 <age_gfn_range+148> 0x0000000000058beb <+139>: test %rax,%rdi 0x0000000000058bee <+142>: je 0x58cdc <age_gfn_range+380> 0x0000000000058bf4 <+148>: mov %rdi,0x8(%rsp) 0x0000000000058bf9 <+153>: mov $0xffffffff,%edx 0x0000000000058bfe <+158>: bsf %eax,%edx 0x0000000000058c01 <+161>: movslq %edx,%rdx 0x0000000000058c04 <+164>: lock btr %rdx,0x8(%rsp) 0x0000000000058c0b <+171>: mov 0x8(%rsp),%r15 to: 0x0000000000058bdd <+125>: test %rax,%rax 0x0000000000058be0 <+128>: je 0x58beb <age_gfn_range+139> 0x0000000000058be2 <+130>: test %rax,%r8 0x0000000000058be5 <+133>: je 0x58cc0 <age_gfn_range+352> 0x0000000000058beb <+139>: not %rax 0x0000000000058bee <+142>: and %r8,%rax 0x0000000000058bf1 <+145>: mov %rax,%r15 thus eliminating several memory accesses, including a locked access. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331004942.2444916-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Remove spurious clearing of dirty bit from TDP MMU SPTESean Christopherson
Don't clear the dirty bit when aging a TDP MMU SPTE (in response to a MMU notifier event). Prematurely clearing the dirty bit could cause spurious PML updates if aging a page happened to coincide with dirty logging. Note, tdp_mmu_set_spte_no_acc_track() flows into __handle_changed_spte(), so the host PFN will be marked dirty, i.e. there is no potential for data corruption. Fixes: a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331004942.2444916-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Drop trace_kvm_age_page() tracepointSean Christopherson
Remove x86's trace_kvm_age_page() tracepoint. It's mostly redundant with the common trace_kvm_age_hva() tracepoint, and if there is a need for the extra details, e.g. gfn, referenced, etc... those details should be added to the common tracepoint so that all architectures and MMUs benefit from the info. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: Move arm64's MMU notifier trace events to generic codeSean Christopherson
Move arm64's MMU notifier trace events into common code in preparation for doing the hva->gfn lookup in common code. The alternative would be to trace the gfn instead of hva, but that's not obviously better and could also be done in common code. Tracing the notifiers is also quite handy for debug regardless of architecture. Remove a completely redundant tracepoint from PPC e500. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: Move prototypes for MMU notifier callbacks to generic codeSean Christopherson
Move the prototypes for the MMU notifier callbacks out of arch code and into common code. There is no benefit to having each arch replicate the prototypes since any deviation from the invocation in common code will explode. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Use leaf-only loop for walking TDP SPTEs when changing SPTESean Christopherson
Use the leaf-only TDP iterator when changing the SPTE in reaction to a MMU notifier. Practically speaking, this is a nop since the guts of the loop explicitly looks for 4k SPTEs, which are always leaf SPTEs. Switch the iterator to match age_gfn_range() and test_age_gfn() so that a future patch can consolidate the core iterating logic. No real functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Pass address space ID to TDP MMU root walkersSean Christopherson
Move the address space ID check that is performed when iterating over roots into the macro helpers to consolidate code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Pass address space ID to __kvm_tdp_mmu_zap_gfn_range()Sean Christopherson
Pass the address space ID to TDP MMU's primary "zap gfn range" helper to allow the MMU notifier paths to iterate over memslots exactly once. Currently, both the legacy MMU and TDP MMU iterate over memslots when looking for an overlapping hva range, which can be quite costly if there are a large number of memslots. Add a "flush" parameter so that iterating over multiple address spaces in the caller will continue to do the right thing when yielding while a flush is pending from a previous address space. Note, this also has a functional change in the form of coalescing TLB flushes across multiple address spaces in kvm_zap_gfn_range(), and also optimizes the TDP MMU to utilize range-based flushing when running as L1 with Hyper-V enlightenments. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-6-seanjc@google.com> [Keep separate for loops to prepare for other incoming patches. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Coalesce TLB flushes across address spaces for gfn range zapSean Christopherson
Gather pending TLB flushes across both address spaces when zapping a given gfn range. This requires feeding "flush" back into subsequent calls, but on the plus side sets the stage for further batching between the legacy MMU and TDP MMU. It also allows refactoring the address space iteration to cover the legacy and TDP MMUs without introducing truly ugly code. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Coalesce TLB flushes when zapping collapsible SPTEsSean Christopherson
Gather pending TLB flushes across both the legacy and TDP MMUs when zapping collapsible SPTEs to avoid multiple flushes if both the legacy MMU (for nested guests) and TDP MMU have mappings for the memslot. Note, this also optimizes the TDP MMU to flush only the relevant range when running as L1 with Hyper-V enlightenments. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Move flushing for "slot" handlers to caller for legacy MMUSean Christopherson
Place the onus on the caller of slot_handle_*() to flush the TLB, rather than handling the flush in the helper, and rename parameters accordingly. This will allow future patches to coalesce flushes between address spaces and between the legacy and TDP MMUs. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Coalesce TDP MMU TLB flushes when zapping collapsible SPTEsSean Christopherson
When zapping collapsible SPTEs across multiple roots, gather pending flushes and perform a single remote TLB flush at the end, as opposed to flushing after processing every root. Note, flush may be cleared by the result of zap_collapsible_spte_range(). This is intended and correct, e.g. yielding may have serviced a prior pending flush. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/vPMU: Forbid reading from MSR_F15H_PERF MSRs when guest doesn't ↵Vitaly Kuznetsov
have X86_FEATURE_PERFCTR_CORE MSR_F15H_PERF_CTL0-5, MSR_F15H_PERF_CTR0-5 MSRs have a CPUID bit assigned to them (X86_FEATURE_PERFCTR_CORE) and when it wasn't exposed to the guest the correct behavior is to inject #GP an not just return zero. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210329124804.170173-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in ↵Krish Sadhukhan
nested_svm_vmexit() According to APM, the #DB intercept for a single-stepped VMRUN must happen after the completion of that instruction, when the guest does #VMEXIT to the host. However, in the current implementation of KVM, the #DB intercept for a single-stepped VMRUN happens after the completion of the instruction that follows the VMRUN instruction. When the #DB intercept handler is invoked, it shows the RIP of the instruction that follows VMRUN, instead of of VMRUN itself. This is an incorrect RIP as far as single-stepping VMRUN is concerned. This patch fixes the problem by checking, in nested_svm_vmexit(), for the condition that the VMRUN instruction is being single-stepped and if so, queues the pending #DB intercept so that the #DB is accounted for before we execute L1's next instruction. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oraacle.com> Message-Id: <20210323175006.73249-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: MMU: load PDPTRs outside mmu_lockPaolo Bonzini
On SVM, reading PDPTRs might access guest memory, which might fault and thus might sleep. On the other hand, it is not possible to release the lock after make_mmu_pages_available has been called. Therefore, push the call to make_mmu_pages_available and the mmu_lock critical section within mmu_alloc_direct_roots and mmu_alloc_shadow_roots. Reported-by: Wanpeng Li <wanpengli@tencent.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17Merge remote-tracking branch 'tip/x86/sgx' into kvm-nextPaolo Bonzini
Pull generic x86 SGX changes needed to support SGX in virtual machines.
2021-04-17Merge tag 'kvm-s390-next-5.13-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix potential crash in preemptible kernels There is a potential race for preemptible kernels, where the host kernel would get a fault when it is preempted as the wrong point in time.
2021-04-17powerpc/traps: Enhance readability for trap typesXiongwei Song
Define macros to list ppc interrupt types in interttupt.h, replace the reference of the trap hex values with these macros. Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S, arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S, arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h. Signed-off-by: Xiongwei Song <sxwjean@gmail.com> [mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com
2021-04-17locking/qrwlock: Fix ordering in queued_write_lock_slowpath()Ali Saidi
While this code is executed with the wait_lock held, a reader can acquire the lock without holding wait_lock. The writer side loops checking the value with the atomic_cond_read_acquire(), but only truly acquires the lock when the compare-and-exchange is completed successfully which isn’t ordered. This exposes the window between the acquire and the cmpxchg to an A-B-A problem which allows reads following the lock acquisition to observe values speculatively before the write lock is truly acquired. We've seen a problem in epoll where the reader does a xchg while holding the read lock, but the writer can see a value change out from under it. Writer | Reader -------------------------------------------------------------------------------- ep_scan_ready_list() | |- write_lock_irq() | |- queued_write_lock_slowpath() | |- atomic_cond_read_acquire() | | read_lock_irqsave(&ep->lock, flags); --> (observes value before unlock) | chain_epi_lockless() | | epi->next = xchg(&ep->ovflist, epi); | | read_unlock_irqrestore(&ep->lock, flags); | | | atomic_cmpxchg_relaxed() | |-- READ_ONCE(ep->ovflist); | A core can order the read of the ovflist ahead of the atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire semantics addresses this issue at which point the atomic_cond_read can be switched to use relaxed semantics. Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock") Signed-off-by: Ali Saidi <alisaidi@amazon.com> [peterz: use try_cmpxchg()] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Steve Capper <steve.capper@arm.com>
2021-04-17sched/debug: Rename the sched_debug parameter to sched_verbosePeter Zijlstra
CONFIG_SCHED_DEBUG is the build-time Kconfig knob, the boot param sched_debug and the /debug/sched/debug_enabled knobs control the sched_debug_enabled variable, but what they really do is make SCHED_DEBUG more verbose, so rename the lot. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2021-04-17Merge tag 'iwlwifi-next-for-kalle-2021-04-12-v2' of ↵Kalle Valo
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next iwlwifi patches for v5.13 * Add support for new FTM FW APIs; * Some CSA fixes; * Support for new HW family and other HW detection fixes; * Robustness improvement in the HW detection code; * One fix in PMF; * Some new regulatory features; * Support for passive scan in 6GHz; * Some improvements in the sync queue implementation; * Support for new devices; * Support for a new FW API command version; * Some locking fixes; * Bump the FW API version support for AX devices; * Some other small fixes, clean-ups and improvements. # gpg: Signature made Wed 14 Apr 2021 12:33:29 PM EEST using RSA key ID 1A3CC5FA # gpg: Good signature from "Luciano Roth Coelho (Luca) <luca@coelho.fi>" # gpg: aka "Luciano Roth Coelho (Intel) <luciano.coelho@intel.com>"
2021-04-17Merge tag 'mt76-for-kvalo-2021-04-12' of https://github.com/nbd168/wirelessKalle Valo
mt76 patches for 5.13 * code cleanup * mt7915/mt7615 decap offload support * driver fixes * mt7613 eeprom support * MCU code unification * threaded NAPI support * new device IDs * mt7921 device reset support * rx timestamp support # gpg: Signature made Tue 13 Apr 2021 12:11:25 AM EEST using DSA key ID 02A76EF5 # gpg: Good signature from "Felix Fietkau <nbd@nbd.name>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 75D1 1A7D 91A7 710F 4900 42EF D77D 141D 02A7 6EF5
2021-04-17ALSA: usb-audio: Add support for many Roland devices' implicit feedback quirksLucas Endres
It makes USB audio capture and playback possible and pristine on my Roland INTEGRA-7, Boutique D-05, and R-26, along with many more I've encountered people having had issues with over the last decade or so. Signed-off-by: Lucas Endres <jaffa225man@gmail.com> Link: https://lore.kernel.org/r/CAOsVg8rA61B=005_VyUwpw3piVwA7Bo5fs1GYEB054efyzGjLw@mail.gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-04-16cxl/mem: Fix memory device capacity probingDan Williams
The CXL Identify Memory Device output payload emits capacity in 256MB units. The driver is treating the capacity field as bytes. This was missed because QEMU reports bytes when it should report bytes / 256MB. Fixes: 8adaf747c9f0 ("cxl/mem: Find device capabilities") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Cc: Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/161862021044.3259705.7008520073059739760.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-04-17powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.hTony Ambardar
A few archs like powerpc have different errno.h values for macros EDEADLOCK and EDEADLK. In code including both libc and linux versions of errno.h, this can result in multiple definitions of EDEADLOCK in the include chain. Definitions to the same value (e.g. seen with mips) do not raise warnings, but on powerpc there are redefinitions changing the value, which raise warnings and errors (if using "-Werror"). Guard against these redefinitions to avoid build errors like the following, first seen cross-compiling libbpf v5.8.9 for powerpc using GCC 8.4.0 with musl 1.1.24: In file included from ../../arch/powerpc/include/uapi/asm/errno.h:5, from ../../include/linux/err.h:8, from libbpf.c:29: ../../include/uapi/asm-generic/errno.h:40: error: "EDEADLOCK" redefined [-Werror] #define EDEADLOCK EDEADLK In file included from toolchain-powerpc_8540_gcc-8.4.0_musl/include/errno.h:10, from libbpf.c:26: toolchain-powerpc_8540_gcc-8.4.0_musl/include/bits/errno.h:58: note: this is the location of the previous definition #define EDEADLOCK 58 cc1: all warnings being treated as errors Cc: Stable <stable@vger.kernel.org> Reported-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200917135437.1238787-1-Tony.Ambardar@gmail.com
2021-04-17powerpc/smp: Cache CPU to chip lookupSrikar Dronamraju
On systems with large CPUs per node, even with the filtered matching of related CPUs, there can be large number of calls to cpu_to_chip_id for the same CPU. For example with 4096 vCPU, 1 node QEMU configuration, with 4 threads per core, system could be see upto 1024 calls to cpu_to_chip_id() for the same CPU. On a given system, cpu_to_chip_id() for a given CPU would always return the same. Hence cache the result in a lookup table for use in subsequent calls. Since all CPUs sharing the same core will belong to the same chip, the lookup_table has an entry for one CPU per core. chip_id_lookup_table is not being freed and would be used on subsequent CPU online post CPU offline. Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210415120934.232271-4-srikar@linux.vnet.ibm.com
2021-04-17Revert "powerpc/topology: Update topology_core_cpumask"Srikar Dronamraju
Now that cpu_core_mask has been reintroduced, lets revert commit 4bce545903fa ("powerpc/topology: Update topology_core_cpumask") Post this commit, lscpu should reflect topologies as requested by a user when a QEMU instance is launched with NUMA spanning multiple sockets. Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210415120934.232271-3-srikar@linux.vnet.ibm.com
2021-04-17powerpc/smp: Reintroduce cpu_core_maskSrikar Dronamraju
Daniel reported that with Commit 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask") QEMU was unable to set single NUMA node SMP topologies such as: -smp 8,maxcpus=8,cores=2,threads=2,sockets=2 i.e he expected 2 sockets in one NUMA node. The above commit helped to reduce boot time on Large Systems for example 4096 vCPU single socket QEMU instance. PAPR is silent on having more than one socket within a NUMA node. cpu_core_mask and cpu_cpu_mask for any CPU would be same unless the number of sockets is different from the number of NUMA nodes. One option is to reintroduce cpu_core_mask but use a slightly different method to arrive at the cpu_core_mask. Previously each CPU's chip-id would be compared with all other CPU's chip-id to verify if both the CPUs were related at the chip level. Now if a CPU 'A' is found related / (unrelated) to another CPU 'B', all the thread siblings of 'A' and thread siblings of 'B' are automatically marked as related / (unrelated). Also if a platform doesn't support ibm,chip-id property, i.e its cpu_to_chip_id returns -1, cpu_core_map holds a copy of cpu_cpu_mask(). Fixes: 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask") Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210415120934.232271-2-srikar@linux.vnet.ibm.com
2021-04-17powerpc/xive: Use the "ibm, chip-id" property only under PowerNVCédric Le Goater
The 'chip_id' field of the XIVE CPU structure is used to choose a target for a source located on the same chip. For that, the XIVE driver queries the chip identifier from the "ibm,chip-id" property and compares it to a 'src_chip' field identifying the chip of a source. This information is only available on the PowerNV platform, 'src_chip' being assigned to XIVE_INVALID_CHIP_ID under pSeries. The "ibm,chip-id" property is also not available on all platforms. It was first introduced on PowerNV and later, under QEMU for pSeries/KVM. However, the property is not part of PAPR and does not exist under pSeries/PowerVM. Assign 'chip_id' to XIVE_INVALID_CHIP_ID by default and let the PowerNV platform override the value with the "ibm,chip-id" property. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210413130352.1183267-1-clg@kaod.org
2021-04-16Merge branch 'mptcp-fixes-and-tracepoints'David S. Miller
Mat Martineau says: ==================== mptcp: Fixes and tracepoints from the mptcp tree Here's one more batch of changes that we've tested out in the MPTCP tree. Patch 1 makes the MPTCP KUnit config symbol more consistent with other subsystems. Patch 2 fixes a couple of format specifiers in pr_debug()s Patches 3-7 add four helpful tracepoints for MPTCP. Patch 8 is a one-line refactor to use an available helper macro. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: use mptcp_for_each_subflow in mptcp_closeGeliang Tang
This patch used the macro helper mptcp_for_each_subflow() instead of list_for_each_entry() in mptcp_close. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: add tracepoint in subflow_check_data_availGeliang Tang
This patch added a tracepoint in subflow_check_data_avail() to show the mapping status. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: add tracepoint in ack_update_mskGeliang Tang
This patch added a tracepoint in ack_update_msk() to track the incoming data_ack and window/snd_una updates. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: add tracepoint in get_mapping_statusGeliang Tang
This patch added a tracepoint in the mapping status function get_mapping_status() to dump every mpext field. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: add tracepoint in mptcp_subflow_get_sendGeliang Tang
This patch added a tracepoint in the packet scheduler function mptcp_subflow_get_send(). Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>