summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-04KVM: SVM: Propagate error from snp_guest_req_init() to userspaceSean Christopherson
If snp_guest_req_init() fails, return the provided error code up the stack to userspace, e.g. so that userspace can log that KVM_SEV_INIT2 failed, as opposed to some random operation later in VM setup failing because SNP wasn't actually enabled for the VM. Note, KVM itself doesn't consult the return value from __sev_guest_init(), i.e. the fallout is purely that userspace may be confused. Fixes: 88caf544c930 ("KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202410192220.MeTyHPxI-lkp@intel.com Link: https://lore.kernel.org/r/20241031203214.1585751-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabledSean Christopherson
When getting the current VPID, e.g. to emulate a guest TLB flush, return vpid01 if L2 is running but with VPID disabled, i.e. if VPID is disabled in vmcs12. Architecturally, if VPID is disabled, then the guest and host effectively share VPID=0. KVM emulates this behavior by using vpid01 when running an L2 with VPID disabled (see prepare_vmcs02_early_rare()), and so KVM must also treat vpid01 as the current VPID while L2 is active. Unconditionally treating vpid02 as the current VPID when L2 is active causes KVM to flush TLB entries for vpid02 instead of vpid01, which results in TLB entries from L1 being incorrectly preserved across nested VM-Enter to L2 (L2=>L1 isn't problematic, because the TLB flush after nested VM-Exit flushes vpid01). The bug manifests as failures in the vmx_apicv_test KVM-Unit-Test, as KVM incorrectly retains TLB entries for the APIC-access page across a nested VM-Enter. Opportunisticaly add comments at various touchpoints to explain the architectural requirements, and also why KVM uses vpid01 instead of vpid02. All credit goes to Chao, who root caused the issue and identified the fix. Link: https://lore.kernel.org/all/ZwzczkIlYGX+QXJz@intel.com Fixes: 2b4a5a5d5688 ("KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST") Cc: stable@vger.kernel.org Cc: Like Xu <like.xu.linux@gmail.com> Debugged-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20241031202011.1580522-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: selftests: Don't force -march=x86-64-v2 if it's unsupportedSean Christopherson
Force -march=x86-64-v2 to avoid SSE/AVX instructions if and only if the uarch definition is supported by the compiler, e.g. gcc 7.5 only supports x86-64. Fixes: 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX instructions") Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-and-tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241031045333.1209195-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: selftests: Disable strict aliasingSean Christopherson
Disable strict aliasing, as has been done in the kernel proper for decades (literally since before git history) to fix issues where gcc will optimize away loads in code that looks 100% correct, but is _technically_ undefined behavior, and thus can be thrown away by the compiler. E.g. arm64's vPMU counter access test casts a uint64_t (unsigned long) pointer to a u64 (unsigned long long) pointer when setting PMCR.N via u64p_replace_bits(), which gcc-13 detects and optimizes away, i.e. ignores the result and uses the original PMCR. The issue is most easily observed by making set_pmcr_n() noinline and wrapping the call with printf(), e.g. sans comments, for this code: printf("orig = %lx, next = %lx, want = %lu\n", pmcr_orig, pmcr, pmcr_n); set_pmcr_n(&pmcr, pmcr_n); printf("orig = %lx, next = %lx, want = %lu\n", pmcr_orig, pmcr, pmcr_n); gcc-13 generates: 0000000000401c90 <set_pmcr_n>: 401c90: f9400002 ldr x2, [x0] 401c94: b3751022 bfi x2, x1, #11, #5 401c98: f9000002 str x2, [x0] 401c9c: d65f03c0 ret 0000000000402660 <test_create_vpmu_vm_with_pmcr_n>: 402724: aa1403e3 mov x3, x20 402728: aa1503e2 mov x2, x21 40272c: aa1603e0 mov x0, x22 402730: aa1503e1 mov x1, x21 402734: 940060ff bl 41ab30 <_IO_printf> 402738: aa1403e1 mov x1, x20 40273c: 910183e0 add x0, sp, #0x60 402740: 97fffd54 bl 401c90 <set_pmcr_n> 402744: aa1403e3 mov x3, x20 402748: aa1503e2 mov x2, x21 40274c: aa1503e1 mov x1, x21 402750: aa1603e0 mov x0, x22 402754: 940060f7 bl 41ab30 <_IO_printf> with the value stored in [sp + 0x60] ignored by both printf() above and in the test proper, resulting in a false failure due to vcpu_set_reg() simply storing the original value, not the intended value. $ ./vpmu_counter_access Random seed: 0x6b8b4567 orig = 3040, next = 3040, want = 0 orig = 3040, next = 3040, want = 0 ==== Test Assertion Failure ==== aarch64/vpmu_counter_access.c:505: pmcr_n == get_pmcr_n(pmcr) pid=71578 tid=71578 errno=9 - Bad file descriptor 1 0x400673: run_access_test at vpmu_counter_access.c:522 2 (inlined by) main at vpmu_counter_access.c:643 3 0x4132d7: __libc_start_call_main at libc-start.o:0 4 0x413653: __libc_start_main at ??:0 5 0x40106f: _start at ??:0 Failed to update PMCR.N to 0 (received: 6) Somewhat bizarrely, gcc-11 also exhibits the same behavior, but only if set_pmcr_n() is marked noinline, whereas gcc-13 fails even if set_pmcr_n() is inlined in its sole caller. Cc: stable@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116912 Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: selftests: fix unintentional noop test in guest_memfd_test.cPatrick Roy
The loop in test_create_guest_memfd_invalid() that is supposed to test that nothing is accepted as a valid flag to KVM_CREATE_GUEST_MEMFD was initializing `flag` as 0 instead of BIT(0). This caused the loop to immediately exit instead of iterating over BIT(0), BIT(1), ... . Fixes: 8a89efd43423 ("KVM: selftests: Add basic selftest for guest_memfd()") Signed-off-by: Patrick Roy <roypat@amazon.co.uk> Reviewed-by: James Gowans <jgowans@amazon.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20241024095956.3668818-1-roypat@amazon.co.uk Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: selftests: memslot_perf_test: increase guest sync timeoutMaxim Levitsky
When memslot_perf_test is run nested, first iteration of test_memslot_rw_loop testcase, sometimes takes more than 2 seconds due to build of shadow page tables. Following iterations are fast. To be on the safe side, bump the timeout to 10 seconds. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Link: https://lore.kernel.org/r/20241004220153.287459-1-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Short-circuit all of kvm_apic_set_base() if MSR value is unchangedSean Christopherson
Do nothing in all of kvm_apic_set_base(), not just __kvm_apic_set_base(), if the incoming MSR value is the same as the current value. Validating the mode transitions is obviously unnecessary, and rejecting the write is pointless if the vCPU already has an invalid value, e.g. if userspace is doing weird things and modified guest CPUID after setting MSR_IA32_APICBASE. Bailing early avoids kvm_recalculate_apic_map()'s slow path in the rare scenario where the map is DIRTY due to some other vCPU dirtying the map, in which case it's the other vCPU/task's responsibility to recalculate the map. Note, kvm_lapic_reset() calls __kvm_apic_set_base() only when emulating RESET, in which case the old value is guaranteed to be zero, and the new value is guaranteed to be non-zero. I.e. all callers of __kvm_apic_set_base() effectively pre-check for the MSR value actually changing. Don't bother keeping the check in __kvm_apic_set_base(), as no additional callers are expected, and implying that the MSR might already be non-zero at the time of kvm_lapic_reset() could confuse readers. Link: https://lore.kernel.org/r/20241101183555.1794700-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Unpack msr_data structure prior to calling kvm_apic_set_base()Sean Christopherson
Pass in the new value and "host initiated" as separate parameters to kvm_apic_set_base(), as forcing the KVM_SET_SREGS path to declare and fill an msr_data structure is awkward and kludgy, e.g. __set_sregs_common() doesn't even bother to set the proper MSR index. No functional change intended. Suggested-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20241101183555.1794700-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Make kvm_recalculate_apic_map() local to lapic.cSean Christopherson
Make kvm_recalculate_apic_map() local to lapic.c now that all external callers are gone. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-8-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Rename APIC base setters to better capture their relationshipSean Christopherson
Rename kvm_set_apic_base() and kvm_lapic_set_base() to kvm_apic_set_base() and __kvm_apic_set_base() respectively to capture that the underscores version is a "special" variant (it exists purely to avoid recalculating the optimized map multiple times when stuffing the RESET value). Opportunistically add a comment explaining why kvm_lapic_reset() uses the inner helper. Note, KVM deliberately invokes kvm_arch_vcpu_create() while kvm->lock is NOT held so that vCPU setup isn't serialized if userspace is creating multiple/all vCPUs in parallel. I.e. triggering an extra recalculation is not limited to theoretical/rare edge cases, and so is worth avoiding. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-7-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Move kvm_set_apic_base() implementation to lapic.c (from x86.c)Sean Christopherson
Move kvm_set_apic_base() to lapic.c so that the bulk of KVM's local APIC code resides in lapic.c, regardless of whether or not KVM is emulating the local APIC in-kernel. This will also allow making various helpers visible only to lapic.c. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-6-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Inline kvm_get_apic_mode() in lapic.hSean Christopherson
Inline kvm_get_apic_mode() in lapic.h to avoid a CALL+RET as well as an export. The underlying kvm_apic_mode() helper is public information, i.e. there is no state/information that needs to be hidden from vendor modules. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-5-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Get vcpu->arch.apic_base directly and drop kvm_get_apic_base()Sean Christopherson
Access KVM's emulated APIC base MSR value directly instead of bouncing through a helper, as there is no reason to add a layer of indirection, and there are other MSRs with a "set" but no "get", e.g. EFER. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-4-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Drop superfluous kvm_lapic_set_base() call when setting APIC stateSean Christopherson
Now that kvm_lapic_set_base() does nothing if the "new" APIC base MSR is the same as the current value, drop the kvm_lapic_set_base() call in the KVM_SET_LAPIC flow that passes in the current value, as it too does nothing. Note, the purpose of invoking kvm_lapic_set_base() was purely to set apic->base_address (see commit 5dbc8f3fed0b ("KVM: use kvm_lapic_set_base() to change apic_base")). And there is no evidence that explicitly setting apic->base_address in KVM_SET_LAPIC ever had any functional impact; even in the original commit 96ad2cc61324 ("KVM: in-kernel LAPIC save and restore support"), all flows that set apic_base also set apic->base_address to the same address. E.g. svm_create_vcpu() did open code a write to apic_base, svm->vcpu.apic_base = 0xfee00000 | MSR_IA32_APICBASE_ENABLE; but it also called kvm_create_lapic() when irqchip_in_kernel() is true. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-3-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Short-circuit all kvm_lapic_set_base() if MSR value isn't changingSean Christopherson
Do nothing in kvm_lapic_set_base() if the APIC base MSR value is the same as the current value. All flows except the handling of the base address explicitly take effect if and only if relevant bits are changing. For the base address, invoking kvm_lapic_set_base() before KVM initializes the base to APIC_DEFAULT_PHYS_BASE during vCPU RESET would be a KVM bug, i.e. KVM _must_ initialize apic->base_address before exposing the vCPU (to userspace or KVM at-large). Note, the inhibit is intended to be set if the base address is _changed_ from the default, i.e. is also covered by the RESET behavior. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-2-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Drop per-VM zapped_obsolete_pages listVipin Sharma
Drop the per-VM zapped_obsolete_pages list now that the usage from the defunct mmu_shrinker is gone, and instead use a local list to track pages in kvm_zap_obsolete_pages(), the sole remaining user of zapped_obsolete_pages. Opportunistically add an assertion to verify and document that slots_lock must be held, i.e. that there can only be one active instance of kvm_zap_obsolete_pages() at any given time, and by doing so also prove that using a local list instead of a per-VM list doesn't change any functionality (beyond trivialities like list initialization). Signed-off-by: Vipin Sharma <vipinsh@google.com> Link: https://lore.kernel.org/r/20241101201437.1604321-2-vipinsh@google.com [sean: split to separate patch, write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Remove KVM's MMU shrinkerVipin Sharma
Remove KVM's MMU shrinker and (almost) all of its related code, as the current implementation is very disruptive to VMs (if it ever runs), without providing any meaningful benefit[1]. Alternatively, KVM could repurpose its shrinker, e.g. to reclaim pages from the per-vCPU caches[2], but given that no one has complained about lack of TDP MMU support for the shrinker in the 3+ years since the TDP MMU was enabled by default, it's safe to say that there is likely no real use case for initiating reclaim of KVM's page tables from the shrinker. And while clever/cute, reclaiming the per-vCPU caches doesn't scale the same way that reclaiming in-use page table pages does. E.g. the amount of memory being used by a VM doesn't always directly correlate with the number vCPUs, and even when it does, reclaiming a few pages from per-vCPU caches likely won't make much of a dent in the VM's total memory usage, especially for VMs with huge amounts of memory. Lastly, if it turns out that there is a strong use case for dropping the per-vCPU caches, re-introducing the shrinker registration is trivial compared to the complexity of actually reclaiming pages from the caches. [1] https://lore.kernel.org/lkml/Y45dldZnI6OIf+a5@google.com [2] https://lore.kernel.org/kvm/20241004195540.210396-3-vipinsh@google.com Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: David Matlack <dmatlack@google.com> Signed-off-by: Vipin Sharma <vipinsh@google.com> Link: https://lore.kernel.org/r/20241101201437.1604321-2-vipinsh@google.com [sean: keep zapped_obsolete_pages for now, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: WARN if huge page recovery triggered during dirty loggingDavid Matlack
WARN and bail out of recover_huge_pages_range() if dirty logging is enabled. KVM shouldn't be recovering huge pages during dirty logging anyway, since KVM needs to track writes at 4KiB. However it's not out of the possibility that that changes in the future. If KVM wants to recover huge pages during dirty logging, make_huge_spte() must be updated to write-protect the new huge page mapping. Otherwise, writes through the newly recovered huge page mapping will not be tracked. Note that this potential risk did not exist back when KVM zapped to recover huge page mappings, since subsequent accesses would just be faulted in at PG_LEVEL_4K if dirty logging was enabled. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240823235648.3236880-7-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Rename make_huge_page_split_spte() to make_small_spte()David Matlack
Rename make_huge_page_split_spte() to make_small_spte(). This ensures that the usage of "small_spte" and "huge_spte" are consistent between make_huge_spte() and make_small_spte(). This should also reduce some confusion as make_huge_page_split_spte() almost reads like it will create a huge SPTE, when in fact it is creating a small SPTE to split the huge SPTE. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240823235648.3236880-6-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Recover TDP MMU huge page mappings in-place instead of zappingDavid Matlack
Recover TDP MMU huge page mappings in-place instead of zapping them when dirty logging is disabled, and rename functions that recover huge page mappings when dirty logging is disabled to move away from the "zap collapsible spte" terminology. Before KVM flushes TLBs, guest accesses may be translated through either the (stale) small SPTE or the (new) huge SPTE. This is already possible when KVM is doing eager page splitting (where TLB flushes are also batched), and when vCPUs are faulting in huge mappings (where TLBs are flushed after the new huge SPTE is installed). Recovering huge pages reduces the number of page faults when dirty logging is disabled: $ perf stat -e kvm:kvm_page_fault -- ./dirty_log_perf_test -s anonymous_hugetlb_2mb -v 64 -e -b 4g Before: 393,599 kvm:kvm_page_fault After: 262,575 kvm:kvm_page_fault vCPU throughput and the latency of disabling dirty-logging are about equal compared to zapping, but avoiding faults can be beneficial to remove vCPU jitter in extreme scenarios. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240823235648.3236880-5-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Refactor TDP MMU iter need resched checkSean Christopherson
Refactor the TDP MMU iterator "need resched" checks into a helper function so they can be called from a different code path in a subsequent commit. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240823235648.3236880-4-dmatlack@google.com [sean: rebase on a swapped order of checks] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Demote the WARN on yielded in xxx_cond_resched() to ↵Sean Christopherson
KVM_MMU_WARN_ON Convert the WARN in tdp_mmu_iter_cond_resched() that the iterator hasn't already yielded to a KVM_MMU_WARN_ON() so the code is compiled out for production kernels (assuming production kernels disable KVM_PROVE_MMU). Checking for a needed reschedule is a hot path, and KVM sanity checks iter->yielded in several other less-hot paths, i.e. the odds of KVM not flagging that something went sideways are quite low. Furthermore, the odds of KVM not noticing *and* the WARN detecting something worth investigating are even lower. Link: https://lore.kernel.org/r/20241031170633.1502783-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Check yielded_gfn for forward progress iff resched is neededSean Christopherson
Swap the order of the checks in tdp_mmu_iter_cond_resched() so that KVM checks to see if a resched is needed _before_ checking to see if yielding must be disallowed to guarantee forward progress. Iterating over TDP MMU SPTEs is a hot path, e.g. tearing down a root can touch millions of SPTEs, and not needing to reschedule is by far the common case. On the other hand, disallowing yielding because forward progress has not been made is a very rare case. Returning early for the common case (no resched), effectively reduces the number of checks from 2 to 1 for the common case, and should make the code slightly more predictable for the CPU. To resolve a weird conundrum where the forward progress check currently returns false, but the need resched check subtly returns iter->yielded, which _should_ be false (enforced by a WARN), return false unconditionally (which might also help make the sequence more predictable). If KVM has a bug where iter->yielded is left danging, continuing to yield is neither right nor wrong, it was simply an artifact of how the original code was written. Unconditionally returning false when yielding is unnecessary or unwanted will also allow extracting the "should resched" logic to a separate helper in a future patch. Cc: David Matlack <dmatlack@google.com> Reviewed-by: James Houghton <jthoughton@google.com> Link: https://lore.kernel.org/r/20241031170633.1502783-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04Revert "Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'"Jakub Kicinski
This reverts commit d80a3091308491455b6501b1c4b68698c4a7cd24, reversing changes made to 637f41476384c76d3cd7dcf5947caf2c8b8d7a9b: 2cf246143519 ("net: hns3: fix kernel crash when 1588 is sent on HIP08 devices") 3e22b7de34cb ("net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue") d1c2e2961ab4 ("net: hns3: initialize reset_timer before hclgevf_misc_irq_init()") 5f62009ff108 ("net: hns3: don't auto enable misc vector") 2758f18a83ef ("net: hns3: Resolved the issue that the debugfs query result is inconsistent.") 662ecfc46690 ("net: hns3: fix missing features due to dev->features configuration too early") 3e0f7cc887b7 ("net: hns3: fixed reset failure issues caused by the incorrect reset type") f2c14899caba ("net: hns3: add sync command to sync io-pgtable") e6ab19443b36 ("net: hns3: default enable tx bounce buffer when smmu enabled") The series is making the driver poke into IOMMU internals instead of implementing appropriate IOMMU workarounds. Link: https://lore.kernel.org/069c9838-b781-4012-934a-d2626fa78212@arm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04scsi: ufs: core: Start the RTC update work laterBart Van Assche
The RTC update work involves runtime resuming the UFS controller. Hence, only start the RTC update work after runtime power management in the UFS driver has been fully initialized. This patch fixes the following kernel crash: Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Workqueue: events ufshcd_rtc_work Call trace: _raw_spin_lock_irqsave+0x34/0x8c (P) pm_runtime_get_if_active+0x24/0x9c (L) pm_runtime_get_if_active+0x24/0x9c ufshcd_rtc_work+0x138/0x1b4 process_one_work+0x148/0x288 worker_thread+0x2cc/0x3d4 kthread+0x110/0x114 ret_from_fork+0x10/0x20 Reported-by: Neil Armstrong <neil.armstrong@linaro.org> Closes: https://lore.kernel.org/linux-scsi/0c0bc528-fdc2-4106-bc99-f23ae377f6f5@linaro.org/ Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support") Cc: Bean Huo <beanhuo@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241031212632.2799127-1-bvanassche@acm.org Reviewed-by: Peter Wang <peter.wang@mediatek.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-HDK Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-11-04Merge tag 'linux-can-fixes-for-6.12-20241104' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2024-11-04 Alexander Hölzl contributes a patch to fix an error in the CAN j1939 documentation. Thomas Mühlbacher's patch allows building of the {cc770,sja1000}_isa drivers on x86_64 again. A patch by me targets the m_can driver and limits the call to free_irq() to devices with IRQs. Dario Binacchi's patch fixes the RX and TX error counters in the c_can driver. The next 2 patches target the rockchip_canfd driver. Geert Uytterhoeven's patch lets the driver depend on ARCH_ROCKCHIP. Jean Delvare's patch drops the obsolete dependency on COMPILE_TEST. The last 2 patches are by me and fix 2 regressions in the mcp251xfd driver: fix broken coalescing configuration when switching CAN modes and fix the length calculation of the Transmit Event FIFO (TEF) on full TEF. * tag 'linux-can-fixes-for-6.12-20241104' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculation can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing configuration when switching CAN modes can: rockchip_canfd: Drop obsolete dependency on COMPILE_TEST can: rockchip_canfd: CAN_ROCKCHIP_CANFD should depend on ARCH_ROCKCHIP can: c_can: fix {rx,tx}_errors statistics can: m_can: m_can_close(): don't call free_irq() for IRQ-less devices can: {cc770,sja1000}_isa: allow building on x86_64 can: j1939: fix error in J1939 documentation. ==================== Link: https://patch.msgid.link/20241104200120.393312-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04Merge tag 'arm-fixes-6.12-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "Where the last set of fixes was mostly drivers, this time the devicetree changes all come at once, targeting mostly the Rockchips, Qualcomm and NXP platforms. The Qualcomm bugfixes target the Snapdragon X Elite laptops, specifically problems with PCIe and NVMe support to improve reliability, and a boot regresion on msm8939. Also for Snapdragon platforms, there are a number of correctness changes in the several platform specific device drivers, but none of these are as impactful. On the NXP i.MX platform, the fixes are all for 64-bit i.MX8 variants, correcting individual entries in the devicetree that were incorrect and causing the media, video, mmc and spi drivers to misbehave in minor ways. The Arm SCMI firmware driver gets fixes for a use-after-free bug and for correctly parsing firmware information. On the RISC-V side, there are three minor devicetree fixes for starfive and sophgo, again addressing only minor mistakes. One device driver patch fixes a problem with spurious interrupt handling" * tag 'arm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (63 commits) firmware: arm_scmi: Use vendor string in max-rx-timeout-ms dt-bindings: firmware: arm,scmi: Add missing vendor string riscv: dts: Replace deprecated snps,nr-gpios property for snps,dw-apb-gpio-port devices arm64: dts: rockchip: Correct GPIO polarity on brcm BT nodes arm64: dts: rockchip: Drop invalid clock-names from es8388 codec nodes ARM: dts: rockchip: Fix the realtek audio codec on rk3036-kylin ARM: dts: rockchip: Fix the spi controller on rk3036 ARM: dts: rockchip: drop grf reference from rk3036 hdmi ARM: dts: rockchip: fix rk3036 acodec node arm64: dts: rockchip: remove orphaned pinctrl-names from pinephone pro soc: qcom: pmic_glink: Handle GLINK intent allocation rejections rpmsg: glink: Handle rejected intent request better arm64: dts: qcom: x1e80100: fix PCIe5 interconnect arm64: dts: qcom: x1e80100: fix PCIe4 interconnect arm64: dts: qcom: x1e80100: Fix up BAR spaces MAINTAINERS: invert Misc RISC-V SoC Support's pattern soc: qcom: socinfo: fix revision check in qcom_socinfo_probe() arm64: dts: qcom: x1e80100-qcp: fix nvme regulator boot glitch arm64: dts: qcom: x1e80100-microsoft-romulus: fix nvme regulator boot glitch arm64: dts: qcom: x1e80100-yoga-slim7x: fix nvme regulator boot glitch ...
2024-11-05ksmbd: check outstanding simultaneous SMB operationsNamjae Jeon
If Client send simultaneous SMB operations to ksmbd, It exhausts too much memory through the "ksmbd_work_cache”. It will cause OOM issue. ksmbd has a credit mechanism but it can't handle this problem. This patch add the check if it exceeds max credits to prevent this problem by assuming that one smb request consumes at least one credit. Cc: stable@vger.kernel.org # v5.15+ Reported-by: Norbert Szetei <norbert@doyensec.com> Tested-by: Norbert Szetei <norbert@doyensec.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-05ksmbd: fix slab-use-after-free in smb3_preauth_hash_rspNamjae Jeon
ksmbd_user_session_put should be called under smb3_preauth_hash_rsp(). It will avoid freeing session before calling smb3_preauth_hash_rsp(). Cc: stable@vger.kernel.org # v5.15+ Reported-by: Norbert Szetei <norbert@doyensec.com> Tested-by: Norbert Szetei <norbert@doyensec.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-05ksmbd: fix slab-use-after-free in ksmbd_smb2_session_createNamjae Jeon
There is a race condition between ksmbd_smb2_session_create and ksmbd_expire_session. This patch add missing sessions_table_lock while adding/deleting session from global session table. Cc: stable@vger.kernel.org # v5.15+ Reported-by: Norbert Szetei <norbert@doyensec.com> Tested-by: Norbert Szetei <norbert@doyensec.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-04cpufreq: intel_pstate: Update asym capacity for CPUs that were offline initiallyRafael J. Wysocki
Commit 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") overlooked a corner case in which some CPUs may be offline to start with and brought back online later, after the intel_pstate driver has been registered, so their asymmetric capacity will not be set. Address this by calling hybrid_update_capacity() in the CPU initialization path that is executed instead of the online path for those CPUs. Note that this asymmetric capacity update will be skipped during driver initialization and mode switches because hybrid_max_perf_cpu is NULL in those cases. Fixes: 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/1913414.tdWV9SEqCh@rjwysocki.net
2024-11-04cpufreq: intel_pstate: Clear hybrid_max_perf_cpu before driver registrationRafael J. Wysocki
Modify intel_pstate_register_driver() to clear hybrid_max_perf_cpu before calling cpufreq_register_driver(), so that asymmetric CPU capacity scaling is not updated until hybrid_init_cpu_capacity_scaling() runs down the road. This is done in preparation for a subsequent change adding asymmetric CPU capacity computation to the CPU init path to handle CPUs that are initially offline. The information on whether or not hybrid_max_perf_cpu was NULL before it has been cleared is passed to hybrid_init_cpu_capacity_scaling(), so full initialization of CPU capacity scaling can be skipped if it has been carried out already. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/4616631.LvFx2qVVIh@rjwysocki.net
2024-11-04nvme/host: Fix RCU list traversal to use SRCU primitiveBreno Leitao
The code currently uses list_for_each_entry_rcu() while holding an SRCU lock, triggering false positive warnings with CONFIG_PROVE_RCU=y enabled: drivers/nvme/host/core.c:3770 RCU-list traversed in non-reader section!! While the list is properly protected by SRCU lock, the code uses the wrong list traversal primitive. Replace list_for_each_entry_rcu() with list_for_each_entry_srcu() to correctly indicate SRCU-based protection and eliminate the false warning. Fixes: be647e2c76b2 ("nvme: use srcu for iterating namespace list") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-04e1000e: Remove Meteor Lake SMBUS workaroundsVitaly Lifshits
This is a partial revert to commit 76a0a3f9cc2f ("e1000e: fix force smbus during suspend flow"). That commit fixed a sporadic PHY access issue but introduced a regression in runtime suspend flows. The original issue on Meteor Lake systems was rare in terms of the reproduction rate and the number of the systems affected. After the integration of commit 0a6ad4d9e169 ("e1000e: avoid failing the system during pm_suspend"), PHY access loss can no longer cause a system-level suspend failure. As it only occurs when the LAN cable is disconnected, and is recovered during system resume flow. Therefore, its functional impact is low, and the priority is given to stabilizing runtime suspend. Fixes: 76a0a3f9cc2f ("e1000e: fix force smbus during suspend flow") Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04i40e: fix race condition by adding filter's intermediate sync stateAleksandr Loktionov
Fix a race condition in the i40e driver that leads to MAC/VLAN filters becoming corrupted and leaking. Address the issue that occurs under heavy load when multiple threads are concurrently modifying MAC/VLAN filters by setting mac and port VLAN. 1. Thread T0 allocates a filter in i40e_add_filter() within i40e_ndo_set_vf_port_vlan(). 2. Thread T1 concurrently frees the filter in __i40e_del_filter() within i40e_ndo_set_vf_mac(). 3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which refers to the already freed filter memory, causing corruption. Reproduction steps: 1. Spawn multiple VFs. 2. Apply a concurrent heavy load by running parallel operations to change MAC addresses on the VFs and change port VLANs on the host. 3. Observe errors in dmesg: "Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX, please set promiscuous on manually for VF XX". Exact code for stable reproduction Intel can't open-source now. The fix involves implementing a new intermediate filter state, I40E_FILTER_NEW_SYNC, for the time when a filter is on a tmp_add_list. These filters cannot be deleted from the hash list directly but must be removed using the full process. Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04idpf: fix idpf_vc_core_init error pathPavan Kumar Linga
In an event where the platform running the device control plane is rebooted, reset is detected on the driver. It releases all the resources and waits for the reset to complete. Once the reset is done, it tries to build the resources back. At this time if the device control plane is not yet started, then the driver timeouts on the virtchnl message and retries to establish the mailbox again. In the retry flow, mailbox is deinitialized but the mailbox workqueue is still alive and polling for the mailbox message. This results in accessing the released control queue leading to null-ptr-deref. Fix it by unrolling the work queue cancellation and mailbox deinitialization in the reverse order which they got initialized. Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request") Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager") Cc: stable@vger.kernel.org # 6.9+ Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04idpf: avoid vport access in idpf_get_link_ksettingsPavan Kumar Linga
When the device control plane is removed or the platform running device control plane is rebooted, a reset is detected on the driver. On driver reset, it releases the resources and waits for the reset to complete. If the reset fails, it takes the error path and releases the vport lock. At this time if the monitoring tools tries to access link settings, it call traces for accessing released vport pointer. To avoid it, move link_speed_mbps to netdev_priv structure which removes the dependency on vport pointer and the vport lock in idpf_get_link_ksettings. Also use netif_carrier_ok() to check the link status and adjust the offsetof to use link_up instead of link_speed_mbps. Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks") Cc: stable@vger.kernel.org # 6.7+ Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04ice: change q_index variable type to s16 to store -1 valueMateusz Polchlopek
Fix Flow Director not allowing to re-map traffic to 0th queue when action is configured to drop (and vice versa). The current implementation of ethtool callback in the ice driver forbids change Flow Director action from 0 to -1 and from -1 to 0 with an error, e.g: # ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action 0 # ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action -1 rmgr: Cannot insert RX class rule: Invalid argument We set the value of `u16 q_index = 0` at the beginning of the function ice_set_fdir_input_set(). In case of "drop traffic" action (which is equal to -1 in ethtool) we store the 0 value. Later, when want to change traffic rule to redirect to queue with index 0 it returns an error caused by duplicate found. Fix this behaviour by change of the type of field `q_index` from u16 to s16 in `struct ice_fdir_fltr`. This allows to store -1 in the field in case of "drop traffic" action. What is more, change the variable type in the function ice_set_fdir_input_set() and assign at the beginning the new `#define ICE_FDIR_NO_QUEUE_IDX` which is -1. Later, if the action is set to another value (point specific queue index) the variable value is overwritten in the function. Fixes: cac2a27cd9ab ("ice: Support IPv4 Flow Director filters") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04ice: Fix use after free during unload with ports in bridgeMarcin Szycik
Unloading the ice driver while switchdev port representors are added to a bridge can lead to kernel panic. Reproducer: modprobe ice devlink dev eswitch set $PF1_PCI mode switchdev ip link add $BR type bridge ip link set $BR up echo 2 > /sys/class/net/$PF1/device/sriov_numvfs sleep 2 ip link set $PF1 master $BR ip link set $VF1_PR master $BR ip link set $VF2_PR master $BR ip link set $PF1 up ip link set $VF1_PR up ip link set $VF2_PR up ip link set $VF1 up rmmod irdma ice When unloading the driver, ice_eswitch_detach() is eventually called as part of VF freeing. First, it removes a port representor from xarray, then unregister_netdev() is called (via repr->ops.rem()), finally representor is deallocated. The problem comes from the bridge doing its own deinit at the same time. unregister_netdev() triggers a notifier chain, resulting in ice_eswitch_br_port_deinit() being called. It should set repr->br_port = NULL, but this does not happen since repr has already been removed from xarray and is not found. Regardless, it finishes up deallocating br_port. At this point, repr is still not freed and an fdb event can happen, in which ice_eswitch_br_fdb_event_work() takes repr->br_port and tries to use it, which causes a panic (use after free). Note that this only happens with 2 or more port representors added to the bridge, since with only one representor port, the bridge deinit is slightly different (ice_eswitch_br_port_deinit() is called via ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()). Trace: Oops: general protection fault, probably for non-canonical address 0xf129010fd1a93284: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: maybe wild-memory-access in range [0x8948287e8d499420-0x8948287e8d499427] (...) Workqueue: ice_bridge_wq ice_eswitch_br_fdb_event_work [ice] RIP: 0010:__rht_bucket_nested+0xb4/0x180 (...) Call Trace: (...) ice_eswitch_br_fdb_find+0x3fa/0x550 [ice] ? __pfx_ice_eswitch_br_fdb_find+0x10/0x10 [ice] ice_eswitch_br_fdb_event_work+0x2de/0x1e60 [ice] ? __schedule+0xf60/0x5210 ? mutex_lock+0x91/0xe0 ? __pfx_ice_eswitch_br_fdb_event_work+0x10/0x10 [ice] ? ice_eswitch_br_update_work+0x1f4/0x310 [ice] (...) A workaround is available: brctl setageing $BR 0, which stops the bridge from adding fdb entries altogether. Change the order of operations in ice_eswitch_detach(): move the call to unregister_netdev() before removing repr from xarray. This way repr->br_port will be correctly set to NULL in ice_eswitch_br_port_deinit(), preventing a panic. Fixes: fff292b47ac1 ("ice: add VF representors one by one") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04KEYS: trusted: dcp: fix NULL dereference in AEAD crypto operationDavid Gstir
When sealing or unsealing a key blob we currently do not wait for the AEAD cipher operation to finish and simply return after submitting the request. If there is some load on the system we can exit before the cipher operation is done and the buffer we read from/write to is already removed from the stack. This will e.g. result in NULL pointer dereference errors in the DCP driver during blob creation. Fix this by waiting for the AEAD cipher operation to finish before resuming the seal and unseal calls. Cc: stable@vger.kernel.org # v6.10+ Fixes: 0e28bf61a5f9 ("KEYS: trusted: dcp: fix leak of blob encryption key") Reported-by: Parthiban N <parthiban@linumiz.com> Closes: https://lore.kernel.org/keyrings/254d3bb1-6dbc-48b4-9c08-77df04baee2f@linumiz.com/ Signed-off-by: David Gstir <david@sigma-star.at> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-11-04security/keys: fix slab-out-of-bounds in key_task_permissionChen Ridong
KASAN reports an out of bounds read: BUG: KASAN: slab-out-of-bounds in __kuid_val include/linux/uidgid.h:36 BUG: KASAN: slab-out-of-bounds in uid_eq include/linux/uidgid.h:63 [inline] BUG: KASAN: slab-out-of-bounds in key_task_permission+0x394/0x410 security/keys/permission.c:54 Read of size 4 at addr ffff88813c3ab618 by task stress-ng/4362 CPU: 2 PID: 4362 Comm: stress-ng Not tainted 5.10.0-14930-gafbffd6c3ede #15 Call Trace: __dump_stack lib/dump_stack.c:82 [inline] dump_stack+0x107/0x167 lib/dump_stack.c:123 print_address_description.constprop.0+0x19/0x170 mm/kasan/report.c:400 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560 kasan_report+0x3a/0x50 mm/kasan/report.c:585 __kuid_val include/linux/uidgid.h:36 [inline] uid_eq include/linux/uidgid.h:63 [inline] key_task_permission+0x394/0x410 security/keys/permission.c:54 search_nested_keyrings+0x90e/0xe90 security/keys/keyring.c:793 This issue was also reported by syzbot. It can be reproduced by following these steps(more details [1]): 1. Obtain more than 32 inputs that have similar hashes, which ends with the pattern '0xxxxxxxe6'. 2. Reboot and add the keys obtained in step 1. The reproducer demonstrates how this issue happened: 1. In the search_nested_keyrings function, when it iterates through the slots in a node(below tag ascend_to_node), if the slot pointer is meta and node->back_pointer != NULL(it means a root), it will proceed to descend_to_node. However, there is an exception. If node is the root, and one of the slots points to a shortcut, it will be treated as a keyring. 2. Whether the ptr is keyring decided by keyring_ptr_is_keyring function. However, KEYRING_PTR_SUBTYPE is 0x2UL, the same as ASSOC_ARRAY_PTR_SUBTYPE_MASK. 3. When 32 keys with the similar hashes are added to the tree, the ROOT has keys with hashes that are not similar (e.g. slot 0) and it splits NODE A without using a shortcut. When NODE A is filled with keys that all hashes are xxe6, the keys are similar, NODE A will split with a shortcut. Finally, it forms the tree as shown below, where slot 6 points to a shortcut. NODE A +------>+---+ ROOT | | 0 | xxe6 +---+ | +---+ xxxx | 0 | shortcut : : xxe6 +---+ | +---+ xxe6 : : | | | xxe6 +---+ | +---+ | 6 |---+ : : xxe6 +---+ +---+ xxe6 : : | f | xxe6 +---+ +---+ xxe6 | f | +---+ 4. As mentioned above, If a slot(slot 6) of the root points to a shortcut, it may be mistakenly transferred to a key*, leading to a read out-of-bounds read. To fix this issue, one should jump to descend_to_node if the ptr is a shortcut, regardless of whether the node is root or not. [1] https://lore.kernel.org/linux-kernel/1cfa878e-8c7b-4570-8606-21daf5e13ce7@huaweicloud.com/ [jarkko: tweaked the commit message a bit to have an appropriate closes tag.] Fixes: b2a4df200d57 ("KEYS: Expand the capacity of a keyring") Reported-by: syzbot+5b415c07907a2990d1a3@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000cbb7860611f61147@google.com/T/ Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-11-04Merge tag 'mmc-v6.12-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull mmc fixes from Ulf Hansson: - sdhci-pci-gli: A couple of fixes for low power mode on GL9767 * tag 'mmc-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-pci-gli: GL9767: Fix low power mode in the SD Express process mmc: sdhci-pci-gli: GL9767: Fix low power mode on the set clock function
2024-11-04Merge tag 'tpmdd-next-6.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fix from Jarkko Sakkinen: "Fix a race condition between tpm_pm_suspend() and tpm_hwrng_read() (I think for good now)" * tag 'tpmdd-next-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: Lock TPM chip in tpm_pm_suspend() first
2024-11-04drm/amd/pm: correct the workload settingKenneth Feng
Correct the workload setting in order not to mix the setting with the end user. Update the workload mask accordingly. v2: changes as below: 1. the end user can not erase the workload from driver except default workload. 2. always shows the real highest priority workoad to the end user. 3. the real workload mask is combined with driver workload mask and end user workload mask. v3: apply this to the other ASICs as well. v4: simplify the code v5: refine the code based on the review comments. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8cc438be5d49b8326b2fcade0bdb7e6a97df9e0b) Cc: stable@vger.kernel.org # 6.11.x
2024-11-04drm/amd/pm: always pick the pptable from IFWIKenneth Feng
always pick the pptable from IFWI on smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 136ce12bd5907388cb4e9aa63ee5c9c8c441640b) Cc: stable@vger.kernel.org # 6.11.x
2024-11-04drm/amdgpu: prevent NULL pointer dereference if ATIF is not supportedAntonio Quartulli
acpi_evaluate_object() may return AE_NOT_FOUND (failure), which would result in dereferencing buffer.pointer (obj) while being NULL. Although this case may be unrealistic for the current code, it is still better to protect against possible bugs. Bail out also when status is AE_NOT_FOUND. This fixes 1 FORWARD_NULL issue reported by Coverity Report: CID 1600951: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Fixes: c9b7c809b89f ("drm/amd: Guard against bad data for ATIF ACPI method") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91c9e221fe2553edf2db71627d8453f083de87a1) Cc: stable@vger.kernel.org
2024-11-04drm/amd/display: parse umc_info or vram_info based on ASICAurabindo Pillai
An upstream bug report suggests that there are production dGPUs that are older than DCN401 but still have a umc_info in VBIOS tables with the same version as expected for a DCN401 product. Hence, reading this tables should be guarded with a version check. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3678 Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2551b4a321a68134360b860113dd460133e856e5) Fixes: 00c391102abc ("drm/amd/display: Add misc DC changes for DCN401") Cc: stable@vger.kernel.org # 6.11.x
2024-11-04drm/amd/display: Fix brightness level not retained over rebootTom Chung
[Why] During boot up and resume the DC layer will reset the panel brightness to fix a flicker issue. It will cause the dm->actual_brightness is not the current panel brightness level. (the dm->brightness is the correct panel level) [How] Set the backlight level after do the set mode. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: d9e865826c20 ("drm/amd/display: Simplify brightness initialization") Reported-by: Mark Herbert <mark.herbert42@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3655 Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7875afafba84817b791be6d2282b836695146060) Cc: stable@vger.kernel.org
2024-11-04can: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculationMarc Kleine-Budde
Commit b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum") introduced mcp251xfd_get_tef_len() to get the number of unhandled transmit events from the Transmit Event FIFO (TEF). As the TEF has no head pointer, the driver uses the TX FIFO's tail pointer instead, assuming that send frames are completed. However the check for the TEF being full was not correct. This leads to the driver stop working if the TEF is full. Fix the TEF full check by assuming that if, from the driver's point of view, there are no free TX buffers in the chip and the TX FIFO is empty, all messages must have been sent and the TEF must therefore be full. Reported-by: Sven Schuchmann <schuchmann@schleissheimer.de> Closes: https://patch.msgid.link/FR3P281MB155216711EFF900AD9791B7ED9692@FR3P281MB1552.DEUP281.PROD.OUTLOOK.COM Fixes: b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum") Tested-by: Sven Schuchmann <schuchmann@schleissheimer.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20241104-mcp251xfd-fix-length-calculation-v3-1-608b6e7e2197@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-11-04can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing configuration when ↵Marc Kleine-Budde
switching CAN modes Since commit 50ea5449c563 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode"), the current ring and coalescing configuration is passed to can_ram_get_layout(). That fixed the issue when switching between CAN-CC and CAN-FD mode with configured ring (rx, tx) and/or coalescing parameters (rx-frames-irq, tx-frames-irq). However 50ea5449c563 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode"), introduced a regression when switching CAN modes with disabled coalescing configuration: Even if the previous CAN mode has no coalescing configured, the new mode is configured with active coalescing. This leads to delayed receiving of CAN-FD frames. This comes from the fact, that ethtool uses usecs = 0 and max_frames = 1 to disable coalescing, however the driver uses internally priv->{rx,tx}_obj_num_coalesce_irq = 0 to indicate disabled coalescing. Fix the regression by assigning struct ethtool_coalesce ec->{rx,tx}_max_coalesced_frames_irq = 1 if coalescing is disabled in the driver as can_ram_get_layout() expects this. Reported-by: https://github.com/vdh-robothania Closes: https://github.com/raspberrypi/linux/issues/6407 Fixes: 50ea5449c563 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode") Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241025-mcp251xfd-fix-coalesing-v1-1-9d11416de1df@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>