summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-24KVM: arm64: Advertise FEAT_ECV when possibleMarc Zyngier
We can advertise support for FEAT_ECV if supported on the HW as long as we limit it to the basic trap bits, and not advertise CNTPOFF_EL2 support, even if the host has it (the short story being that CNTPOFF_EL2 is not virtualisable). Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-13-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writableMarc Zyngier
We want to make sure that it is possible for userspace to configure whether recursive NV is possible. Make NV_frac writable for that purpose. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-12-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Allow userspace to limit NV support to nVHEMarc Zyngier
NV is hard. No kidding. In order to make things simpler, we have established that NV would support two mutually exclusive configurations: - VHE-only, and supporting recursive virtualisation - nVHE-only, and not supporting recursive virtualisation For that purpose, introduce a new vcpu feature flag that denotes the second configuration. We use this flag to limit the idregs further. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Move NV-specific capping to idreg sanitisationMarc Zyngier
Instead of applying the NV idreg limits at run time, switch to doing it at the same time as the reset of the VM initialisation. This will make things much simpler once we introduce vcpu-driven variants of NV. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Enforce NV limits on a per-idregs basisMarc Zyngier
As we are about to change the way the idreg reset values are computed, move all the NV limits into a function that initialises one register at a time. This will be most useful in the upcoming patches. We take this opportunity to remove the NV_FTR() macro and rely on the generated names instead. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely availableMarc Zyngier
ID_REG_LIMIT_FIELD_ENUM() is a useful macro to limit the idreg features exposed to guest and userspace, and the NV code can make use of it. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-8-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Consolidate idreg callbacksMarc Zyngier
Most of the ID_DESC() users use the same callbacks, with only a few overrides. Consolidate the common callbacks in a macro, and consistently use it everywhere. Whilst we're at it, give ID_UNALLOCATED() a .name string, so that we can easily decode traces. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Advertise NV2 in the boot messagesMarc Zyngier
Make it a bit easier to understand what people are running by adding a +NV2 string to the successful KVM initialisation. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0Marc Zyngier
Enforce HCR_EL2.{NV*,AT} being RES0 when NV2 is disabled, so that we can actually rely on these bits never being flipped behind our back. This of course relies on our earlier ID reg sanitising. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zeroMarc Zyngier
Enforce HCR_EL2.E2H being RES0 when VHE is disabled, so that we can actually rely on that bit never being flipped behind our back. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspaceMarc Zyngier
Since our take on FEAT_NV is to only support FEAT_NV2, we should never expose ID_AA64MMFR2_EL1.NV to a guest nor userspace. Make sure we mask this field for good. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-3-maz@kernel.org [oliver: squash diff for NV field] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24arm64: cpufeature: Handle NV_frac as a synonym of NV2Marc Zyngier
With ARMv9.5, an implementation supporting Nested Virtualization is allowed to only support NV2, and to avoid supporting the old (and useless) ARMv8.3 variant. This is indicated by ID_AA64MMFR2_EL1.NV being 0 (as if NV wasn't implemented) and ID_AA64MMFR4_EL1.NV_frac being 1 (indicating that NV2 is actually supported). Given that KVM only deals with NV2 and refuses to use the old NV, detecting NV2 or NV_frac is what we need to enable it. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24coredump: Only sort VMAs when core_sort_vma sysctl is setKees Cook
The sorting of VMAs by size in commit 7d442a33bfe8 ("binfmt_elf: Dump smaller VMAs first in ELF cores") breaks elfutils[1]. Instead, sort based on the setting of the new sysctl, core_sort_vma, which defaults to 0, no sorting. Reported-by: Michael Stapelberg <michael@stapelberg.ch> Closes: https://lore.kernel.org/all/20250218085407.61126-1-michael@stapelberg.de/ [1] Fixes: 7d442a33bfe8 ("binfmt_elf: Dump smaller VMAs first in ELF cores") Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-24ASoC: dapm-graph: set fill colour of turned on nodesNicolas Frattaroli
Some tools like KGraphViewer interpret the "ON" nodes not having an explicitly set fill colour as them being entirely black, which obscures the text on them and looks funny. In fact, I thought they were off for the longest time. Comparing to the output of the `dot` tool, I assume they are supposed to be white. Instead of speclawyering over who's in the wrong and must immediately atone for their wickedness at the altar of RFC2119, just be explicit about it, set the fillcolor to white, and nobody gets confused. Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Link: https://patch.msgid.link/20250221-dapm-graph-node-colour-v1-1-514ed0aa7069@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-24ASoC: fsl: Rename stream name of SAI DAI driverChancel Liu
If stream names of DAI driver are duplicated there'll be warnings when machine driver tries to add widgets on a route: [ 8.831335] fsl-asoc-card sound-wm8960: ASoC: sink widget CPU-Playback overwritten [ 8.839917] fsl-asoc-card sound-wm8960: ASoC: source widget CPU-Capture overwritten Use different stream names to avoid such warnings. DAI names in AUDMIX are also updated accordingly. Fixes: 15c958390460 ("ASoC: fsl_sai: Add separate DAI for transmitter and receiver") Signed-off-by: Chancel Liu <chancel.liu@nxp.com> Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com> Link: https://patch.msgid.link/20250217010437.258621-1-chancel.liu@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-24perf/core: Order the PMU list to fix warning about unordered pmu_ctx_listLuo Gengkun
Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in perf_event_swap_task_ctx_data(). vmcore shows that two lists have the same perf_event_pmu_context, but not in the same order. The problem is that the order of pmu_ctx_list for the parent is impacted by the time when an event/PMU is added. While the order for a child is impacted by the event order in the pinned_groups and flexible_groups. So the order of pmu_ctx_list in the parent and child may be different. To fix this problem, insert the perf_event_pmu_context to its proper place after iteration of the pmu_ctx_list. The follow testcase can trigger above warning: # perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out & # perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out test.c void main() { int count = 0; pid_t pid; printf("%d running\n", getpid()); sleep(30); printf("running\n"); pid = fork(); if (pid == -1) { printf("fork error\n"); return; } if (pid == 0) { while (1) { count++; } } else { while (1) { count++; } } } The testcase first opens an LBR event, so it will allocate task_ctx_data, and then open tracepoint and software events, so the parent context will have 3 different perf_event_pmu_contexts. On inheritance, child ctx will insert the perf_event_pmu_context in another order and the warning will trigger. [ mingo: Tidied up the changelog. ] Fixes: bd2756811766 ("perf: Rewrite core context handling") Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250122073356.1824736-1-luogengkun@huaweicloud.com
2025-02-24Merge tag 'kvm-riscv-fixes-6.14-1' of https://github.com/kvm-riscv/linux ↵Paolo Bonzini
into HEAD KVM/riscv fixes for 6.14, take #1 - Fix hart status check in SBI HSM extension - Fix hart suspend_type usage in SBI HSM extension - Fix error returned by SBI IPI and TIME extensions for unsupported function IDs - Fix suspend_type usage in SBI SUSP extension - Remove unnecessary vcpu kick after injecting interrupt via IMSIC guest file
2025-02-24Merge tag 'kvmarm-fixes-6.14-3' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.14, take #3 - Fix TCR_EL2 configuration to not use the ASID in TTBR1_EL2 and not mess-up T1SZ/PS by using the HCR_EL2.E2H==0 layout. - Bring back the VMID allocation to the vcpu_load phase, ensuring that we only setup VTTBR_EL2 once on VHE. This cures an ugly race that would lead to running with an unallocated VMID.
2025-02-24perf/core: Add RCU read lock protection to perf_iterate_ctx()Breno Leitao
The perf_iterate_ctx() function performs RCU list traversal but currently lacks RCU read lock protection. This causes lockdep warnings when running perf probe with unshare(1) under CONFIG_PROVE_RCU_LIST=y: WARNING: suspicious RCU usage kernel/events/core.c:8168 RCU-list traversed in non-reader section!! Call Trace: lockdep_rcu_suspicious ? perf_event_addr_filters_apply perf_iterate_ctx perf_event_exec begin_new_exec ? load_elf_phdrs load_elf_binary ? lock_acquire ? find_held_lock ? bprm_execve bprm_execve do_execveat_common.isra.0 __x64_sys_execve do_syscall_64 entry_SYSCALL_64_after_hwframe This protection was previously present but was removed in commit bd2756811766 ("perf: Rewrite core context handling"). Add back the necessary rcu_read_lock()/rcu_read_unlock() pair around perf_iterate_ctx() call in perf_event_exec(). [ mingo: Use scoped_guard() as suggested by Peter ] Fixes: bd2756811766 ("perf: Rewrite core context handling") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250117-fix_perf_rcu-v1-1-13cb9210fc6a@debian.org
2025-02-24KVM: selftests: Add a nested (forced) emulation intercept test for x86Sean Christopherson
Add a rudimentary test for validating KVM's handling of L1 hypervisor intercepts during instruction emulation on behalf of L2. To minimize complexity and avoid overlap with other tests, only validate KVM's handling of instructions that L1 wants to intercept, i.e. that generate a nested VM-Exit. Full testing of emulation on behalf of L2 is better achieved by running existing (forced) emulation tests in a VM, (although on VMX, getting L0 to emulate on #UD requires modifying either L1 KVM to not intercept #UD, or modifying L0 KVM to prioritize L0's exception intercepts over L1's intercepts, as is done by KVM for SVM). Since emulation should never be successful, i.e. L2 always exits to L1, dynamically generate the L2 code stream instead of adding a helper for each instruction. Doing so requires hand coding instruction opcodes, but makes it significantly easier for the test to compute the expected "next RIP" and instruction length. Link: https://lore.kernel.org/r/20250201015518.689704-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: nVMX: Synthesize nested VM-Exit for supported emulation interceptsSean Christopherson
When emulating an instruction on behalf of L2 that L1 wants to intercept, generate a nested VM-Exit instead of injecting a #UD into L2. Now that (most of) the necessary information is available, synthesizing a VM-Exit isn't terribly difficult. Punt on decoding the ModR/M for descriptor table exits for now. There is no evidence that any hypervisor intercepts descriptor table accesses *and* uses the EXIT_QUALIFICATION to expedite emulation, i.e. it's not worth delaying basic support for. To avoid doing more harm than good, e.g. by putting L2 into an infinite or effectively corrupting its code stream, inject #UD if the instruction length is nonsensical. Link: https://lore.kernel.org/r/20250201015518.689704-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: nVMX: Allow the caller to provide instruction length on nested VM-ExitSean Christopherson
Rework the nested VM-Exit helper to take the instruction length as a parameter, and convert nested_vmx_vmexit() into a "default" wrapper that grabs the length from vmcs02 as appropriate. This will allow KVM to set the correct instruction length when synthesizing a nested VM-Exit when emulating an instruction that L1 wants to intercept. No functional change intended, as the path to prepare_vmcs12()'s reading of vmcs02.VM_EXIT_INSTRUCTION_LEN is gated on the same set of conditions as the VMREAD in the new nested_vmx_vmexit(). Link: https://lore.kernel.org/r/20250201015518.689704-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: x86: Add a #define for the architectural max instruction lengthSean Christopherson
Add a #define to capture x86's architecturally defined max instruction length instead of open coding the literal in a variety of places. No functional change intended. Link: https://lore.kernel.org/r/20250201015518.689704-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: x86: Plumb the emulator's starting RIP into nested intercept checksSean Christopherson
When checking for intercept when emulating an instruction on behalf of L2, pass the emulator's view of the RIP of the instruction being emulated to vendor code. Unlike SVM, which communicates the next RIP on VM-Exit, VMX communicates the length of the instruction that generated the VM-Exit, i.e. requires the current and next RIPs. Note, unless userspace modifies RIP during a userspace exit that requires completion, kvm_rip_read() will contain the same information. Pass the emulator's view largely out of a paranoia, and because there is no meaningful cost in doing so. Link: https://lore.kernel.org/r/20250201015518.689704-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: x86: Plumb the src/dst operand types through to .check_intercept()Sean Christopherson
When checking for intercept when emulating an instruction on behalf of L2, forward the source and destination operand types to vendor code so that VMX can synthesize the correct EXIT_QUALIFICATION for port I/O VM-Exits. Link: https://lore.kernel.org/r/20250201015518.689704-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: nVMX: Consolidate missing X86EMUL_INTERCEPTED logic in L2 emulationSean Christopherson
Refactor the handling of port I/O interception checks when emulating on behalf of L2 in anticipation of synthesizing a nested VM-Exit to L1 instead of injecting a #UD into L2. No functional change intended. Link: https://lore.kernel.org/r/20250201015518.689704-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: nVMX: Emulate HLT in L2 if it's not interceptedSean Christopherson
Extend VMX's nested intercept logic for emulated instructions to handle HLT interception, primarily for testing purposes. Failure to allow emulation of HLT isn't all that interesting, as emulating HLT while L2 is active either requires forced emulation (and no #UD intercept in L1), TLB games in the guest to coerce KVM into emulating the wrong instruction, or a bug elsewhere in KVM. E.g. without commit 47ef3ef843c0 ("KVM: VMX: Handle event vectoring error in check_emulate_instruction()"), KVM can end up trying to emulate HLT if RIP happens to point at a HLT when a vectored event arrives with L2's IDT pointing at emulated MMIO. Note, vmx_check_intercept() is still broken when L1 wants to intercept an instruction, as KVM injects a #UD instead of synthesizing a nested VM-Exit. That issue extends far beyond HLT, punt on it for now. Link: https://lore.kernel.org/r/20250201015518.689704-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: nVMX: Allow emulating RDPID on behalf of L2Sean Christopherson
Return X86EMUL_CONTINUE instead X86EMUL_UNHANDLEABLE when emulating RDPID on behalf of L2 and L1 _does_ expose RDPID/RDTSCP to L2. When RDPID emulation was added by commit fb6d4d340e05 ("KVM: x86: emulate RDPID"), KVM incorrectly allowed emulation by default. Commit 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") fixed that flaw, but missed that RDPID emulation was relying on the common return path to allow emulation on behalf of L2. Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Link: https://lore.kernel.org/r/20250201015518.689704-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: nSVM: Pass next RIP, not current RIP, for nested VM-Exit on emulationSean Christopherson
Set "next_rip" in the emulation interception info passed to vendor code using the emulator context's "_eip", not "eip". "eip" holds RIP from the start of emulation, i.e. the RIP of the instruction that's being emulated, whereas _eip tracks the context's current position in decoding the code stream, which at the time of the intercept checks is effectively the RIP of the next instruction. Passing the current RIP as next_rip causes SVM to stuff the wrong value value into vmcb12->control.next_rip if a nested VM-Exit is generated, i.e. if L1 wants to intercept the instruction, and could result in L1 putting L2 into an infinite loop due to restarting L2 with the same RIP over and over. Fixes: 8a76d7f25f8f ("KVM: x86: Add x86 callback for intercept check") Link: https://lore.kernel.org/r/20250201015518.689704-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: nVMX: Check PAUSE_EXITING, not BUS_LOCK_DETECTION, on PAUSE emulationSean Christopherson
When emulating PAUSE on behalf of L2, check for interception in vmcs12 by looking at primary execution controls, not secondary execution controls. Checking for PAUSE_EXITING in secondary execution controls effectively results in KVM looking for BUS_LOCK_DETECTION, which KVM doesn't expose to L1, i.e. is always off in vmcs12, and ultimately results in KVM failing to "intercept" PAUSE. Because KVM doesn't handle interception during emulation correctly on VMX, i.e. the "fixed" code is still quite broken, and not intercepting PAUSE is relatively benign, for all intents and purposes the bug means that L2 gets to live when it would otherwise get an unexpected #UD. Fixes: 4984563823f0 ("KVM: nVMX: Emulate NOPs in L2, and PAUSE if it's not intercepted") Link: https://lore.kernel.org/r/20250201015518.689704-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: x86/xen: Move kvm_xen_hvm_config field into kvm_xenSean Christopherson
Now that all KVM usage of the Xen HVM config information is buried behind CONFIG_KVM_XEN=y, move the per-VM kvm_xen_hvm_config field out of kvm_arch and into kvm_xen. No functional change intended. Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250215011437.1203084-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: x86/xen: Bury xen_hvm_config behind CONFIG_KVM_XEN=ySean Christopherson
Now that all references to kvm_vcpu_arch.xen_hvm_config are wrapped with CONFIG_KVM_XEN #ifdefs, bury the field itself behind CONFIG_KVM_XEN=y. No functional change intended. Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250215011437.1203084-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: x86/xen: Consult kvm_xen_enabled when checking for Xen MSR writesSean Christopherson
Query kvm_xen_enabled when detecting writes to the Xen hypercall page MSR so that the check is optimized away in the likely scenario that Xen isn't enabled for the VM. Deliberately open code the check instead of using kvm_xen_msr_enabled() in order to avoid a double load of xen_hvm_config.msr (which is admittedly rather pointless given the widespread lack of READ_ONCE() usage on the plethora of vCPU-scoped accesses to kvm->arch.xen state). No functional change intended. Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250215011437.1203084-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: x86/xen: Add an #ifdef'd helper to detect writes to Xen MSRSean Christopherson
Add a helper to detect writes to the Xen hypercall page MSR, and provide a stub for CONFIG_KVM_XEN=n to optimize out the check for kernels built without Xen support. Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20250215011437.1203084-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24KVM: x86/xen: Restrict hypercall MSR to unofficial synthetic rangeSean Christopherson
Reject userspace attempts to set the Xen hypercall page MSR to an index outside of the "standard" virtualization range [0x40000000, 0x4fffffff], as KVM is not equipped to handle collisions with real MSRs, e.g. KVM doesn't update MSR interception, conflicts with VMCS/VMCB fields, special case writes in KVM, etc. While the MSR index isn't strictly ABI, i.e. can theoretically float to any value, in practice no known VMM sets the MSR index to anything other than 0x40000000 or 0x40000200. Cc: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250215011437.1203084-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24ASoC: es8328: fix route from DAC to outputNicolas Frattaroli
The ES8328 codec driver, which is also used for the ES8388 chip that appears to have an identical register map, claims that the output can either take the route from DAC->Mixer->Output or through DAC->Output directly. To the best of what I could find, this is not true, and creates problems. Without DACCONTROL17 bit index 7 set for the left channel, as well as DACCONTROL20 bit index 7 set for the right channel, I cannot get any analog audio out on Left Out 2 and Right Out 2 respectively, despite the DAPM routes claiming that this should be possible. Furthermore, the same is the case for Left Out 1 and Right Out 1, showing that those two don't have a direct route from DAC to output bypassing the mixer either. Those control bits toggle whether the DACs are fed (stale bread?) into their respective mixers. If one "unmutes" the mixer controls in alsamixer, then sure, the audio output works, but if it doesn't work without the mixer being fed the DAC input then evidently it's not a direct output from the DAC. ES8328/ES8388 are seemingly not alone in this. ES8323, which uses a separate driver for what appears to be a very similar register map, simply flips those two bits on in its probe function, and then pretends there is no power management whatsoever for the individual controls. Fair enough. My theory as to why nobody has noticed this up to this point is that everyone just assumes it's their fault when they had to unmute an additional control in ALSA. Fix this in the es8328 driver by removing the erroneous direct route, then get rid of the playback switch controls and have those bits tied to the mixer's widget instead, which until now had no register to play with. Fixes: 567e4f98922c ("ASoC: add es8328 codec driver") Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://patch.msgid.link/20250222-es8328-route-bludgeoning-v1-1-99bfb7fb22d9@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-24dm vdo: add missing spin_lock_initKen Raeburn
Signed-off-by: Ken Raeburn <raeburn@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org
2025-02-24nsfs: remove d_op->d_deleteChristian Brauner
Nsfs only deals with unhashed dentries and there's currently no way for them to become hashed. So remove d_op->d_delete. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-24pidfs: remove d_op->d_deleteChristian Brauner
Pidfs only deals with unhashed dentries and there's currently no way for them to become hashed. So remove d_op->d_delete. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-24net: dsa: rtl8366rb: Fix compilation problemLinus Walleij
When the kernel is compiled without LED framework support the rtl8366rb fails to build like this: rtl8366rb.o: in function `rtl8366rb_setup_led': rtl8366rb.c:953:(.text.unlikely.rtl8366rb_setup_led+0xe8): undefined reference to `led_init_default_state_get' rtl8366rb.c:980:(.text.unlikely.rtl8366rb_setup_led+0x240): undefined reference to `devm_led_classdev_register_ext' As this is constantly coming up in different randconfig builds, bite the bullet and create a separate file for the offending code, split out a header with all stuff needed both in the core driver and the leds code. Add a new bool Kconfig option for the LED compile target, such that it depends on LEDS_CLASS=y || LEDS_CLASS=RTL8366RB which make LED support always available when LEDS_CLASS is compiled into the kernel and enforce that if the LEDS_CLASS is a module, then the RTL8366RB driver needs to be a module as well so that modprobe can resolve the dependencies. Fixes: 32d617005475 ("net: dsa: realtek: add LED drivers for rtl8366rb") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502070525.xMUImayb-lkp@intel.com/ Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-23bcachefs: fix bch2_extent_ptr_eq()Kent Overstreet
Reviewed-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-23Linux 6.14-rc4Linus Torvalds
2025-02-23Merge tag 'i2c-for-6.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "Revert one cleanup which turned out to eat too much stack space" * tag 'i2c-for-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: core: Allocate temporary client dynamically
2025-02-23Merge tag 'edac_urgent_for_v6.14_rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Borislav Petkov: - Have qcom_edac use the correct interrupt enable register to configure the RAS interrupt lines * tag 'edac_urgent_for_v6.14_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/qcom: Correct interrupt enable register configuration
2025-02-23efivarfs: Defer PM notifier registration until .fill_superArd Biesheuvel
syzbot reports an issue that turns out to be caused by the fact that the efivarfs PM notifier may be invoked before the efivarfs_fs_info::sb field is populated, resulting in a NULL deference. So defer the registration until efivarfs_fill_super() is invoked. Reported-by: syzbot+00d13e505ef530a45100@syzkaller.appspotmail.com Tested-by: syzbot+00d13e505ef530a45100@syzkaller.appspotmail.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-23efi/cper: Fix cper_arm_ctx_info alignmentPatrick Rudolph
According to the UEFI Common Platform Error Record appendix, the processor context information structure is a variable length structure, but "is padded with zeros if the size is not a multiple of 16 bytes". Currently this isn't honoured, causing all but the first structure to be garbage when printed. Thus align the size to be a multiple of 16. Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-23efi/cper: Fix cper_ia_proc_ctx alignmentPatrick Rudolph
According to the UEFI Common Platform Error Record appendix, the IA32/X64 Processor Context Information Structure is a variable length structure, but "is padded with zeros if the size is not a multiple of 16 bytes". Currently this isn't honoured, causing all but the first structure to be garbage when printed. Thus align the size to be a multiple of 16. Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-23RDMA/bnxt_re: Fix the page details for the srq created by kernel consumersKashyap Desai
While using nvme target with use_srq on, below kernel panic is noticed. [ 549.698111] bnxt_en 0000:41:00.0 enp65s0np0: FEC autoneg off encoding: Clause 91 RS(544,514) [ 566.393619] Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI .. [ 566.393799] <TASK> [ 566.393807] ? __die_body+0x1a/0x60 [ 566.393823] ? die+0x38/0x60 [ 566.393835] ? do_trap+0xe4/0x110 [ 566.393847] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393867] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393881] ? do_error_trap+0x7c/0x120 [ 566.393890] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393911] ? exc_divide_error+0x34/0x50 [ 566.393923] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393939] ? asm_exc_divide_error+0x16/0x20 [ 566.393966] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393997] bnxt_qplib_create_srq+0xc9/0x340 [bnxt_re] [ 566.394040] bnxt_re_create_srq+0x335/0x3b0 [bnxt_re] [ 566.394057] ? srso_return_thunk+0x5/0x5f [ 566.394068] ? __init_swait_queue_head+0x4a/0x60 [ 566.394090] ib_create_srq_user+0xa7/0x150 [ib_core] [ 566.394147] nvmet_rdma_queue_connect+0x7d0/0xbe0 [nvmet_rdma] [ 566.394174] ? lock_release+0x22c/0x3f0 [ 566.394187] ? srso_return_thunk+0x5/0x5f Page size and shift info is set only for the user space SRQs. Set page size and page shift for kernel space SRQs also. Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1740237621-29291-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-23RDMA/mlx5: Fix bind QP error cleanup flowPatrisious Haddad
When there is a failure during bind QP, the cleanup flow destroys the counter regardless if it is the one that created it or not, which is problematic since if it isn't the one that created it, that counter could still be in use. Fix that by destroying the counter only if it was created during this call. Fixes: 45842fc627c7 ("IB/mlx5: Support statistic q counter configuration") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://patch.msgid.link/25dfefddb0ebefa668c32e06a94d84e3216257cf.1740033937.git.leon@kernel.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-22Merge tag 'v6.14-rc3-smb3-client-fix-part2' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull smb client fix from Steve French: - Fix potential null pointer dereference * tag 'v6.14-rc3-smb3-client-fix-part2' of git://git.samba.org/sfrench/cifs-2.6: smb: client: Add check for next_buffer in receive_encrypted_standard()