summaryrefslogtreecommitdiff
path: root/arch/riscv/kvm
AgeCommit message (Collapse)Author
2025-05-24RISC-V: KVM: lock the correct mp_state during resetRadim Krčmář
Currently, the kvm_riscv_vcpu_sbi_system_reset() function locks vcpu->arch.mp_state_lock when updating tmp->arch.mp_state.mp_state which is incorrect hence fix it. Fixes: 2121cadec45a ("RISCV: KVM: Introduce mp_state_lock to avoid lock inversion") Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250523104725.2894546-4-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESETRadim Krčmář
Add a toggleable VM capability to reset the VCPU from userspace by setting MP_STATE_INIT_RECEIVED through IOCTL. Reset through a mp_state to avoid adding a new IOCTL. Do not reset on a transition from STOPPED to RUNNABLE, because it's better to avoid side effects that would complicate userspace adoption. The MP_STATE_INIT_RECEIVED is not a permanent mp_state -- IOCTL resets the VCPU while preserving the original mp_state -- because we wouldn't gain much from having a new state it in the rest of KVM, but it's a very non-standard use of the IOCTL. Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250515143723.2450630-5-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21RISC-V: KVM: Remove scounteren initializationAtish Patra
Scounteren CSR controls the direct access the hpmcounters and cycle/ instret/time from the userspace. It's the supervisor's responsibility to set it up correctly for it's user space. They hypervisor doesn't need to decide the policy on behalf of the supervisor. Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250515-fix_scounteren_vs-v3-1-729dc088943e@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21KVM: RISC-V: remove unnecessary SBI reset stateRadim Krčmář
The SBI reset state has only two variables -- pc and a1. The rest is known, so keep only the necessary information. The reset structures make sense if we want userspace to control the reset state (which we do), but I'd still remove them now and reintroduce with the userspace interface later -- we could probably have just a single reset state per VM, instead of a reset state for each VCPU. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Link: https://lore.kernel.org/r/20250403112522.1566629-6-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21KVM: RISC-V: refactor sbi reset requestRadim Krčmář
The same code is used twice and SBI reset sets only two variables. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Link: https://lore.kernel.org/r/20250403112522.1566629-5-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21KVM: RISC-V: refactor vector state resetRadim Krčmář
Do not depend on the reset structures. vector.datap is a kernel memory pointer that needs to be preserved as it is not a part of the guest vector data. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Link: https://lore.kernel.org/r/20250403112522.1566629-4-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21RISC-V: KVM: Remove experimental tag for RISC-VAtish Patra
RISC-V KVM port is no longer experimental. Let's remove it to avoid confusion. Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250505-kvm_tag_change-v1-1-6dbf6af240af@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-01KVM: RISC-V: reset smstateen CSRsRadim Krčmář
Not resetting smstateen is a potential security hole, because VU might be able to access state that VS does not properly context-switch. Fixes: 81f0f314fec9 ("RISCV: KVM: Add sstateen0 context save/restore") Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Link: https://lore.kernel.org/r/20250403112522.1566629-8-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-04-04Merge tag 'riscv-for-linus-6.15-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - The sub-architecture selection Kconfig system has been cleaned up, the documentation has been improved, and various detections have been fixed - The vector-related extensions dependencies are now validated when parsing from device tree and in the DT bindings - Misaligned access probing can be overridden via a kernel command-line parameter, along with various fixes to misalign access handling - Support for relocatable !MMU kernels builds - Support for hpge pfnmaps, which should improve TLB utilization - Support for runtime constants, which improves the d_hash() performance - Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm - Various fixes, including: - We were missing a secondary mmu notifier call when flushing the tlb which is required for IOMMU - Fix ftrace panics by saving the registers as expected by ftrace - Fix a couple of stimecmp usage related to cpu hotplug - purgatory_start is now aligned as per the STVEC requirements - A fix for hugetlb when calculating the size of non-present PTEs * tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (65 commits) riscv: Add norvc after .option arch in runtime const riscv: Make sure toolchain supports zba before using zba instructions riscv/purgatory: 4B align purgatory_start riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator selftests: riscv: fix v_exec_initval_nolibc.c riscv: Fix hugetlb retrieval of number of ptes in case of !present pte riscv: print hartid on bringup riscv: Add norvc after .option arch in runtime const riscv: Remove CONFIG_PAGE_OFFSET riscv: Support CONFIG_RELOCATABLE on riscv32 asm-generic: Always define Elf_Rel and Elf_Rela riscv: Support CONFIG_RELOCATABLE on NOMMU riscv: Allow NOMMU kernels to access all of RAM riscv: Remove duplicate CONFIG_PAGE_OFFSET definition RISC-V: errata: Use medany for relocatable builds dt-bindings: riscv: document vector crypto requirements dt-bindings: riscv: add vector sub-extension dependencies dt-bindings: riscv: d requires f RISC-V: add f & d extension validation checks RISC-V: add vector crypto extension validation checks ...
2025-03-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Nested virtualization support for VGICv3, giving the nested hypervisor control of the VGIC hardware when running an L2 VM - Removal of 'late' nested virtualization feature register masking, making the supported feature set directly visible to userspace - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers - Paravirtual interface for discovering the set of CPU implementations where a VM may run, addressing a longstanding issue of guest CPU errata awareness in big-little systems and cross-implementation VM migration - Userspace control of the registers responsible for identifying a particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1), allowing VMs to be migrated cross-implementation - pKVM updates, including support for tracking stage-2 page table allocations in the protected hypervisor in the 'SecPageTable' stat - Fixes to vPMU, ensuring that userspace updates to the vPMU after KVM_RUN are reflected into the backing perf events LoongArch: - Remove unnecessary header include path - Assume constant PGD during VM context switch - Add perf events support for guest VM RISC-V: - Disable the kernel perf counter during configure - KVM selftests improvements for PMU - Fix warning at the time of KVM module removal x86: - Add support for aging of SPTEs without holding mmu_lock. Not taking mmu_lock allows multiple aging actions to run in parallel, and more importantly avoids stalling vCPUs. This includes an implementation of per-rmap-entry locking; aging the gfn is done with only a per-rmap single-bin spinlock taken, whereas locking an rmap for write requires taking both the per-rmap spinlock and the mmu_lock. Note that this decreases slightly the accuracy of accessed-page information, because changes to the SPTE outside aging might not use atomic operations even if they could race against a clear of the Accessed bit. This is deliberate because KVM and mm/ tolerate false positives/negatives for accessed information, and testing has shown that reducing the latency of aging is far more beneficial to overall system performance than providing "perfect" young/old information. - Defer runtime CPUID updates until KVM emulates a CPUID instruction, to coalesce updates when multiple pieces of vCPU state are changing, e.g. as part of a nested transition - Fix a variety of nested emulation bugs, and add VMX support for synthesizing nested VM-Exit on interception (instead of injecting #UD into L2) - Drop "support" for async page faults for protected guests that do not set SEND_ALWAYS (i.e. that only want async page faults at CPL3) - Bring a bit of sanity to x86's VM teardown code, which has accumulated a lot of cruft over the years. Particularly, destroy vCPUs before the MMU, despite the latter being a VM-wide operation - Add common secure TSC infrastructure for use within SNP and in the future TDX - Block KVM_CAP_SYNC_REGS if guest state is protected. It does not make sense to use the capability if the relevant registers are not available for reading or writing - Don't take kvm->lock when iterating over vCPUs in the suspend notifier to fix a largely theoretical deadlock - Use the vCPU's actual Xen PV clock information when starting the Xen timer, as the cached state in arch.hv_clock can be stale/bogus - Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend notifier only accounts for kvmclock, and there's no evidence that the flag is actually supported by Xen guests - Clean up the per-vCPU "cache" of its reference pvclock, and instead only track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately expensive to compute, and rarely changes for modern setups) - Don't write to the Xen hypercall page on MSR writes that are initiated by the host (userspace or KVM) to fix a class of bugs where KVM can write to guest memory at unexpected times, e.g. during vCPU creation if userspace has set the Xen hypercall MSR index to collide with an MSR that KVM emulates - Restrict the Xen hypercall MSR index to the unofficial synthetic range to reduce the set of possible collisions with MSRs that are emulated by KVM (collisions can still happen as KVM emulates Hyper-V MSRs, which also reside in the synthetic range) - Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config - Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID entries when updating PV clocks; there is no guarantee PV clocks will be updated between TSC frequency changes and CPUID emulation, and guest reads of the TSC leaves should be rare, i.e. are not a hot path x86 (Intel): - Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1 - Pass XFD_ERR as the payload when injecting #NM, as a preparatory step for upcoming FRED virtualization support - Decouple the EPT entry RWX protection bit macros from the EPT Violation bits, both as a general cleanup and in anticipation of adding support for emulating Mode-Based Execution Control (MBEC) - Reject KVM_RUN if userspace manages to gain control and stuff invalid guest state while KVM is in the middle of emulating nested VM-Enter - Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs in anticipation of adding sanity checks for secondary exit controls (the primary field is out of bits) x86 (AMD): - Ensure the PSP driver is initialized when both the PSP and KVM modules are built-in (the initcall framework doesn't handle dependencies) - Use long-term pins when registering encrypted memory regions, so that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to excessive fragmentation - Add macros and helpers for setting GHCB return/error codes - Add support for Idle HLT interception, which elides interception if the vCPU has a pending, unmasked virtual IRQ when HLT is executed - Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical address - Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g. because the vCPU was "destroyed" via SNP's AP Creation hypercall - Reject SNP AP Creation if the requested SEV features for the vCPU don't match the VM's configured set of features Selftests: - Fix again the Intel PMU counters test; add a data load and do CLFLUSH{OPT} on the data instead of executing code. The theory is that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters - Fix a flaw in the Intel PMU counters test where it asserts that an event is counting correctly without actually knowing what the event counts on the underlying hardware - Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and improve its coverage by collecting all dirty entries on each iteration - Fix a few minor bugs related to handling of stats FDs - Add infrastructure to make vCPU and VM stats FDs available to tests by default (open the FDs during VM/vCPU creation) - Relax an assertion on the number of HLT exits in the xAPIC IPI test when running on a CPU that supports AMD's Idle HLT (which elides interception of HLT if a virtual IRQ is pending and unmasked)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits) RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed RISC-V: KVM: Teardown riscv specific bits after kvm_exit LoongArch: KVM: Register perf callbacks for guest LoongArch: KVM: Implement arch-specific functions for guest perf LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel() LoongArch: KVM: Remove PGD saving during VM context switch LoongArch: KVM: Remove unnecessary header include path KVM: arm64: Tear down vGIC on failed vCPU creation KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected KVM: x86: Add infrastructure for secure TSC KVM: x86: Push down setting vcpu.arch.user_set_tsc ...
2025-03-25Merge tag 'timers-cleanups-2025-03-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide hrtimer timer cleanup hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member" * tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) wifi: rt2x00: Switch to use hrtimer_update_function() io_uring: Use helper function hrtimer_update_function() serial: xilinx_uartps: Use helper function hrtimer_update_function() ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup() RDMA: Switch to use hrtimer_setup() virtio: mem: Switch to use hrtimer_setup() drm/vmwgfx: Switch to use hrtimer_setup() drm/xe/oa: Switch to use hrtimer_setup() drm/vkms: Switch to use hrtimer_setup() drm/msm: Switch to use hrtimer_setup() drm/i915/request: Switch to use hrtimer_setup() drm/i915/uncore: Switch to use hrtimer_setup() drm/i915/pmu: Switch to use hrtimer_setup() drm/i915/perf: Switch to use hrtimer_setup() drm/i915/gvt: Switch to use hrtimer_setup() drm/i915/huc: Switch to use hrtimer_setup() drm/amdgpu: Switch to use hrtimer_setup() stm class: heartbeat: Switch to use hrtimer_setup() i2c: Switch to use hrtimer_setup() iio: Switch to use hrtimer_setup() ...
2025-03-20RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowedChao Du
The comments for EXT_SVADE are a bit confusing. Clarify it. Signed-off-by: Chao Du <duchao@eswincomputing.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250221025929.31678-1-duchao@eswincomputing.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-03-19RISC-V: KVM: Teardown riscv specific bits after kvm_exitAtish Patra
During a module removal, kvm_exit invokes arch specific disable call which disables AIA. However, we invoke aia_exit before kvm_exit resulting in the following warning. KVM kernel module can't be inserted afterwards due to inconsistent state of IRQ. [25469.031389] percpu IRQ 31 still enabled on CPU0! [25469.031732] WARNING: CPU: 3 PID: 943 at kernel/irq/manage.c:2476 __free_percpu_irq+0xa2/0x150 [25469.031804] Modules linked in: kvm(-) [25469.031848] CPU: 3 UID: 0 PID: 943 Comm: rmmod Not tainted 6.14.0-rc5-06947-g91c763118f47-dirty #2 [25469.031905] Hardware name: riscv-virtio,qemu (DT) [25469.031928] epc : __free_percpu_irq+0xa2/0x150 [25469.031976] ra : __free_percpu_irq+0xa2/0x150 [25469.032197] epc : ffffffff8007db1e ra : ffffffff8007db1e sp : ff2000000088bd50 [25469.032241] gp : ffffffff8131cef8 tp : ff60000080b96400 t0 : ff2000000088baf8 [25469.032285] t1 : fffffffffffffffc t2 : 5249207570637265 s0 : ff2000000088bd90 [25469.032329] s1 : ff60000098b21080 a0 : 037d527a15eb4f00 a1 : 037d527a15eb4f00 [25469.032372] a2 : 0000000000000023 a3 : 0000000000000001 a4 : ffffffff8122dbf8 [25469.032410] a5 : 0000000000000fff a6 : 0000000000000000 a7 : ffffffff8122dc10 [25469.032448] s2 : ff60000080c22eb0 s3 : 0000000200000022 s4 : 000000000000001f [25469.032488] s5 : ff60000080c22e00 s6 : ffffffff80c351c0 s7 : 0000000000000000 [25469.032582] s8 : 0000000000000003 s9 : 000055556b7fb490 s10: 00007ffff0e12fa0 [25469.032621] s11: 00007ffff0e13e9a t3 : ffffffff81354ac7 t4 : ffffffff81354ac7 [25469.032664] t5 : ffffffff81354ac8 t6 : ffffffff81354ac7 [25469.032698] status: 0000000200000100 badaddr: ffffffff8007db1e cause: 0000000000000003 [25469.032738] [<ffffffff8007db1e>] __free_percpu_irq+0xa2/0x150 [25469.032797] [<ffffffff8007dbfc>] free_percpu_irq+0x30/0x5e [25469.032856] [<ffffffff013a57dc>] kvm_riscv_aia_exit+0x40/0x42 [kvm] [25469.033947] [<ffffffff013b4e82>] cleanup_module+0x10/0x32 [kvm] [25469.035300] [<ffffffff8009b150>] __riscv_sys_delete_module+0x18e/0x1fc [25469.035374] [<ffffffff8000c1ca>] syscall_handler+0x3a/0x46 [25469.035456] [<ffffffff809ec9a4>] do_trap_ecall_u+0x72/0x134 [25469.035536] [<ffffffff809f5e18>] handle_exception+0x148/0x156 Invoke aia_exit and other arch specific cleanup functions after kvm_exit so that disable gets a chance to be called first before exit. Fixes: 54e43320c2ba ("RISC-V: KVM: Initial skeletal support for AIA") Fixes: eded6754f398 ("riscv: KVM: add basic support for host vs guest profiling") Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250317-kvm_exit_fix-v1-1-aa5240c5dbd2@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-03-19RISC-V: KVM: Allow Zaamo/Zalrsc extensions for Guest/VMClément Léger
Extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Zaamo/Zalrsc extensions for Guest/VM. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240619153913.867263-5-cleger@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-06RISC-V: KVM: Disable the kernel perf counter during configureAtish Patra
The perf event should be marked disabled during the creation as it is not ready to be scheduled until there is SBI PMU start call or config matching is called with auto start. Otherwise, event add/start gets called during perf_event_create_kernel_counter function. It will be enabled and scheduled to run via perf_event_enable during either the above mentioned scenario. Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling") Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-1-41d177e45929@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-02-21riscv: KVM: Remove unnecessary vcpu kickBillXiang
Remove the unnecessary kick to the vCPU after writing to the vs_file of IMSIC in kvm_riscv_vcpu_aia_imsic_inject. For vCPUs that are running, writing to the vs_file directly forwards the interrupt as an MSI to them and does not need an extra kick. For vCPUs that are descheduled after emulating WFI, KVM will enable the guest external interrupt for that vCPU in kvm_riscv_aia_wakeon_hgei. This means that writing to the vs_file will cause a guest external interrupt, which will cause KVM to wake up the vCPU in hgei_interrupt to handle the interrupt properly. Signed-off-by: BillXiang <xiangwencheng@lanxincomputing.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Radim Krčmář <rkrcmar@ventanamicro.com> Link: https://lore.kernel.org/r/20250221104538.2147-1-xiangwencheng@lanxincomputing.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-02-18riscv: kvm: Switch to use hrtimer_setup()Nam Cao
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely. Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism. Patch was created by using Coccinelle. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/d5ededf778f59f2fc38ff4276fb7f4c893e4142c.1738746821.git.namcao@linutronix.de
2025-02-17riscv: KVM: Fix SBI sleep_type useAndrew Jones
The spec says sleep_type is 32 bits wide and "In case the data is defined as 32bit wide, higher privilege software must ensure that it only uses 32 bit data." Mask off upper bits of sleep_type before using it. Fixes: 023c15151fbb ("RISC-V: KVM: Add SBI system suspend support") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-12-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-02-17riscv: KVM: Fix SBI TIME error generationAndrew Jones
When an invalid function ID of an SBI extension is used we should return not-supported, not invalid-param. Fixes: 5f862df5585c ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-11-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-02-17riscv: KVM: Fix SBI IPI error generationAndrew Jones
When an invalid function ID of an SBI extension is used we should return not-supported, not invalid-param. Also, when we see that at least one hartid constructed from the base and mask parameters is invalid, then we should return invalid-param. Finally, rather than relying on overflowing a left shift to result in zero and then using that zero in a condition which [correctly] skips sending an IPI (but loops unnecessarily), explicitly check for overflow and exit the loop immediately. Fixes: 5f862df5585c ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-10-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-02-17riscv: KVM: Fix hart suspend_type useAndrew Jones
The spec says suspend_type is 32 bits wide and "In case the data is defined as 32bit wide, higher privilege software must ensure that it only uses 32 bit data." Mask off upper bits of suspend_type before using it. Fixes: 763c8bed8c05 ("RISC-V: KVM: Implement SBI HSM suspend call") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-9-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-02-17riscv: KVM: Fix hart suspend status checkAndrew Jones
"Not stopped" means started or suspended so we need to check for a single state in order to have a chance to check for each state. Also, we need to use target_vcpu when checking for the suspend state. Fixes: 763c8bed8c05 ("RISC-V: KVM: Implement SBI HSM suspend call") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-8-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-12-30RISC-V: KVM: Add new exit statstics for redirected trapsAtish Patra
Currently, kvm doesn't delegate the few traps such as misaligned load/store, illegal instruction and load/store access faults because it is not expected to occur in the guest very frequently. Thus, kvm gets a chance to act upon it or collect statistics about it before redirecting the traps to the guest. Collect both guest and host visible statistics during the traps. Enable them so that both guest and host can collect the stats about them if required. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241224-kvm_guest_stat-v2-3-08a77ac36b02@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-12-30RISC-V: KVM: Update firmware counters for various eventsAtish Patra
SBI PMU specification defines few firmware counters which can be used by the guests to collect the statstics about various traps occurred in the host. Update these counters whenever a corresponding trap is taken Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241224-kvm_guest_stat-v2-2-08a77ac36b02@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-12-30RISC-V: KVM: Redirect instruction access fault trap to guestQuan Zhou
The M-mode redirects an unhandled instruction access fault trap back to S-mode when not delegating it to VS-mode(hedeleg). However, KVM running in HS-mode terminates the VS-mode software when back from M-mode. The KVM should redirect the trap back to VS-mode, and let VS-mode trap handler decide the next step. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241224-kvm_guest_stat-v2-1-08a77ac36b02@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-12-30RISC-V: KVM: Allow Ziccrse extension for Guest/VMQuan Zhou
Extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Ziccrse extension for Guest/VM. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/d10e746d165074174f830aa3d89bf3c92017acee.1732854096.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2024-12-30RISC-V: KVM: Allow Zabha extension for Guest/VMQuan Zhou
Extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Zabha extension for Guest/VM. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/4074feb27819e23bab05b0fd6441a38bf0b6a5e2.1732854096.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2024-12-30RISC-V: KVM: Allow Svvptc extension for Guest/VMQuan Zhou
Extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Svvptc extension for Guest/VM. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/133509ffe5783b62cf95e8f675cc3e327bee402e.1732854096.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2024-12-30RISC-V: KVM: Add SBI system suspend supportAndrew Jones
Implement a KVM SBI SUSP extension handler. The handler only validates the system suspend entry criteria and prepares for resuming in the appropriate state at the resume_addr (as specified by the SBI spec), but then it forwards the call to the VMM where any system suspend behavior may be implemented. Since VMM support is needed, KVM disables the extension by default. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20241017074538.18867-5-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-12-06RISC-V: KVM: Fix csr_write -> csr_set for HVIEN PMU overflow bitMichael Neuling
This doesn't cause a problem currently as HVIEN isn't used elsewhere yet. Found by inspection. Signed-off-by: Michael Neuling <michaelneuling@tenstorrent.com> Fixes: 16b0bde9a37c ("RISC-V: KVM: Add perf sampling support for guests") Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20241127041840.419940-1-michaelneuling@tenstorrent.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-11-27Merge tag 'kvm-riscv-6.13-2' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 6.13 part #2 - Svade and Svadu extension support for Host and Guest/VM
2024-11-27Merge tag 'riscv-for-linus-6.13-mw1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux into HEAD RISC-V Paches for the 6.13 Merge Window, Part 1 * Support for pointer masking in userspace, * Support for probing vector misaligned access performance. * Support for qspinlock on systems with Zacas and Zabha.
2024-11-21RISC-V: KVM: Add Svade and Svadu Extensions Support for Guest/VMYong-Xuan Wang
We extend the KVM ISA extension ONE_REG interface to allow VMM tools to detect and enable Svade and Svadu extensions for Guest/VM. Since the henvcfg.ADUE is read-only zero if the menvcfg.ADUE is zero, the Svadu extension is available for Guest/VM and the Svade extension is allowed to disabledonly when arch_has_hw_pte_young() is true. Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20240726084931.28924-4-yongxuan.wang@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-11-11Merge patch series "Zacas/Zabha support and qspinlocks"Palmer Dabbelt
Alexandre Ghiti <alexghiti@rivosinc.com> says: This implements [cmp]xchgXX() macros using Zacas and Zabha extensions and finally uses those newly introduced macros to add support for qspinlocks: note that this implementation of qspinlocks satisfies the forward progress guarantee. It also uses Ziccrse to provide the qspinlock implementation. Thanks to Guo and Leonardo for their work! * b4-shazam-merge: (1314 commits) riscv: Add qspinlock support dt-bindings: riscv: Add Ziccrse ISA extension description riscv: Add ISA extension parsing for Ziccrse asm-generic: ticket-lock: Add separate ticket-lock.h asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock riscv: Implement xchg8/16() using Zabha riscv: Implement arch_cmpxchg128() using Zacas riscv: Improve zacas fully-ordered cmpxchg() riscv: Implement cmpxchg8/16() using Zabha dt-bindings: riscv: Add Zabha ISA extension description riscv: Implement cmpxchg32/64() using Zacas riscv: Do not fail to build on byte/halfword operations with Zawrs riscv: Move cpufeature.h macros into their own header Link: https://lore.kernel.org/r/20241103145153.105097-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-11-08Merge tag 'kvm-riscv-6.13-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 6.13 - Accelerate KVM RISC-V when running as a guest - Perf support to collect KVM guest statistics from host side
2024-11-05riscv: kvm: Fix out-of-bounds array accessBjörn Töpel
In kvm_riscv_vcpu_sbi_init() the entry->ext_idx can contain an out-of-bound index. This is used as a special marker for the base extensions, that cannot be disabled. However, when traversing the extensions, that special marker is not checked prior indexing the array. Add an out-of-bounds check to the function. Fixes: 56d8a385b605 ("RISC-V: KVM: Allow some SBI extensions to be disabled by default") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20241104191503.74725-1-bjorn@kernel.org Signed-off-by: Anup Patel <anup@brainfault.org>
2024-11-05RISC-V: KVM: Fix APLIC in_clrip and clripnum write emulationYong-Xuan Wang
In the section "4.7 Precise effects on interrupt-pending bits" of the RISC-V AIA specification defines that: "If the source mode is Level1 or Level0 and the interrupt domain is configured in MSI delivery mode (domaincfg.DM = 1): The pending bit is cleared whenever the rectified input value is low, when the interrupt is forwarded by MSI, or by a relevant write to an in_clrip register or to clripnum." Update the aplic_write_pending() to match the spec. Fixes: d8dd9f113e16 ("RISC-V: KVM: Fix APLIC setipnum_le/be write emulation") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20241029085542.30541-1-yongxuan.wang@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Use NACL HFENCEs for KVM request based HFENCEsAnup Patel
When running under some other hypervisor, use SBI NACL based HFENCEs for TLB shoot-down via KVM requests. This makes HFENCEs faster whenever SBI nested acceleration is available. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-14-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Save trap CSRs in kvm_riscv_vcpu_enter_exit()Anup Patel
Save trap CSRs in the kvm_riscv_vcpu_enter_exit() function instead of the kvm_arch_vcpu_ioctl_run() function so that HTVAL and HTINST CSRs are accessed in more optimized manner while running under some other hypervisor. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-13-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Use SBI sync SRET call when availableAnup Patel
Implement an optimized KVM world-switch using SBI sync SRET call when SBI nested acceleration extension is available. This improves KVM world-switch when KVM RISC-V is running as a Guest under some other hypervisor. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-12-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Use nacl_csr_xyz() for accessing AIA CSRsAnup Patel
When running under some other hypervisor, prefer nacl_csr_xyz() for accessing AIA CSRs in the run-loop. This makes CSR access faster whenever SBI nested acceleration is available. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-11-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Use nacl_csr_xyz() for accessing H-extension CSRsAnup Patel
When running under some other hypervisor, prefer nacl_csr_xyz() for accessing H-extension CSRs in the run-loop. This makes CSR access faster whenever SBI nested acceleration is available. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-10-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Add common nested acceleration supportAnup Patel
Add a common nested acceleration support which will be shared by all parts of KVM RISC-V. This nested acceleration support detects and enables SBI NACL extension usage based on static keys which ensures minimum impact on the non-nested scenario. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-9-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Don't setup SGEI for zero guest external interruptsAnup Patel
No need to setup SGEI local interrupt when there are zero guest external interrupts (i.e. zero HW IMSIC guest files). Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-7-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Replace aia_set_hvictl() with aia_hvictl_value()Anup Patel
The aia_set_hvictl() internally writes the HVICTL CSR which makes it difficult to optimize the CSR write using SBI NACL extension for kvm_riscv_vcpu_aia_update_hvip() function so replace aia_set_hvictl() with new aia_hvictl_value() which only computes the HVICTL value. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-6-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Break down the __kvm_riscv_switch_to() into macrosAnup Patel
Break down the __kvm_riscv_switch_to() function into macros so that these macros can be later re-used by SBI NACL extension based low-level switch function. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-5-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Save/restore SCOUNTEREN in C sourceAnup Patel
The SCOUNTEREN CSR need not be saved/restored in the low-level __kvm_riscv_switch_to() function hence move the SCOUNTEREN CSR save/restore to the kvm_riscv_vcpu_swap_in_guest_state() and kvm_riscv_vcpu_swap_in_host_state() functions in C sources. Also, re-arrange the CSR save/restore and related GPR usage in the low-level __kvm_riscv_switch_to() low-level function. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-4-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Save/restore HSTATUS in C sourceAnup Patel
We will be optimizing HSTATUS CSR access via shared memory setup using the SBI nested acceleration extension. To facilitate this, we first move HSTATUS save/restore in kvm_riscv_vcpu_enter_exit(). Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-3-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Order the object files alphabeticallyAnup Patel
Order the object files alphabetically in the Makefile so that it is very predictable inserting new object files in the future. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-2-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28riscv: KVM: add basic support for host vs guest profilingQuan Zhou
For the information collected on the host side, we need to identify which data originates from the guest and record these events separately, this can be achieved by having KVM register perf callbacks. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/00342d535311eb0629b9ba4f1e457a48e2abee33.1728957131.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>