summaryrefslogtreecommitdiff
path: root/arch/riscv/include/asm/kvm_host.h
AgeCommit message (Collapse)Author
2025-07-11RISC-V: KVM: Move HGEI[E|P] CSR access to IMSIC virtualizationAnup Patel
Currently, the common AIA functions kvm_riscv_vcpu_aia_has_interrupts() and kvm_riscv_aia_wakeon_hgei() lookup HGEI line using an array of VCPU pointers before accessing HGEI[E|P] CSR which is slow and prone to race conditions because there is a separate per-hart lock for the VCPU pointer array and a separate per-VCPU rwlock for IMSIC VS-file (including HGEI line) used by the VCPU. Due to these race conditions, it is observed on QEMU RISC-V host that Guest VCPUs sleep in WFI and never wakeup even with interrupt pending in the IMSIC VS-file because VCPUs were waiting for HGEI wakeup on the wrong host CPU. The IMSIC virtualization already keeps track of the HGEI line and the associated IMSIC VS-file used by each VCPU so move the HGEI[E|P] CSR access to IMSIC virtualization so that costly HGEI line lookup can be avoided and likelihood of race-conditions when updating HGEI[E|P] CSR is also reduced. Reviewed-by: Atish Patra <atishp@rivosinc.com> Tested-by: Atish Patra <atishp@rivosinc.com> Tested-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Fixes: 3385339296d1 ("RISC-V: KVM: Use IMSIC guest files when available") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Link: https://lore.kernel.org/r/20250707035345.17494-3-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESETRadim Krčmář
Add a toggleable VM capability to reset the VCPU from userspace by setting MP_STATE_INIT_RECEIVED through IOCTL. Reset through a mp_state to avoid adding a new IOCTL. Do not reset on a transition from STOPPED to RUNNABLE, because it's better to avoid side effects that would complicate userspace adoption. The MP_STATE_INIT_RECEIVED is not a permanent mp_state -- IOCTL resets the VCPU while preserving the original mp_state -- because we wouldn't gain much from having a new state it in the rest of KVM, but it's a very non-standard use of the IOCTL. Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250515143723.2450630-5-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21KVM: RISC-V: remove unnecessary SBI reset stateRadim Krčmář
The SBI reset state has only two variables -- pc and a1. The rest is known, so keep only the necessary information. The reset structures make sense if we want userspace to control the reset state (which we do), but I'd still remove them now and reintroduce with the userspace interface later -- we could probably have just a single reset state per VM, instead of a reset state for each VCPU. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Link: https://lore.kernel.org/r/20250403112522.1566629-6-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-02-26KVM: Drop kvm_arch_sync_events() now that all implementations are nopsSean Christopherson
Remove kvm_arch_sync_events() now that x86 no longer uses it (no other arch has ever used it). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Bibo Mao <maobibo@loongson.cn> Message-ID: <20250224235542.2562848-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-30RISC-V: KVM: Add new exit statstics for redirected trapsAtish Patra
Currently, kvm doesn't delegate the few traps such as misaligned load/store, illegal instruction and load/store access faults because it is not expected to occur in the guest very frequently. Thus, kvm gets a chance to act upon it or collect statistics about it before redirecting the traps to the guest. Collect both guest and host visible statistics during the traps. Enable them so that both guest and host can collect the stats about them if required. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241224-kvm_guest_stat-v2-3-08a77ac36b02@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28riscv: KVM: add basic support for host vs guest profilingQuan Zhou
For the information collected on the host side, we need to identify which data originates from the guest and record these events separately, this can be achieved by having KVM register perf callbacks. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/00342d535311eb0629b9ba4f1e457a48e2abee33.1728957131.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2024-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...
2024-07-12Merge patch series "riscv: Apply Zawrs when available"Palmer Dabbelt
Andrew Jones <ajones@ventanamicro.com> says: Zawrs provides two instructions (wrs.nto and wrs.sto), where both are meant to allow the hart to enter a low-power state while waiting on a store to a memory location. The instructions also both wait an implementation-defined "short" duration (unless the implementation terminates the stall for another reason). The difference is that while wrs.sto will terminate when the duration elapses, wrs.nto, depending on configuration, will either just keep waiting or an ILL exception will be raised. Linux will use wrs.nto, so if platforms have an implementation which falls in the "just keep waiting" category (which is not expected), then it should _not_ advertise Zawrs in the hardware description. Like wfi (and with the same {m,h}status bits to configure it), when wrs.nto is configured to raise exceptions it's expected that the higher privilege level will see the instruction was a wait instruction, do something, and then resume execution following the instruction. For example, KVM does configure exceptions for wfi (hstatus.VTW=1) and therefore also for wrs.nto. KVM does this for wfi since it's better to allow other tasks to be scheduled while a VCPU waits for an interrupt. For waits such as those where wrs.nto/sto would be used, which are typically locks, it is also a good idea for KVM to be involved, as it can attempt to schedule the lock holding VCPU. This series starts with Christoph's addition of the riscv smp_cond_load_relaxed function which applies wrs.sto when available. That patch has been reworked to use wrs.nto and to use the same approach as Arm for the wait loop, since we can't have arbitrary C code between the load-reserved and the wrs. Then, hwprobe support is added (since the instructions are also usable from usermode), and finally KVM is taught about wrs.nto, allowing guests to see and use the Zawrs extension. We still don't have test results from hardware, and it's not possible to prove that using Zawrs is a win when testing on QEMU, not even when oversubscribing VCPUs to guests. However, it is possible to use KVM selftests to force a scenario where we can prove Zawrs does its job and does it well. [4] is a test which does this and, on my machine, without Zawrs it takes 16 seconds to complete and with Zawrs it takes 0.25 seconds. This series is also available here [1]. In order to use QEMU for testing a build with [2] is needed. In order to enable guests to use Zawrs with KVM using kvmtool, the branch at [3] may be used. [1] https://github.com/jones-drew/linux/commits/riscv/zawrs-v3/ [2] https://lore.kernel.org/all/20240312152901.512001-2-ajones@ventanamicro.com/ [3] https://github.com/jones-drew/kvmtool/commits/riscv/zawrs/ [4] https://github.com/jones-drew/linux/commit/cb2beccebcece10881db842ed69bdd5715cfab5d Link: https://lore.kernel.org/r/20240426100820.14762-8-ajones@ventanamicro.com * b4-shazam-merge: KVM: riscv: selftests: Add Zawrs extension to get-reg-list test KVM: riscv: Support guest wrs.nto riscv: hwprobe: export Zawrs ISA extension riscv: Add Zawrs support for spinlocks dt-bindings: riscv: Add Zawrs ISA extension description riscv: Provide a definition for 'pause' Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-12KVM: riscv: Support guest wrs.ntoAndrew Jones
When a guest traps on wrs.nto, call kvm_vcpu_on_spin() to attempt to yield to the lock holding VCPU. Also extend the KVM ISA extension ONE_REG interface to allow KVM userspace to detect and enable the Zawrs extension for the Guest/VM. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240426100820.14762-13-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-11KVM: Delete the now unused kvm_arch_sched_in()Sean Christopherson
Delete kvm_arch_sched_in() now that all implementations are nops. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240522014013.1672962-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-22RISCV: KVM: Introduce vcpu->reset_cntx_lockYong-Xuan Wang
Originally, the use of kvm->lock in SBI_EXT_HSM_HART_START also avoids the simultaneous updates to the reset context of target VCPU. Since this lock has been replace with vcpu->mp_state_lock, and this new lock also protects the vcpu->mp_state. We have to add a separate lock for vcpu->reset_cntx. Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240417074528.16506-3-yongxuan.wang@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22RISCV: KVM: Introduce mp_state_lock to avoid lock inversionYong-Xuan Wang
Documentation/virt/kvm/locking.rst advises that kvm->lock should be acquired outside vcpu->mutex and kvm->srcu. However, when KVM/RISC-V handling SBI_EXT_HSM_HART_START, the lock ordering is vcpu->mutex, kvm->srcu then kvm->lock. Although the lockdep checking no longer complains about this after commit f0f44752f5f6 ("rcu: Annotate SRCU's update-side lockdep dependencies"), it's necessary to replace kvm->lock with a new dedicated lock to ensure only one hart can execute the SBI_EXT_HSM_HART_START call for the target hart simultaneously. Additionally, this patch also rename "power_off" to "mp_state" with two possible values. The vcpu->mp_state_lock also protects the access of vcpu->mp_state. Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240417074528.16506-2-yongxuan.wang@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-08RISC-V: KVM: Implement kvm_arch_vcpu_ioctl_set_guest_debug()Chao Du
kvm_vm_ioctl_check_extension(): Return 1 if KVM_CAP_SET_GUEST_DEBUG is been checked. kvm_arch_vcpu_ioctl_set_guest_debug(): Update the guest_debug flags from userspace accordingly. Route the breakpoint exceptions to HS mode if the VCPU is being debugged by userspace, by clearing the corresponding bit in hedeleg. Initialize the hedeleg configuration in kvm_riscv_vcpu_setup_config(). Write the actual CSR in kvm_arch_vcpu_load(). Signed-off-by: Chao Du <duchao@eswincomputing.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240402062628.5425-2-duchao@eswincomputing.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-01-02Merge tag 'kvm-riscv-6.8-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 6.8 part #1 - KVM_GET_REG_LIST improvement for vector registers - Generate ISA extension reg_list using macros in get-reg-list selftest - Steal time account support along with selftest
2023-12-30RISC-V: KVM: Add SBI STA info to vcpu_archAndrew Jones
KVM's implementation of SBI STA needs to track the address of each VCPU's steal-time shared memory region as well as the amount of stolen time. Add a structure to vcpu_arch to contain this state and make sure that the address is always set to INVALID_GPA on vcpu reset. And, of course, ensure KVM won't try to update steal- time when the shared memory address is invalid. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: KVM: Add steal-update vcpu requestAndrew Jones
Add a new vcpu request to inform a vcpu that it should record its steal-time information. The request is made each time it has been detected that the vcpu task was not assigned a cpu for some time, which is easy to do by making the request from vcpu-load. The record function is just a stub for now and will be filled in with the rest of the steal-time support functions in following patches. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-11-13KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIERSean Christopherson
Convert KVM_ARCH_WANT_MMU_NOTIFIER into a Kconfig and select it where appropriate to effectively maintain existing behavior. Using a proper Kconfig will simplify building more functionality on top of KVM's mmu_notifier infrastructure. Add a forward declaration of kvm_gfn_range to kvm_types.h so that including arch/powerpc/include/asm/kvm_ppc.h's with CONFIG_KVM=n doesn't generate warnings due to kvm_gfn_range being undeclared. PPC defines hooks for PR vs. HV without guarding them via #ifdeffery, e.g. bool (*unmap_gfn_range)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*test_age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*set_spte_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); Alternatively, PPC could forward declare kvm_gfn_range, but there's no good reason not to define it in common KVM. Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12RISCV: KVM: Add sstateen0 context save/restoreMayuresh Chitale
Define sstateen0 and add sstateen0 save/restore for guest VCPUs. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISCV: KVM: Add senvcfg context save/restoreMayuresh Chitale
Add senvcfg context save/restore for guest VCPUs and also add it to the ONE_REG interface to allow its access from user space. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISC-V: KVM: Enable Smstateen accessesMayuresh Chitale
Configure hstateen0 register so that the AIA state and envcfg are accessible to the vcpus. This includes registers such as siselect, sireg, siph, sieh and all the IMISC registers. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISC-V: KVM: Add kvm_vcpu_configMayuresh Chitale
Add a placeholder for all registers such as henvcfg, hstateen etc which have 'static' configurations depending on extensions supported by the guest. The values are derived once and are then subsequently written to the corresponding CSRs while switching to the vcpu. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-09KVM: riscv: Add KVM_GET_REG_LIST API supportHaibo Xu
KVM_GET_REG_LIST API will return all registers that are available to KVM_GET/SET_ONE_REG APIs. It's very useful to identify some platform regression issue during VM migration. Since this API was already supported on arm64, it is straightforward to enable it on riscv with similar code structure. Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: Factor-out ONE_REG related code to its own source fileAnup Patel
The VCPU ONE_REG interface has grown over time and it will continue to grow with new ISA extensions and other features. Let us move all ONE_REG related code to its own source file so that vcpu.c only focuses only on high-level VCPU functions. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-07-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM64: - Eager page splitting optimization for dirty logging, optionally allowing for a VM to avoid the cost of hugepage splitting in the stage-2 fault path. - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with services that live in the Secure world. pKVM intervenes on FF-A calls to guarantee the host doesn't misuse memory donated to the hyp or a pKVM guest. - Support for running the split hypervisor with VHE enabled, known as 'hVHE' mode. This is extremely useful for testing the split hypervisor on VHE-only systems, and paves the way for new use cases that depend on having two TTBRs available at EL2. - Generalized framework for configurable ID registers from userspace. KVM/arm64 currently prevents arbitrary CPU feature set configuration from userspace, but the intent is to relax this limitation and allow userspace to select a feature set consistent with the CPU. - Enable the use of Branch Target Identification (FEAT_BTI) in the hypervisor. - Use a separate set of pointer authentication keys for the hypervisor when running in protected mode, as the host is untrusted at runtime. - Ensure timer IRQs are consistently released in the init failure paths. - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps (FEAT_EVT), as it is a register commonly read from userspace. - Erratum workaround for the upcoming AmpereOne part, which has broken hardware A/D state management. RISC-V: - Redirect AMO load/store misaligned traps to KVM guest - Trap-n-emulate AIA in-kernel irqchip for KVM guest - Svnapot support for KVM Guest s390: - New uvdevice secret API - CMM selftest and fixes - fix racy access to target CPU for diag 9c x86: - Fix missing/incorrect #GP checks on ENCLS - Use standard mmu_notifier hooks for handling APIC access page - Drop now unnecessary TR/TSS load after VM-Exit on AMD - Print more descriptive information about the status of SEV and SEV-ES during module load - Add a test for splitting and reconstituting hugepages during and after dirty logging - Add support for CPU pinning in demand paging test - Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes included along the way - Add a "nx_huge_pages=never" option to effectively avoid creating NX hugepage recovery threads (because nx_huge_pages=off can be toggled at runtime) - Move handling of PAT out of MTRR code and dedup SVM+VMX code - Fix output of PIC poll command emulation when there's an interrupt - Add a maintainer's handbook to document KVM x86 processes, preferred coding style, testing expectations, etc. - Misc cleanups, fixes and comments Generic: - Miscellaneous bugfixes and cleanups Selftests: - Generate dependency files so that partial rebuilds work as expected" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits) Documentation/process: Add a maintainer handbook for KVM x86 Documentation/process: Add a label for the tip tree handbook's coding style KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index RISC-V: KVM: Remove unneeded semicolon RISC-V: KVM: Allow Svnapot extension for Guest/VM riscv: kvm: define vcpu_sbi_ext_pmu in header RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip RISC-V: KVM: Add in-kernel emulation of AIA APLIC RISC-V: KVM: Implement device interface for AIA irqchip RISC-V: KVM: Skeletal in-kernel AIA irqchip support RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero RISC-V: KVM: Add APLIC related defines RISC-V: KVM: Add IMSIC related defines RISC-V: KVM: Implement guest external interrupt line management KVM: x86: Remove PRIx* definitions as they are solely for user space s390/uv: Update query for secret-UVCs s390/uv: replace scnprintf with sysfs_emit s390/uvdevice: Add 'Lock Secret Store' UVC ...
2023-06-18RISC-V: KVM: Skeletal in-kernel AIA irqchip supportAnup Patel
To incrementally implement in-kernel AIA irqchip support, we first add minimal skeletal support which only compiles but does not provide any functionality. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-06-08riscv: KVM: Add vector lazy save/restore supportVincent Chen
This patch adds vector context save/restore for guest VCPUs. To reduce the impact on KVM performance, the implementation imitates the FP context switch mechanism to lazily store and restore the vector context only when the kernel enters/exits the in-kernel run loop and not during the KVM world switch. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Greentime Hu <greentime.hu@sifive.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20230605110724.21391-20-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-21RISC-V: KVM: Use bitmap for irqs_pending and irqs_pending_maskAnup Patel
To support 64 VCPU local interrupts on RV32 host, we should use bitmap for irqs_pending and irqs_pending_mask in struct kvm_vcpu_arch. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-04-21RISC-V: KVM: Initial skeletal support for AIAAnup Patel
To incrementally implement AIA support, we first add minimal skeletal support which only compiles and detects AIA hardware support at the boot-time but does not provide any functionality. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-02-15Merge tag 'kvm-riscv-6.3-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 6.3 - Fix wrong usage of PGDIR_SIZE to check page sizes - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect() - Redirect illegal instruction traps to guest - SBI PMU support for guest
2023-02-07RISC-V: KVM: Add skeleton support for perfAtish Patra
This patch only adds barebone structure of perf implementation. Most of the function returns zero at this point and will be implemented fully in the future. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-12-29KVM: RISC-V: Tag init functions and data with __init, __ro_after_initSean Christopherson
Now that KVM setup is handled directly in riscv_kvm_init(), tag functions and data that are used/set only during init with __init/__ro_after_init. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-26-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29KVM: Drop arch hardware (un)setup hooksSean Christopherson
Drop kvm_arch_hardware_setup() and kvm_arch_hardware_unsetup() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-07RISC-V: KVM: Save mvendorid, marchid, and mimpid when creating VCPUAnup Patel
We should save VCPU mvendorid, marchid, and mimpid at the time of creating VCPU so that we don't have to do host SBI call every time Guest/VM ask for these details. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-12-07RISC-V: KVM: Move sbi related struct and functions to kvm_vcpu_sbi.hAnup Patel
Just like asm/kvm_vcpu_timer.h, we should have all sbi related struct and functions in asm/kvm_vcpu_sbi.h. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-12-07RISC-V: KVM: Remove redundant includes of asm/csr.hAnup Patel
We should include asm/csr.h only where required so let us remove redundant includes of this header. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-10-02RISC-V: KVM: Record number of signal exits as a vCPU statJisheng Zhang
Record a statistic indicating the number of times a vCPU has exited due to a pending signal. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org
2022-07-29RISC-V: KVM: Add G-stage ioremap() and iounmap() functionsAnup Patel
The in-kernel AIA IMSIC support requires on-demand mapping / unmapping of Guest IMSIC address to Host IMSIC guest files. To help achieve this, we add kvm_riscv_stage2_ioremap() and kvm_riscv_stage2_iounmap() functions. These new functions for updating G-stage page table mappings will be called in atomic context so we have special "in_atomic" parameter for this purpose. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-07-29RISC-V: KVM: Add extensible CSR emulation frameworkAnup Patel
We add an extensible CSR emulation framework which is based upon the existing system instruction emulation. This will be useful to upcoming AIA, PMU, Nested and other virtualization features. The CSR emulation framework also has provision to emulate CSR in user space but this will be used only in very specific cases such as AIA IMSIC CSR emulation in user space or vendor specific CSR emulation in user space. By default, all CSRs not handled by KVM RISC-V will be redirected back to Guest VCPU as illegal instruction trap. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-07-29RISC-V: KVM: Factor-out instruction emulation into separate sourcesAnup Patel
The instruction and CSR emulation for VCPU is going to grow over time due to upcoming AIA, PMU, Nested and other virtualization features. Let us factor-out VCPU instruction emulation from vcpu_exit.c to a separate source dedicated for this purpose. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-07-29RISC-V: KVM: Improve ISA extension by using a bitmapAtish Patra
Currently, the every vcpu only stores the ISA extensions in a unsigned long which is not scalable as number of extensions will continue to grow. Using a bitmap allows the ISA extension to support any number of extensions. The CONFIG one reg interface implementation is modified to support the bitmap as well. But it is meant only for base extensions. Thus, the first element of the bitmap array is sufficient for that interface. In the future, all the new multi-letter extensions must use the ISA_EXT one reg interface that allows enabling/disabling any extension now. Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20RISC-V: KVM: Cleanup stale TLB entries when host CPU changesAnup Patel
On RISC-V platforms with hardware VMID support, we share same VMID for all VCPUs of a particular Guest/VM. This means we might have stale G-stage TLB entries on the current Host CPU due to some other VCPU of the same Guest which ran previously on the current Host CPU. To cleanup stale TLB entries, we simply flush all G-stage TLB entries by VMID whenever underlying Host CPU changes for a VCPU. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20RISC-V: KVM: Add remote HFENCE functions based on VCPU requestsAnup Patel
The generic KVM has support for VCPU requests which can be used to do arch-specific work in the run-loop. We introduce remote HFENCE functions which will internally use VCPU requests instead of host SBI calls. Advantages of doing remote HFENCEs as VCPU requests are: 1) Multiple VCPUs of a Guest may be running on different Host CPUs so it is not always possible to determine the Host CPU mask for doing Host SBI call. For example, when VCPU X wants to do HFENCE on VCPU Y, it is possible that VCPU Y is blocked or in user-space (i.e. vcpu->cpu < 0). 2) To support nested virtualization, we will be having a separate shadow G-stage for each VCPU and a common host G-stage for the entire Guest/VM. The VCPU requests based remote HFENCEs helps us easily synchronize the common host G-stage and shadow G-stage of each VCPU without any additional IPI calls. This is also a preparatory patch for upcoming nested virtualization support where we will be having a shadow G-stage page table for each Guest VCPU. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20RISC-V: KVM: Reduce KVM_MAX_VCPUS valueAnup Patel
Currently, the KVM_MAX_VCPUS value is 16384 for RV64 and 128 for RV32. The KVM_MAX_VCPUS value is too high for RV64 and too low for RV32 compared to other architectures (e.g. x86 sets it to 1024 and ARM64 sets it to 512). The too high value of KVM_MAX_VCPUS on RV64 also leads to VCPU mask on stack consuming 2KB. We set KVM_MAX_VCPUS to 1024 for both RV64 and RV32 to be aligned other architectures. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20RISC-V: KVM: Introduce range based local HFENCE functionsAnup Patel
Various __kvm_riscv_hfence_xyz() functions implemented in the kvm/tlb.S are equivalent to corresponding HFENCE.GVMA instructions and we don't have range based local HFENCE functions. This patch provides complete set of local HFENCE functions which supports range based TLB invalidation and supports HFENCE.VVMA based functions. This is also a preparatory patch for upcoming Svinval support in KVM RISC-V. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20RISC-V: KVM: Use G-stage name for hypervisor page tableAnup Patel
The two-stage address translation defined by the RISC-V privileged specification defines: VS-stage (guest virtual address to guest physical address) programmed by the Guest OS and G-stage (guest physical addree to host physical address) programmed by the hypervisor. To align with above terminology, we replace "stage2" with "gstage" and "Stage2" with "G-stage" name everywhere in KVM RISC-V sources. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-04-21KVM: RISC-V: Use kvm_vcpu.srcu_idx, drop RISC-V's unnecessary copySean Christopherson
Use the generic kvm_vcpu's srcu_idx instead of using an indentical field in RISC-V's version of kvm_vcpu_arch. Generic KVM very intentionally does not touch vcpu->srcu_idx, i.e. there's zero chance of running afoul of common code. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220415004343.2203171-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-11RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() functionAnup Patel
The wait for interrupt (WFI) instruction emulation can share the VCPU halt logic with SBI HSM suspend emulation so this patch adds a common kvm_riscv_vcpu_wfi() function for this purpose. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-01-06RISC-V: KVM: Add VM capability to allow userspace get GPA bitsAnup Patel
The number of GPA bits supported for a RISC-V Guest/VM is based on the MMU mode used by the G-stage translation. The KVM RISC-V will detect and use the best possible MMU mode for the G-stage in kvm_arch_init(). We add a generic VM capability KVM_CAP_VM_GPA_BITS which can be used by the KVM userspace to get the number of GPA (guest physical address) bits supported for a Guest/VM. Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-and-tested-by: Atish Patra <atishp@rivosinc.com>
2022-01-06KVM: RISC-V: Use common KVM implementation of MMU memory cachesSean Christopherson
Use common KVM's implementation of the MMU memory caches, which for all intents and purposes is semantically identical to RISC-V's version, the only difference being that the common implementation will fall back to an atomic allocation if there's a KVM bug that triggers a cache underflow. RISC-V appears to have based its MMU code on arm64 before the conversion to the common caches in commit c1a33aebe91d ("KVM: arm64: Use common KVM implementation of MMU memory caches"), despite having also copy-pasted the definition of KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE in kvm_types.h. Opportunistically drop the superfluous wrapper kvm_riscv_stage2_flush_cache(), whose name is very, very confusing as "cache flush" in the context of MMU code almost always refers to flushing hardware caches, not freeing unused software objects. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>
2021-12-08KVM: Drop obsolete kvm_arch_vcpu_block_finish()Sean Christopherson
Drop kvm_arch_vcpu_block_finish() now that all arch implementations are nops. No functional change intended. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>