summaryrefslogtreecommitdiff
path: root/arch/riscv/kvm
AgeCommit message (Collapse)Author
2023-12-30RISC-V: KVM: Implement SBI STA extensionAndrew Jones
Add a select SCHED_INFO to the KVM config in order to get run_delay info. Then implement SBI STA's set-steal-time-shmem function and kvm_riscv_vcpu_record_steal_time() to provide the steal-time info to guests. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: KVM: Add support for SBI STA registersAndrew Jones
KVM userspace needs to be able to save and restore the steal-time shared memory address. Provide the address through the get/set-one-reg interface with two ulong-sized SBI STA extension registers (lo and hi). 64-bit KVM userspace must not set the hi register to anything other than zero and is allowed to completely neglect saving/restoring it. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: KVM: Add support for SBI extension registersAndrew Jones
Some SBI extensions have state that needs to be saved / restored when migrating the VM. Provide a get/set-one-reg register type for SBI extension registers. Each SBI extension that uses this type will have its own subtype. There are currently no subtypes defined. The next patch introduces the first one. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: KVM: Add SBI STA info to vcpu_archAndrew Jones
KVM's implementation of SBI STA needs to track the address of each VCPU's steal-time shared memory region as well as the amount of stolen time. Add a structure to vcpu_arch to contain this state and make sure that the address is always set to INVALID_GPA on vcpu reset. And, of course, ensure KVM won't try to update steal- time when the shared memory address is invalid. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: KVM: Add steal-update vcpu requestAndrew Jones
Add a new vcpu request to inform a vcpu that it should record its steal-time information. The request is made each time it has been detected that the vcpu task was not assigned a cpu for some time, which is easy to do by making the request from vcpu-load. The record function is just a stub for now and will be filled in with the rest of the steal-time support functions in following patches. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: KVM: Add SBI STA extension skeletonAndrew Jones
Add the files and functions needed to support the SBI STA (steal-time accounting) extension. In the next patches we'll complete the functions to fully enable SBI STA support. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()Anup Patel
The indentation of "break" in kvm_riscv_vcpu_set_reg_csr() is inconsistent hence let us fix it. Fixes: c04913f2b54e ("RISCV: KVM: Add sstateen0 to ONE_REG") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312190719.kBuYl6oJ-lkp@intel.com/ Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: add vector registers and CSRs in KVM_GET_REG_LISTDaniel Henrique Barboza
Add all vector registers and CSRs (vstart, vl, vtype, vcsr, vlenb) in get-reg-list. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: add 'vlenb' Vector CSRDaniel Henrique Barboza
Userspace requires 'vlenb' to be able to encode it in reg ID. Otherwise it is not possible to retrieve any vector reg since we're returning EINVAL if reg_size isn't vlenb (see kvm_riscv_vcpu_vreg_addr()). Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: set 'vlenb' in kvm_riscv_vcpu_alloc_vector_context()Daniel Henrique Barboza
'vlenb', added to riscv_v_ext_state by commit c35f3aa34509 ("RISC-V: vector: export VLENB csr in __sc_riscv_v_state"), isn't being initialized in guest_context. If we export 'vlenb' as a KVM CSR, something we want to do in the next patch, it'll always return 0. Set 'vlenb' to riscv_v_size/32. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: Make SBI uapi consistent with ISA uapiAndrew Jones
When an SBI extension cannot be enabled, that's a distinct state vs. enabled and disabled. Modify enum kvm_riscv_sbi_ext_status to accommodate it, which allows KVM userspace to tell the difference in state too, as the SBI extension register will disappear when it cannot be enabled, i.e. accesses to it return ENOENT. get-reg-list is updated as well to only add SBI extension registers to the list which may be enabled. Returning ENOENT for SBI extension registers which cannot be enabled makes them consistent with ISA extension registers. Any SBI extensions which were enabled by default are still enabled by default, if they can be enabled at all. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: Don't add SBI multi regs in get-reg-listAndrew Jones
The multi regs are derived from the single registers. Only list the single registers in get-reg-list. This also makes the SBI extension register listing consistent with the ISA extension register listing. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: remove a redundant condition in kvm_arch_vcpu_ioctl_run()Chao Du
The latest ret value is updated by kvm_riscv_vcpu_aia_update(), the loop will continue if the ret is less than or equal to zero. So the later condition will never hit. Thus remove it. Signed-off-by: Chao Du <duchao@eswincomputing.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29riscv: kvm: use ".L" local labels in assembly when applicableClément Léger
For the sake of coherency, use local labels in assembly when applicable. This also avoid kprobes being confused when applying a kprobe since the size of function is computed by checking where the next visible symbol is located. This might end up in computing some function size to be way shorter than expected and thus failing to apply kprobes to the specified offset. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29riscv: kvm: Use SYM_*() assembly macros instead of deprecated onesClément Léger
ENTRY()/END()/WEAK() macros are deprecated and we should make use of the new SYM_*() macros [1] for better annotation of symbols. Replace the deprecated ones with the new ones and fix wrong usage of END()/ENDPROC() to correctly describe the symbols. [1] https://docs.kernel.org/core-api/asm-annotations.html Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-20riscv: Use accessors to page table entries instead of direct dereferenceAlexandre Ghiti
As very well explained in commit 20a004e7b017 ("arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables"), an architecture whose page table walker can modify the PTE in parallel must use READ_ONCE()/WRITE_ONCE() macro to avoid any compiler transformation. So apply that to riscv which is such architecture. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20231213203001.179237-5-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-13RISCV: KVM: update external interrupt atomically for IMSIC swfileYong-Xuan Wang
The emulated IMSIC update the external interrupt pending depending on the value of eidelivery and topei. It might lose an interrupt when it is interrupted before setting the new value to the pending status. For example, when VCPU0 sends an IPI to VCPU1 via IMSIC: VCPU0 VCPU1 CSRSWAP topei = 0 The VCPU1 has claimed all the external interrupt in its interrupt handler. topei of VCPU1's IMSIC = 0 set pending in VCPU1's IMSIC topei of VCPU1' IMSIC = 1 set the external interrupt pending of VCPU1 clear the external interrupt pending of VCPU1 When the VCPU1 switches back to VS mode, it exits the interrupt handler because the result of CSRSWAP topei is 0. If there are no other external interrupts injected into the VCPU1's IMSIC, VCPU1 will never know this pending interrupt unless it initiative read the topei. If the interruption occurs between updating interrupt pending in IMSIC and updating external interrupt pending of VCPU, it will not cause a problem. Suppose that the VCPU1 clears the IPI pending in IMSIC right after VCPU0 sets the pending, the external interrupt pending of VCPU1 will not be set because the topei is 0. But when the VCPU1 goes back to VS mode, the pending IPI will be reported by the CSRSWAP topei, it will not lose this interrupt. So we only need to make the external interrupt updating procedure as a critical section to avoid the problem. Fixes: db8b7e97d613 ("RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC") Tested-by: Roy Lin <roy.lin@sifive.com> Tested-by: Wayling Chen <wayling.chen@sifive.com> Co-developed-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-08KVM: remove CONFIG_HAVE_KVM_IRQFDPaolo Bonzini
All platforms with a kernel irqchip have support for irqfd. Unify the two configuration items so that userspace can expect to use irqfd to inject interrupts into the irqchip. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-08KVM: remove CONFIG_HAVE_KVM_EVENTFDPaolo Bonzini
virt/kvm/eventfd.c is compiled unconditionally, meaning that the ioeventfds member of struct kvm is accessed unconditionally. CONFIG_HAVE_KVM_EVENTFD therefore must be defined for KVM common code to compile successfully, remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-30KVM: move KVM_CAP_DEVICE_CTRL to the generic checkWei Wang
KVM_CAP_DEVICE_CTRL allows userspace to check if the kvm_device framework (e.g. KVM_CREATE_DEVICE) is supported by KVM. Move KVM_CAP_DEVICE_CTRL to the generic check for the two reasons: 1) it already supports arch agnostic usages (i.e. KVM_DEV_TYPE_VFIO). For example, userspace VFIO implementation may needs to create KVM_DEV_TYPE_VFIO on x86, riscv, or arm etc. It is simpler to have it checked at the generic code than at each arch's code. 2) KVM_CREATE_DEVICE has been added to the generic code. Link: https://lore.kernel.org/all/20221215115207.14784-1-wei.w.wang@intel.com Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Acked-by: Anup Patel <anup@brainfault.org> (riscv) Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lore.kernel.org/r/20230315101606.10636-1-wei.w.wang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-14Merge branch 'kvm-guestmemfd' into HEADPaolo Bonzini
Introduce several new KVM uAPIs to ultimately create a guest-first memory subsystem within KVM, a.k.a. guest_memfd. Guest-first memory allows KVM to provide features, enhancements, and optimizations that are kludgly or outright impossible to implement in a generic memory subsystem. The core KVM ioctl() for guest_memfd is KVM_CREATE_GUEST_MEMFD, which similar to the generic memfd_create(), creates an anonymous file and returns a file descriptor that refers to it. Again like "regular" memfd files, guest_memfd files live in RAM, have volatile storage, and are automatically released when the last reference is dropped. The key differences between memfd files (and every other memory subystem) is that guest_memfd files are bound to their owning virtual machine, cannot be mapped, read, or written by userspace, and cannot be resized. guest_memfd files do however support PUNCH_HOLE, which can be used to convert a guest memory area between the shared and guest-private states. A second KVM ioctl(), KVM_SET_MEMORY_ATTRIBUTES, allows userspace to specify attributes for a given page of guest memory. In the long term, it will likely be extended to allow userspace to specify per-gfn RWX protections, including allowing memory to be writable in the guest without it also being writable in host userspace. The immediate and driving use case for guest_memfd are Confidential (CoCo) VMs, specifically AMD's SEV-SNP, Intel's TDX, and KVM's own pKVM. For such use cases, being able to map memory into KVM guests without requiring said memory to be mapped into the host is a hard requirement. While SEV+ and TDX prevent untrusted software from reading guest private data by encrypting guest memory, pKVM provides confidentiality and integrity *without* relying on memory encryption. In addition, with SEV-SNP and especially TDX, accessing guest private memory can be fatal to the host, i.e. KVM must be prevent host userspace from accessing guest memory irrespective of hardware behavior. Long term, guest_memfd may be useful for use cases beyond CoCo VMs, for example hardening userspace against unintentional accesses to guest memory. As mentioned earlier, KVM's ABI uses userspace VMA protections to define the allow guest protection (with an exception granted to mapping guest memory executable), and similarly KVM currently requires the guest mapping size to be a strict subset of the host userspace mapping size. Decoupling the mappings sizes would allow userspace to precisely map only what is needed and with the required permissions, without impacting guest performance. A guest-first memory subsystem also provides clearer line of sight to things like a dedicated memory pool (for slice-of-hardware VMs) and elimination of "struct page" (for offload setups where userspace _never_ needs to DMA from or into guest memory). guest_memfd is the result of 3+ years of development and exploration; taking on memory management responsibilities in KVM was not the first, second, or even third choice for supporting CoCo VMs. But after many failed attempts to avoid KVM-specific backing memory, and looking at where things ended up, it is quite clear that of all approaches tried, guest_memfd is the simplest, most robust, and most extensible, and the right thing to do for KVM and the kernel at-large. The "development cycle" for this version is going to be very short; ideally, next week I will merge it as is in kvm/next, taking this through the KVM tree for 6.8 immediately after the end of the merge window. The series is still based on 6.6 (plus KVM changes for 6.7) so it will require a small fixup for changes to get_file_rcu() introduced in 6.7 by commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU"). The fixup will be done as part of the merge commit, and most of the text above will become the commit message for the merge. Pending post-merge work includes: - hugepage support - looking into using the restrictedmem framework for guest memory - introducing a testing mechanism to poison memory, possibly using the same memory attributes introduced here - SNP and TDX support There are two non-KVM patches buried in the middle of this series: fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure() mm: Add AS_UNMOVABLE to mark mapping as completely unmovable The first is small and mostly suggested-by Christian Brauner; the second a bit less so but it was written by an mm person (Vlastimil Babka).
2023-11-13KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIERSean Christopherson
Convert KVM_ARCH_WANT_MMU_NOTIFIER into a Kconfig and select it where appropriate to effectively maintain existing behavior. Using a proper Kconfig will simplify building more functionality on top of KVM's mmu_notifier infrastructure. Add a forward declaration of kvm_gfn_range to kvm_types.h so that including arch/powerpc/include/asm/kvm_ppc.h's with CONFIG_KVM=n doesn't generate warnings due to kvm_gfn_range being undeclared. PPC defines hooks for PR vs. HV without guarding them via #ifdeffery, e.g. bool (*unmap_gfn_range)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*test_age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*set_spte_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); Alternatively, PPC could forward declare kvm_gfn_range, but there's no good reason not to define it in common KVM. Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-10Merge tag 'riscv-for-linus-6.7-mw2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for handling misaligned accesses in S-mode - Probing for misaligned access support is now properly cached and handled in parallel - PTDUMP now reflects the SW reserved bits, as well as the PBMT and NAPOT extensions - Performance improvements for TLB flushing - Support for many new relocations in the module loader - Various bug fixes and cleanups * tag 'riscv-for-linus-6.7-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits) riscv: Optimize bitops with Zbb extension riscv: Rearrange hwcap.h and cpufeature.h drivers: perf: Do not broadcast to other cpus when starting a counter drivers: perf: Check find_first_bit() return value of: property: Add fw_devlink support for msi-parent RISC-V: Don't fail in riscv_of_parent_hartid() for disabled HARTs riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings riscv: Don't use PGD entries for the linear mapping RISC-V: Probe misaligned access speed in parallel RISC-V: Remove __init on unaligned_emulation_finish() RISC-V: Show accurate per-hart isa in /proc/cpuinfo RISC-V: Don't rely on positional structure initialization riscv: Add tests for riscv module loading riscv: Add remaining module relocations riscv: Avoid unaligned access when relocating modules riscv: split cache ops out of dma-noncoherent.c riscv: Improve flush_tlb_kernel_range() riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb riscv: Improve flush_tlb_range() for hugetlb pages riscv: Improve tlb_flush() ...
2023-11-09riscv: Rearrange hwcap.h and cpufeature.hXiao Wang
Now hwcap.h and cpufeature.h are mutually including each other, and most of the variable/API declarations in hwcap.h are implemented in cpufeature.c, so, it's better to move them into cpufeature.h and leave only macros for ISA extension logical IDs in hwcap.h. BTW, the riscv_isa_extension_mask macro is not used now, so this patch removes it. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231031064553.2319688-2-xiao.w.wang@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-20RISC-V: KVM: Forward SBI DBCN extension to user-spaceAnup Patel
The frozen SBI v2.0 specification defines the SBI debug console (DBCN) extension which replaces the legacy SBI v0.1 console functions namely sbi_console_getchar() and sbi_console_putchar(). The SBI DBCN extension needs to be emulated in the KVM user-space (i.e. QEMU-KVM or KVMTOOL) so we forward SBI DBCN calls from KVM guest to the KVM user-space which can then redirect the console input/output to wherever it wants (e.g. telnet, file, stdio, etc). The SBI debug console is simply a early console available to KVM guest for early prints and it does not intend to replace the proper console devices such as 8250, VirtIO console, etc. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-20RISC-V: KVM: Allow some SBI extensions to be disabled by defaultAnup Patel
Currently, all SBI extensions are enabled by default which is problematic for SBI extensions (such as DBCN) which are forwarded to the KVM user-space because we might have an older KVM user-space which is not aware/ready to handle newer SBI extensions. Ideally, the SBI extensions forwarded to the KVM user-space must be disabled by default. To address above, we allow certain SBI extensions to be disabled by default so that KVM user-space must explicitly enable such SBI extensions to receive forwarded calls from Guest VCPU. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISC-V: KVM: Allow Zicond extension for Guest/VMAnup Patel
We extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Zicond extension for Guest/VM. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISCV: KVM: Add sstateen0 to ONE_REGMayuresh Chitale
Add support for sstateen0 CSR to the ONE_REG interface to allow its access from user space. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISCV: KVM: Add sstateen0 context save/restoreMayuresh Chitale
Define sstateen0 and add sstateen0 save/restore for guest VCPUs. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISCV: KVM: Add senvcfg context save/restoreMayuresh Chitale
Add senvcfg context save/restore for guest VCPUs and also add it to the ONE_REG interface to allow its access from user space. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISC-V: KVM: Enable Smstateen accessesMayuresh Chitale
Configure hstateen0 register so that the AIA state and envcfg are accessible to the vcpus. This includes registers such as siselect, sireg, siph, sieh and all the IMISC registers. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISC-V: KVM: Add kvm_vcpu_configMayuresh Chitale
Add a placeholder for all registers such as henvcfg, hstateen etc which have 'static' configurations depending on extensions supported by the guest. The values are derived once and are then subsequently written to the corresponding CSRs while switching to the vcpu. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-09-21RISC-V: KVM: Fix riscv_vcpu_get_isa_ext_single() for missing extensionsAnup Patel
The riscv_vcpu_get_isa_ext_single() should fail with -ENOENT error when corresponding ISA extension is not available on the host. Fixes: e98b1085be79 ("RISC-V: KVM: Factor-out ONE_REG related code to its own source file") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-09-21RISC-V: KVM: Fix KVM_GET_REG_LIST API for ISA_EXT registersAnup Patel
The ISA_EXT registers to enabled/disable ISA extensions for VCPU are always available when underlying host has the corresponding ISA extension. The copy_isa_ext_reg_indices() called by the KVM_GET_REG_LIST API does not align with this expectation so let's fix it. Fixes: 031f9efafc08 ("KVM: riscv: Add KVM_GET_REG_LIST API support") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-31Merge tag 'kvm-riscv-6.6-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 6.6 - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for Guest/VM - Added ONE_REG interface for SATP mode - Added ONE_REG interface to enable/disable multiple ISA extensions - Improved error codes returned by ONE_REG interfaces - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V - Added get-reg-list selftest for KVM RISC-V
2023-08-31Merge tag 'kvm-x86-generic-6.6' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
Common KVM changes for 6.6: - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass action specific data without needing to constantly update the main handlers. - Drop unused function declarations
2023-08-17KVM: Wrap kvm_{gfn,hva}_range.pte in a per-action unionSean Christopherson
Wrap kvm_{gfn,hva}_range.pte in a union so that future notifier events can pass event specific information up and down the stack without needing to constantly expand and churn the APIs. Lockless aging of SPTEs will pass around a bitmap, and support for memory attributes will pass around the new attributes for the range. Add a "KVM_NO_ARG" placeholder to simplify handling events without an argument (creating a dummy union variable is midly annoying). Opportunstically drop explicit zero-initialization of the "pte" field, as omitting the field (now a union) has the same effect. Cc: Yu Zhao <yuzhao@google.com> Link: https://lore.kernel.org/all/CAOUHufagkd2Jk3_HrVoFFptRXM=hX2CV8f+M-dka-hJU4bP8kw@mail.gmail.com Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Yu Zhao <yuzhao@google.com> Link: https://lore.kernel.org/r/20230729004144.1054885-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common codeDavid Matlack
Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a range-based TLB invalidation where the range is defined by the memslot. Now that kvm_flush_remote_tlbs_range() can be called from common code we can just use that and drop a bunch of duplicate code from the arch directories. Note this adds a lockdep assertion for slots_lock being held when calling kvm_flush_remote_tlbs_memslot(), which was previously only asserted on x86. MIPS has calls to kvm_flush_remote_tlbs_memslot(), but they all hold the slots_lock, so the lockdep assertion continues to hold true. Also drop the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT ifdef gating kvm_flush_remote_tlbs_memslot(), since it is no longer necessary. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Acked-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811045127.3308641-7-rananta@google.com
2023-08-09KVM: riscv: Add KVM_GET_REG_LIST API supportHaibo Xu
KVM_GET_REG_LIST API will return all registers that are available to KVM_GET/SET_ONE_REG APIs. It's very useful to identify some platform regression issue during VM migration. Since this API was already supported on arm64, it is straightforward to enable it on riscv with similar code structure. Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: Improve vector save/restore functionsAndrew Jones
Make two nonfunctional changes to the vector get/set vector reg functions and their supporting function for simplification and readability. The first is to not pass KVM_REG_RISCV_VECTOR, but rather integrate it directly into the masking. The second is to rename reg_val to reg_addr where and address is used instead of a value. Also opportunistically touch up some of the code formatting for a third nonfunctional change. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: Improve vector save/restore errorsAndrew Jones
kvm_riscv_vcpu_(get/set)_reg_vector() now returns ENOENT if V is not available, EINVAL if reg type is not of VECTOR type, and any error that might be thrown by kvm_riscv_vcpu_vreg_addr(). Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: avoid EBUSY when writing the same isa_ext valDaniel Henrique Barboza
riscv_vcpu_set_isa_ext_single() will prevent any write of isa_ext regs if the vcpu already started spinning. But if there's no extension state (enabled/disabled) made by the userspace, there's no need to -EBUSY out - we can treat the operation as a no-op. zicbom/zicboz_block_size, ISA config reg and mvendorid/march/mimpid already works in a more permissive manner w.r.t userspace writes being a no-op, so let's do the same with isa_ext writes. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: avoid EBUSY when writing the same machine ID valDaniel Henrique Barboza
Right now we do not allow any write in mvendorid/marchid/mimpid if the vcpu already started, preventing these regs to be changed. However, if userspace doesn't change them, an alternative is to consider the reg write a no-op and avoid erroring out altogether. Userpace can then be oblivious about KVM internals if no changes were intended in the first place. Allow the same form of 'lazy writing' that registers such as zicbom/zicboz_block_size supports: avoid erroring out if userspace makes no changes in mvendorid/marchid/mimpid during reg write. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: avoid EBUSY when writing same ISA valDaniel Henrique Barboza
kvm_riscv_vcpu_set_reg_config() will return -EBUSY if the ISA config reg is being written after the VCPU ran at least once. The same restriction isn't placed in kvm_riscv_vcpu_get_reg_config(), so there's a chance that we'll -EBUSY out on an ISA config reg write even if the userspace intended no changes to it. We'll allow the same form of 'lazy writing' that registers such as zicbom/zicboz_block_size supports: avoid erroring out if userspace made no changes to the ISA config reg. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: use EBUSY when !vcpu->arch.ran_atleast_onceDaniel Henrique Barboza
vcpu_set_reg_config() and vcpu_set_reg_isa_ext() is throwing an EOPNOTSUPP error when !vcpu->arch.ran_atleast_once. In similar cases we're throwing an EBUSY error, like in mvendorid/marchid/mimpid set_reg(). EOPNOTSUPP has a conotation of finality. EBUSY is more adequate in this case since its a condition/error related to the vcpu lifecycle. Change these EOPNOTSUPP instances to EBUSY. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: do not EOPNOTSUPP in set KVM_REG_RISCV_TIMER_REGDaniel Henrique Barboza
The KVM_REG_RISCV_TIMER_REG can be read via get_one_reg(). But trying to write anything in this reg via set_one_reg() results in an EOPNOTSUPP. Change the API to behave like cbom_block_size: instead of always erroring out with EOPNOTSUPP, allow userspace to write the same value (riscv_timebase) back, throwing an EINVAL if a different value is attempted. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: do not EOPNOTSUPP in set_one_reg() zicbo(m|z)Daniel Henrique Barboza
zicbom_block_size and zicboz_block_size have a peculiar API: they can be read via get_one_reg() but any write will return a EOPNOTSUPP. It makes sense to return a 'not supported' error since both values can't be changed, but as far as userspace goes they're regs that are throwing the same EOPNOTSUPP error even if they were read beforehand via get_one_reg(), even if the same read value is being written back. EOPNOTSUPP is also returned even if ZICBOM/ZICBOZ aren't enabled in the host. Change both to work more like their counterparts in get_one_reg() and return -ENOENT if their respective extensions aren't available. After that, check if the userspace is written a valid value (i.e. the host value). Throw an -EINVAL if that's not case, let it slide otherwise. This allows both regs to be read/written by userspace in a 'lazy' manner, as long as the userspace doesn't change the reg vals. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: use ENOENT in *_one_reg() when extension is unavailableDaniel Henrique Barboza
Following a similar logic as the previous patch let's minimize the EINVAL usage in *_one_reg() APIs by using ENOENT when an extension that implements the reg is not available. For consistency we're also replacing an EOPNOTSUPP instance that should be an ENOENT since it's an "extension is not available" error. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: return ENOENT in *_one_reg() when reg is unknownDaniel Henrique Barboza
get_one_reg() and set_one_reg() are returning EINVAL errors for almost everything: if a reg doesn't exist, if a reg ID is malformatted, if the associated CPU extension that implements the reg isn't present in the host, and for set_one_reg() if the value being written is invalid. This isn't wrong according to the existing KVM API docs (EINVAL can be used when there's no such register) but adding more ENOENT instances will make easier for userspace to understand what went wrong. Existing userspaces can be affected by this error code change. We checked a few. As of current upstream code, crosvm doesn't check for any particular errno code when using kvm_(get|set)_one_reg(). Neither does QEMU. rust-vmm doesn't have kvm-riscv support yet. Thus we have a good chance of changing these error codes now while the KVM RISC-V ecosystem is still new, minimizing user impact. Change all get_one_reg() and set_one_reg() implementations to return -ENOENT at all "no such register" cases. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-08-08RISC-V: KVM: provide UAPI for host SATP modeDaniel Henrique Barboza
KVM userspaces need to be aware of the host SATP to allow them to advertise it back to the guest OS. Since this information is used to build the guest FDT we can't wait for the SATP reg to be readable. We just need to read the SATP mode, thus we can use the existing 'satp_mode' global that represents the SATP reg with MODE set and both ASID and PPN cleared. E.g. for a 32 bit host running with sv32 satp_mode is 0x80000000, for a 64 bit host running sv57 satp_mode is 0xa000000000000000, and so on. Add a new userspace virtual config register 'satp_mode' to allow userspace to read the current SATP mode the host is using with GET_ONE_REG API before spinning the vcpu. 'satp_mode' can't be changed via KVM, so SET_ONE_REG is allowed as long as userspace writes the existing 'satp_mode'. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>