summaryrefslogtreecommitdiff
path: root/arch/arm64/include/asm/kvm_asm.h
AgeCommit message (Collapse)Author
2025-01-12Merge branch kvm-arm64/pkvm-np-guest into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-np-guest: : . : pKVM support for non-protected guests using the standard MM : infrastructure, courtesy of Quentin Perret. From the cover letter: : : "This series moves the stage-2 page-table management of non-protected : guests to EL2 when pKVM is enabled. This is only intended as an : incremental step towards a 'feature-complete' pKVM, there is however a : lot more that needs to come on top. : : With that series applied, pKVM provides near-parity with standard KVM : from a functional perspective all while Linux no longer touches the : stage-2 page-tables itself at EL1. The majority of mm-related KVM : features work out of the box, including MMU notifiers, dirty logging, : RO memslots and things of that nature. There are however two gotchas: : : - We don't support mapping devices into guests: this requires : additional hypervisor support for tracking the 'state' of devices, : which will come in a later series. No device assignment until then. : : - Stage-2 mappings are forced to page-granularity even when backed by a : huge page for the sake of simplicity of this series. I'm only aiming : at functional parity-ish (from userspace's PoV) for now, support for : HP can be added on top later as a perf improvement." : . KVM: arm64: Plumb the pKVM MMU in KVM KVM: arm64: Introduce the EL1 pKVM MMU KVM: arm64: Introduce __pkvm_tlb_flush_vmid() KVM: arm64: Introduce __pkvm_host_mkyoung_guest() KVM: arm64: Introduce __pkvm_host_test_clear_young_guest() KVM: arm64: Introduce __pkvm_host_wrprotect_guest() KVM: arm64: Introduce __pkvm_host_relax_guest_perms() KVM: arm64: Introduce __pkvm_host_unshare_guest() KVM: arm64: Introduce __pkvm_host_share_guest() KVM: arm64: Introduce __pkvm_vcpu_{load,put}() KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung KVM: arm64: Move host page ownership tracking to the hyp vmemmap KVM: arm64: Make hyp_page::order a u8 KVM: arm64: Move enum pkvm_page_state to memory.h KVM: arm64: Change the layout of enum pkvm_page_state Signed-off-by: Marc Zyngier <maz@kernel.org> # Conflicts: # arch/arm64/kvm/arm.c
2024-12-20KVM: arm64: Introduce __pkvm_tlb_flush_vmid()Quentin Perret
Introduce a new hypercall to flush the TLBs of non-protected guests. The host kernel will be responsible for issuing this hypercall after changing stage-2 permissions using the __pkvm_host_relax_guest_perms() or __pkvm_host_wrprotect_guest() paths. This is left under the host's responsibility for performance reasons. Note however that the TLB maintenance for all *unmap* operations still remains entirely under the hypervisor's responsibility for security reasons -- an unmapped page may be donated to another entity, so a stale TLB entry could be used to leak private data. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20241218194059.3670226-17-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20KVM: arm64: Introduce __pkvm_host_mkyoung_guest()Quentin Perret
Plumb the kvm_pgtable_stage2_mkyoung() callback into pKVM for non-protected guests. It will be called later from the fault handling path. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20241218194059.3670226-16-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()Quentin Perret
Plumb the kvm_stage2_test_clear_young() callback into pKVM for non-protected guest. It will be later be called from MMU notifiers. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20241218194059.3670226-15-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20KVM: arm64: Introduce __pkvm_host_wrprotect_guest()Quentin Perret
Introduce a new hypercall to remove the write permission from a non-protected guest stage-2 mapping. This will be used for e.g. enabling dirty logging. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20241218194059.3670226-14-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20KVM: arm64: Introduce __pkvm_host_relax_guest_perms()Quentin Perret
Introduce a new hypercall allowing the host to relax the stage-2 permissions of mappings in a non-protected guest page-table. It will be used later once we start allowing RO memslots and dirty logging. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20241218194059.3670226-13-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20KVM: arm64: Introduce __pkvm_host_unshare_guest()Quentin Perret
In preparation for letting the host unmap pages from non-protected guests, introduce a new hypercall implementing the host-unshare-guest transition. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20241218194059.3670226-12-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20KVM: arm64: Introduce __pkvm_host_share_guest()Quentin Perret
In preparation for handling guest stage-2 mappings at EL2, introduce a new pKVM hypercall allowing to share pages with non-protected guests. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20241218194059.3670226-11-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20KVM: arm64: Introduce __pkvm_vcpu_{load,put}()Marc Zyngier
Rather than look-up the hyp vCPU on every run hypercall at EL2, introduce a per-CPU 'loaded_hyp_vcpu' tracking variable which is updated by a pair of load/put hypercalls called directly from kvm_arch_vcpu_{load,put}() when pKVM is enabled. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20241218194059.3670226-10-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20KVM: arm64: Get rid of __kvm_get_mdcr_el2() and related wartsOliver Upton
KVM caches MDCR_EL2 on a per-CPU basis in order to preserve the configuration of MDCR_EL2.HPMN while running a guest. This is a bit gross, since we're relying on some baked configuration rather than the hardware definition of implemented counters. Discover the number of implemented counters by reading PMCR_EL0.N instead. This works because: - In VHE the kernel runs at EL2, and N always returns the number of counters implemented in hardware - In {n,h}VHE, the EL2 setup code programs MDCR_EL2.HPMN with the EL2 view of PMCR_EL0.N for the host Lastly, avoid traps under nested virtualization by saving PMCR_EL0.N in host data. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241219224116.3941496-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-11-11Merge branch kvm-arm64/misc into kvmarm/nextOliver Upton
* kvm-arm64/misc: : Miscellaneous updates : : - Drop useless check against vgic state in ICC_CLTR_EL1.SEIS read : emulation : : - Fix trap configuration for pKVM : : - Close the door on initialization bugs surrounding userspace irqchip : static key by removing it. KVM: selftests: Don't bother deleting memslots in KVM when freeing VMs KVM: arm64: Get rid of userspace_irqchip_in_use KVM: arm64: Initialize trap register values in hyp in pKVM KVM: arm64: Initialize the hypervisor's VM state at EL2 KVM: arm64: Refactor kvm_vcpu_enable_ptrauth() for hyp use KVM: arm64: Move pkvm_vcpu_init_traps() to init_pkvm_hyp_vcpu() KVM: arm64: Don't map 'kvm_vgic_global_state' at EL2 with pKVM KVM: arm64: Just advertise SEIS as 0 when emulating ICC_CTLR_EL1 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Move pkvm_vcpu_init_traps() to init_pkvm_hyp_vcpu()Fuad Tabba
Move pkvm_vcpu_init_traps() to the initialization of the hypervisor's vcpu state in init_pkvm_hyp_vcpu(), and remove the associated hypercall. In protected mode, traps need to be initialized whenever a VCPU is initialized anyway, and not only for protected VMs. This also saves an unnecessary hypercall. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-2-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-17KVM: arm64: Shave a few bytes from the EL2 idmap codeMarc Zyngier
Our idmap is becoming too big, to the point where it doesn't fit in a 4kB page anymore. There are some low-hanging fruits though, such as the el2_init_state horror that is expanded 3 times in the kernel. Let's at least limit ourselves to two copies, which makes the kernel link again. At some point, we'll have to have a better way of doing this. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241009204903.GA3353168@thelio-3990X
2024-08-30KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W}Marc Zyngier
On the face of it, AT S12E{0,1}{R,W} is pretty simple. It is the combination of AT S1E{0,1}{R,W}, followed by an extra S2 walk. However, there is a great deal of complexity coming from combining the S1 and S2 attributes to report something consistent in PAR_EL1. This is an absolute mine field, and I have a splitting headache. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: nv: Add basic emulation of AT S1E2{R,W}Marc Zyngier
Similar to our AT S1E{0,1} emulation, we implement the AT S1E2 handling. This emulation of course suffers from the same problems, but is somehow simpler due to the lack of PAN2 and the fact that we are guaranteed to execute it from the correct context. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: nv: Add basic emulation of AT S1E{0,1}{R,W}Marc Zyngier
Emulating AT instructions is one the tasks devolved to the host hypervisor when NV is on. Here, we take the basic approach of emulating AT S1E{0,1}{R,W} using the AT instructions themselves. While this mostly work, it doesn't *always* work: - S1 page tables can be swapped out - shadow S2 can be incomplete and not contain mappings for the S1 page tables We are not trying to handle these case here, and defer it to a later patch. Suitable comments indicate where we are in dire need of better handling. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: Make kvm_at() take an OP_AT_*Joey Gouly
To allow using newer instructions that current assemblers don't know about, replace the `at` instruction with the underlying SYS instruction. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-19KVM: arm64: nv: Add Stage-1 EL2 invalidation primitivesMarc Zyngier
Provide the primitives required to handle TLB invalidation for Stage-1 EL2 TLBs, which by definition do not require messing with the Stage-2 page tables. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-05-01KVM: arm64: Simplify vgic-v3 hypercallsMarc Zyngier
Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore hypercalls so that all of the EL2 GICv3 state is covered by a single pair of hypercalls. Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-17-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-08-31Merge tag 'kvmarm-6.6' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 6.6 - Add support for TLB range invalidation of Stage-2 page tables, avoiding unnecessary invalidations. Systems that do not implement range invalidation still rely on a full invalidation when dealing with large ranges. - Add infrastructure for forwarding traps taken from a L2 guest to the L1 guest, with L0 acting as the dispatcher, another baby step towards the full nested support. - Simplify the way we deal with the (long deprecated) 'CPU target', resulting in a much needed cleanup. - Fix another set of PMU bugs, both on the guest and host sides, as we seem to never have any shortage of those... - Relax the alignment requirements of EL2 VA allocations for non-stack allocations, as we were otherwise wasting a lot of that precious VA space. - The usual set of non-functional cleanups, although I note the lack of spelling fixes...
2023-08-17KVM: arm64: Implement __kvm_tlb_flush_vmid_range()Raghavendra Rao Ananta
Define __kvm_tlb_flush_vmid_range() (for VHE and nVHE) to flush a range of stage-2 page-tables using IPA in one go. If the system supports FEAT_TLBIRANGE, the following patches would conveniently replace global TLBI such as vmalls12e1is in the map, unmap, and dirty-logging paths with ripas2e1is instead. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811045127.3308641-10-rananta@google.com
2023-07-26KVM: arm64: fix __kvm_host_psci_cpu_entry() prototypeArnd Bergmann
The kvm_host_psci_cpu_entry() function was renamed in order to add a wrapper around it, but the prototype did not change, so now the missing-prototype warning came back in W=1 builds: arch/arm64/kvm/hyp/nvhe/psci-relay.c:203:28: error: no previous prototype for function '__kvm_host_psci_cpu_entry' [-Werror,-Wmissing-prototypes] asmlinkage void __noreturn __kvm_host_psci_cpu_entry(bool is_cpu_on) Fixes: dcf89d1111995 ("KVM: arm64: Add missing BTI instructions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230724121850.1386668-1-arnd@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM64: - Eager page splitting optimization for dirty logging, optionally allowing for a VM to avoid the cost of hugepage splitting in the stage-2 fault path. - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with services that live in the Secure world. pKVM intervenes on FF-A calls to guarantee the host doesn't misuse memory donated to the hyp or a pKVM guest. - Support for running the split hypervisor with VHE enabled, known as 'hVHE' mode. This is extremely useful for testing the split hypervisor on VHE-only systems, and paves the way for new use cases that depend on having two TTBRs available at EL2. - Generalized framework for configurable ID registers from userspace. KVM/arm64 currently prevents arbitrary CPU feature set configuration from userspace, but the intent is to relax this limitation and allow userspace to select a feature set consistent with the CPU. - Enable the use of Branch Target Identification (FEAT_BTI) in the hypervisor. - Use a separate set of pointer authentication keys for the hypervisor when running in protected mode, as the host is untrusted at runtime. - Ensure timer IRQs are consistently released in the init failure paths. - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps (FEAT_EVT), as it is a register commonly read from userspace. - Erratum workaround for the upcoming AmpereOne part, which has broken hardware A/D state management. RISC-V: - Redirect AMO load/store misaligned traps to KVM guest - Trap-n-emulate AIA in-kernel irqchip for KVM guest - Svnapot support for KVM Guest s390: - New uvdevice secret API - CMM selftest and fixes - fix racy access to target CPU for diag 9c x86: - Fix missing/incorrect #GP checks on ENCLS - Use standard mmu_notifier hooks for handling APIC access page - Drop now unnecessary TR/TSS load after VM-Exit on AMD - Print more descriptive information about the status of SEV and SEV-ES during module load - Add a test for splitting and reconstituting hugepages during and after dirty logging - Add support for CPU pinning in demand paging test - Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes included along the way - Add a "nx_huge_pages=never" option to effectively avoid creating NX hugepage recovery threads (because nx_huge_pages=off can be toggled at runtime) - Move handling of PAT out of MTRR code and dedup SVM+VMX code - Fix output of PIC poll command emulation when there's an interrupt - Add a maintainer's handbook to document KVM x86 processes, preferred coding style, testing expectations, etc. - Misc cleanups, fixes and comments Generic: - Miscellaneous bugfixes and cleanups Selftests: - Generate dependency files so that partial rebuilds work as expected" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits) Documentation/process: Add a maintainer handbook for KVM x86 Documentation/process: Add a label for the tip tree handbook's coding style KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index RISC-V: KVM: Remove unneeded semicolon RISC-V: KVM: Allow Svnapot extension for Guest/VM riscv: kvm: define vcpu_sbi_ext_pmu in header RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip RISC-V: KVM: Add in-kernel emulation of AIA APLIC RISC-V: KVM: Implement device interface for AIA irqchip RISC-V: KVM: Skeletal in-kernel AIA irqchip support RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero RISC-V: KVM: Add APLIC related defines RISC-V: KVM: Add IMSIC related defines RISC-V: KVM: Implement guest external interrupt line management KVM: x86: Remove PRIx* definitions as they are solely for user space s390/uv: Update query for secret-UVCs s390/uv: replace scnprintf with sysfs_emit s390/uvdevice: Add 'Lock Secret Store' UVC ...
2023-05-25arm64: kvm: add prototypes for functions called in asmArnd Bergmann
A lot of kvm specific functions are called only from assembler and have no extern prototype, but that causes a W=1 warnings: arch/arm64/kvm/handle_exit.c:365:24: error: no previous prototype for 'nvhe_hyp_panic_handler' [-Werror=missing-prototypes] arch/arm64/kvm/va_layout.c:188:6: error: no previous prototype for 'kvm_patch_vector_branch' [-Werror=mi ssing-prototypes] arch/arm64/kvm/va_layout.c:287:6: error: no previous prototype for 'kvm_get_kimage_voffset' [-Werror=mis sing-prototypes] arch/arm64/kvm/va_layout.c:293:6: error: no previous prototype for 'kvm_compute_final_ctr_el0' [-Werror= missing-prototypes] arch/arm64/kvm/hyp/vhe/switch.c:259:17: error: no previous prototype for 'hyp_panic' [-Werror=missing-pr arch/arm64/kvm/hyp/nvhe/switch.c:389:17: error: no previous prototype for 'kvm_unexpected_el2_exception' arch/arm64/kvm/hyp/nvhe/switch.c:384:28: error: no previous prototype for 'hyp_panic_bad_stack' [-Werror arch/arm64/kvm/hyp/nvhe/hyp-main.c:383:6: error: no previous prototype for 'handle_trap' [-Werror=missin arch/arm64/kvm/hyp/nvhe/psci-relay.c:203:28: error: no previous prototype for 'kvm_host_psci_cpu_entry' [-Werror=missing-prototypes] Declare those in asm/kvm_asm.h, which already has related declarations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20230516160642.523862-7-arnd@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-05-16KVM: arm64: Use local TLBI on permission relaxationMarc Zyngier
Broadcast TLB invalidations (TLBIs) targeting the Inner Shareable Domain are usually less performant than their non-shareable variant. In particular, we observed some implementations that take millliseconds to complete parallel broadcasted TLBIs. It's safe to use non-shareable TLBIs when relaxing permissions on a PTE in the KVM case. According to the ARM ARM (0487I.a) section D8.13.1 "Using break-before-make when updating translation table entries", permission relaxation does not need break-before-make. Specifically, R_WHZWS states that these are the only changes that require a break-before-make sequence: changes of memory type (Shareability or Cacheability), address changes, or changing the block size. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20230426172330.1439644-13-ricarkol@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2022-11-11KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the hostQuentin Perret
When pKVM is enabled, the hypervisor at EL2 does not trust the host at EL1 and must therefore prevent it from having unrestricted access to internal hypervisor state. The 'kvm_arm_hyp_percpu_base' array holds the offsets for hypervisor per-cpu allocations, so move this this into the nVHE code where it cannot be modified by the untrusted host at EL1. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-22-will@kernel.org
2022-11-11KVM: arm64: Add infrastructure to create and track pKVM instances at EL2Fuad Tabba
Introduce a global table (and lock) to track pKVM instances at EL2, and provide hypercalls that can be used by the untrusted host to create and destroy pKVM VMs and their vCPUs. pKVM VM/vCPU state is directly accessible only by the trusted hypervisor (EL2). Each pKVM VM is directly associated with an untrusted host KVM instance, and is referenced by the host using an opaque handle. Future patches will provide hypercalls to allow the host to initialize/set/get pKVM VM/vCPU state using the opaque handle. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Co-developed-by: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> [maz: silence warning on unmap_donated_memory_noclear()] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-13-will@kernel.org
2022-07-26KVM: arm64: Prepare non-protected nVHE hypervisor stacktraceKalesh Singh
In non-protected nVHE mode (non-pKVM) the host can directly access hypervisor memory; and unwinding of the hypervisor stacktrace is done from EL1 to save on memory for shared buffers. To unwind the hypervisor stack from EL1 the host needs to know the starting point for the unwind and information that will allow it to translate hypervisor stack addresses to the corresponding kernel addresses. This patch sets up this book keeping. It is made use of later in the series. Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220726073750.3219117-10-kaleshsingh@google.com
2022-04-28KVM: arm64: Add guard pages for KVM nVHE hypervisor stackKalesh Singh
Map the stack pages in the flexible private VA range and allocate guard pages below the stack as unbacked VA space. The stack is aligned so that any valid stack address has PAGE_SHIFT bit as 1 - this is used for overflow detection (implemented in a subsequent patch in the series). Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220420214317.3303360-4-kaleshsingh@google.com
2021-12-16KVM: arm64: Expose unshare hypercall to the hostWill Deacon
Introduce an unshare hypercall which can be used to unmap memory from the hypervisor stage-1 in nVHE protected mode. This will be useful to update the EL2 ownership state of pages during guest teardown, and avoids keeping dangling mappings to unreferenced portions of memory. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-14-qperret@google.com
2021-11-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us RISC-V: - New KVM port. x86: - New API to control TSC offset from userspace - TSC scaling for nested hypervisors on SVM - Switch masterclock protection from raw_spin_lock to seqcount - Clean up function prototypes in the page fault code and avoid repeated memslot lookups - Convey the exit reason to userspace on emulation failure - Configure time between NX page recovery iterations - Expose Predictive Store Forwarding Disable CPUID leaf - Allocate page tracking data structures lazily (if the i915 KVM-GT functionality is not compiled in) - Cleanups, fixes and optimizations for the shadow MMU code s390: - SIGP Fixes - initial preparations for lazy destroy of secure VMs - storage key improvements/fixes - Log the guest CPNC Starting from this release, KVM-PPC patches will come from Michael Ellerman's PPC tree" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) RISC-V: KVM: fix boolreturn.cocci warnings RISC-V: KVM: remove unneeded semicolon RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions RISC-V: KVM: Factor-out FP virtualization into separate sources KVM: s390: add debug statement for diag 318 CPNC data KVM: s390: pv: properly handle page flags for protected guests KVM: s390: Fix handle_sske page fault handling KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol KVM: x86: On emulation failure, convey the exit reason, etc. to userspace KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info KVM: x86: Clarify the kvm_run.emulation_failure structure layout KVM: s390: Add a routine for setting userspace CPU state KVM: s390: Simplify SIGP Set Arch handling KVM: s390: pv: avoid stalls when making pages secure KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm KVM: s390: pv: avoid double free of sida page KVM: s390: pv: add macros for UVC CC values s390/mm: optimize reset_guest_reference_bit() s390/mm: optimize set_guest_storage_key() s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present ...
2021-10-21arm64: kvm: use kvm_exception_table_entryMark Rutland
In subsequent patches we'll alter `struct exception_table_entry`, adding fields that are not needed for KVM exception fixups. In preparation for this, migrate KVM to its own `struct kvm_exception_table_entry`, which is identical to the current format of `struct exception_table_entry`. Comments are updated accordingly. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211019160219.5202-5-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-18Merge branch kvm-arm64/pkvm/fixed-features into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm/fixed-features: (22 commits) : . : Add the pKVM fixed feature that allows a bunch of exceptions : to either be forbidden or be easily handled at EL2. : . KVM: arm64: pkvm: Give priority to standard traps over pvm handling KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array() KVM: arm64: pkvm: Move kvm_handle_pvm_restricted around KVM: arm64: pkvm: Consolidate include files KVM: arm64: pkvm: Preserve pending SError on exit from AArch32 KVM: arm64: pkvm: Handle GICv3 traps as required KVM: arm64: pkvm: Drop sysregs that should never be routed to the host KVM: arm64: pkvm: Drop AArch32-specific registers KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WI KVM: arm64: pkvm: Use a single function to expose all id-regs KVM: arm64: Fix early exit ptrauth handling KVM: arm64: Handle protected guests at 32 bits KVM: arm64: Trap access to pVM restricted features KVM: arm64: Move sanitized copies of CPU features KVM: arm64: Initialize trap registers for protected VMs KVM: arm64: Add handlers for protected VM System Registers KVM: arm64: Simplify masking out MTE in feature id reg KVM: arm64: Add missing field descriptor for MDCR_EL2 KVM: arm64: Pass struct kvm to per-EC handlers KVM: arm64: Move early handlers to per-EC handlers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-11KVM: arm64: Initialize trap registers for protected VMsFuad Tabba
Protected VMs have more restricted features that need to be trapped. Moreover, the host should not be trusted to set the appropriate trapping registers and their values. Initialize the trapping registers, i.e., hcr_el2, mdcr_el2, and cptr_el2 at EL2 for protected guests, based on the values of the guest's feature id registers. No functional change intended as trap handlers introduced in the previous patch are still not hooked in to the guest exit handlers. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-9-tabba@google.com
2021-10-11KVM: arm64: Disable privileged hypercalls after pKVM finalisationWill Deacon
After pKVM has been 'finalised' using the __pkvm_prot_finalize hypercall, the calling CPU will have a Stage-2 translation enabled to prevent access to memory pages owned by EL2. Although this forms a significant part of the process to deprivilege the host kernel, we also need to ensure that the hypercall interface is reduced so that the EL2 code cannot, for example, be re-initialised using a new set of vectors. Re-order the hypercalls so that only a suffix remains available after finalisation of pKVM. Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211008135839.1193-7-will@kernel.org
2021-10-11KVM: arm64: Turn __KVM_HOST_SMCCC_FUNC_* into an enum (mostly)Marc Zyngier
__KVM_HOST_SMCCC_FUNC_* is a royal pain, as there is a fair amount of churn around these #defines, and we avoid making it an enum only for the sake of the early init, low level code that requires __KVM_HOST_SMCCC_FUNC___kvm_hyp_init to be usable from assembly. Let's be brave and turn everything but this symbol into an enum, using a bit of arithmetic to avoid any overlap. Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/877depq9gw.wl-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211008135839.1193-2-will@kernel.org
2021-08-20Merge branch kvm-arm64/pkvm-fixed-features-prologue into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-fixed-features-prologue: : Rework a bunch of common infrastructure as a prologue : to Fuad Tabba's protected VM fixed feature series. KVM: arm64: Upgrade trace_kvm_arm_set_dreg32() to 64bit KVM: arm64: Add config register bit definitions KVM: arm64: Add feature register flag definitions KVM: arm64: Track value of cptr_el2 in struct kvm_vcpu_arch KVM: arm64: Keep mdcr_el2's value as set by __init_el2_debug KVM: arm64: Restore mdcr_el2 from vcpu KVM: arm64: Refactor sys_regs.h,c for nVHE reuse KVM: arm64: Fix names of config register fields KVM: arm64: MDCR_EL2 is a 64-bit register KVM: arm64: Remove trailing whitespace in comment KVM: arm64: placeholder to check if VM is protected Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20KVM: arm64: MDCR_EL2 is a 64-bit registerFuad Tabba
Fix the places in KVM that treat MDCR_EL2 as a 32-bit register. More recent features (e.g., FEAT_SPEv1p2) use bits above 31. No functional change intended. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210817081134.2918285-4-tabba@google.com
2021-08-11KVM: arm64: Restrict EL2 stage-1 changes in protected modeQuentin Perret
The host kernel is currently able to change EL2 stage-1 mappings without restrictions thanks to the __pkvm_create_mappings() hypercall. But in a world where the host is no longer part of the TCB, this clearly poses a problem. To fix this, introduce a new hypercall to allow the host to share a physical memory page with the hypervisor, and remove the __pkvm_create_mappings() variant. The new hypercall implements ownership and permission checks before allowing the sharing operation, and it annotates the shared page in the hypervisor stage-1 and host stage-2 page-tables. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-21-qperret@google.com
2021-08-11KVM: arm64: Remove __pkvm_mark_hypQuentin Perret
Now that we mark memory owned by the hypervisor in the host stage-2 during __pkvm_init(), we no longer need to rely on the host to explicitly mark the hyp sections later on. Remove the __pkvm_mark_hyp() hypercall altogether. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-19-qperret@google.com
2021-06-28Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "There's a reasonable amount here and the juicy details are all below. It's worth noting that the MTE/KASAN changes strayed outside of our usual directories due to core mm changes and some associated changes to some other architectures; Andrew asked for us to carry these [1] rather that take them via the -mm tree. Summary: - Optimise SVE switching for CPUs with 128-bit implementations. - Fix output format from SVE selftest. - Add support for versions v1.2 and 1.3 of the SMC calling convention. - Allow Pointer Authentication to be configured independently for kernel and userspace. - PMU driver cleanups for managing IRQ affinity and exposing event attributes via sysfs. - KASAN optimisations for both hardware tagging (MTE) and out-of-line software tagging implementations. - Relax frame record alignment requirements to facilitate 8-byte alignment with KASAN and Clang. - Cleanup of page-table definitions and removal of unused memory types. - Reduction of ARCH_DMA_MINALIGN back to 64 bytes. - Refactoring of our instruction decoding routines and addition of some missing encodings. - Move entry code moved into C and hardened against harmful compiler instrumentation. - Update booting requirements for the FEAT_HCX feature, added to v8.7 of the architecture. - Fix resume from idle when pNMI is being used. - Additional CPU sanity checks for MTE and preparatory changes for systems where not all of the CPUs support 32-bit EL0. - Update our kernel string routines to the latest Cortex Strings implementation. - Big cleanup of our cache maintenance routines, which were confusingly named and inconsistent in their implementations. - Tweak linker flags so that GDB can understand vmlinux when using RELR relocations. - Boot path cleanups to enable early initialisation of per-cpu operations needed by KCSAN. - Non-critical fixes and miscellaneous cleanup" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (150 commits) arm64: tlb: fix the TTL value of tlb_get_level arm64: Restrict undef hook for cpufeature registers arm64/mm: Rename ARM64_SWAPPER_USES_SECTION_MAPS arm64: insn: avoid circular include dependency arm64: smp: Bump debugging information print down to KERN_DEBUG drivers/perf: fix the missed ida_simple_remove() in ddr_perf_probe() perf/arm-cmn: Fix invalid pointer when access dtc object sharing the same IRQ number arm64: suspend: Use cpuidle context helpers in cpu_suspend() PSCI: Use cpuidle context helpers in psci_cpu_suspend_enter() arm64: Convert cpu_do_idle() to using cpuidle context helpers arm64: Add cpuidle context save/restore helpers arm64: head: fix code comments in set_cpu_boot_mode_flag arm64: mm: drop unused __pa(__idmap_text_start) arm64: mm: fix the count comments in compute_indices arm64/mm: Fix ttbr0 values stored in struct thread_info for software-pan arm64: mm: Pass original fault address to handle_mm_fault() arm64/mm: Drop SECTION_[SHIFT|SIZE|MASK] arm64/mm: Use CONT_PMD_SHIFT for ARM64_MEMSTART_SHIFT arm64/mm: Drop SWAPPER_INIT_MAP_SIZE arm64: Conditionally configure PTR_AUTH key of the kernel. ...
2021-06-11arm64: insn: move AARCH64_INSN_SIZE into <asm/insn.h>Mark Rutland
For histroical reasons, we define AARCH64_INSN_SIZE in <asm/alternative-macros.h>, but it would make more sense to do so in <asm/insn.h>. Let's move it into <asm/insn.h>, and add the necessary include directives for this. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210609102301.17332-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-15KVM: arm64: Commit pending PC adjustemnts before returning to userspaceMarc Zyngier
KVM currently updates PC (and the corresponding exception state) using a two phase approach: first by setting a set of flags, then by converting these flags into a state update when the vcpu is about to enter the guest. However, this creates a disconnect with userspace if the vcpu thread returns there with any exception/PC flag set. In this case, the exposed context is wrong, as userspace doesn't have access to these flags (they aren't architectural). It also means that these flags are preserved across a reset, which isn't expected. To solve this problem, force an explicit synchronisation of the exception state on vcpu exit to userspace. As an optimisation for nVHE systems, only perform this when there is something pending. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org # 5.11
2021-05-15KVM: arm64: Move __adjust_pc out of lineMarc Zyngier
In order to make it easy to call __adjust_pc() from the EL1 code (in the case of nVHE), rename it to __kvm_adjust_pc() and move it out of line. No expected functional change. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org # 5.11
2021-03-19KVM: arm64: Protect the .hyp sections from the hostQuentin Perret
When KVM runs in nVHE protected mode, use the host stage 2 to unmap the hypervisor sections by marking them as owned by the hypervisor itself. The long-term goal is to ensure the EL2 code can remain robust regardless of the host's state, so this starts by making sure the host cannot e.g. write to the .hyp sections directly. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-39-qperret@google.com
2021-03-19KVM: arm64: Wrap the host with a stage 2Quentin Perret
When KVM runs in protected nVHE mode, make use of a stage 2 page-table to give the hypervisor some control over the host memory accesses. The host stage 2 is created lazily using large block mappings if possible, and will default to page mappings in absence of a better solution. >From this point on, memory accesses from the host to protected memory regions (e.g. not 'owned' by the host) are fatal and lead to hyp_panic(). Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-36-qperret@google.com
2021-03-19KVM: arm64: Set host stage 2 using kvm_nvhe_init_paramsQuentin Perret
Move the registers relevant to host stage 2 enablement to kvm_nvhe_init_params to prepare the ground for enabling it in later patches. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-22-qperret@google.com
2021-03-19KVM: arm64: Prepare the creation of s1 mappings at EL2Quentin Perret
When memory protection is enabled, the EL2 code needs the ability to create and manage its own page-table. To do so, introduce a new set of hypercalls to bootstrap a memory management system at EL2. This leads to the following boot flow in nVHE Protected mode: 1. the host allocates memory for the hypervisor very early on, using the memblock API; 2. the host creates a set of stage 1 page-table for EL2, installs the EL2 vectors, and issues the __pkvm_init hypercall; 3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table and stores it in the memory pool provided by the host; 4. the hypervisor then extends its stage 1 mappings to include a vmemmap in the EL2 VA space, hence allowing to use the buddy allocator introduced in a previous patch; 5. the hypervisor jumps back in the idmap page, switches from the host-provided page-table to the new one, and wraps up its initialization by enabling the new allocator, before returning to the host. 6. the host can free the now unused page-table created for EL2, and will now need to issue hypercalls to make changes to the EL2 stage 1 mappings instead of modifying them directly. Note that for the sake of simplifying the review, this patch focuses on the hypervisor side of things. In other words, this only implements the new hypercalls, but does not make use of them from the host yet. The host-side changes will follow in a subsequent patch. Credits to Will for __pkvm_init_switch_pgd. Acked-by: Will Deacon <will@kernel.org> Co-authored-by: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-18-qperret@google.com
2021-03-09KVM: arm64: Ensure I-cache isolation between vcpus of a same VMMarc Zyngier
It recently became apparent that the ARMv8 architecture has interesting rules regarding attributes being used when fetching instructions if the MMU is off at Stage-1. In this situation, the CPU is allowed to fetch from the PoC and allocate into the I-cache (unless the memory is mapped with the XN attribute at Stage-2). If we transpose this to vcpus sharing a single physical CPU, it is possible for a vcpu running with its MMU off to influence another vcpu running with its MMU on, as the latter is expected to fetch from the PoU (and self-patching code doesn't flush below that level). In order to solve this, reuse the vcpu-private TLB invalidation code to apply the same policy to the I-cache, nuking it every time the vcpu runs on a physical CPU that ran another vcpu of the same VM in the past. This involve renaming __kvm_tlb_flush_local_vmid() to __kvm_flush_cpu_context(), and inserting a local i-cache invalidation there. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210303164505.68492-1-maz@kernel.org
2021-03-06KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()Marc Zyngier
As we are about to report a bit more information to the rest of the kernel, rename __vgic_v3_get_ich_vtr_el2() to the more explicit __vgic_v3_get_gic_config(). No functional change. Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20210305185254.3730990-7-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>