summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/hyp/nvhe
AgeCommit message (Collapse)Author
2025-06-26KVM: arm64: Adjust range correctly during host stage-2 faultsQuentin Perret
host_stage2_adjust_range() tries to find the largest block mapping that fits within a memory or mmio region (represented by a kvm_mem_range in this function) during host stage-2 faults under pKVM. To do so, it walks the host stage-2 page-table, finds the faulting PTE and its level, and then progressively increments the level until it finds a granule of the appropriate size. However, the condition in the loop implementing the above is broken as it checks kvm_level_supports_block_mapping() for the next level instead of the current, so pKVM may attempt to map a region larger than can be covered with a single block. This is not a security problem and is quite rare in practice (the kvm_mem_range check usually forces host_stage2_adjust_range() to choose a smaller granule), but this is clearly not the expected behaviour. Refactor the loop to fix the bug and improve readability. Fixes: c4f0935e4d95 ("KVM: arm64: Optimize host memory aborts") Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250625105548.984572-1-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19KVM: arm64: Remove ad-hoc CPTR manipulation from fpsimd_sve_sync()Mark Rutland
There's no need for fpsimd_sve_sync() to write to CPTR/CPACR. All relevant traps are always disabled earlier within __kvm_vcpu_run(), when __deactivate_cptr_traps() configures CPTR/CPACR. With irrelevant details elided, the flow is: handle___kvm_vcpu_run(...) { flush_hyp_vcpu(...) { fpsimd_sve_flush(...); } __kvm_vcpu_run(...) { __activate_traps(...) { __activate_cptr_traps(...); } do { __guest_enter(...); } while (...); __deactivate_traps(....) { __deactivate_cptr_traps(...); } } sync_hyp_vcpu(...) { fpsimd_sve_sync(...); } } Remove the unnecessary write to CPTR/CPACR. An ISB is still necessary, so a comment is added to describe this requirement. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250617133718.4014181-5-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19KVM: arm64: Reorganise CPTR trap manipulationMark Rutland
The NVHE/HVHE and VHE modes have separate implementations of __activate_cptr_traps() and __deactivate_cptr_traps() in their respective switch.c files. There's some duplication of logic, and it's not currently possible to reuse this logic elsewhere. Move the logic into the common switch.h header so that it can be reused, and de-duplicate the common logic. This rework changes the way SVE traps are deactivated in VHE mode, aligning it with NVHE/HVHE modes: * Before this patch, VHE's __deactivate_cptr_traps() would unconditionally enable SVE for host EL2 (but not EL0), regardless of whether the ARM64_SVE cpucap was set. * After this patch, VHE's __deactivate_cptr_traps() will take the ARM64_SVE cpucap into account. When ARM64_SVE is not set, SVE will be trapped from EL2 and below. The old and new behaviour are both benign: * When ARM64_SVE is not set, the host will not touch SVE state, and will not reconfigure SVE traps. Host EL0 access to SVE will be trapped as expected. * When ARM64_SVE is set, the host will configure EL0 SVE traps before returning to EL0 as part of reloading the EL0 FPSIMD/SVE/SME state. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250617133718.4014181-4-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-11Merge tag 'kvmarm-fixes-6.16-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.16, take #2 - Rework of system register accessors for system registers that are directly writen to memory, so that sanitisation of the in-memory value happens at the correct time (after the read, or before the write). For convenience, RMW-style accessors are also provided. - Multiple fixes for the so-called "arch-timer-edge-cases' selftest, which was always broken.
2025-06-05KVM: arm64: Add assignment-specific sysreg accessorMarc Zyngier
Assigning a value to a system register doesn't do what it is supposed to be doing if that register is one that has RESx bits. The main problem is that we use __vcpu_sys_reg(), which can be used both as a lvalue and rvalue. When used as a lvalue, the bit masking occurs *before* the new value is assigned, meaning that we (1) do pointless work on the old cvalue, and (2) potentially assign an invalid value as we fail to apply the masks to it. Fix this by providing a new __vcpu_assign_sys_reg() that does what it says on the tin, and sanitises the *new* value instead of the old one. This comes with a significant amount of churn. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-26Merge tag 'kvmarm-6.16' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.16 * New features: - Add large stage-2 mapping support for non-protected pKVM guests, clawing back some performance. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Enable nested virtualisation support on systems that support it (yes, it has been a long time coming), though it is disabled by default. * Improvements, fixes and cleanups: - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. This ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups.
2025-05-23Merge branch kvm-arm64/misc-6.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-6.16: : . : Misc changes and improvements for 6.16: : : - Add a new selftest for the SVE host state being corrupted by a guest : : - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, : ensuring that the window for interrupts is slightly bigger, and avoiding : a pretty bad erratum on the AmpereOne HW : : - Replace a couple of open-coded on/off strings with str_on_off() : : - Get rid of the pKVM memblock sorting, which now appears to be superflous : : - Drop superflous clearing of ICH_LR_EOI in the LR when nesting : : - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from : a pretty bad case of TLB corruption unless accesses to HCR_EL2 are : heavily synchronised : : - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables : in a human-friendly fashion : . KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables arm64: errata: Work around AmpereOne's erratum AC04_CPU_23 KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1 KVM: arm64: Drop sort_memblock_regions() KVM: arm64: selftests: Add test for SVE host corruption KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Replace ternary flags with str_on_off() helper Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/fgt-masks into kvmarm-master/nextMarc Zyngier
* kvm-arm64/fgt-masks: (43 commits) : . : Large rework of the way KVM deals with trap bits in conjunction with : the CPU feature registers. It now draws a direct link between which : the feature set, the system registers that need to UNDEF to match : the configuration and bits that need to behave as RES0 or RES1 in : the trap registers that are visible to the guest. : : Best of all, these definitions are mostly automatically generated : from the JSON description published by ARM under a permissive : license. : . KVM: arm64: Handle TSB CSYNC traps KVM: arm64: Add FGT descriptors for FEAT_FGT2 KVM: arm64: Allow sysreg ranges for FGT descriptors KVM: arm64: Add context-switch for FEAT_FGT2 registers KVM: arm64: Add trap routing for FEAT_FGT2 registers KVM: arm64: Add sanitisation for FEAT_FGT2 registers KVM: arm64: Add FEAT_FGT2 registers to the VNCR page KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits KVM: arm64: Allow kvm_has_feat() to take variable arguments KVM: arm64: Use FGT feature maps to drive RES0 bits KVM: arm64: Validate FGT register descriptions against RES0 masks KVM: arm64: Switch to table-driven FGU configuration KVM: arm64: Handle PSB CSYNC traps KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask KVM: arm64: Remove hand-crafted masks for FGT registers KVM: arm64: Use computed FGT masks to setup FGT registers KVM: arm64: Propagate FGT masks to the nVHE hypervisor KVM: arm64: Unconditionally configure fine-grain traps KVM: arm64: Use computed masks as sanitisers for FGT registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/ubsan-el2: : . : Add UBSAN support to the EL2 portion of KVM, reusing most of the : existing logic provided by CONFIG_IBSAN_TRAP. : : Patches courtesy of Mostafa Saleh. : . KVM: arm64: Handle UBSAN faults KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2 ubsan: Remove regs from report_ubsan_failure() arm64: Introduce esr_is_ubsan_brk() Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: np-guest CMOs with PMD_SIZE fixmapVincent Donnefort
With the introduction of stage-2 huge mappings in the pKVM hypervisor, guest pages CMO is needed for PMD_SIZE size. Fixmap only supports PAGE_SIZE and iterating over the huge-page is time consuming (mostly due to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency. Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap) to improve guest page CMOs when stage-2 huge mappings are installed. On a Pixel6, the iterative solution resulted in a latency of ~700us, while the PMD_SIZE fixmap reduces it to ~100us. Because of the horrendous private range allocation that would be necessary, this is disabled for 64KiB pages systems. Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Stage-2 huge mappings for np-guestsVincent Donnefort
Now np-guests hypercalls with range are supported, we can let the hypervisor to install block mappings whenever the Stage-1 allows it, that is when backed by either Hugetlbfs or THPs. The size of those block mappings is limited to PMD_SIZE. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-10-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-7-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-6-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_unshare_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-5-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_share_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_share_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-4-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Introduce for_each_hyp_pageVincent Donnefort
Add a helper to iterate over the hypervisor vmemmap. This will be particularly handy with the introduction of huge mapping support for the np-guest stage-2. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-3-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Handle huge mappings for np-guest CMOsVincent Donnefort
clean_dcache_guest_page() and invalidate_icache_guest_page() accept a size as an argument. But they also rely on fixmap, which can only map a single PAGE_SIZE page. With the upcoming stage-2 huge mappings for pKVM np-guests, those callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis until the whole range is done. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-2-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21Merge branch kvm-arm64/pkvm-selftest-6.16 into kvm-arm64/pkvm-np-thp-6.16Marc Zyngier
* kvm-arm64/pkvm-selftest-6.16: : . : pKVM selftests covering the memory ownership transitions by : Quentin Perret. From the initial cover letter: : : "We have recently found a bug [1] in the pKVM memory ownership : transitions by code inspection, but it could have been caught with a : test. : : Introduce a boot-time selftest exercising all the known pKVM memory : transitions and importantly checks the rejection of illegal transitions. : : The new test is hidden behind a new Kconfig option separate from : CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the : transition checks ([1] doesn't reproduce with EL2 debug enabled). : : [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/" : . KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19arm64: errata: Work around AmpereOne's erratum AC04_CPU_23D Scott Phillips
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous translations for data addresses initiated by load/store instructions. Only instruction initiated translations are vulnerable, not translations from prefetches for example. A DSB before the store to HCR_EL2 is sufficient to prevent older instructions from hitting the window for corruption, and an ISB after is sufficient to prevent younger instructions from hitting the window for corruption. Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add sanitisation for FEAT_FGT2 registersMarc Zyngier
Just like the FEAT_FGT registers, treat the FGT2 variant the same way. THis is a large update, but a fairly mechanical one. The config dependencies are extracted from the 2025-03 JSON drop. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-07KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2Mostafa Saleh
Add a new Kconfig CONFIG_UBSAN_KVM_EL2 for KVM which enables UBSAN for EL2 code (in protected/nvhe/hvhe) modes. This will re-use the same checks enabled for the kernel for the hypervisor. The only difference is that for EL2 it always emits a "brk" instead of implementing hooks as the hypervisor can't print reports. The KVM code will re-use the same code for the kernel "report_ubsan_failure()" so #ifdefs are changed to also have this code for CONFIG_UBSAN_KVM_EL2 Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250430162713.1997569-4-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-07KVM: arm64: Fix memory check in host_stage2_set_owner_locked()Mostafa Saleh
I found this simple bug while preparing some patches for pKVM. AFAICT, it should be harmless (besides crashing the kernel if it was misbehaving) Fixes: e94a7dea2972 ("KVM: arm64: Move host page ownership tracking to the hyp vmemmap") Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20250501162450.2784043-1-smostafa@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-06KVM: arm64: Propagate FGT masks to the nVHE hypervisorMarc Zyngier
The nVHE hypervisor needs to have access to its own view of the FGT masks, which unfortunately results in a bit of data duplication. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Extend pKVM selftest for np-guestsQuentin Perret
The pKVM selftest intends to test as many memory 'transitions' as possible, so extend it to cover sharing pages with non-protected guests, including in the case of multi-sharing. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-5-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Selftest for pKVM transitionsQuentin Perret
We have recently found a bug [1] in the pKVM memory ownership transitions by code inspection, but it could have been caught with a test. Introduce a boot-time selftest exercising all the known pKVM memory transitions and importantly checks the rejection of illegal transitions. The new test is hidden behind a new Kconfig option separate from CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the transition checks ([1] doesn't reproduce with EL2 debug enabled). [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/ Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-4-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Don't WARN from __pkvm_host_share_guest()Quentin Perret
We currently WARN() if the host attempts to share a page that is not in an acceptable state with a guest. This isn't strictly necessary and makes testing much harder, so drop the WARN and make sure to propage the error code instead. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Add .hyp.data sectionDavid Brazdil
The hypervisor has not needed its own .data section because all globals were either .rodata or .bss. To avoid having to initialize future data-structures at run-time, let's introduce add a .data section to the hypervisor. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Unconditionally cross check hyp stateQuentin Perret
Now that the hypervisor's state is stored in the hyp_vmemmap, we no longer need an expensive page-table walk to read it. This means we can now afford to cross check the hyp-state during all memory ownership transitions where the hyp is involved unconditionally, hence avoiding problems such as [1]. [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/ Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-8-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Defer EL2 stage-1 mapping on shareQuentin Perret
We currently blindly map into EL2 stage-1 *any* page passed to the __pkvm_host_share_hyp() HVC. This is less than ideal from a security perspective as it makes exploitation of potential hypervisor gadgets easier than it should be. But interestingly, pKVM should never need to access SHARED_BORROWED pages that it hasn't previously pinned, so there is no need to map the page before that. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-7-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Move hyp state to hyp_vmemmapQuentin Perret
Tracking the hypervisor's ownership state into struct hyp_page has several benefits, including allowing far more efficient lookups (no page-table walk needed) and de-corelating the state from the presence of a mapping. This will later allow to map pages into EL2 stage-1 less proactively which is generally a good thing for security. And in the future this will help with tracking the state of pages mapped into the hypervisor's private range without requiring an alias into the 'linear map' range. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-6-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Introduce {get,set}_host_state() helpersQuentin Perret
Instead of directly accessing the host_state member in struct hyp_page, introduce static inline accessors to do it. The future hyp_state member will follow the same pattern as it will need some logic in the accessors. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-5-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Track SVE state in the hypervisor vcpu structureFuad Tabba
When dealing with a guest with SVE enabled, make sure the host SVE state is pinned at EL2 S1, and that the hypervisor vCPU state is correctly initialised (and then unpinned on teardown). Co-authored-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-07KVM: arm64: Use acquire/release to communicate FF-A version negotiationWill Deacon
The pKVM FF-A proxy rejects FF-A requests other than FFA_VERSION until version negotiation is complete, which is signalled by setting the global 'has_version_negotiated' variable. To avoid excessive locking, this variable is checked directly from kvm_host_ffa_handler() in response to an FF-A call, but this can race against another CPU performing the negotiation and potentially lead to reading a torn value (incredibly unlikely for a 'bool') or problematic re-ordering of the accesses to 'has_version_negotiated' and 'hyp_ffa_version' whereby a stale version number could be read by __do_ffa_mem_xfer(). Use acquire/release primitives when writing 'has_version_negotiated' with the version lock held and when reading without the lock held. Cc: Sebastian Ene <sebastianene@google.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Quentin Perret <qperret@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Marc Zyngier <maz@kernel.org> Fixes: c9c012625e12 ("KVM: arm64: Trap FFA_VERSION host call in pKVM") Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250407152755.1041-1-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-03KVM: arm64: Don't translate FAR if invalid/unsafeOliver Upton
Don't re-walk the page tables if an SEA occurred during the faulting page table walk to avoid taking a fatal exception in the hyp. Additionally, check that FAR_EL2 is valid for SEAs not taken on PTW as the architecture doesn't guarantee it contains the fault VA. Finally, fix up the rest of the abort path by checking for SEAs early and bugging the VM if we get further along with an UNKNOWN fault IPA. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250402201725.2963645-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-03arm64: Convert HPFAR_EL2 to sysreg tableOliver Upton
Switch over to the typical sysreg table for HPFAR_EL2 as we're about to start using more fields in the register. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250402201725.2963645-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/pkvm-6.15' into kvmarm/nextOliver Upton
* kvm-arm64/pkvm-6.15: : pKVM updates for 6.15 : : - SecPageTable stats for stage-2 table pages allocated by the protected : hypervisor (Vincent Donnefort) : : - HCRX_EL2 trap + vCPU initialization fixes for pKVM (Fuad Tabba) KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: arm64: Count pKVM stage-2 usage in secondary pagetable stats KVM: arm64: Distinct pKVM teardown memcache for stage-2 KVM: arm64: Add flags to kvm_hyp_memcache Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/writable-midr' into kvmarm/nextOliver Upton
* kvm-arm64/writable-midr: : Writable implementation ID registers, courtesy of Sebastian Ott : : Introduce a new capability that allows userspace to set the : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1, : and AIDR_EL1. Also plug a hole in KVM's trap configuration where : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not : support SME. KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable KVM: arm64: Copy guest CTR_EL0 into hyp VM KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR KVM: arm64: Allow userspace to change the implementation ID registers KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value KVM: arm64: Maintain per-VM copy of implementation ID regs KVM: arm64: Set HCR_EL2.TID1 unconditionally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-14KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpuFuad Tabba
Instead of creating and initializing _all_ hyp vcpus in pKVM when the first host vcpu runs for the first time, initialize _each_ hyp vcpu in conjunction with its corresponding host vcpu. Some of the host vcpu state (e.g., system registers and traps values) is not initialized until the first time the host vcpu is run. Therefore, initializing a hyp vcpu before its corresponding host vcpu has run for the first time might not view the complete host state of these vcpus. Additionally, this behavior is inline with non-protected modes. Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250314111832.4137161-5-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-14KVM: arm64: Initialize HCRX_EL2 traps in pKVMFuad Tabba
Initialize and set the traps controlled by the HCRX_EL2 in pKVM. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250314111832.4137161-3-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-14KVM: arm64: Distinct pKVM teardown memcache for stage-2Vincent Donnefort
In order to account for memory dedicated to the stage-2 page-tables, use a separated memcache when tearing down the VM. Meanwhile rename reclaim_guest_pages to reflect the fact it only reclaim page-table pages. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250313114038.1502357-3-vdonnefort@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-05KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writableOliver Upton
KVM recently added a capability that allows userspace to override the 'implementation ID' registers presented to the VM. MIDR_EL1 is a special example, where the hypervisor can directly set the value when read from EL1 using VPIDR_EL2. Copy the VM-wide value for MIDR_EL1 into the hyp VM for non-protected guests when the capability is enabled so VPIDR_EL2 gets set up correctly. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/kvmarm/ac594b9c-4bbb-46c8-9391-e7a68ce4de5b@sirena.org.uk/ Fixes: 3adaee783061 ("KVM: arm64: Allow userspace to change the implementation ID registers") Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250305230825.484091-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-05KVM: arm64: Copy guest CTR_EL0 into hyp VMOliver Upton
Since commit 2843cae26644 ("KVM: arm64: Treat CTR_EL0 as a VM feature ID register") KVM has allowed userspace to configure the VM-wide view of CTR_EL0, falling back to trap-n-emulate if the value doesn't match hardware. It appears that this has worked by chance in protected-mode for some time, and on systems with FEAT_EVT protected-mode unconditionally sets TID4 (i.e. TID2 traps sans CTR_EL0). Forward the guest CTR_EL0 value through to the hyp VM and align the TID2/TID4 configuration with the non-protected setup. Fixes: 2843cae26644 ("KVM: arm64: Treat CTR_EL0 as a VM feature ID register") Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250305230825.484091-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-02KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu()Ahmed Genidi
When KVM is in protected mode, host calls to PSCI are proxied via EL2, and cold entries from CPU_ON, CPU_SUSPEND, and SYSTEM_SUSPEND bounce through __kvm_hyp_init_cpu() at EL2 before entering the host kernel's entry point at EL1. While __kvm_hyp_init_cpu() initializes SPSR_EL2 for the exception return to EL1, it does not initialize SCTLR_EL1. Due to this, it's possible to enter EL1 with SCTLR_EL1 in an UNKNOWN state. In practice this has been seen to result in kernel crashes after CPU_ON as a result of SCTLR_EL1.M being 1 in violation of the initial core configuration specified by PSCI. Fix this by initializing SCTLR_EL1 for cold entry to the host kernel. As it's necessary to write to SCTLR_EL12 in VHE mode, this initialization is moved into __kvm_host_psci_cpu_entry() where we can use write_sysreg_el1(). The remnants of the '__init_el2_nvhe_prepare_eret' macro are folded into its only caller, as this is clearer than having the macro. Fixes: cdf367192766ad11 ("KVM: arm64: Intercept host's CPU_ON SMCs") Reported-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ahmed Genidi <ahmed.genidi@arm.com> [ Mark: clarify commit message, handle E2H, move to C, remove macro ] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ahmed Genidi <ahmed.genidi@arm.com> Cc: Ben Horgan <ben.horgan@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250227180526.1204723-3-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-03-02KVM: arm64: Initialize HCR_EL2.E2H earlyMark Rutland
On CPUs without FEAT_E2H0, HCR_EL2.E2H is RES1, but may reset to an UNKNOWN value out of reset and consequently may not read as 1 unless it has been explicitly initialized. We handled this for the head.S boot code in commits: 3944382fa6f22b54 ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative") b3320142f3db9b3f ("arm64: Fix early handling of FEAT_E2H0 not being implemented") Unfortunately, we forgot to apply a similar fix to the KVM PSCI entry points used when relaying CPU_ON, CPU_SUSPEND, and SYSTEM SUSPEND. When KVM is entered via these entry points, the value of HCR_EL2.E2H may be consumed before it has been initialized (e.g. by the 'init_el2_state' macro). Initialize HCR_EL2.E2H early in these paths such that it can be consumed reliably. The existing code in head.S is factored out into a new 'init_el2_hcr' macro, and this is used in the __kvm_hyp_init_cpu() function common to all the relevant PSCI entry points. For clarity, I've tweaked the assembly used to check whether ID_AA64MMFR4_EL1.E2H0 is negative. The bitfield is extracted as a signed value, and this is checked with a signed-greater-or-equal (GE) comparison. As the hyp code will reconfigure HCR_EL2 later in ___kvm_hyp_init(), all bits other than E2H are initialized to zero in __kvm_hyp_init_cpu(). Fixes: 3944382fa6f22b54 ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative") Fixes: b3320142f3db9b3f ("arm64: Fix early handling of FEAT_E2H0 not being implemented") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ahmed Genidi <ahmed.genidi@arm.com> Cc: Ben Horgan <ben.horgan@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250227180526.1204723-2-mark.rutland@arm.com [maz: fixed LT->GE thinko] Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-26KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 valueOliver Upton
Userspace will soon be able to change the value of MIDR_EL1. Prepare by loading VPIDR_EL2 with the guest value for non-nested VMs. Since VPIDR_EL2 is set for any VM, get rid of the NV-specific cleanup of reloading the hardware value on vcpu_put(). And for nVHE, load the hardware value before switching to the host. Link: https://lore.kernel.org/r/20250225005401.679536-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-13KVM: arm64: Eagerly switch ZCR_EL{1,2}Mark Rutland
In non-protected KVM modes, while the guest FPSIMD/SVE/SME state is live on the CPU, the host's active SVE VL may differ from the guest's maximum SVE VL: * For VHE hosts, when a VM uses NV, ZCR_EL2 contains a value constrained by the guest hypervisor, which may be less than or equal to that guest's maximum VL. Note: in this case the value of ZCR_EL1 is immaterial due to E2H. * For nVHE/hVHE hosts, ZCR_EL1 contains a value written by the guest, which may be less than or greater than the guest's maximum VL. Note: in this case hyp code traps host SVE usage and lazily restores ZCR_EL2 to the host's maximum VL, which may be greater than the guest's maximum VL. This can be the case between exiting a guest and kvm_arch_vcpu_put_fp(). If a softirq is taken during this period and the softirq handler tries to use kernel-mode NEON, then the kernel will fail to save the guest's FPSIMD/SVE state, and will pend a SIGKILL for the current thread. This happens because kvm_arch_vcpu_ctxsync_fp() binds the guest's live FPSIMD/SVE state with the guest's maximum SVE VL, and fpsimd_save_user_state() verifies that the live SVE VL is as expected before attempting to save the register state: | if (WARN_ON(sve_get_vl() != vl)) { | force_signal_inject(SIGKILL, SI_KERNEL, 0, 0); | return; | } Fix this and make this a bit easier to reason about by always eagerly switching ZCR_EL{1,2} at hyp during guest<->host transitions. With this happening, there's no need to trap host SVE usage, and the nVHE/nVHE __deactivate_cptr_traps() logic can be simplified to enable host access to all present FPSIMD/SVE/SME features. In protected nVHE/hVHE modes, the host's state is always saved/restored by hyp, and the guest's state is saved prior to exit to the host, so from the host's PoV the guest never has live FPSIMD/SVE/SME state, and the host's ZCR_EL1 is never clobbered by hyp. Fixes: 8c8010d69c132273 ("KVM: arm64: Save/restore SVE state for nVHE") Fixes: 2e3cf82063a00ea0 ("KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-9-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Refactor exit handlersMark Rutland
The hyp exit handling logic is largely shared between VHE and nVHE/hVHE, with common logic in arch/arm64/kvm/hyp/include/hyp/switch.h. The code in the header depends on function definitions provided by arch/arm64/kvm/hyp/vhe/switch.c and arch/arm64/kvm/hyp/nvhe/switch.c when they include the header. This is an unusual header dependency, and prevents the use of arch/arm64/kvm/hyp/include/hyp/switch.h in other files as this would result in compiler warnings regarding missing definitions, e.g. | In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8: | ./arch/arm64/kvm/hyp/include/hyp/switch.h:733:31: warning: 'kvm_get_exit_handler_array' used but never defined | 733 | static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu); | | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:735:13: warning: 'early_exit_filter' used but never defined | 735 | static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code); | | ^~~~~~~~~~~~~~~~~ Refactor the logic such that the header doesn't depend on anything from the C files. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-7-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Refactor CPTR trap deactivationMark Rutland
For historical reasons, the VHE and nVHE/hVHE implementations of __activate_cptr_traps() pair with a common implementation of __kvm_reset_cptr_el2(), which ideally would be named __deactivate_cptr_traps(). Rename __kvm_reset_cptr_el2() to __deactivate_cptr_traps(), and split it into separate VHE and nVHE/hVHE variants so that each can be paired with its corresponding implementation of __activate_cptr_traps(). At the same time, fold kvm_write_cptr_el2() into its callers. This makes it clear in-context whether a write is made to the CPACR_EL1 encoding or the CPTR_EL2 encoding, and removes the possibility of confusion as to whether kvm_write_cptr_el2() reformats the sysreg fields as cpacr_clear_set() does. In the nVHE/hVHE implementation of __activate_cptr_traps(), placing the sysreg writes within the if-else blocks requires that the call to __activate_traps_fpsimd32() is moved earlier, but as this was always called before writing to CPTR_EL2/CPACR_EL1, this should not result in a functional change. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-6-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Remove host FPSIMD saving for non-protected KVMMark Rutland
Now that the host eagerly saves its own FPSIMD/SVE/SME state, non-protected KVM never needs to save the host FPSIMD/SVE/SME state, and the code to do this is never used. Protected KVM still needs to save/restore the host FPSIMD/SVE state to avoid leaking guest state to the host (and to avoid revealing to the host whether the guest used FPSIMD/SVE/SME), and that code needs to be retained. Remove the unused code and data structures. To avoid the need for a stub copy of kvm_hyp_save_fpsimd_host() in the VHE hyp code, the nVHE/hVHE version is moved into the shared switch header, where it is only invoked when KVM is in protected mode. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-3-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-10KVM: arm64: Fix __pkvm_host_mkyoung_guest() return valueMarc Zyngier
Don't use an uninitialised stack variable, and just return 0 on the non-error path. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502100911.8c9DbtKD-lkp@intel.com/ Reviewed-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>