summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/hyp
AgeCommit message (Collapse)Author
2025-05-06KVM: arm64: Propagate FGT masks to the nVHE hypervisorMarc Zyngier
The nVHE hypervisor needs to have access to its own view of the FGT masks, which unfortunately results in a bit of data duplication. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Unconditionally configure fine-grain trapsMark Rutland
... otherwise we can inherit the host configuration if this differs from the KVM configuration. Signed-off-by: Mark Rutland <mark.rutland@arm.com> [maz: simplified a couple of things] Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Don't treat HCRX_EL2 as a FGT registerMarc Zyngier
Treating HCRX_EL2 as yet another FGT register seems excessive, and gets in a way of further improvements. It is actually simpler to just be explicit about the masking, so just to that. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06arm64: sysreg: Replace HFGxTR_EL2 with HFG{R,W}TR_EL2Marc Zyngier
Treating HFGRTR_EL2 and HFGWTR_EL2 identically was a mistake. It makes things hard to reason about, has the potential to introduce bugs by giving a meaning to bits that are really reserved, and is in general a bad description of the architecture. Given that #defines are cheap, let's describe both registers as intended by the architecture, and repaint all the existing uses. Yes, this is painful. The registers themselves are generated from the JSON file in an automated way. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Extend pKVM selftest for np-guestsQuentin Perret
The pKVM selftest intends to test as many memory 'transitions' as possible, so extend it to cover sharing pages with non-protected guests, including in the case of multi-sharing. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-5-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Selftest for pKVM transitionsQuentin Perret
We have recently found a bug [1] in the pKVM memory ownership transitions by code inspection, but it could have been caught with a test. Introduce a boot-time selftest exercising all the known pKVM memory transitions and importantly checks the rejection of illegal transitions. The new test is hidden behind a new Kconfig option separate from CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the transition checks ([1] doesn't reproduce with EL2 debug enabled). [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/ Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-4-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Don't WARN from __pkvm_host_share_guest()Quentin Perret
We currently WARN() if the host attempts to share a page that is not in an acceptable state with a guest. This isn't strictly necessary and makes testing much harder, so drop the WARN and make sure to propage the error code instead. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Add .hyp.data sectionDavid Brazdil
The hypervisor has not needed its own .data section because all globals were either .rodata or .bss. To avoid having to initialize future data-structures at run-time, let's introduce add a .data section to the hypervisor. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE modeMarc Zyngier
We keep setting and clearing these bits depending on the role of the host kernel, mimicking what we do for nVHE. But that's actually pretty pointless, as we always want physical interrupts to make it to the host, at EL2. This has also two problems: - it prevents IRQs from being taken when these bits are cleared if the implementation has chosen to implement these bits as masks when HCR_EL2.{TGE,xMO}=={0,0} - it triggers a bad erratum on the AmpereOne HW, which catches fire on clearing these bits while an interrupt is being taken (AC03_CPU_36). Let's kill these two birds with a single stone, and permanently set the xMO bits when running VHE. This involves a bit of surgery on code paths that rely on flipping these bits on and off for other purposes. Note that the earliest setting of hcr_el2 (in the init_hcr_el2 macro) is left untouched as is runs extremely early, with interrupts disabled, and soon enough overwritten with the final value containing the xMO bits. Reported-by: D Scott Phillips <scott@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250429114326.3618875-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-05KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE modeMarc Zyngier
We keep setting and clearing these bits depending on the role of the host kernel, mimicking what we do for nVHE. But that's actually pretty pointless, as we always want physical interrupts to make it to the host, at EL2. This has also two problems: - it prevents IRQs from being taken when these bits are cleared if the implementation has chosen to implement these bits as masks when HCR_EL2.{TGE,xMO}=={0,0} - it triggers a bad erratum on the AmpereOne HW, which catches fire on clearing these bits while an interrupt is being taken (AC03_CPU_36). Let's kill these two birds with a single stone, and permanently set the xMO bits when running VHE. This involves a bit of surgery on code paths that rely on flipping these bits on and off for other purposes. Note that the earliest setting of hcr_el2 (in the init_hcr_el2 macro) is left untouched as is runs extremely early, with interrupts disabled, and soon enough overwritten with the final value containing the xMO bits. Reported-by: D Scott Phillips <scott@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250429114326.3618875-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-28KVM: arm64: Unconditionally cross check hyp stateQuentin Perret
Now that the hypervisor's state is stored in the hyp_vmemmap, we no longer need an expensive page-table walk to read it. This means we can now afford to cross check the hyp-state during all memory ownership transitions where the hyp is involved unconditionally, hence avoiding problems such as [1]. [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/ Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-8-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Defer EL2 stage-1 mapping on shareQuentin Perret
We currently blindly map into EL2 stage-1 *any* page passed to the __pkvm_host_share_hyp() HVC. This is less than ideal from a security perspective as it makes exploitation of potential hypervisor gadgets easier than it should be. But interestingly, pKVM should never need to access SHARED_BORROWED pages that it hasn't previously pinned, so there is no need to map the page before that. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-7-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Move hyp state to hyp_vmemmapQuentin Perret
Tracking the hypervisor's ownership state into struct hyp_page has several benefits, including allowing far more efficient lookups (no page-table walk needed) and de-corelating the state from the presence of a mapping. This will later allow to map pages into EL2 stage-1 less proactively which is generally a good thing for security. And in the future this will help with tracking the state of pages mapped into the hypervisor's private range without requiring an alias into the 'linear map' range. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-6-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Introduce {get,set}_host_state() helpersQuentin Perret
Instead of directly accessing the host_state member in struct hyp_page, introduce static inline accessors to do it. The future hyp_state member will follow the same pattern as it will need some logic in the accessors. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-5-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Use 0b11 for encoding PKVM_NOPAGEQuentin Perret
The page ownership state encoded as 0b11 is currently considered reserved for future use, and PKVM_NOPAGE uses bit 2. In order to simplify the relocation of the hyp ownership state into the vmemmap in later patches, let's use the 'reserved' encoding for the PKVM_NOPAGE state. The struct hyp_page layout isn't guaranteed stable at all, so there is no real reason to have 'reserved' encodings. No functional changes intended. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-4-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Fix pKVM page-tracking commentsQuentin Perret
Most of the comments relating to pKVM page-tracking in nvhe/memory.h are now either slightly outdated or outright wrong. Fix the comments. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Track SVE state in the hypervisor vcpu structureFuad Tabba
When dealing with a guest with SVE enabled, make sure the host SVE state is pinned at EL2 S1, and that the hypervisor vCPU state is correctly initialised (and then unpinned on teardown). Co-authored-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-07KVM: arm64: Use acquire/release to communicate FF-A version negotiationWill Deacon
The pKVM FF-A proxy rejects FF-A requests other than FFA_VERSION until version negotiation is complete, which is signalled by setting the global 'has_version_negotiated' variable. To avoid excessive locking, this variable is checked directly from kvm_host_ffa_handler() in response to an FF-A call, but this can race against another CPU performing the negotiation and potentially lead to reading a torn value (incredibly unlikely for a 'bool') or problematic re-ordering of the accesses to 'has_version_negotiated' and 'hyp_ffa_version' whereby a stale version number could be read by __do_ffa_mem_xfer(). Use acquire/release primitives when writing 'has_version_negotiated' with the version lock held and when reading without the lock held. Cc: Sebastian Ene <sebastianene@google.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Quentin Perret <qperret@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Marc Zyngier <maz@kernel.org> Fixes: c9c012625e12 ("KVM: arm64: Trap FFA_VERSION host call in pKVM") Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250407152755.1041-1-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-03KVM: arm64: Don't translate FAR if invalid/unsafeOliver Upton
Don't re-walk the page tables if an SEA occurred during the faulting page table walk to avoid taking a fatal exception in the hyp. Additionally, check that FAR_EL2 is valid for SEAs not taken on PTW as the architecture doesn't guarantee it contains the fault VA. Finally, fix up the rest of the abort path by checking for SEAs early and bugging the VM if we get further along with an UNKNOWN fault IPA. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250402201725.2963645-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-03arm64: Convert HPFAR_EL2 to sysreg tableOliver Upton
Switch over to the typical sysreg table for HPFAR_EL2 as we're about to start using more fields in the register. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250402201725.2963645-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-03KVM: arm64: Only read HPFAR_EL2 when value is architecturally validOliver Upton
KVM's logic for deciding when HPFAR_EL2 is UNKNOWN doesn't align with the architecture. Most notably, KVM assumes HPFAR_EL2 contains the faulting IPA even in the case of an SEA. Align the logic with the architecture rather than attempting to paraphrase it. Additionally, take the opportunity to improve the language around ARM erratum #834220 such that it actually describes the bug. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250402201725.2963645-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/pkvm-6.15' into kvmarm/nextOliver Upton
* kvm-arm64/pkvm-6.15: : pKVM updates for 6.15 : : - SecPageTable stats for stage-2 table pages allocated by the protected : hypervisor (Vincent Donnefort) : : - HCRX_EL2 trap + vCPU initialization fixes for pKVM (Fuad Tabba) KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: arm64: Count pKVM stage-2 usage in secondary pagetable stats KVM: arm64: Distinct pKVM teardown memcache for stage-2 KVM: arm64: Add flags to kvm_hyp_memcache Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/writable-midr' into kvmarm/nextOliver Upton
* kvm-arm64/writable-midr: : Writable implementation ID registers, courtesy of Sebastian Ott : : Introduce a new capability that allows userspace to set the : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1, : and AIDR_EL1. Also plug a hole in KVM's trap configuration where : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not : support SME. KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable KVM: arm64: Copy guest CTR_EL0 into hyp VM KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR KVM: arm64: Allow userspace to change the implementation ID registers KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value KVM: arm64: Maintain per-VM copy of implementation ID regs KVM: arm64: Set HCR_EL2.TID1 unconditionally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/pmuv3-asahi' into kvmarm/nextOliver Upton
* kvm-arm64/pmuv3-asahi: : Support PMUv3 for KVM guests on Apple silicon : : Take advantage of some IMPLEMENTATION DEFINED traps available on Apple : parts to trap-and-emulate the PMUv3 registers on behalf of a KVM guest. : Constrain the vPMU to a cycle counter and single event counter, as the : Apple PMU has events that cannot be counted on every counter. : : There is a small new interface between the ARM PMU driver and KVM, where : the PMU driver owns the PMUv3 -> hardware event mappings. arm64: Enable IMP DEF PMUv3 traps on Apple M* KVM: arm64: Provide 1 event counter on IMPDEF hardware drivers/perf: apple_m1: Provide helper for mapping PMUv3 events KVM: arm64: Remap PMUv3 events onto hardware KVM: arm64: Advertise PMUv3 if IMPDEF traps are present KVM: arm64: Compute synthetic sysreg ESR for Apple PMUv3 traps KVM: arm64: Move PMUVer filtering into KVM code KVM: arm64: Use guard() to cleanup usage of arm_pmus_lock KVM: arm64: Drop kvm_arm_pmu_available static key KVM: arm64: Use a cpucap to determine if system supports FEAT_PMUv3 KVM: arm64: Always support SW_INCR PMU event KVM: arm64: Compute PMCEID from arm_pmu's event bitmaps drivers/perf: apple_m1: Support host/guest event filtering drivers/perf: apple_m1: Refactor event select/filter configuration Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/nv-vgic' into kvmarm/nextOliver Upton
* kvm-arm64/nv-vgic: : NV VGICv3 support, courtesy of Marc Zyngier : : Support for emulating the GIC hypervisor controls and managing shadow : VGICv3 state for the L1 hypervisor. As part of it, bring in support for : taking IRQs to the L1 and UAPI to manage the VGIC maintenance interrupt. KVM: arm64: nv: Fail KVM init if asking for NV without GICv3 KVM: arm64: nv: Allow userland to set VGIC maintenance IRQ KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup KVM: arm64: nv: Propagate used_lrs between L1 and L0 contexts KVM: arm64: nv: Request vPE doorbell upon nested ERET to L2 KVM: arm64: nv: Respect virtual HCR_EL2.TWx setting KVM: arm64: nv: Add Maintenance Interrupt emulation KVM: arm64: nv: Handle L2->L1 transition on interrupt injection KVM: arm64: nv: Nested GICv3 emulation KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses KVM: arm64: nv: Plumb handling of GICv3 EL2 accesses KVM: arm64: nv: Add ICH_*_EL2 registers to vpcu_sysreg KVM: arm64: nv: Load timer before the GIC arm64: sysreg: Add layout for ICH_MISR_EL2 arm64: sysreg: Add layout for ICH_VTR_EL2 arm64: sysreg: Add layout for ICH_HCR_EL2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-14KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpuFuad Tabba
Instead of creating and initializing _all_ hyp vcpus in pKVM when the first host vcpu runs for the first time, initialize _each_ hyp vcpu in conjunction with its corresponding host vcpu. Some of the host vcpu state (e.g., system registers and traps values) is not initialized until the first time the host vcpu is run. Therefore, initializing a hyp vcpu before its corresponding host vcpu has run for the first time might not view the complete host state of these vcpus. Additionally, this behavior is inline with non-protected modes. Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250314111832.4137161-5-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-14KVM: arm64: Initialize HCRX_EL2 traps in pKVMFuad Tabba
Initialize and set the traps controlled by the HCRX_EL2 in pKVM. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250314111832.4137161-3-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-14KVM: arm64: Distinct pKVM teardown memcache for stage-2Vincent Donnefort
In order to account for memory dedicated to the stage-2 page-tables, use a separated memcache when tearing down the VM. Meanwhile rename reclaim_guest_pages to reflect the fact it only reclaim page-table pages. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250313114038.1502357-3-vdonnefort@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-11KVM: arm64: Compute synthetic sysreg ESR for Apple PMUv3 trapsOliver Upton
Apple M* CPUs provide an IMPDEF trap for PMUv3 sysregs, where ESR_EL2.EC is a reserved value (0x3F) and a sysreg-like ISS is reported in AFSR1_EL2. Compute a synthetic ESR for these PMUv3 traps, giving the illusion of something architectural to the rest of KVM. Tested-by: Janne Grunau <j@jannau.net> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250305202641.428114-10-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-11KVM: arm64: Use a cpucap to determine if system supports FEAT_PMUv3Oliver Upton
KVM is about to learn some new tricks to virtualize PMUv3 on IMPDEF hardware. As part of that, we now need to differentiate host support from guest support for PMUv3. Add a cpucap to determine if an architectural PMUv3 is present to guard host usage of PMUv3 controls. Tested-by: Janne Grunau <j@jannau.net> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250305202641.428114-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-05KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writableOliver Upton
KVM recently added a capability that allows userspace to override the 'implementation ID' registers presented to the VM. MIDR_EL1 is a special example, where the hypervisor can directly set the value when read from EL1 using VPIDR_EL2. Copy the VM-wide value for MIDR_EL1 into the hyp VM for non-protected guests when the capability is enabled so VPIDR_EL2 gets set up correctly. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/kvmarm/ac594b9c-4bbb-46c8-9391-e7a68ce4de5b@sirena.org.uk/ Fixes: 3adaee783061 ("KVM: arm64: Allow userspace to change the implementation ID registers") Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250305230825.484091-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-05KVM: arm64: Copy guest CTR_EL0 into hyp VMOliver Upton
Since commit 2843cae26644 ("KVM: arm64: Treat CTR_EL0 as a VM feature ID register") KVM has allowed userspace to configure the VM-wide view of CTR_EL0, falling back to trap-n-emulate if the value doesn't match hardware. It appears that this has worked by chance in protected-mode for some time, and on systems with FEAT_EVT protected-mode unconditionally sets TID4 (i.e. TID2 traps sans CTR_EL0). Forward the guest CTR_EL0 value through to the hyp VM and align the TID2/TID4 configuration with the non-protected setup. Fixes: 2843cae26644 ("KVM: arm64: Treat CTR_EL0 as a VM feature ID register") Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250305230825.484091-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03KVM: arm64: nv: Nested GICv3 emulationMarc Zyngier
When entering a nested VM, we set up the hypervisor control interface based on what the guest hypervisor has set. Especially, we investigate each list register written by the guest hypervisor whether HW bit is set. If so, we translate hw irq number from the guest's point of view to the real hardware irq number if there is a mapping. Co-developed-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> [Christoffer: Redesigned execution flow around vcpu load/put] Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [maz: Rewritten to support GICv3 instead of GICv2, NV2 support] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03arm64: sysreg: Add layout for ICH_HCR_EL2Marc Zyngier
The ICH_HCR_EL2-related macros are missing a number of control bits that we are about to handle. Take this opportunity to fully describe the layout of that register as part of the automatic generation infrastructure. This results in a bit of churn, unfortunately. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-02KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu()Ahmed Genidi
When KVM is in protected mode, host calls to PSCI are proxied via EL2, and cold entries from CPU_ON, CPU_SUSPEND, and SYSTEM_SUSPEND bounce through __kvm_hyp_init_cpu() at EL2 before entering the host kernel's entry point at EL1. While __kvm_hyp_init_cpu() initializes SPSR_EL2 for the exception return to EL1, it does not initialize SCTLR_EL1. Due to this, it's possible to enter EL1 with SCTLR_EL1 in an UNKNOWN state. In practice this has been seen to result in kernel crashes after CPU_ON as a result of SCTLR_EL1.M being 1 in violation of the initial core configuration specified by PSCI. Fix this by initializing SCTLR_EL1 for cold entry to the host kernel. As it's necessary to write to SCTLR_EL12 in VHE mode, this initialization is moved into __kvm_host_psci_cpu_entry() where we can use write_sysreg_el1(). The remnants of the '__init_el2_nvhe_prepare_eret' macro are folded into its only caller, as this is clearer than having the macro. Fixes: cdf367192766ad11 ("KVM: arm64: Intercept host's CPU_ON SMCs") Reported-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ahmed Genidi <ahmed.genidi@arm.com> [ Mark: clarify commit message, handle E2H, move to C, remove macro ] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ahmed Genidi <ahmed.genidi@arm.com> Cc: Ben Horgan <ben.horgan@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250227180526.1204723-3-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-03-02KVM: arm64: Initialize HCR_EL2.E2H earlyMark Rutland
On CPUs without FEAT_E2H0, HCR_EL2.E2H is RES1, but may reset to an UNKNOWN value out of reset and consequently may not read as 1 unless it has been explicitly initialized. We handled this for the head.S boot code in commits: 3944382fa6f22b54 ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative") b3320142f3db9b3f ("arm64: Fix early handling of FEAT_E2H0 not being implemented") Unfortunately, we forgot to apply a similar fix to the KVM PSCI entry points used when relaying CPU_ON, CPU_SUSPEND, and SYSTEM SUSPEND. When KVM is entered via these entry points, the value of HCR_EL2.E2H may be consumed before it has been initialized (e.g. by the 'init_el2_state' macro). Initialize HCR_EL2.E2H early in these paths such that it can be consumed reliably. The existing code in head.S is factored out into a new 'init_el2_hcr' macro, and this is used in the __kvm_hyp_init_cpu() function common to all the relevant PSCI entry points. For clarity, I've tweaked the assembly used to check whether ID_AA64MMFR4_EL1.E2H0 is negative. The bitfield is extracted as a signed value, and this is checked with a signed-greater-or-equal (GE) comparison. As the hyp code will reconfigure HCR_EL2 later in ___kvm_hyp_init(), all bits other than E2H are initialized to zero in __kvm_hyp_init_cpu(). Fixes: 3944382fa6f22b54 ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative") Fixes: b3320142f3db9b3f ("arm64: Fix early handling of FEAT_E2H0 not being implemented") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ahmed Genidi <ahmed.genidi@arm.com> Cc: Ben Horgan <ben.horgan@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250227180526.1204723-2-mark.rutland@arm.com [maz: fixed LT->GE thinko] Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-26KVM: arm64: Allow userspace to change the implementation ID registersSebastian Ott
KVM's treatment of the ID registers that describe the implementation (MIDR, REVIDR, and AIDR) is interesting, to say the least. On the userspace-facing end of it, KVM presents the values of the boot CPU on all vCPUs and treats them as invariant. On the guest side of things KVM presents the hardware values of the local CPU, which can change during CPU migration in a big-little system. While one may call this fragile, there is at least some degree of predictability around it. For example, if a VMM wanted to present big-little to a guest, it could affine vCPUs accordingly to the correct clusters. All of this makes a giant mess out of adding support for making these implementation ID registers writable. Avoid breaking the rather subtle ABI around the old way of doing things by requiring opt-in from userspace to make the registers writable. When the cap is enabled, allow userspace to set MIDR, REVIDR, and AIDR to any non-reserved value and present those values consistently across all vCPUs. Signed-off-by: Sebastian Ott <sebott@redhat.com> [oliver: changelog, capability] Link: https://lore.kernel.org/r/20250225005401.679536-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 valueOliver Upton
Userspace will soon be able to change the value of MIDR_EL1. Prepare by loading VPIDR_EL2 with the guest value for non-nested VMs. Since VPIDR_EL2 is set for any VM, get rid of the NV-specific cleanup of reloading the hardware value on vcpu_put(). And for nVHE, load the hardware value before switching to the host. Link: https://lore.kernel.org/r/20250225005401.679536-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-14Merge tag 'kvmarm-fixes-6.14-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.14, take #2 - Large set of fixes for vector handling, specially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours. - Fix use of kernel VAs at EL2 when emulating timers with nVHE. - Small set of pKVM improvements and cleanups.
2025-02-13KVM: arm64: Convert timer offset VA when accessed in HYP codeMarc Zyngier
Now that EL2 has gained some early timer emulation, it accesses the offsets pointed to by the timer structure, both of which live in the KVM structure. Of course, these are *kernel* pointers, so the dereferencing of these pointers in non-kernel code must be itself be offset. Given switch.h its own version of timer_get_offset() and use that instead. Fixes: b86fc215dc26d ("KVM: arm64: Handle counter access early in non-HYP context") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Anders Roxell <anders.roxell@linaro.org> Link: https://lore.kernel.org/r/20250212173454.2864462-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Eagerly switch ZCR_EL{1,2}Mark Rutland
In non-protected KVM modes, while the guest FPSIMD/SVE/SME state is live on the CPU, the host's active SVE VL may differ from the guest's maximum SVE VL: * For VHE hosts, when a VM uses NV, ZCR_EL2 contains a value constrained by the guest hypervisor, which may be less than or equal to that guest's maximum VL. Note: in this case the value of ZCR_EL1 is immaterial due to E2H. * For nVHE/hVHE hosts, ZCR_EL1 contains a value written by the guest, which may be less than or greater than the guest's maximum VL. Note: in this case hyp code traps host SVE usage and lazily restores ZCR_EL2 to the host's maximum VL, which may be greater than the guest's maximum VL. This can be the case between exiting a guest and kvm_arch_vcpu_put_fp(). If a softirq is taken during this period and the softirq handler tries to use kernel-mode NEON, then the kernel will fail to save the guest's FPSIMD/SVE state, and will pend a SIGKILL for the current thread. This happens because kvm_arch_vcpu_ctxsync_fp() binds the guest's live FPSIMD/SVE state with the guest's maximum SVE VL, and fpsimd_save_user_state() verifies that the live SVE VL is as expected before attempting to save the register state: | if (WARN_ON(sve_get_vl() != vl)) { | force_signal_inject(SIGKILL, SI_KERNEL, 0, 0); | return; | } Fix this and make this a bit easier to reason about by always eagerly switching ZCR_EL{1,2} at hyp during guest<->host transitions. With this happening, there's no need to trap host SVE usage, and the nVHE/nVHE __deactivate_cptr_traps() logic can be simplified to enable host access to all present FPSIMD/SVE/SME features. In protected nVHE/hVHE modes, the host's state is always saved/restored by hyp, and the guest's state is saved prior to exit to the host, so from the host's PoV the guest never has live FPSIMD/SVE/SME state, and the host's ZCR_EL1 is never clobbered by hyp. Fixes: 8c8010d69c132273 ("KVM: arm64: Save/restore SVE state for nVHE") Fixes: 2e3cf82063a00ea0 ("KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-9-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Mark some header functions as inlineMark Rutland
The shared hyp switch header has a number of static functions which might not be used by all files that include the header, and when unused they will provoke compiler warnings, e.g. | In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8: | ./arch/arm64/kvm/hyp/include/hyp/switch.h:703:13: warning: 'kvm_hyp_handle_dabt_low' defined but not used [-Wunused-function] | 703 | static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:682:13: warning: 'kvm_hyp_handle_cp15_32' defined but not used [-Wunused-function] | 682 | static bool kvm_hyp_handle_cp15_32(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:662:13: warning: 'kvm_hyp_handle_sysreg' defined but not used [-Wunused-function] | 662 | static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:458:13: warning: 'kvm_hyp_handle_fpsimd' defined but not used [-Wunused-function] | 458 | static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:329:13: warning: 'kvm_hyp_handle_mops' defined but not used [-Wunused-function] | 329 | static bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~ Mark these functions as 'inline' to suppress this warning. This shouldn't result in any functional change. At the same time, avoid the use of __alias() in the header and alias kvm_hyp_handle_iabt_low() and kvm_hyp_handle_watchpt_low() to kvm_hyp_handle_memory_fault() using CPP, matching the style in the rest of the kernel. For consistency, kvm_hyp_handle_memory_fault() is also marked as 'inline'. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-8-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Refactor exit handlersMark Rutland
The hyp exit handling logic is largely shared between VHE and nVHE/hVHE, with common logic in arch/arm64/kvm/hyp/include/hyp/switch.h. The code in the header depends on function definitions provided by arch/arm64/kvm/hyp/vhe/switch.c and arch/arm64/kvm/hyp/nvhe/switch.c when they include the header. This is an unusual header dependency, and prevents the use of arch/arm64/kvm/hyp/include/hyp/switch.h in other files as this would result in compiler warnings regarding missing definitions, e.g. | In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8: | ./arch/arm64/kvm/hyp/include/hyp/switch.h:733:31: warning: 'kvm_get_exit_handler_array' used but never defined | 733 | static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu); | | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:735:13: warning: 'early_exit_filter' used but never defined | 735 | static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code); | | ^~~~~~~~~~~~~~~~~ Refactor the logic such that the header doesn't depend on anything from the C files. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-7-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Refactor CPTR trap deactivationMark Rutland
For historical reasons, the VHE and nVHE/hVHE implementations of __activate_cptr_traps() pair with a common implementation of __kvm_reset_cptr_el2(), which ideally would be named __deactivate_cptr_traps(). Rename __kvm_reset_cptr_el2() to __deactivate_cptr_traps(), and split it into separate VHE and nVHE/hVHE variants so that each can be paired with its corresponding implementation of __activate_cptr_traps(). At the same time, fold kvm_write_cptr_el2() into its callers. This makes it clear in-context whether a write is made to the CPACR_EL1 encoding or the CPTR_EL2 encoding, and removes the possibility of confusion as to whether kvm_write_cptr_el2() reformats the sysreg fields as cpacr_clear_set() does. In the nVHE/hVHE implementation of __activate_cptr_traps(), placing the sysreg writes within the if-else blocks requires that the call to __activate_traps_fpsimd32() is moved earlier, but as this was always called before writing to CPTR_EL2/CPACR_EL1, this should not result in a functional change. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-6-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Remove host FPSIMD saving for non-protected KVMMark Rutland
Now that the host eagerly saves its own FPSIMD/SVE/SME state, non-protected KVM never needs to save the host FPSIMD/SVE/SME state, and the code to do this is never used. Protected KVM still needs to save/restore the host FPSIMD/SVE state to avoid leaking guest state to the host (and to avoid revealing to the host whether the guest used FPSIMD/SVE/SME), and that code needs to be retained. Remove the unused code and data structures. To avoid the need for a stub copy of kvm_hyp_save_fpsimd_host() in the VHE hyp code, the nVHE/hVHE version is moved into the shared switch header, where it is only invoked when KVM is in protected mode. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-3-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-10KVM: arm64: Fix __pkvm_host_mkyoung_guest() return valueMarc Zyngier
Don't use an uninitialised stack variable, and just return 0 on the non-error path. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502100911.8c9DbtKD-lkp@intel.com/ Reviewed-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-09KVM: arm64: Simplify np-guest hypercallsQuentin Perret
When the handling of a guest stage-2 permission fault races with an MMU notifier, the faulting page might be gone from the guest's stage-2 by the point we attempt to call (p)kvm_pgtable_stage2_relax_perms(). In the normal KVM case, this leads to returning -EAGAIN which user_mem_abort() handles correctly by simply re-entering the guest. However, the pKVM hypercall implementation has additional logic to check the page state using __check_host_shared_guest() which gets confused with absence of a page mapped at the requested IPA and returns -ENOENT, hence breaking user_mem_abort() and hilarity ensues. Luckily, several of the hypercalls for managing the stage-2 page-table of NP guests have no effect on the pKVM ownership tracking (wrprotect, test_clear_young, mkyoung, and crucially relax_perms), so the extra state checking logic is in fact not strictly necessary. So, to fix the discrepancy between standard KVM and pKVM, let's just drop the superfluous __check_host_shared_guest() logic from those hypercalls and make the extra state checking a debug assertion dependent on CONFIG_NVHE_EL2_DEBUG as we already do for other transitions. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250207145438.1333475-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-09KVM: arm64: Improve error handling from check_host_shared_guest()Quentin Perret
The check_host_shared_guest() path expects to find a last-level valid PTE in the guest's stage-2 page-table. However, it checks the PTE's level before its validity, which makes it hard for callers to figure out what went wrong. To make error handling simpler, check the PTE's validity first. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250207145438.1333475-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-04Merge tag 'kvmarm-fixes-6.14-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.14, take #1 - Correctly clean the BSS to the PoC before allowing EL2 to access it on nVHE/hVHE/protected configurations - Propagate ownership of debug registers in protected mode after the rework that landed in 6.14-rc1 - Stop pretending that we can run the protected mode without a GICv3 being present on the host - Fix a use-after-free situation that can occur if a vcpu fails to initialise the NV shadow S2 MMU contexts - Always evaluate the need to arm a background timer for fully emulated guest timers - Fix the emulation of EL1 timers in the absence of FEAT_ECV - Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0
2025-02-01KVM: arm64: Flush/sync debug state in protected modeOliver Upton
The recent changes to debug state management broke self-hosted debug for guests when running in protected mode, since both the debug owner and the debug state itself aren't shared with the hyp's view of the vcpu. Fix it by flushing/syncing the relevant bits with the hyp vcpu. Fixes: beb470d96cec ("KVM: arm64: Use debug_owner to track if debug regs need save/restore") Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/kvmarm/5f62740f-a065-42d9-9f56-8fb648b9c63f@sirena.org.uk/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250131222922.1548780-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>