summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/sys_regs.c
AgeCommit message (Collapse)Author
2025-07-09KVM: arm64: Fix enforcement of upper bound on MDCR_EL2.HPMNBen Horgan
Previously, u64_replace_bits() was used to no effect as the return value was ignored. Convert to u64p_replace_bits() so the value is updated in place. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Fixes: efff9dd2fee7 ("KVM: arm64: Handle out-of-bound write to MDCR_EL2.HPMN") Link: https://lore.kernel.org/r/20250709093808.920284-2-ben.horgan@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-11Merge tag 'kvmarm-fixes-6.16-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.16, take #2 - Rework of system register accessors for system registers that are directly writen to memory, so that sanitisation of the in-memory value happens at the correct time (after the read, or before the write). For convenience, RMW-style accessors are also provided. - Multiple fixes for the so-called "arch-timer-edge-cases' selftest, which was always broken.
2025-06-05KVM: arm64: Add RMW specific sysreg accessorMarc Zyngier
In a number of cases, we perform a Read-Modify-Write operation on a system register, meaning that we would apply the RESx masks twice. Instead, provide a new accessor that performs this RMW operation, allowing the masks to be applied exactly once per operation. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05KVM: arm64: Add assignment-specific sysreg accessorMarc Zyngier
Assigning a value to a system register doesn't do what it is supposed to be doing if that register is one that has RESx bits. The main problem is that we use __vcpu_sys_reg(), which can be used both as a lvalue and rvalue. When used as a lvalue, the bit masking occurs *before* the new value is assigned, meaning that we (1) do pointless work on the old cvalue, and (2) potentially assign an invalid value as we fail to apply the masks to it. Fix this by providing a new __vcpu_assign_sys_reg() that does what it says on the tin, and sanitises the *new* value instead of the old one. This comes with a significant amount of churn. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-26Merge tag 'kvmarm-6.16' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.16 * New features: - Add large stage-2 mapping support for non-protected pKVM guests, clawing back some performance. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Enable nested virtualisation support on systems that support it (yes, it has been a long time coming), though it is disabled by default. * Improvements, fixes and cleanups: - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. This ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups.
2025-05-23Merge branch kvm-arm64/nv-nv into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-nv: : . : Flick the switch on the NV support by adding the missing piece : in the form of the VNCR page management. From the cover letter: : : "This is probably the most interesting bit of the whole NV adventure. : So far, everything else has been a walk in the park, but this one is : where the real fun takes place. : : With FEAT_NV2, most of the NV support revolves around tricking a guest : into accessing memory while it tries to access system registers. The : hypervisor's job is to handle the context switch of the actual : registers with the state in memory as needed." : . KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating KVM: arm64: Document NV caps and vcpu flags KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* KVM: arm64: nv: Remove dead code from ERET handling KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 KVM: arm64: nv: Handle VNCR_EL2-triggered faults KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting KVM: arm64: nv: Move TLBI range decoding to a helper KVM: arm64: nv: Snapshot S1 ASID tagging information during walk KVM: arm64: nv: Extract translation helper from the AT code KVM: arm64: nv: Allocate VNCR page when required arm64: sysreg: Add layout for VNCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/fgt-masks into kvmarm-master/nextMarc Zyngier
* kvm-arm64/fgt-masks: (43 commits) : . : Large rework of the way KVM deals with trap bits in conjunction with : the CPU feature registers. It now draws a direct link between which : the feature set, the system registers that need to UNDEF to match : the configuration and bits that need to behave as RES0 or RES1 in : the trap registers that are visible to the guest. : : Best of all, these definitions are mostly automatically generated : from the JSON description published by ARM under a permissive : license. : . KVM: arm64: Handle TSB CSYNC traps KVM: arm64: Add FGT descriptors for FEAT_FGT2 KVM: arm64: Allow sysreg ranges for FGT descriptors KVM: arm64: Add context-switch for FEAT_FGT2 registers KVM: arm64: Add trap routing for FEAT_FGT2 registers KVM: arm64: Add sanitisation for FEAT_FGT2 registers KVM: arm64: Add FEAT_FGT2 registers to the VNCR page KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits KVM: arm64: Allow kvm_has_feat() to take variable arguments KVM: arm64: Use FGT feature maps to drive RES0 bits KVM: arm64: Validate FGT register descriptions against RES0 masks KVM: arm64: Switch to table-driven FGU configuration KVM: arm64: Handle PSB CSYNC traps KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask KVM: arm64: Remove hand-crafted masks for FGT registers KVM: arm64: Use computed FGT masks to setup FGT registers KVM: arm64: Propagate FGT masks to the nVHE hypervisor KVM: arm64: Unconditionally configure fine-grain traps KVM: arm64: Use computed masks as sanitisers for FGT registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/mte-frac into kvmarm-master/nextMarc Zyngier
* kvm-arm64/mte-frac: : . : Prevent FEAT_MTE_ASYNC from being accidently exposed to a guest, : courtesy of Ben Horgan. From the cover letter: : : "The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM. : However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0 : indicates that MTE_ASYNC is supported. On a host with : ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the : MTE capability enabled will incorrectly see MTE_ASYNC advertised as : supported. This series fixes that." : . KVM: selftests: Confirm exposing MTE_frac does not break migration KVM: arm64: Make MTE_frac masking conditional on MTE capability arm64/sysreg: Expose MTE_frac so that it is visible to KVM Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add sanitisation for FEAT_FGT2 registersMarc Zyngier
Just like the FEAT_FGT registers, treat the FGT2 variant the same way. THis is a large update, but a fairly mechanical one. The config dependencies are extracted from the 2025-03 JSON drop. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatchMarc Zyngier
Now that we have to handle TLBI S1E2 in the core code, plumb the sysinsn dispatch code into it, so that these instructions don't just UNDEF anymore. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-15-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2Marc Zyngier
Plug VNCR_EL2 in the vcpu_sysreg enum, define its RES0/RES1 bits, and make it accessible to userspace when the VM is configured to support FEAT_NV2. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Move TLBI range decoding to a helperMarc Zyngier
As we are about to expand out TLB invalidation capabilities to support recursive virtualisation, move the decoding of a TLBI by range into a helper that returns the base, the range and the ASID. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-16KVM: arm64: Make MTE_frac masking conditional on MTE capabilityBen Horgan
If MTE_frac is masked out unconditionally then the guest will always see ID_AA64PFR1_EL1_MTE_frac as 0. However, a value of 0 when ID_AA64PFR1_EL1_MTE is 2 indicates that MTE_ASYNC is supported. Hence, for a host with ID_AA64PFR1_EL1_MTE==2 and ID_AA64PFR1_EL1_MTE_frac==0xf (MTE_ASYNC unsupported) the guest would see MTE_ASYNC advertised as supported whilst the host does not support it. Hence, expose the sanitised value of MTE_frac to the guest and user-space. As MTE_frac was previously hidden, always 0, and KVM must accept values from KVM provided by user-space, when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid incorrectly claiming hardware support for MTE_ASYNC in the guest. Note that linux does not check the value of ID_AA64PFR1_EL1_MTE_frac and wrongly assumes that MTE async faults can be generated even on hardware that does nto support them. This issue is not addressed here. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Link: https://lore.kernel.org/r/20250512114112.359087-3-ben.horgan@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-10KVM: arm64: Validate FGT register descriptions against RES0 masksMarc Zyngier
In order to point out to the unsuspecting KVM hacker that they are missing something somewhere, validate that the known FGT bits do not intersect with the corresponding RES0 mask, as computed at boot time. THis check is also performed at boot time, ensuring that there is no runtime overhead. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-10KVM: arm64: Switch to table-driven FGU configurationMarc Zyngier
Defining the FGU behaviour is extremely tedious. It relies on matching each set of bits from FGT registers with am architectural feature, and adding them to the FGU list if the corresponding feature isn't advertised to the guest. It is however relatively easy to dump most of that information from the architecture JSON description, and use that to control the FGU bits. Let's introduce a new set of tables descripbing the mapping between FGT bits and features. Most of the time, this is only a lookup in an idreg field, with a few more complex exceptions. While this is obviously many more lines in a new file, this is mostly generated, and is pretty easy to maintain. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Plug FEAT_GCS handlingMarc Zyngier
We don't seem to be handling the GCS-specific exception class. Handle it by delivering an UNDEF to the guest, and populate the relevant trap bits. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Restrict ACCDATA_EL1 undef to FEAT_LS64_ACCDATA being disabledMarc Zyngier
We currently unconditionally make ACCDATA_EL1 accesses UNDEF. As we are about to support it, restrict the UNDEF behaviour to cases where FEAT_LS64_ACCDATA is not exposed to the guest. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06arm64: sysreg: Replace HFGxTR_EL2 with HFG{R,W}TR_EL2Marc Zyngier
Treating HFGRTR_EL2 and HFGWTR_EL2 identically was a mistake. It makes things hard to reason about, has the potential to introduce bugs by giving a meaning to bits that are really reserved, and is in general a bad description of the architecture. Given that #defines are cheap, let's describe both registers as intended by the architecture, and repaint all the existing uses. Yes, this is painful. The registers themselves are generated from the JSON file in an automated way. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-05KVM: arm64: Prevent userspace from disabling AArch64 support at any ↵Marc Zyngier
virtualisable EL A sorry excuse for a selftest is trying to disable AArch64 support. And yes, this goes as well as you can imagine. Let's forbid this sort of things. Normal userspace shouldn't get caught doing that. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250429114117.3618800-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-11KVM: arm64: Handle out-of-bound write to MDCR_EL2.HPMNMarc Zyngier
We don't really pay attention to what gets written to MDCR_EL2.HPMN, and funky guests could play ugly games on us. Restrict what gets written there, and limit the number of counters to what the PMU is allowed to have. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11KVM: arm64: Don't let userspace write to PMCR_EL0.N when the vcpu has EL2Marc Zyngier
Now that userspace can provide its limit for hte maximum number of counters, prevent it from writing to PMCR_EL0.N, as the value should be derived from MDCR_EL2.HPMN in that case. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11KVM: arm64: Fix MDCR_EL2.HPMN reset valueMarc Zyngier
The MDCR_EL2 documentation indicates that the HPMN field has the following behaviour: "On a Warm reset, this field resets to the expression NUM_PMU_COUNTERS." However, it appears we reset it to zero, which is not very useful. Add a reset helper for MDCR_EL2, and handle the case where userspace changes the target PMU, which may force us to change HPMN again. Reported-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11KVM: arm64: Repaint pmcr_n into nr_pmu_countersMarc Zyngier
The pmcr_n field obviously refers to PMCR_EL0.N, but is generally used as the number of counters seen by the guest. Rename it accordingly. Suggested-by: Oliver upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-03-19Merge branch 'kvm-arm64/pmu-fixes' into kvmarm/nextOliver Upton
* kvm-arm64/pmu-fixes: : vPMU fixes for 6.15 courtesy of Akihiko Odaki : : Various fixes to KVM's vPMU implementation, notably ensuring : userspace-directed changes to the PMCs are reflected in the backing perf : events. KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/pkvm-6.15' into kvmarm/nextOliver Upton
* kvm-arm64/pkvm-6.15: : pKVM updates for 6.15 : : - SecPageTable stats for stage-2 table pages allocated by the protected : hypervisor (Vincent Donnefort) : : - HCRX_EL2 trap + vCPU initialization fixes for pKVM (Fuad Tabba) KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: arm64: Count pKVM stage-2 usage in secondary pagetable stats KVM: arm64: Distinct pKVM teardown memcache for stage-2 KVM: arm64: Add flags to kvm_hyp_memcache Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/writable-midr' into kvmarm/nextOliver Upton
* kvm-arm64/writable-midr: : Writable implementation ID registers, courtesy of Sebastian Ott : : Introduce a new capability that allows userspace to set the : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1, : and AIDR_EL1. Also plug a hole in KVM's trap configuration where : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not : support SME. KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable KVM: arm64: Copy guest CTR_EL0 into hyp VM KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR KVM: arm64: Allow userspace to change the implementation ID registers KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value KVM: arm64: Maintain per-VM copy of implementation ID regs KVM: arm64: Set HCR_EL2.TID1 unconditionally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/nv-idregs' into kvmarm/nextOliver Upton
* kvm-arm64/nv-idregs: : Changes to exposure of NV features, courtesy of Marc Zyngier : : Apply NV-specific feature restrictions at reset rather than at the point : of KVM_RUN. This makes the true feature set visible to userspace, a : necessary step towards save/restore support or NV VMs. : : Add an additional vCPU feature flag for selecting the E2H0 flavor of NV, : such that the VHE-ness of the VM can be applied to the feature set. KVM: arm64: selftests: Test that TGRAN*_2 fields are writable KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN*_2 KVM: arm64: Advertise FEAT_ECV when possible KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writable KVM: arm64: Allow userspace to limit NV support to nVHE KVM: arm64: Move NV-specific capping to idreg sanitisation KVM: arm64: Enforce NV limits on a per-idregs basis KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely available KVM: arm64: Consolidate idreg callbacks KVM: arm64: Advertise NV2 in the boot messages KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0 KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspace arm64: cpufeature: Handle NV_frac as a synonym of NV2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-17KVM: arm64: PMU: Reload when resettingAkihiko Odaki
Replace kvm_pmu_vcpu_reset() with the generic PMU reloading mechanism to ensure the consistency with system registers and to reduce code size. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250315-pmc-v5-5-ecee87dab216@daynix.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-17KVM: arm64: PMU: Reload when user modifies registersAkihiko Odaki
Commit d0c94c49792c ("KVM: arm64: Restore PMU configuration on first run") added the code to reload the PMU configuration on first run. It is also important to keep the correct state even if system registers are modified after first run, specifically when debugging Windows on QEMU with GDB; QEMU tries to write back all visible registers when resuming the VM execution with GDB, corrupting the PMU state. Windows always uses the PMU so this can cause adverse effects on that particular OS. The usual register writes and reset are already handled independently, but register writes from userspace are not covered. Trigger the code to reload the PMU configuration for them instead so that PMU configuration changes made by users will be applied also after the first run. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250315-pmc-v5-4-ecee87dab216@daynix.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-17KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regsAkihiko Odaki
Reload the perf event when setting the vPMU counter (vPMC) registers (PMCCNTR_EL0 and PMEVCNTR<n>_EL0). This is a change corresponding to commit 9228b26194d1 ("KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value") but for SET_ONE_REG. Values of vPMC registers are saved in sysreg files on certain occasions. These saved values don't represent the current values of the vPMC registers if the perf events for the vPMCs count events after the save. The current values of those registers are the sum of the sysreg file value and the current perf event counter value. But, when userspace writes those registers (using KVM_SET_ONE_REG), KVM only updates the sysreg file value and leaves the current perf event counter value as is. It is also important to keep the correct state even if userspace writes them after first run, specifically when debugging Windows on QEMU with GDB; QEMU tries to write back all visible registers when resuming the VM execution with GDB, corrupting the PMU state. Windows always uses the PMU so this can cause adverse effects on that particular OS. Fix this by releasing the current perf event and trigger recreating one with KVM_REQ_RELOAD_PMU. Fixes: 051ff581ce70 ("arm64: KVM: Add access handler for event counter register") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250315-pmc-v5-3-ecee87dab216@daynix.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-17KVM: arm64: PMU: Assume PMU presence in pmu-emul.cAkihiko Odaki
Many functions in pmu-emul.c checks kvm_vcpu_has_pmu(vcpu). A favorable interpretation is defensive programming, but it also has downsides: - It is confusing as it implies these functions are called without PMU although most of them are called only when a PMU is present. - It makes semantics of functions fuzzy. For example, calling kvm_pmu_disable_counter_mask() without PMU may result in no-op as there are no enabled counters, but it's unclear what kvm_pmu_get_counter_value() returns when there is no PMU. - It allows callers without checking kvm_vcpu_has_pmu(vcpu), but it is often wrong to call these functions without PMU. - It is error-prone to duplicate kvm_vcpu_has_pmu(vcpu) checks into multiple functions. Many functions are called for system registers, and the system register infrastructure already employs less error-prone, comprehensive checks. Check kvm_vcpu_has_pmu(vcpu) in callers of these functions instead, and remove the obsolete checks from pmu-emul.c. The only exceptions are the functions that implement ioctls as they have definitive semantics even when the PMU is not present. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250315-pmc-v5-2-ecee87dab216@daynix.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-17KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, ↵Akihiko Odaki
PMOVS{SET,CLR} Commit a45f41d754e0 ("KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}") changed KVM_SET_ONE_REG to update the mentioned registers in a way matching with the behavior of guest register writes. This is a breaking change of a UAPI though the new semantics looks cleaner and VMMs are not prepared for this. Firecracker, QEMU, and crosvm perform migration by listing registers with KVM_GET_REG_LIST, getting their values with KVM_GET_ONE_REG and setting them with KVM_SET_ONE_REG. This algorithm assumes KVM_SET_ONE_REG restores the values retrieved with KVM_GET_ONE_REG without any alteration. However, bit operations added by the earlier commit do not preserve the values retried with KVM_GET_ONE_REG and potentially break migration. Remove the bit operations that alter the values retrieved with KVM_GET_ONE_REG. Cc: stable@vger.kernel.org Fixes: a45f41d754e0 ("KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250315-pmc-v5-1-ecee87dab216@daynix.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-14KVM: arm64: Factor out setting HCRX_EL2 traps into separate functionFuad Tabba
Factor out the code for setting a vcpu's HCRX_EL2 traps in to a separate inline function. This allows us to share the logic with pKVM when setting the traps in protected mode. No functional change intended. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250314111832.4137161-2-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-12KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN*_2Sebastian Ott
Allow userspace to write the safe (NI) value for ID_AA64MMFR0_EL1.TGRAN*_2. Disallow to change these fields for NV since kvm provides a sanitized view for them based on the PAGE_SIZE. Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/kvmarm/20250306184013.30008-1-sebott@redhat.com/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03KVM: arm64: nv: Plumb handling of GICv3 EL2 accessesMarc Zyngier
Wire the handling of all GICv3 EL2 registers, and provide emulation for all the non memory-backed registers (ICC_SRE_EL2, ICH_VTR_EL2, ICH_MISR_EL2, ICH_ELRSR_EL2, and ICH_EISR_EL2). Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26KVM: arm64: Allow userspace to change the implementation ID registersSebastian Ott
KVM's treatment of the ID registers that describe the implementation (MIDR, REVIDR, and AIDR) is interesting, to say the least. On the userspace-facing end of it, KVM presents the values of the boot CPU on all vCPUs and treats them as invariant. On the guest side of things KVM presents the hardware values of the local CPU, which can change during CPU migration in a big-little system. While one may call this fragile, there is at least some degree of predictability around it. For example, if a VMM wanted to present big-little to a guest, it could affine vCPUs accordingly to the correct clusters. All of this makes a giant mess out of adding support for making these implementation ID registers writable. Avoid breaking the rather subtle ABI around the old way of doing things by requiring opt-in from userspace to make the registers writable. When the cap is enabled, allow userspace to set MIDR, REVIDR, and AIDR to any non-reserved value and present those values consistently across all vCPUs. Signed-off-by: Sebastian Ott <sebott@redhat.com> [oliver: changelog, capability] Link: https://lore.kernel.org/r/20250225005401.679536-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26KVM: arm64: Maintain per-VM copy of implementation ID regsSebastian Ott
Get ready to allow changes to the implementation ID registers by tracking the VM-wide values. Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250225005401.679536-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26KVM: arm64: Set HCR_EL2.TID1 unconditionallyOliver Upton
commit 90807748ca3a ("KVM: arm64: Hide SME system registers from guests") added trap handling for SMIDR_EL1, treating it as UNDEFINED as KVM does not support SME. This is right for the most part, however KVM needs to set HCR_EL2.TID1 to _actually_ trap the register. Unfortunately, this comes with some collateral damage as TID1 forces REVIDR_EL1 and AIDR_EL1 to trap as well. KVM has long treated these registers as "invariant" which is an awful term for the following: - Userspace sees the boot CPU values on all vCPUs - The guest sees the hardware values of the CPU on which a vCPU is scheduled Keep the plates spinning by adding trap handling for the affected registers and repaint all of the "invariant" crud into terms of identifying an implementation. Yes, at this point we only need to set TID1 on SME hardware, but REVIDR_EL1 and AIDR_EL1 are about to become mutable anyway. Cc: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Fixes: 90807748ca3a ("KVM: arm64: Hide SME system registers from guests") [maz: handle traps from 32bit] Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225005401.679536-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writableMarc Zyngier
We want to make sure that it is possible for userspace to configure whether recursive NV is possible. Make NV_frac writable for that purpose. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-12-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Move NV-specific capping to idreg sanitisationMarc Zyngier
Instead of applying the NV idreg limits at run time, switch to doing it at the same time as the reset of the VM initialisation. This will make things much simpler once we introduce vcpu-driven variants of NV. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely availableMarc Zyngier
ID_REG_LIMIT_FIELD_ENUM() is a useful macro to limit the idreg features exposed to guest and userspace, and the NV code can make use of it. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-8-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Consolidate idreg callbacksMarc Zyngier
Most of the ID_DESC() users use the same callbacks, with only a few overrides. Consolidate the common callbacks in a macro, and consistently use it everywhere. Whilst we're at it, give ID_UNALLOCATED() a .name string, so that we can easily decode traces. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspaceMarc Zyngier
Since our take on FEAT_NV is to only support FEAT_NV2, we should never expose ID_AA64MMFR2_EL1.NV to a guest nor userspace. Make sure we mask this field for good. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-3-maz@kernel.org [oliver: squash diff for NV field] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-04Merge tag 'kvmarm-fixes-6.14-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.14, take #1 - Correctly clean the BSS to the PoC before allowing EL2 to access it on nVHE/hVHE/protected configurations - Propagate ownership of debug registers in protected mode after the rework that landed in 6.14-rc1 - Stop pretending that we can run the protected mode without a GICv3 being present on the host - Fix a use-after-free situation that can occur if a vcpu fails to initialise the NV shadow S2 MMU contexts - Always evaluate the need to arm a background timer for fully emulated guest timers - Fix the emulation of EL1 timers in the absence of FEAT_ECV - Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0
2025-02-04KVM: arm64: timer: Don't adjust the EL2 virtual timer offsetMarc Zyngier
The way we deal with the EL2 virtual timer is a bit odd. We try to cope with E2H being flipped, and adjust which offset applies to that timer depending on the current E2H value. But that's a complexity we shouldn't have to worry about. What we have to deal with is either E2H being RES1, in which case there is no offset, or E2H being RES0, and the virtual timer simply does not exist. Drop the adjusting of the timer offset, which makes things a bit simpler. At the same time, make sure that accessing the HV timer when E2H is RES0 results in an UNDEF in the guest. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250204110050.150560-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-28Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull KVM/arm64 updates from Will Deacon: "New features: - Support for non-protected guest in protected mode, achieving near feature parity with the non-protected mode - Support for the EL2 timers as part of the ongoing NV support - Allow control of hardware tracing for nVHE/hVHE Improvements, fixes and cleanups: - Massive cleanup of the debug infrastructure, making it a bit less awkward and definitely easier to maintain. This should pave the way for further optimisations - Complete rewrite of pKVM's fixed-feature infrastructure, aligning it with the rest of KVM and making the code easier to follow - Large simplification of pKVM's memory protection infrastructure - Better handling of RES0/RES1 fields for memory-backed system registers - Add a workaround for Qualcomm's Snapdragon X CPUs, which suffer from a pretty nasty timer bug - Small collection of cleanups and low-impact fixes" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits) arm64/sysreg: Get rid of TRFCR_ELx SysregFields KVM: arm64: nv: Fix doc header layout for timers KVM: arm64: nv: Apply RESx settings to sysreg reset values KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors KVM: arm64: Fix selftests after sysreg field name update coresight: Pass guest TRFCR value to KVM KVM: arm64: Support trace filtering for guests KVM: arm64: coresight: Give TRBE enabled state to KVM coresight: trbe: Remove redundant disable call arm64/sysreg/tools: Move TRFCR definitions to sysreg tools: arm64: Update sysreg.h header files KVM: arm64: Drop pkvm_mem_transition for host/hyp donations KVM: arm64: Drop pkvm_mem_transition for host/hyp sharing KVM: arm64: Drop pkvm_mem_transition for FF-A KVM: arm64: Explicitly handle BRBE traps as UNDEFINED KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe() arm64: kvm: Introduce nvhe stack size constants KVM: arm64: Fix nVHE stacktrace VA bits mask KVM: arm64: Fix FEAT_MTE in pKVM Documentation: Update the behaviour of "kvm-arm.mode" ...
2025-01-20Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "We've got a little less than normal thanks to the holidays in December, but there's the usual summary below. The highlight is probably the 52-bit physical addressing (LPA2) clean-up from Ard. Confidential Computing: - Register a platform device when running in CCA realm mode to enable automatic loading of dependent modules CPU Features: - Update a bunch of system register definitions to pick up new field encodings from the architectural documentation - Add hwcaps and selftests for the new (2024) dpISA extensions Documentation: - Update EL3 (firmware) requirements for booting Linux on modern arm64 designs - Remove stale information about the kernel virtual memory map Miscellaneous: - Minor cleanups and typo fixes Memory management: - Fix vmemmap_check_pmd() to look at the PMD type bits - LPA2 (52-bit physical addressing) cleanups and minor fixes - Adjust physical address space depending upon whether or not LPA2 is enabled Perf and PMUs: - Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU - Extend AXI filtering support for the DDR PMU on NXP IMX SoCs - Fix Designware PCIe PMU event numbering - Add generic branch events for the Apple M1 CPU PMU - Add support for Marvell Odyssey DDR and LLC-TAD PMUs - Cleanups to the Hisilicon DDRC and Uncore PMU code - Advertise discard mode for the SPE PMU - Add the perf users mailing list to our MAINTAINERS entry" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits) Documentation: arm64: Remove stale and redundant virtual memory diagrams perf docs: arm_spe: Document new discard mode perf: arm_spe: Add format option for discard mode MAINTAINERS: Add perf list for drivers/perf/ arm64: Remove duplicate included header drivers/perf: apple_m1: Map generic branch events arm64: rsi: Add automatic arm-cca-guest module loading kselftest/arm64: Add 2024 dpISA extensions to hwcap test KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1 arm64/hwcap: Describe 2024 dpISA extensions to userspace arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12 arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu() arm64: mm: Test for pmd_sect() in vmemmap_check_pmd() arm64/mm: Replace open encodings with PXD_TABLE_BIT arm64/mm: Rename pte_mkpresent() as pte_mkvalid() arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09 arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09 arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09 ...
2025-01-17Merge branch kvm-arm64/misc-6.14 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-6.14: : . : Misc KVM/arm64 changes for 6.14 : : - Don't expose AArch32 EL0 capability when NV is enabled : : - Update documentation to reflect the full gamut of kvm-arm.mode : behaviours : : - Use the hypervisor VA bit width when dumping stacktraces : : - Decouple the hypervisor stack size from PAGE_SIZE, at least : on the surface... : : - Make use of str_enabled_disabled() when advertising GICv4.1 support : : - Explicitly handle BRBE traps as UNDEFINED : . KVM: arm64: Explicitly handle BRBE traps as UNDEFINED KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe() arm64: kvm: Introduce nvhe stack size constants KVM: arm64: Fix nVHE stacktrace VA bits mask Documentation: Update the behaviour of "kvm-arm.mode" KVM: arm64: nv: Advertise the lack of AArch32 EL0 support Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-17Merge branch kvm-arm64/nv-resx-fixes-6.14 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-resx-fixes-6.14: : . : Fixes for NV sysreg accessors. From the cover letter: : : "Joey recently reported that some rather basic tests were failing on : NV, and managed to track it down to critical register fields (such as : HCR_EL2.E2H) not having their expect value. : : Further investigation has outlined a couple of critical issues: : : - Evaluating HCR_EL2.E2H must always be done with a sanitising : accessor, no ifs, no buts. Given that KVM assumes a fixed value for : this bit, we cannot leave it to the guest to mess with. : : - Resetting the sysreg file must result in the RESx bits taking : effect. Otherwise, we may end-up making the wrong decision (see : above), and we definitely expose invalid values to the guest. Note : that because we compute the RESx masks very late in the VM setup, we : need to apply these masks at that particular point as well. : [...]" : . KVM: arm64: nv: Apply RESx settings to sysreg reset values KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors Signed-off-by: Marc Zyngier <maz@kernel.org> # Conflicts: # arch/arm64/kvm/nested.c
2025-01-17Merge branch kvm-arm64/pkvm-memshare-declutter into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-memshare-declutter: : . : pKVM memory transition simplifications, courtesy of Quentin Perret. : : From the cover letter: : "Since its early days, pKVM has formalized memory 'transitions' (shares : and donations) using 'struct pkvm_mem_transition' and bunch of helpers : to manipulate it. The intention was for all transitions to use this : machinery to ensure we're checking things consistently. However, as : development progressed, it became clear that the rigidity of this model : made it really difficult to use in some use-cases which ended-up : side-stepping it entirely. That is the case for the : hyp_{un}pin_shared_mem() and host_{un}share_guest() paths upstream which : use lower level helpers directly, as well as for several other pKVM : features that should land upstream in the future (ex: when a guest : relinquishes a page during ballooning, when annotating a page that is : being DMA'd to, ...). On top of this, the pkvm_mem_transition machinery : requires a lot of boilerplate which makes the code hard to read, but : also adds layers of indirection that no compilers seems to see through, : hence leading to suboptimal generated code. : : Given all the above, this series removes the pkvm_mem_transition : machinery from mem_protect.c, and converts all its users to use : __*_{check,set}_page_state_range() low-level helpers directly." : . KVM: arm64: Drop pkvm_mem_transition for host/hyp donations KVM: arm64: Drop pkvm_mem_transition for host/hyp sharing KVM: arm64: Drop pkvm_mem_transition for FF-A KVM: arm64: Only apply PMCR_EL0.P to the guest range of counters KVM: arm64: nv: Reload PMU events upon MDCR_EL2.HPME change KVM: arm64: Use KVM_REQ_RELOAD_PMU to handle PMCR_EL0.E change KVM: arm64: Add unified helper for reprogramming counters by mask KVM: arm64: Always check the state from hyp_ack_unshare() KVM: arm64: Fix set_id_regs selftest for ASIDBITS becoming unwritable Signed-off-by: Marc Zyngier <maz@kernel.org>