summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm
AgeCommit message (Collapse)Author
2023-11-13KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIERSean Christopherson
Convert KVM_ARCH_WANT_MMU_NOTIFIER into a Kconfig and select it where appropriate to effectively maintain existing behavior. Using a proper Kconfig will simplify building more functionality on top of KVM's mmu_notifier infrastructure. Add a forward declaration of kvm_gfn_range to kvm_types.h so that including arch/powerpc/include/asm/kvm_ppc.h's with CONFIG_KVM=n doesn't generate warnings due to kvm_gfn_range being undeclared. PPC defines hooks for PR vs. HV without guarding them via #ifdeffery, e.g. bool (*unmap_gfn_range)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*test_age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*set_spte_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); Alternatively, PPC could forward declare kvm_gfn_range, but there's no good reason not to define it in common KVM. Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes LoongArch: - New architecture for kvm. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: - Support for the Smstateen and Zicond extensions - Support for virtualizing senvcfg - Support for virtualized SBI debug console (DBCN) S390: - Nested page table management can be monitored through tracepoints and statistics x86: - Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ - Avoid WARN on systems with Intel IPI virtualization - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. - Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. - Optimize injection of PMU interrupts that are simultaneous with NMIs. - Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: - Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. - Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. - Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: - Various updates and fixes, mostly for x86 - MTRR and PAT fixes and optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits) KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: x86: Service NMI requests after PMI requests in VM-Enter path KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} ...
2023-11-01Merge tag 'soc-drivers-6.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "The highlights for the driver support this time are - Qualcomm platforms gain support for the Qualcomm Secure Execution Environment firmware interface to access EFI variables on certain devices, and new features for multiple platform and firmware drivers. - Arm FF-A firmware support gains support for v1.1 specification features, in particular notification and memory transaction descriptor changes. - SCMI firmware support now support v3.2 features for clock and DVFS configuration and a new transport for Qualcomm platforms. - Minor cleanups and bugfixes are added to pretty much all the active platforms: qualcomm, broadcom, dove, ti-k3, rockchip, sifive, amlogic, atmel, tegra, aspeed, vexpress, mediatek, samsung and more. In particular, this contains portions of the treewide conversion to use __counted_by annotations and the device_get_match_data helper" * tag 'soc-drivers-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (156 commits) soc: qcom: pmic_glink_altmode: Print return value on error firmware: qcom: scm: remove unneeded 'extern' specifiers firmware: qcom: scm: add a missing forward declaration for struct device firmware: qcom: move Qualcomm code into its own directory soc: samsung: exynos-chipid: Convert to platform remove callback returning void soc: qcom: apr: Add __counted_by for struct apr_rx_buf and use struct_size() soc: qcom: pmic_glink: fix connector type to be DisplayPort soc: ti: k3-socinfo: Avoid overriding return value soc: ti: k3-socinfo: Fix typo in bitfield documentation soc: ti: knav_qmss_queue: Use device_get_match_data() firmware: ti_sci: Use device_get_match_data() firmware: qcom: qseecom: add missing include guards soc/pxa: ssp: Convert to platform remove callback returning void soc/mediatek: mtk-mmsys: Convert to platform remove callback returning void soc/mediatek: mtk-devapc: Convert to platform remove callback returning void soc/loongson: loongson2_guts: Convert to platform remove callback returning void soc/litex: litex_soc_ctrl: Convert to platform remove callback returning void soc/ixp4xx: ixp4xx-qmgr: Convert to platform remove callback returning void soc/ixp4xx: ixp4xx-npe: Convert to platform remove callback returning void soc/hisilicon: kunpeng_hccs: Convert to platform remove callback returning void ...
2023-11-01Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "No major architecture features this time around, just some new HWCAP definitions, support for the Ampere SoC PMUs and a few fixes/cleanups. The bulk of the changes is reworking of the CPU capability checking code (cpus_have_cap() etc). - Major refactoring of the CPU capability detection logic resulting in the removal of the cpus_have_const_cap() function and migrating the code to "alternative" branches where possible - Backtrace/kgdb: use IPIs and pseudo-NMI - Perf and PMU: - Add support for Ampere SoC PMUs - Multi-DTC improvements for larger CMN configurations with multiple Debug & Trace Controllers - Rework the Arm CoreSight PMU driver to allow separate registration of vendor backend modules - Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf driver; use device_get_match_data() in the xgene driver; fix NULL pointer dereference in the hisi driver caused by calling cpuhp_state_remove_instance(); use-after-free in the hisi driver - HWCAP updates: - FEAT_SVE_B16B16 (BFloat16) - FEAT_LRCPC3 (release consistency model) - FEAT_LSE128 (128-bit atomic instructions) - SVE: remove a couple of pseudo registers from the cpufeature code. There is logic in place already to detect mismatched SVE features - Miscellaneous: - Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA bouncing is needed. The buffer is still required for small kmalloc() buffers - Fix module PLT counting with !RANDOMIZE_BASE - Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move synchronisation code out of the set_ptes() loop - More compact cpufeature displaying enabled cores - Kselftest updates for the new CPU features" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits) arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper perf: hisi: Fix use-after-free when register pmu fails drivers/perf: hisi_pcie: Initialize event->cpu only on success drivers/perf: hisi_pcie: Check the type first in pmu::event_init() arm64: cpufeature: Change DBM to display enabled cores arm64: cpufeature: Display the set of cores with a feature perf/arm-cmn: Enable per-DTC counter allocation perf/arm-cmn: Rework DTC counters (again) perf/arm-cmn: Fix DTC domain detection drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init() drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process clocksource/drivers/arm_arch_timer: limit XGene-1 workaround arm64: Remove system_uses_lse_atomics() arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused drivers/perf: xgene: Use device_get_match_data() perf/amlogic: add missing MODULE_DEVICE_TABLE arm64/mm: Hoist synchronization out of set_ptes() loop ...
2023-10-31Merge tag 'kvmarm-6.7' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.7 - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes
2023-10-30Merge branch kvm-arm64/pmu_pmcr_n into kvmarm/nextOliver Upton
* kvm-arm64/pmu_pmcr_n: : User-defined PMC limit, courtesy Raghavendra Rao Ananta : : Certain VMMs may want to reserve some PMCs for host use while running a : KVM guest. This was a bit difficult before, as KVM advertised all : supported counters to the guest. Userspace can now limit the number of : advertised PMCs by writing to PMCR_EL0.N, as KVM's sysreg and PMU : emulation enforce the specified limit for handling guest accesses. KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMU KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0 KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handler KVM: arm64: PMU: Introduce helpers to set the guest's PMU Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/mops into kvmarm/nextOliver Upton
* kvm-arm64/mops: : KVM support for MOPS, courtesy of Kristina Martsenko : : MOPS adds new instructions for accelerating memcpy(), memset(), and : memmove() operations in hardware. This series brings virtualization : support for KVM guests, and allows VMs to run on asymmetrict systems : that may have different MOPS implementations. KVM: arm64: Expose MOPS instructions to guests KVM: arm64: Add handler for MOPS exceptions Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/writable-id-regs into kvmarm/nextOliver Upton
* kvm-arm64/writable-id-regs: : Writable ID registers, courtesy of Jing Zhang : : This series significantly expands the architectural feature set that : userspace can manipulate via the ID registers. A new ioctl is defined : that makes the mutable fields in the ID registers discoverable to : userspace. KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: selftests: Test for setting ID register from usersapce tools headers arm64: Update sysreg.h with kernel sources KVM: selftests: Generate sysreg-defs.h and add to include path perf build: Generate arm64's sysreg-defs.h and add to include path tools: arm64: Add a Makefile for generating sysreg-defs.h KVM: arm64: Document vCPU feature selection UAPIs KVM: arm64: Allow userspace to change ID_AA64ZFR0_EL1 KVM: arm64: Allow userspace to change ID_AA64PFR0_EL1 KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1 KVM: arm64: Allow userspace to change ID_AA64ISAR{0-2}_EL1 KVM: arm64: Bump up the default KVM sanitised debug version to v8p8 KVM: arm64: Reject attempts to set invalid debug arch version KVM: arm64: Advertise selected DebugVer in DBGDIDR.Version KVM: arm64: Use guest ID register values for the sake of emulation KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKS KVM: arm64: Allow userspace to get the writable masks for feature ID registers Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/sgi-injection into kvmarm/nextOliver Upton
* kvm-arm64/sgi-injection: : vSGI injection improvements + fixes, courtesy Marc Zyngier : : Avoid linearly searching for vSGI targets using a compressed MPIDR to : index a cache. While at it, fix some egregious bugs in KVM's mishandling : of vcpuid (user-controlled value) and vcpu_idx. KVM: arm64: Clarify the ordering requirements for vcpu/RD creation KVM: arm64: vgic-v3: Optimize affinity-based SGI injection KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available KVM: arm64: Build MPIDR to vcpu index cache at runtime KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() KVM: arm64: Use vcpu_idx for invalidation tracking KVM: arm64: vgic: Use vcpu_idx for the debug information KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id KVM: arm64: vgic-v3: Refactor GICv3 SGI generation KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/stage2-vhe-load into kvmarm/nextOliver Upton
* kvm-arm64/stage2-vhe-load: : Setup stage-2 MMU from vcpu_load() for VHE : : Unlike nVHE, there is no need to switch the stage-2 MMU around on guest : entry/exit in VHE mode as the host is running at EL2. Despite this KVM : reloads the stage-2 on every guest entry, which is needless. : : This series moves the setup of the stage-2 MMU context to vcpu_load() : when running in VHE mode. This is likely to be a win across the board, : but also allows us to remove an ISB on the guest entry path for systems : with one of the speculative AT errata. KVM: arm64: Move VTCR_EL2 into struct s2_mmu KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe() KVM: arm64: Rename helpers for VHE vCPU load/put KVM: arm64: Reload stage-2 for VMID change on VHE KVM: arm64: Restore the stage-2 context in VHE's __tlb_switch_to_host() KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/nv-trap-fixes into kvmarm/nextOliver Upton
* kvm-arm64/nv-trap-fixes: : NV trap forwarding fixes, courtesy Miguel Luis and Marc Zyngier : : - Explicitly define the effects of HCR_EL2.NV on EL2 sysregs in the : NV trap encoding : : - Make EL2 registers that access AArch32 guest state UNDEF or RAZ/WI : where appropriate for NV guests KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/smccc-filter-cleanups into kvmarm/nextOliver Upton
* kvm-arm64/smccc-filter-cleanups: : Cleanup the management of KVM's SMCCC maple tree : : Avoid the cost of maintaining the SMCCC filter maple tree if userspace : hasn't writen a rule to the filter. While at it, rip out the now : unnecessary VM flag to indicate whether or not the SMCCC filter was : configured. KVM: arm64: Use mtree_empty() to determine if SMCCC filter configured KVM: arm64: Only insert reserved ranges when SMCCC filter is used KVM: arm64: Add a predicate for testing if SMCCC filter is configured Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/pmevtyper-filter into kvmarm/nextOliver Upton
* kvm-arm64/pmevtyper-filter: : Fixes to KVM's handling of the PMUv3 exception level filtering bits : : - NSH (count at EL2) and M (count at EL3) should be stateful when the : respective EL is advertised in the ID registers but have no effect on : event counting. : : - NSU and NSK modify the event filtering of EL0 and EL1, respectively. : Though the kernel may not use these bits, other KVM guests might. : Implement these bits exactly as written in the pseudocode if EL3 is : advertised. KVM: arm64: Add PMU event filter bits required if EL3 is implemented KVM: arm64: Make PMEVTYPER<n>_EL0.NSH RES0 if EL2 isn't advertised Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/feature-flag-refactor into kvmarm/nextOliver Upton
* kvm-arm64/feature-flag-refactor: : vCPU feature flag cleanup : : Clean up KVM's handling of vCPU feature flags to get rid of the : vCPU-scoped bitmaps and remove failure paths from kvm_reset_vcpu(). KVM: arm64: Get rid of vCPU-scoped feature bitmap KVM: arm64: Remove unused return value from kvm_reset_vcpu() KVM: arm64: Hoist NV+SVE check into KVM_ARM_VCPU_INIT ioctl handler KVM: arm64: Prevent NV feature flag on systems w/o nested virt KVM: arm64: Hoist PAuth checks into KVM_ARM_VCPU_INIT ioctl KVM: arm64: Hoist SVE check into KVM_ARM_VCPU_INIT ioctl handler KVM: arm64: Hoist PMUv3 check into KVM_ARM_VCPU_INIT ioctl handler KVM: arm64: Add generic check for system-supported vCPU features Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/misc into kvmarm/nextOliver Upton
* kvm-arm64/misc: : Miscellaneous updates : : - Put an upper bound on the number of I-cache invalidations by : cacheline to avoid soft lockups : : - Get rid of bogus refererence count transfer for THP mappings : : - Do a local TLB invalidation on permission fault race : : - Fixes for page_fault_test KVM selftest : : - Add a tracepoint for detecting MMIO instructions unsupported by KVM KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: arm64: Do not transfer page refcount for THP adjustment KVM: arm64: Avoid soft lockups due to I-cache maintenance arm64: tlbflush: Rename MAX_TLBI_OPS KVM: arm64: Don't use kerneldoc comment for arm64_check_features() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30KVM: arm64: Add tracepoint for MMIO accesses where ISV==0Oliver Upton
It is a pretty well known fact that KVM does not support MMIO emulation without valid instruction syndrome information (ESR_EL2.ISV == 0). The current kvm_pr_unimpl() is pretty useless, as it contains zero context to relate the event to a vCPU. Replace it with a precise tracepoint that dumps the relevant context so the user can make sense of what the guest is doing. Acked-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231026205306.3045075-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30KVM: arm64: Always invalidate TLB for stage-2 permission faultsOliver Upton
It is possible for multiple vCPUs to fault on the same IPA and attempt to resolve the fault. One of the page table walks will actually update the PTE and the rest will return -EAGAIN per our race detection scheme. KVM elides the TLB invalidation on the racing threads as the return value is nonzero. Before commit a12ab1378a88 ("KVM: arm64: Use local TLBI on permission relaxation") KVM always used broadcast TLB invalidations when handling permission faults, which had the convenient property of making the stage-2 updates visible to all CPUs in the system. However now we do a local invalidation, and TLBI elision leads to the vCPU thread faulting again on the stale entry. Remember that the architecture permits the TLB to cache translations that precipitate a permission fault. Invalidate the TLB entry responsible for the permission fault if the stage-2 descriptor has been relaxed, regardless of which thread actually did the job. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230922223229.1608155-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-26Merge branch 'for-next/cpus_have_const_cap' into for-next/coreCatalin Marinas
* for-next/cpus_have_const_cap: (38 commits) : cpus_have_const_cap() removal arm64: Remove cpus_have_const_cap() arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419 arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0 arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64} arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2 arm64: Avoid cpus_have_const_cap() for ARM64_SSBS arm64: Avoid cpus_have_const_cap() for ARM64_MTE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT ...
2023-10-25KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WIMarc Zyngier
When trapping accesses from a NV guest that tries to access SPSR_{irq,abt,und,fiq}, make sure we handle them as RAZ/WI, as if AArch32 wasn't implemented. This involves a bit of repainting to make the visibility handler more generic. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregsMarc Zyngier
DBGVCR32_EL2, DACR32_EL2, IFSR32_EL2 and FPEXC32_EL2 are required to UNDEF when AArch32 isn't implemented, which is definitely the case when running NV. Given that this is the only case where these registers can trap, unconditionally inject an UNDEF exception. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20231023095444.1587322-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25KVM: arm64: Refine _EL2 system register list that require trap reinjectionMiguel Luis
Implement a fine grained approach in the _EL2 sysreg range instead of the current wide cast trap. This ensures that we don't mistakenly inject the wrong exception into the guest. [maz: commit message massaging, dropped secure and AArch32 registers from the list] Fixes: d0fc0a2519a6 ("KVM: arm64: nv: Add trap forwarding for HCR_EL2") Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guestReiji Watanabe
KVM does not yet support userspace modifying PMCR_EL0.N (With the previous patch, KVM ignores what is written by userspace). Add support userspace limiting PMCR_EL0.N. Disallow userspace to set PMCR_EL0.N to a value that is greater than the host value as KVM doesn't support more event counters than what the host HW implements. Also, make this register immutable after the VM has started running. To maintain the existing expectations, instead of returning an error, KVM returns a success for these two cases. Finally, ignore writes to read-only bits that are cleared on vCPU reset, and RES{0,1} bits (including writable bits that KVM doesn't support yet), as those bits shouldn't be modified (at least with the current KVM). Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20231020214053.2144305-8-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first runRaghavendra Rao Ananta
For unimplemented counters, the registers PM{C,I}NTEN{SET,CLR} and PMOVS{SET,CLR} are expected to have the corresponding bits RAZ. Hence to ensure correct KVM's PMU emulation, mask out the RES0 bits. Defer this work to the point that userspace can no longer change the number of advertised PMCs. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231020214053.2144305-7-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}Raghavendra Rao Ananta
For unimplemented counters, the bits in PM{C,I}NTEN{SET,CLR} and PMOVS{SET,CLR} registers are expected to RAZ. To honor this, explicitly implement the {get,set}_user functions for these registers to mask out unimplemented counters for userspace reads and writes. Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20231020214053.2144305-6-rananta@google.com [Oliver: drop unnecessary locking] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMURaghavendra Rao Ananta
The number of PMU event counters is indicated in PMCR_EL0.N. For a vCPU with PMUv3 configured, the value is set to the same value as the current PE on every vCPU reset. Unless the vCPU is pinned to PEs that has the PMU associated to the guest from the initial vCPU reset, the value might be different from the PMU's PMCR_EL0.N on heterogeneous PMU systems. Fix this by setting the vCPU's PMCR_EL0.N to the PMU's PMCR_EL0.N value. Track the PMCR_EL0.N per guest, as only one PMU can be set for the guest (PMCR_EL0.N must be the same for all vCPUs of the guest), and it is convenient for updating the value. To achieve this, the patch introduces a helper, kvm_arm_pmu_get_max_counters(), that reads the maximum number of counters from the arm_pmu associated to the VM. Make the function global as upcoming patches will be interested to know the value while setting the PMCR.N of the guest from userspace. KVM does not yet support userspace modifying PMCR_EL0.N. The following patch will add support for that. Reviewed-by: Sebastian Ott <sebott@redhat.com> Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20231020214053.2144305-5-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0Reiji Watanabe
Add a helper to read a vCPU's PMCR_EL0, and use it whenever KVM reads a vCPU's PMCR_EL0. Currently, the PMCR_EL0 value is tracked per vCPU. The following patches will make (only) PMCR_EL0.N track per guest. Having the new helper will be useful to combine the PMCR_EL0.N field (tracked per guest) and the other fields (tracked per vCPU) to provide the value of PMCR_EL0. No functional change intended. Reviewed-by: Sebastian Ott <sebott@redhat.com> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231020214053.2144305-4-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handlerReiji Watanabe
Future changes to KVM's sysreg emulation will rely on having a valid PMU instance to determine the number of implemented counters (PMCR_EL0.N). This is earlier than when userspace is expected to modify the vPMU device attributes, where the default is selected today. Select the default PMU when handling KVM_ARM_VCPU_INIT such that it is available in time for sysreg emulation. Reviewed-by: Sebastian Ott <sebott@redhat.com> Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20231020214053.2144305-3-rananta@google.com [Oliver: rewrite changelog] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: Add PMU event filter bits required if EL3 is implementedOliver Upton
Suzuki noticed that KVM's PMU emulation is oblivious to the NSU and NSK event filter bits. On systems that have EL3 these bits modify the filter behavior in non-secure EL0 and EL1, respectively. Even though the kernel doesn't use these bits, it is entirely possible some other guest OS does. Additionally, it would appear that these and the M bit are required by the architecture if EL3 is implemented. Allow the EL3 event filter bits to be set if EL3 is advertised in the guest's ID register. Implement the behavior of NSU and NSK according to the pseudocode, and entirely ignore the M bit for perf event creation. Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20231019185618.3442949-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: Make PMEVTYPER<n>_EL0.NSH RES0 if EL2 isn't advertisedOliver Upton
The NSH bit, which filters event counting at EL2, is required by the architecture if an implementation has EL2. Even though KVM doesn't support nested virt yet, it makes no effort to hide the existence of EL2 from the ID registers. Userspace can, however, change the value of PFR0 to hide EL2. Align KVM's sysreg emulation with the architecture and make NSH RES0 if EL2 isn't advertised. Keep in mind the bit is ignored when constructing the backing perf event. While at it, build the event type mask using explicit field definitions instead of relying on ARMV8_PMU_EVTYPE_MASK. KVM probably should've been doing this in the first place, as it avoids changes to the aforementioned mask affecting sysreg emulation. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20231019185618.3442949-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: PMU: Introduce helpers to set the guest's PMUReiji Watanabe
Introduce new helper functions to set the guest's PMU (kvm->arch.arm_pmu) either to a default probed instance or to a caller requested one, and use it when the guest's PMU needs to be set. These helpers will make it easier for the following patches to modify the relevant code. No functional change intended. Reviewed-by: Sebastian Ott <sebott@redhat.com> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231020214053.2144305-2-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-23KVM: arm64: Move VTCR_EL2 into struct s2_mmuMarc Zyngier
We currently have a global VTCR_EL2 value for each guest, even if the guest uses NV. This implies that the guest's own S2 must fit in the host's. This is odd, for multiple reasons: - the PARange values and the number of IPA bits don't necessarily match: you can have 33 bits of IPA space, and yet you can only describe 32 or 36 bits of PARange - When userspace set the IPA space, it creates a contract with the kernel saying "this is the IPA space I'm prepared to handle". At no point does it constraint the guest's own IPA space as long as the guest doesn't try to use a [I]PA outside of the IPA space set by userspace - We don't even try to hide the value of ID_AA64MMFR0_EL1.PARange. And then there is the consequence of the above: if a guest tries to create a S2 that has for input address something that is larger than the IPA space defined by the host, we inject a fatal exception. This is no good. For all intent and purposes, a guest should be able to have the S2 it really wants, as long as the *output* address of that S2 isn't outside of the IPA space. For that, we need to have a per-s2_mmu VTCR_EL2 setting, which allows us to represent the full PARange. Move the vctr field into the s2_mmu structure, which has no impact whatsoever, except for NV. Note that once we are able to override ID_AA64MMFR0_EL1.PARange from userspace, we'll also be able to restrict the size of the shadow S2 that NV uses. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012205108.3937270-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()Oliver Upton
To date the VHE code has aggressively reloaded the stage-2 MMU context on every guest entry, despite the fact that this isn't necessary. This was probably done for consistency with the nVHE code, which needs to switch in/out the stage-2 MMU context as both the host and guest run at EL1. Hoist __load_stage2() into kvm_vcpu_load_vhe(), thus avoiding a reload on every guest entry/exit. This is likely to be beneficial to systems with one of the speculative AT errata, as there is now one fewer context synchronization event on the guest entry path. Additionally, it is possible that implementations have hitched correctness mitigations on writes to VTTBR_EL2, which are now elided on guest re-entry. Note that __tlb_switch_to_guest() is deliberately left untouched as it can be called outside the context of a running vCPU. Link: https://lore.kernel.org/r/20231018233212.2888027-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Rename helpers for VHE vCPU load/putOliver Upton
The names for the helpers we expose to the 'generic' KVM code are a bit imprecise; we switch the EL0 + EL1 sysreg context and setup trap controls that do not need to change for every guest entry/exit. Rename + shuffle things around a bit in preparation for loading the stage-2 MMU context on vcpu_load(). Link: https://lore.kernel.org/r/20231018233212.2888027-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Reload stage-2 for VMID change on VHEMarc Zyngier
Naturally, a change to the VMID for an MMU implies a new value for VTTBR. Reload on VMID change in anticipation of loading stage-2 on vcpu_load() instead of every guest entry. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231018233212.2888027-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Restore the stage-2 context in VHE's __tlb_switch_to_host()Marc Zyngier
An MMU notifier could cause us to clobber the stage-2 context loaded on a CPU when we switch to another VM's context to invalidate. This isn't an issue right now as the stage-2 context gets reloaded on every guest entry, but is disastrous when moving __load_stage2() into the vcpu_load() path. Restore the previous stage-2 context on the way out of a TLB invalidation if we installed something else. Deliberately do this after TGE=1 is synchronized to keep things safe in light of the speculative AT errata. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231018233212.2888027-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host()Oliver Upton
HCR_EL2.TGE=0 is sufficient to disable stage-2 translation, so there's no need to explicitly zero VTTBR_EL2. Link: https://lore.kernel.org/r/20231018233212.2888027-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18clocksource/drivers/arm_arch_timer: limit XGene-1 workaroundAndre Przywara
The AppliedMicro XGene-1 CPU has an erratum where the timer condition would only consider TVAL, not CVAL. We currently apply a workaround when seeing the PartNum field of MIDR_EL1 being 0x000, under the assumption that this would match only the XGene-1 CPU model. However even the Ampere eMAG (aka XGene-3) uses that same part number, and only differs in the "Variant" and "Revision" fields: XGene-1's MIDR is 0x500f0000, our eMAG reports 0x503f0002. Experiments show the latter doesn't show the faulty behaviour. Increase the specificity of the check to only consider partnum 0x000 and variant 0x00, to exclude the Ampere eMAG. Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations") Reported-by: Ross Burton <ross.burton@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20231016153127.116101-1-andre.przywara@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16Merge tag 'ffa-updates-6.7' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers Arm FF-A updates for v6.7 The main addition is the initial support for the notifications and memory transaction descriptor changes added in FF-A v1.1 specification. The notification mechanism enables a requester/sender endpoint to notify a service provider/receiver endpoint about an event with non-blocking semantics. A notification is akin to the doorbell between two endpoints in a communication protocol that is based upon the doorbell/mailbox mechanism. The framework is responsible for the delivery of the notification from the ender to the receiver without blocking the sender. The receiver endpoint relies on the OS scheduler for allocation of CPU cycles to handle a notification. OS is referred as the receiver’s scheduler in the context of notifications. The framework is responsible for informing the receiver’s scheduler that the receiver must be run since it has a pending notification. The series also includes support for the new format of memory transaction descriptors introduced in v1.1 specification. Apart from the main additions, it includes minor fixes to re-enable FF-A drivers usage of 32bit mode of messaging and kernel warning due to the missing assignment of IDR allocation ID to the FFA device. It also adds emitting 'modalias' to the base attribute of FF-A devices. * tag 'ffa-updates-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: arm_ffa: Upgrade the driver version to v1.1 firmware: arm_ffa: Update memory descriptor to support v1.1 format firmware: arm_ffa: Switch to using ffa_mem_desc_offset() accessor KVM: arm64: FFA: Remove access of endpoint memory access descriptor array firmware: arm_ffa: Simplify the computation of transmit and fragment length firmware: arm_ffa: Add notification handling mechanism firmware: arm_ffa: Add interface to send a notification to a given partition firmware: arm_ffa: Add interfaces to request notification callbacks firmware: arm_ffa: Add schedule receiver callback mechanism firmware: arm_ffa: Initial support for scheduler receiver interrupt firmware: arm_ffa: Implement the NOTIFICATION_INFO_GET interface firmware: arm_ffa: Implement the FFA_NOTIFICATION_GET interface firmware: arm_ffa: Implement the FFA_NOTIFICATION_SET interface firmware: arm_ffa: Implement the FFA_RUN interface firmware: arm_ffa: Implement the notification bind and unbind interface firmware: arm_ffa: Implement notification bitmap create and destroy interfaces firmware: arm_ffa: Update the FF-A command list with v1.1 additions firmware: arm_ffa: Emit modalias for FF-A devices firmware: arm_ffa: Allow the FF-A drivers to use 32bit mode of messaging firmware: arm_ffa: Assign the missing IDR allocation ID to the FFA device Link: https://lore.kernel.org/r/20231010124354.1620064-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_BTIMark Rutland
In system_supports_bti() we use cpus_have_const_cap() to check for ARM64_HAS_BTI, but this is not necessary and alternative_has_cap_*() or cpus_have_final_*cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. When CONFIG_ARM64_BTI_KERNEL=y, the ARM64_HAS_BTI cpucap is a strict boot cpu feature which is detected and patched early on the boot cpu. All uses guarded by CONFIG_ARM64_BTI_KERNEL happen after the boot CPU has detected ARM64_HAS_BTI and patched boot alternatives, and hence can safely use alternative_has_cap_*() or cpus_have_final_boot_cap(). Regardless of CONFIG_ARM64_BTI_KERNEL, all other uses of ARM64_HAS_BTI happen after system capabilities have been finalized and alternatives have been patched. Hence these can safely use alternative_has_cap_*) or cpus_have_final_cap(). This patch splits system_supports_bti() into system_supports_bti() and system_supports_bti_kernel(), with the former handling where the cpucap affects userspace functionality, and ther latter handling where the cpucap affects kernel functionality. The use of cpus_have_const_cap() is replaced by cpus_have_final_cap() in cpus_have_const_cap, and cpus_have_final_boot_cap() in system_supports_bti_kernel(). This will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The use of cpus_have_final_cap() and cpus_have_final_boot_cap() will make it easier to spot if code is chaanged such that these run before the ARM64_HAS_BTI cpucap is guaranteed to have been finalized. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: kvm: Use cpus_have_final_cap() explicitlyMark Rutland
Much of the arm64 KVM code uses cpus_have_const_cap() to check for cpucaps, but this is unnecessary and it would be preferable to use cpus_have_final_cap(). For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. KVM is initialized after cpucaps have been finalized and alternatives have been patched. Since commit: d86de40decaa14e6 ("arm64: cpufeature: upgrade hyp caps to final") ... use of cpus_have_const_cap() in hyp code is automatically converted to use cpus_have_final_cap(): | static __always_inline bool cpus_have_const_cap(int num) | { | if (is_hyp_code()) | return cpus_have_final_cap(num); | else if (system_capabilities_finalized()) | return __cpus_have_const_cap(num); | else | return cpus_have_cap(num); | } Thus, converting hyp code to use cpus_have_final_cap() directly will not result in any functional change. Non-hyp KVM code is also not executed until cpucaps have been finalized, and it would be preferable to extent the same treatment to this code and use cpus_have_final_cap() directly. This patch converts instances of cpus_have_const_cap() in KVM-only code over to cpus_have_final_cap(). As all of this code runs after cpucaps have been finalized, there should be no functional change as a result of this patch, but the redundant instructions generated by cpus_have_const_cap() will be removed from the non-hyp KVM code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-12KVM: arm64: timers: Correctly handle TGE flip with CNTPOFF_EL2Marc Zyngier
Contrary to common belief, HCR_EL2.TGE has a direct and immediate effect on the way the EL0 physical counter is offset. Flipping TGE from 1 to 0 while at EL2 immediately changes the way the counter compared to the CVAL limit. This means that we cannot directly save/restore the guest's view of CVAL, but that we instead must treat it as if CNTPOFF didn't exist. Only in the world switch, once we figure out that we do have CNTPOFF, can we must the offset back and forth depending on the polarity of TGE. Fixes: 2b4825a86940 ("KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer") Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-10-12KVM: arm64: POR{E0}_EL1 do not need trap handlersJoey Gouly
These will not be trapped by KVM, so don't need a handler. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012123459.2820835-3-joey.gouly@arm.com
2023-10-12KVM: arm64: Add nPIR{E0}_EL1 to HFG trapsJoey Gouly
nPIR_EL1 and nPIREO_EL1 are part of the 'reverse polarity' set of bits, set them so that we disable the traps for a guest. Unfortunately, these bits are not yet described in the ARM ARM, but only live in the XML description. Also add them to the NV FGT forwarding infrastructure. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Fixes: e930694e6145 ("KVM: arm64: Restructure FGT register switching") Cc: Oliver Upton <oliver.upton@linux.dev> [maz: add entries to the NV FGT array, commit message update] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012123459.2820835-2-joey.gouly@arm.com
2023-10-12KVM: arm64: pmu: Drop redundant check for non-NULL kvm_pmu_eventsAnshuman Khandual
There is an allocated and valid struct kvm_pmu_events for each cpu on the system via DEFINE_PER_CPU(). Hence there cannot be a NULL pointer accessed via this_cpu_ptr() in the helper kvm_get_pmu_events(). Hence non-NULL check for pmu in such places are redundant and can be dropped. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: kvmarm@lists.linux.dev Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012064617.897346-1-anshuman.khandual@arm.com
2023-10-09KVM: arm64: Expose MOPS instructions to guestsKristina Martsenko
Expose the Armv8.8 FEAT_MOPS feature to guests in the ID register and allow the MOPS instructions to be run in a guest. Only expose MOPS if the whole system supports it. Note, it is expected that guests do not use these instructions on MMIO, similarly to other instructions where ESR_EL2.ISV==0 such as LDP/STP. Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230922112508.1774352-3-kristina.martsenko@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-09KVM: arm64: Add handler for MOPS exceptionsKristina Martsenko
An Armv8.8 FEAT_MOPS main or epilogue instruction will take an exception if executed on a CPU with a different MOPS implementation option (A or B) than the CPU where the preceding prologue instruction ran. In this case the OS exception handler is expected to reset the registers and restart execution from the prologue instruction. A KVM guest may use the instructions at EL1 at times when the guest is not able to handle the exception, expecting that the instructions will only run on one CPU (e.g. when running UEFI boot services in the guest). As KVM may reschedule the guest between different types of CPUs at any time (on an asymmetric system), it needs to also handle the resulting exception itself in case the guest is not able to. A similar situation will also occur in the future when live migrating a guest from one type of CPU to another. Add handling for the MOPS exception to KVM. The handling can be shared with the EL0 exception handler, as the logic and register layouts are the same. The exception can be handled right after exiting a guest, which avoids the cost of returning to the host exit handler. Similarly to the EL0 exception handler, in case the main or epilogue instruction is being single stepped, it makes sense to finish the step before executing the prologue instruction, so advance the single step state machine. Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230922112508.1774352-2-kristina.martsenko@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-08KVM: arm64: FFA: Remove access of endpoint memory access descriptor arraySudeep Holla
FF-A v1.1 removes the fixed location of endpoint memory access descriptor array within the memory transaction descriptor structure. In preparation to remove the ep_mem_access member from the ffa_mem_region structure, provide the accessor to fetch the offset and use the same in FF-A proxy implementation. The accessor take the FF-A version as the argument from which the memory access descriptor format can be determined. v1.0 uses the old format while v1.1 onwards use the new format specified in the v1.1 specification. Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Cc: Quentin Perret <qperret@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-14-cddd3237809c@arm.com Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2023-10-05KVM: arm64: Use mtree_empty() to determine if SMCCC filter configuredOliver Upton
The smccc_filter maple tree is only populated if userspace attempted to configure it. Use the state of the maple tree to determine if the filter has been configured, eliminating the VM flag. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231004234947.207507-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-05KVM: arm64: Only insert reserved ranges when SMCCC filter is usedOliver Upton
The reserved ranges are only useful for preventing userspace from adding a rule that intersects with functions we must handle in KVM. If userspace never writes to the SMCCC filter than this is all just wasted work/memory. Insert reserved ranges on the first call to KVM_ARM_VM_SMCCC_FILTER. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231004234947.207507-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-05KVM: arm64: Add a predicate for testing if SMCCC filter is configuredOliver Upton
Eventually we can drop the VM flag, move around the existing implementation for now. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231004234947.207507-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>