summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm
AgeCommit message (Collapse)Author
2021-11-26Merge tag 'kvmarm-fixes-5.16-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.16, take #2 - Fix constant sign extension affecting TCR_EL2 and preventing running on ARMv8.7 models due to spurious bits being set - Fix use of helpers using PSTATE early on exit by always sampling it as soon as the exit takes place - Move pkvm's 32bit handling into a common helper
2021-11-24KVM: arm64: Move pkvm's special 32bit handling into a generic infrastructureMarc Zyngier
Protected KVM is trying to turn AArch32 exceptions into an illegal exception entry. Unfortunately, it does that in a way that is a bit abrupt, and too early for PSTATE to be available. Instead, move it to the fixup code, which is a more reasonable place for it. This will also be useful for the NV code. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-11-24KVM: arm64: Save PSTATE early on exitMarc Zyngier
In order to be able to use primitives such as vcpu_mode_is_32bit(), we need to synchronize the guest PSTATE. However, this is currently done deep into the bowels of the world-switch code, and we do have helpers evaluating this much earlier (__vgic_v3_perform_cpuif_access and handle_aarch32_guest, for example). Move the saving of the guest pstate into the early fixups, which cures the first issue. The second one will be addressed separately. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-11-18KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()Vitaly Kuznetsov
Generally, it doesn't make sense to return the recommended maximum number of vCPUs which exceeds the maximum possible number of vCPUs. Note: ARM64 is special as the value returned by KVM_CAP_MAX_VCPUS differs depending on whether it is a system-wide ioctl or a per-VM one. Previously, KVM_CAP_NR_VCPUS didn't have this difference and it seems preferable to keep the status quo. Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus() which is what gets returned by system-wide KVM_CAP_MAX_VCPUS. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211116163443.88707-2-vkuznets@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-12Merge tag 'kvmarm-fixes-5.16-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 5.16, take #1 - Fix the host S2 finalization by solely iterating over the memblocks instead of the whole IPA space - Tighten the return value of kvm_vcpu_preferred_target() now that 32bit support is long gone - Make sure the extraction of ESR_ELx.EC is limited to the architected bits - Comment fixups
2021-11-08KVM: arm64: Fix host stage-2 finalizationQuentin Perret
We currently walk the hypervisor stage-1 page-table towards the end of hyp init in nVHE protected mode and adjust the host page ownership attributes in its stage-2 in order to get a consistent state from both point of views. The walk is done on the entire hyp VA space, and expects to only ever find page-level mappings. While this expectation is reasonable in the half of hyp VA space that maps memory with a fixed offset (see the loop in pkvm_create_mappings_locked()), it can be incorrect in the other half where nothing prevents the usage of block mappings. For instance, on systems where memory is physically aligned at an address that happens to maps to a PMD aligned VA in the hyp_vmemmap, kvm_pgtable_hyp_map() will install block mappings when backing the hyp_vmemmap, which will later cause finalize_host_mappings() to fail. Furthermore, it should be noted that all pages backing the hyp_vmemmap are also mapped in the 'fixed offset range' of the hypervisor, which implies that finalize_host_mappings() will walk both aliases and update the host stage-2 attributes twice. The order in which this happens is unpredictable, though, since the hyp VA layout is highly dependent on the position of the idmap page, hence resulting in a fragile mess at best. In order to fix all of this, let's restrict the finalization walk to only cover memory regions in the 'fixed-offset range' of the hyp VA space and nothing else. This not only fixes a correctness issue, but will also result in a slighlty faster hyp initialization overall. Fixes: 2c50166c62ba ("KVM: arm64: Mark host bss and rodata section as shared") Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211108154636.393384-1-qperret@google.com
2021-11-08KVM: arm64: Change the return type of kvm_vcpu_preferred_target()YueHaibing
kvm_vcpu_preferred_target() always return 0 because kvm_target_cpu() never returns a negative error code. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211105011500.16280-1-yuehaibing@huawei.com
2021-11-08KVM: arm64: nvhe: Fix a non-kernel-doc commentRandy Dunlap
Do not use kernel-doc "/**" notation when the comment is not in kernel-doc format. Fixes this docs build warning: arch/arm64/kvm/hyp/nvhe/sys_regs.c:478: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Handler for protected VM restricted exceptions. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211106032529.15057-1-rdunlap@infradead.org
2021-11-08KVM: arm64: Extract ESR_ELx.EC onlyMark Rutland
Since ARMv8.0 the upper 32 bits of ESR_ELx have been RES0, and recently some of the upper bits gained a meaning and can be non-zero. For example, when FEAT_LS64 is implemented, ESR_ELx[36:32] contain ISS2, which for an ST64BV or ST64BV0 can be non-zero. This can be seen in ARM DDI 0487G.b, page D13-3145, section D13.2.37. Generally, we must not rely on RES0 bit remaining zero in future, and when extracting ESR_ELx.EC we must mask out all other bits. All C code uses the ESR_ELx_EC() macro, which masks out the irrelevant bits, and therefore no alterations are required to C code to avoid consuming irrelevant bits. In a couple of places the KVM assembly extracts ESR_ELx.EC using LSR on an X register, and so could in theory consume previously RES0 bits. In both cases this is for comparison with EC values ESR_ELx_EC_HVC32 and ESR_ELx_EC_HVC64, for which the upper bits of ESR_ELx must currently be zero, but this could change in future. This patch adjusts the KVM vectors to use UBFX rather than LSR to extract ESR_ELx.EC, ensuring these are robust to future additions to ESR_ELx. Cc: stable@vger.kernel.org Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211103110545.4613-1-mark.rutland@arm.com
2021-11-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us RISC-V: - New KVM port. x86: - New API to control TSC offset from userspace - TSC scaling for nested hypervisors on SVM - Switch masterclock protection from raw_spin_lock to seqcount - Clean up function prototypes in the page fault code and avoid repeated memslot lookups - Convey the exit reason to userspace on emulation failure - Configure time between NX page recovery iterations - Expose Predictive Store Forwarding Disable CPUID leaf - Allocate page tracking data structures lazily (if the i915 KVM-GT functionality is not compiled in) - Cleanups, fixes and optimizations for the shadow MMU code s390: - SIGP Fixes - initial preparations for lazy destroy of secure VMs - storage key improvements/fixes - Log the guest CPNC Starting from this release, KVM-PPC patches will come from Michael Ellerman's PPC tree" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) RISC-V: KVM: fix boolreturn.cocci warnings RISC-V: KVM: remove unneeded semicolon RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions RISC-V: KVM: Factor-out FP virtualization into separate sources KVM: s390: add debug statement for diag 318 CPNC data KVM: s390: pv: properly handle page flags for protected guests KVM: s390: Fix handle_sske page fault handling KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol KVM: x86: On emulation failure, convey the exit reason, etc. to userspace KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info KVM: x86: Clarify the kvm_run.emulation_failure structure layout KVM: s390: Add a routine for setting userspace CPU state KVM: s390: Simplify SIGP Set Arch handling KVM: s390: pv: avoid stalls when making pages secure KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm KVM: s390: pv: avoid double free of sida page KVM: s390: pv: add macros for UVC CC values s390/mm: optimize reset_guest_reference_bit() s390/mm: optimize set_guest_storage_key() s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present ...
2021-11-01Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "There's the usual summary below, but the highlights are support for the Armv8.6 timer extensions, KASAN support for asymmetric MTE, the ability to kexec() with the MMU enabled and a second attempt at switching to the generic pfn_valid() implementation. Summary: - Support for the Arm8.6 timer extensions, including a self-synchronising view of the system registers to elide some expensive ISB instructions. - Exception table cleanup and rework so that the fixup handlers appear correctly in backtraces. - A handful of miscellaneous changes, the main one being selection of CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK. - More mm and pgtable cleanups. - KASAN support for "asymmetric" MTE, where tag faults are reported synchronously for loads (via an exception) and asynchronously for stores (via a register). - Support for leaving the MMU enabled during kexec relocation, which significantly speeds up the operation. - Minor improvements to our perf PMU drivers. - Improvements to the compat vDSO build system, particularly when building with LLVM=1. - Preparatory work for handling some Coresight TRBE tracing errata. - Cleanup and refactoring of the SVE code to pave the way for SME support in future. - Ensure SCS pages are unpoisoned immediately prior to freeing them when KASAN is enabled for the vmalloc area. - Try moving to the generic pfn_valid() implementation again now that the DMA mapping issue from last time has been resolved. - Numerous improvements and additions to our FPSIMD and SVE selftests" [ armv8.6 timer updates were in a shared branch and already came in through -tip in the timer pull - Linus ] * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (85 commits) arm64: Select POSIX_CPU_TIMERS_TASK_WORK arm64: Document boot requirements for FEAT_SME_FA64 arm64/sve: Fix warnings when SVE is disabled arm64/sve: Add stub for sve_max_virtualisable_vl() arm64: errata: Add detection for TRBE write to out-of-range arm64: errata: Add workaround for TSB flush failures arm64: errata: Add detection for TRBE overwrite in FILL mode arm64: Add Neoverse-N2, Cortex-A710 CPU part definition selftests: arm64: Factor out utility functions for assembly FP tests arm64: vmlinux.lds.S: remove `.fixup` section arm64: extable: add load_unaligned_zeropad() handler arm64: extable: add a dedicated uaccess handler arm64: extable: add `type` and `data` fields arm64: extable: use `ex` for `exception_table_entry` arm64: extable: make fixup_exception() return bool arm64: extable: consolidate definitions arm64: gpr-num: support W registers arm64: factor out GPR numbering helpers arm64: kvm: use kvm_exception_table_entry arm64: lib: __arch_copy_to_user(): fold fixups into body ...
2021-10-31Merge tag 'kvmarm-5.16' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us
2021-10-29Merge branch 'for-next/sve' into for-next/coreWill Deacon
* for-next/sve: arm64/sve: Fix warnings when SVE is disabled arm64/sve: Add stub for sve_max_virtualisable_vl() arm64/sve: Track vector lengths for tasks in an array arm64/sve: Explicitly load vector length when restoring SVE state arm64/sve: Put system wide vector length information into structs arm64/sve: Use accessor functions for vector lengths in thread_struct arm64/sve: Rename find_supported_vector_length() arm64/sve: Make access to FFR optional arm64/sve: Make sve_state_size() static arm64/sve: Remove sve_load_from_fpsimd_state() arm64/fp: Reindent fpsimd_save()
2021-10-21arm64: kvm: use kvm_exception_table_entryMark Rutland
In subsequent patches we'll alter `struct exception_table_entry`, adding fields that are not needed for KVM exception fixups. In preparation for this, migrate KVM to its own `struct kvm_exception_table_entry`, which is identical to the current format of `struct exception_table_entry`. Comments are updated accordingly. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211019160219.5202-5-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21arm64/sve: Explicitly load vector length when restoring SVE stateMark Brown
Currently when restoring the SVE state we supply the SVE vector length as an argument to sve_load_state() and the underlying macros. This becomes inconvenient with the addition of SME since we may need to restore any combination of SVE and SME vector lengths, and we already separately restore the vector length in the KVM code. We don't need to know the vector length during the actual register load since the SME load instructions can index into the data array for us. Refactor the interface so we explicitly set the vector length separately to restoring the SVE registers in preparation for adding SME support, no functional change should be involved. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211019172247.3045838-9-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21arm64/sve: Put system wide vector length information into structsMark Brown
With the introduction of SME we will have a second vector length in the system, enumerated and configured in a very similar fashion to the existing SVE vector length. While there are a few differences in how things are handled this is a relatively small portion of the overall code so in order to avoid code duplication we factor out We create two structs, one vl_info for the static hardware properties and one vl_config for the runtime configuration, with an array instantiated for each and update all the users to reference these. Some accessor functions are provided where helpful for readability, and the write to set the vector length is put into a function since the system register being updated needs to be chosen at compile time. This is a mostly mechanical replacement, further work will be required to actually make things generic, ensuring that we handle those places where there are differences properly. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211019172247.3045838-8-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21arm64/sve: Make access to FFR optionalMark Brown
SME introduces streaming SVE mode in which FFR is not present and the instructions for accessing it UNDEF. In preparation for handling this update the low level SVE state access functions to take a flag specifying if FFR should be handled. When saving the register state we store a zero for FFR to guard against uninitialized data being read. No behaviour change should be introduced by this patch. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211019172247.3045838-5-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-10-18Merge commit 'kvm-pagedata-alloc-fixes' into HEADPaolo Bonzini
2021-10-18Merge branch kvm-arm64/pkvm/fixed-features into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm/fixed-features: (22 commits) : . : Add the pKVM fixed feature that allows a bunch of exceptions : to either be forbidden or be easily handled at EL2. : . KVM: arm64: pkvm: Give priority to standard traps over pvm handling KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array() KVM: arm64: pkvm: Move kvm_handle_pvm_restricted around KVM: arm64: pkvm: Consolidate include files KVM: arm64: pkvm: Preserve pending SError on exit from AArch32 KVM: arm64: pkvm: Handle GICv3 traps as required KVM: arm64: pkvm: Drop sysregs that should never be routed to the host KVM: arm64: pkvm: Drop AArch32-specific registers KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WI KVM: arm64: pkvm: Use a single function to expose all id-regs KVM: arm64: Fix early exit ptrauth handling KVM: arm64: Handle protected guests at 32 bits KVM: arm64: Trap access to pVM restricted features KVM: arm64: Move sanitized copies of CPU features KVM: arm64: Initialize trap registers for protected VMs KVM: arm64: Add handlers for protected VM System Registers KVM: arm64: Simplify masking out MTE in feature id reg KVM: arm64: Add missing field descriptor for MDCR_EL2 KVM: arm64: Pass struct kvm to per-EC handlers KVM: arm64: Move early handlers to per-EC handlers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-18KVM: arm64: pkvm: Give priority to standard traps over pvm handlingMarc Zyngier
Checking for pvm handling first means that we cannot handle ptrauth traps or apply any of the workarounds (GICv3 or TX2 #219). Flip the order around. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-12-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array()Marc Zyngier
Passing a VM pointer around is odd, and results in extra work on VHE. Follow the rest of the design that uses the vcpu instead, and let the nVHE code look into the struct kvm as required. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-11-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Move kvm_handle_pvm_restricted aroundMarc Zyngier
Place kvm_handle_pvm_restricted() next to its little friends such as kvm_handle_pvm_sysreg(). This allows to make inject_undef64() static. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-10-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Consolidate include filesMarc Zyngier
kvm_fixed_config.h is pkvm specific, and would be better placed near its users. At the same time, include/nvhe/sys_regs.h is now almost empty. Merge the two into arch/arm64/kvm/hyp/include/nvhe/fixed_config.h. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-9-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Preserve pending SError on exit from AArch32Marc Zyngier
Don't drop a potential SError when a guest gets caught red-handed running AArch32 code. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-8-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Handle GICv3 traps as requiredMarc Zyngier
Forward accesses to the ICV_*SGI*_EL1 registers to EL1, and emulate ICV_SRE_EL1 by returning a fixed value. This should be enough to support GICv3 in a protected guest. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-7-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Drop sysregs that should never be routed to the hostMarc Zyngier
A bunch of system registers (most of them MM related) should never trap to the host under any circumstance. Keep them close to our chest. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-6-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Drop AArch32-specific registersMarc Zyngier
All the SYS_*32_EL2 registers are AArch32-specific. Since we forbid AArch32, there is no need to handle those in any way. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-5-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WIMarc Zyngier
The ERR*/ERX* registers should be handled as RAZ/WI, and there should be no need to involve EL1 for that. Add a helper that handles such registers, and repaint the sysreg table to declare these registers as RAZ/WI. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-4-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Use a single function to expose all id-regsMarc Zyngier
Rather than exposing a whole set of helper functions to retrieve individual ID registers, use the existing decoding tree and expose a single helper instead. This allow a number of functions to be made static, and we now have a single entry point to maintain. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-3-maz@kernel.org
2021-10-18KVM: arm64: Fix early exit ptrauth handlingMarc Zyngier
The previous rework of the early exit code to provide an EC-based decoding tree missed the fact that we have two trap paths for ptrauth: the instructions (EC_PAC) and the sysregs (EC_SYS64). Rework the handlers to call the ptrauth handling code on both paths. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-2-maz@kernel.org
2021-10-17Merge branch kvm-arm64/memory-accounting into kvmarm-master/nextMarc Zyngier
* kvm-arm64/memory-accounting: : . : Sprinkle a bunch of GFP_KERNEL_ACCOUNT all over the code base : to better track memory allocation made on behalf of a VM. : . KVM: arm64: Add memcg accounting to KVM allocations KVM: arm64: vgic: Add memcg accounting to vgic allocations Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-17KVM: arm64: Add memcg accounting to KVM allocationsJia He
Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM allocations"), it would be better to make arm64 KVM consistent with common kvm codes. The memory allocations of VM scope should be charged into VM process cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT. There remain a few cases since these allocations are global, not in VM scope. Signed-off-by: Jia He <justin.he@arm.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907123112.10232-3-justin.he@arm.com
2021-10-17KVM: arm64: vgic: Add memcg accounting to vgic allocationsJia He
Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM allocations"), it would be better to make arm64 vgic consistent with common kvm codes. The memory allocations of VM scope should be charged into VM process cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT. There remain a few cases since these allocations are global, not in VM scope. Signed-off-by: Jia He <justin.he@arm.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907123112.10232-2-justin.he@arm.com
2021-10-17Merge branch kvm-arm64/vgic-fixes-5.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/vgic-fixes-5.16: : . : Multiple updates to the GICv3 emulation in order to better support : the dreadful Apple M1 that only implements half of it, and in a : broken way... : . KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3 Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-17KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocodeMarc Zyngier
Having realised that a virtual LPI does transition through an active state that does not exist on bare metal, align the CPU interface emulation with the behaviour specified in the architecture pseudocode. The LPIs now transition to active on IAR read, and to inactive on EOI write. Special care is taken not to increment the EOIcount for an LPI that isn't present in the LRs. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010150910.2911495-6-maz@kernel.org
2021-10-17KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEISMarc Zyngier
Since we are trapping all sysreg accesses when ICH_VTR_EL2.SEIS is set, and that we never deliver an SError when emulating any of the GICv3 sysregs, don't advertise ICC_CTLR_EL1.SEIS. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010150910.2911495-5-maz@kernel.org
2021-10-17KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possibleMarc Zyngier
On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg accesses from the guest. From a performance perspective, this is OK as long as the guest doesn't hammer the GICv3 CPU interface. In most cases, this is fine, unless the guest actively uses priorities and switches PMR_EL1 very often. Which is exactly what happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1. In these condition, the performance plumets as we hit PMR each time we mask/unmask interrupts. Not good. There is however an opportunity for improvement. Careful reading of the architecture specification indicates that the only GICv3 sysreg belonging to the common group (which contains the SGI registers, PMR, DIR, CTLR and RPR) that is allowed to generate a SError is DIR. Everything else is safe. It is thus possible to substitute the trapping of all the common group with just that of DIR if it supported by the implementation. Yes, that's yet another optional bit of the architecture. So let's just do that, as it leads to some impressive result on the M1: Without this change: bash-5.1# /host/home/maz/hackbench 100 process 1000 Running with 100*40 (== 4000) tasks. Time: 56.596 With this change: bash-5.1# /host/home/maz/hackbench 100 process 1000 Running with 100*40 (== 4000) tasks. Time: 8.649 which is a pretty convincing result. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
2021-10-17KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrorsMarc Zyngier
The infamous M1 has a feature nobody else ever implemented, in the form of the "GIC locally generated SError interrupts", also known as SEIS for short. These SErrors are generated when a guest does something that violates the GIC state machine. It would have been simpler to just *ignore* the damned thing, but that's not what this HW does. Oh well. This part of of the architecture is also amazingly under-specified. There is a whole 10 lines that describe the feature in a spec that is 930 pages long, and some of these lines are factually wrong. Oh, and it is deprecated, so the insentive to clarify it is low. Now, the spec says that this should be a *virtual* SError when HCR_EL2.AMO is set. As it turns out, that's not always the case on this CPU, and the SError sometimes fires on the host as a physical SError. Goodbye, cruel world. This clearly is a HW bug, and it means that a guest can easily take the host down, on demand. Thankfully, we have seen systems that were just as broken in the past, and we have the perfect vaccine for it. Apple M1, please meet the Cavium ThunderX workaround. All your GIC accesses will be trapped, sanitised, and emulated. Only the signalling aspect of the HW will be used. It won't be super speedy, but it will at least be safe. You're most welcome. Given that this has only ever been seen on this single implementation, that the spec is unclear at best and that we cannot trust it to ever be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS being set. Tested-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org
2021-10-17KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3Marc Zyngier
Until now, we always let ID_AA64PFR0_EL1.GIC reflect the value visible on the host, even if we were running a GICv2-enabled VM on a GICv3+compat host. That's fine, but we also now have the case of a host that does not expose ID_AA64PFR0_EL1.GIC==1 despite having a vGIC. Yes, this is confusing. Thank you M1. Let's go back to first principles and expose ID_AA64PFR0_EL1.GIC=1 when a GICv3 is exposed to the guest. This also hides a GICv4.1 CPU interface from the guest which has no business knowing about the v4.1 extension. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010150910.2911495-2-maz@kernel.org
2021-10-11KVM: arm64: Handle protected guests at 32 bitsFuad Tabba
Protected KVM does not support protected AArch32 guests. However, it is possible for the guest to force run AArch32, potentially causing problems. Add an extra check so that if the hypervisor catches the guest doing that, it can prevent the guest from running again by resetting vcpu->arch.target and returning ARM_EXCEPTION_IL. If this were to happen, The VMM can try and fix it by re- initializing the vcpu with KVM_ARM_VCPU_INIT, however, this is likely not possible for protected VMs. Adapted from commit 22f553842b14 ("KVM: arm64: Handle Asymmetric AArch32 systems") Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-12-tabba@google.com
2021-10-11KVM: arm64: Trap access to pVM restricted featuresFuad Tabba
Trap accesses to restricted features for VMs running in protected mode. Access to feature registers are emulated, and only supported features are exposed to protected VMs. Accesses to restricted registers as well as restricted instructions are trapped, and an undefined exception is injected into the protected guests, i.e., with EC = 0x0 (unknown reason). This EC is the one used, according to the Arm Architecture Reference Manual, for unallocated or undefined system registers or instructions. Only affects the functionality of protected VMs. Otherwise, should not affect non-protected VMs when KVM is running in protected mode. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-11-tabba@google.com
2021-10-11KVM: arm64: Move sanitized copies of CPU featuresFuad Tabba
Move the sanitized copies of the CPU feature registers to the recently created sys_regs.c. This consolidates all copies in a more relevant file. No functional change intended. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-10-tabba@google.com
2021-10-11KVM: arm64: Initialize trap registers for protected VMsFuad Tabba
Protected VMs have more restricted features that need to be trapped. Moreover, the host should not be trusted to set the appropriate trapping registers and their values. Initialize the trapping registers, i.e., hcr_el2, mdcr_el2, and cptr_el2 at EL2 for protected guests, based on the values of the guest's feature id registers. No functional change intended as trap handlers introduced in the previous patch are still not hooked in to the guest exit handlers. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-9-tabba@google.com
2021-10-11KVM: arm64: Add handlers for protected VM System RegistersFuad Tabba
Add system register handlers for protected VMs. These cover Sys64 registers (including feature id registers), and debug. No functional change intended as these are not hooked in yet to the guest exit handlers introduced earlier. So when trapping is triggered, the exit handlers let the host handle it, as before. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-8-tabba@google.com
2021-10-11KVM: arm64: Simplify masking out MTE in feature id regFuad Tabba
Simplify code for hiding MTE support in feature id register when MTE is not enabled/supported by KVM. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-7-tabba@google.com
2021-10-11KVM: arm64: Pass struct kvm to per-EC handlersFuad Tabba
We need struct kvm to check for protected VMs to be able to pick the right handlers for them in subsequent patches. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-5-tabba@google.com
2021-10-11KVM: arm64: Move early handlers to per-EC handlersMarc Zyngier
Simplify the early exception handling by slicing the gigantic decoding tree into a more manageable set of functions, similar to what we have in handle_exit.c. This will also make the structure reusable for pKVM's own early exit handling. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211010145636.1950948-4-tabba@google.com
2021-10-11KVM: arm64: Don't include switch.h into nvhe/kvm-main.cMarc Zyngier
hyp-main.c includes switch.h while it only requires adjust-pc.h. Fix it to remove an unnecessary dependency. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211010145636.1950948-3-tabba@google.com
2021-10-11KVM: arm64: Move __get_fault_info() and co into their own include fileMarc Zyngier
In order to avoid including the whole of the switching helpers in unrelated files, move the __get_fault_info() and related helpers into their own include file. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211010145636.1950948-2-tabba@google.com
2021-10-11Merge branch kvm-arm64/raz-sysregs into kvmarm-master/nextMarc Zyngier
* kvm-arm64/raz-sysregs: : . : Simplify the handling of RAZ register, removing pointless indirections. : . KVM: arm64: Replace get_raz_id_reg() with get_raz_reg() KVM: arm64: Use get_raz_reg() for userspace reads of PMSWINC_EL0 KVM: arm64: Return early from read_id_reg() if register is RAZ Signed-off-by: Marc Zyngier <maz@kernel.org>