summaryrefslogtreecommitdiff
path: root/arch/arm64/include/asm/kvm_emulate.h
AgeCommit message (Collapse)Author
2024-05-03Merge branch kvm-arm64/pkvm-6.10 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-6.10: (25 commits) : . : At last, a bunch of pKVM patches, courtesy of Fuad Tabba. : From the cover letter: : : "This series is a bit of a bombay-mix of patches we've been : carrying. There's no one overarching theme, but they do improve : the code by fixing existing bugs in pKVM, refactoring code to : make it more readable and easier to re-use for pKVM, or adding : functionality to the existing pKVM code upstream." : . KVM: arm64: Force injection of a data abort on NISV MMIO exit KVM: arm64: Restrict supported capabilities for protected VMs KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap() KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst KVM: arm64: Rename firmware pseudo-register documentation file KVM: arm64: Reformat/beautify PTP hypercall documentation KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit KVM: arm64: Introduce and use predicates that check for protected VMs KVM: arm64: Add is_pkvm_initialized() helper KVM: arm64: Simplify vgic-v3 hypercalls KVM: arm64: Move setting the page as dirty out of the critical section KVM: arm64: Change kvm_handle_mmio_return() return polarity KVM: arm64: Fix comment for __pkvm_vcpu_init_traps() KVM: arm64: Prevent kmemleak from accessing .hyp.data KVM: arm64: Do not map the host fpsimd state to hyp in pKVM KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE KVM: arm64: Support TLB invalidation in guest context KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE KVM: arm64: Check for PTE validity when checking for executable/cacheable KVM: arm64: Avoid BUG-ing from the host abort path ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-03Merge branch kvm-arm64/nv-eret-pauth into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-eret-pauth: : . : Add NV support for the ERETAA/ERETAB instructions. From the cover letter: : : "Although the current upstream NV support has *some* support for : correctly emulating ERET, that support is only partial as it doesn't : support the ERETAA and ERETAB variants. : : Supporting these instructions was cast aside for a long time as it : involves implementing some form of PAuth emulation, something I wasn't : overly keen on. But I have reached a point where enough of the : infrastructure is there that it actually makes sense. So here it is!" : . KVM: arm64: nv: Work around lack of pauth support in old toolchains KVM: arm64: Drop trapping of PAuth instructions/keys KVM: arm64: nv: Advertise support for PAuth KVM: arm64: nv: Handle ERETA[AB] instructions KVM: arm64: nv: Add emulation for ERETAx instructions KVM: arm64: nv: Add kvm_has_pauth() helper KVM: arm64: nv: Reinject PAC exceptions caused by HCR_EL2.API==0 KVM: arm64: nv: Handle HCR_EL2.{API,APK} independently KVM: arm64: nv: Honor HFGITR_EL2.ERET being set KVM: arm64: nv: Fast-track 'InHost' exception returns KVM: arm64: nv: Add trap forwarding for ERET and SMC KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2 KVM: arm64: nv: Drop VCPU_HYP_CONTEXT flag KVM: arm64: Constraint PAuth support to consistent implementations KVM: arm64: Add helpers for ESR_ELx_ERET_ISS_ERET* KVM: arm64: Harden __ctxt_sys_reg() against out-of-range values Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01KVM: arm64: Refactor checks for FP state ownershipFuad Tabba
To avoid direct comparison against the fp_owner enum, add a new function that performs the check, host_owns_fp_regs(), to complement the existing guest_owns_fp_regs(). To check for fpsimd state ownership, use the helpers instead of directly using the enums. No functional change intended. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-04-20KVM: arm64: Drop trapping of PAuth instructions/keysMarc Zyngier
We currently insist on disabling PAuth on vcpu_load(), and get to enable it on first guest use of an instruction or a key (ignoring the NV case for now). It isn't clear at all what this is trying to achieve: guests tend to use PAuth when available, and nothing forces you to expose it to the guest if you don't want to. This also isn't totally free: we take a full GPR save/restore between host and guest, only to write ten 64bit registers. The "value proposition" escapes me. So let's forget this stuff and enable PAuth eagerly if exposed to the guest. This results in much simpler code. Performance wise, that's not bad either (tested on M2 Pro running a fully automated Debian installer as the workload): - On a non-NV guest, I can see reduction of 0.24% in the number of cycles (measured with perf over 10 consecutive runs) - On a NV guest (L2), I see a 2% reduction in wall-clock time (measured with 'time', as M2 doesn't have a PMUv3 and NV doesn't support it either) So overall, a much reduced complexity and a (small) performance improvement. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240419102935.1935571-16-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-04-20KVM: arm64: nv: Handle HCR_EL2.{API,APK} independentlyMarc Zyngier
Although KVM couples API and APK for simplicity, the architecture makes no such requirement, and the two can be independently set or cleared. Check for which of the two possible reasons we have trapped here, and if the corresponding L1 control bit isn't set, delegate the handling for forwarding. Otherwise, set this exact bit in HCR_EL2 and resume the guest. Of course, in the non-NV case, we keep setting both bits and be done with it. Note that the entry core already saves/restores the keys should any of the two control bits be set. This results in a bit of rework, and the removal of the (trivial) vcpu_ptrauth_enable() helper. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240419102935.1935571-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-04-12KVM: arm64: Exclude FP ownership from kvm_vcpu_archMarc Zyngier
In retrospect, it is fairly obvious that the FP state ownership is only meaningful for a given CPU, and that locating this information in the vcpu was just a mistake. Move the ownership tracking into the host data structure, and rename it from fp_state to fp_owner, which is a better description (name suggested by Mark Brown). Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "S390: - Changes to FPU handling came in via the main s390 pull request - Only deliver to the guest the SCLP events that userspace has requested - More virtual vs physical address fixes (only a cleanup since virtual and physical address spaces are currently the same) - Fix selftests undefined behavior x86: - Fix a restriction that the guest can't program a PMU event whose encoding matches an architectural event that isn't included in the guest CPUID. The enumeration of an architectural event only says that if a CPU supports an architectural event, then the event can be programmed *using the architectural encoding*. The enumeration does NOT say anything about the encoding when the CPU doesn't report support the event *in general*. It might support it, and it might support it using the same encoding that made it into the architectural PMU spec - Fix a variety of bugs in KVM's emulation of RDPMC (more details on individual commits) and add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID and therefore are easier to validate with selftests than with custom guests (aka kvm-unit-tests) - Zero out PMU state on AMD if the virtual PMU is disabled, it does not cause any bug but it wastes time in various cases where KVM would check if a PMC event needs to be synthesized - Optimize triggering of emulated events, with a nice ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit - Fix a bug where KVM would report stale/bogus exit qualification information when exiting to userspace with an internal error exit code - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support - Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for read, e.g. to avoid serializing vCPUs when userspace deletes a memslot - Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB). KVM doesn't support yielding in the middle of processing a zap, and 1GiB granularity resulted in multi-millisecond lags that are quite impolite for CONFIG_PREEMPT kernels - Allocate write-tracking metadata on-demand to avoid the memory overhead when a kernel is built with i915 virtualization support but the workloads use neither shadow paging nor i915 virtualization - Explicitly initialize a variety of on-stack variables in the emulator that triggered KMSAN false positives - Fix the debugregs ABI for 32-bit KVM - Rework the "force immediate exit" code so that vendor code ultimately decides how and when to force the exit, which allowed some optimization for both Intel and AMD - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if vCPU creation ultimately failed, causing extra unnecessary work - Cleanup the logic for checking if the currently loaded vCPU is in-kernel - Harden against underflowing the active mmu_notifier invalidation count, so that "bad" invalidations (usually due to bugs elsehwere in the kernel) are detected earlier and are less likely to hang the kernel x86 Xen emulation: - Overlay pages can now be cached based on host virtual address, instead of guest physical addresses. This removes the need to reconfigure and invalidate the cache if the guest changes the gpa but the underlying host virtual address remains the same - When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation - Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior) - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs RISC-V: - Support exception and interrupt handling in selftests - New self test for RISC-V architectural timer (Sstc extension) - New extension support (Ztso, Zacas) - Support userspace emulation of random number seed CSRs ARM: - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests LoongArch: - Set reserved bits as zero in CPUCFG - Start SW timer only when vcpu is blocking - Do not restart SW timer when it is expired - Remove unnecessary CSR register saving during enter guest - Misc cleanups and fixes as usual Generic: - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically always true on all architectures except MIPS (where Kconfig determines the available depending on CPU capabilities). It is replaced either by an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM) everywhere else - Factor common "select" statements in common code instead of requiring each architecture to specify it - Remove thoroughly obsolete APIs from the uapi headers - Move architecture-dependent stuff to uapi/asm/kvm.h - Always flush the async page fault workqueue when a work item is being removed, especially during vCPU destruction, to ensure that there are no workers running in KVM code when all references to KVM-the-module are gone, i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded - Grab a reference to the VM's mm_struct in the async #PF worker itself instead of gifting the worker a reference, so that there's no need to remember to *conditionally* clean up after the worker Selftests: - Reduce boilerplate especially when utilize selftest TAP infrastructure - Add basic smoke tests for SEV and SEV-ES, along with a pile of library support for handling private/encrypted/protected memory - Fix benign bugs where tests neglect to close() guest_memfd files" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) selftests: kvm: remove meaningless assignments in Makefiles KVM: riscv: selftests: Add Zacas extension to get-reg-list test RISC-V: KVM: Allow Zacas extension for Guest/VM KVM: riscv: selftests: Add Ztso extension to get-reg-list test RISC-V: KVM: Allow Ztso extension for Guest/VM RISC-V: KVM: Forward SEED CSR access to user space KVM: riscv: selftests: Add sstc timer test KVM: riscv: selftests: Change vcpu_has_ext to a common function KVM: riscv: selftests: Add guest helper to get vcpu id KVM: riscv: selftests: Add exception handling support LoongArch: KVM: Remove unnecessary CSR register saving during enter guest LoongArch: KVM: Do not restart SW timer when it is expired LoongArch: KVM: Start SW timer only when vcpu is blocking LoongArch: KVM: Set reserved bits as zero in CPUCFG KVM: selftests: Explicitly close guest_memfd files in some gmem tests KVM: x86/xen: fix recursive deadlock in timer injection KVM: pfncache: simplify locking and make more self-contained KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled KVM: x86/xen: improve accuracy of Xen timers ...
2024-02-16arm64: Add ESR decoding for exceptions involving translation level -1Ard Biesheuvel
The LPA2 feature introduces new FSC values to report abort exceptions related to translation level -1. Define these and wire them up. Reuse the new ESR FSC classification helpers that arrived via the KVM arm64 tree, and update the one for translation faults to check specifically for a translation fault at level -1. (Access flag or permission faults cannot occur at level -1 because they alway involve a descriptor at the superior level so changing those helpers is not needed). Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-73-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-08KVM: arm64: Force guest's HCR_EL2.E2H RES1 when NV1 is not implementedMarc Zyngier
If NV1 isn't supported on a system, make sure we always evaluate the guest's HCR_EL2.E2H as RES1, irrespective of what the guest may have written there. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240122181344.258974-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-12-19Merge branch kvm-arm64/nv-6.8-prefix into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-6.8-prefix: : . : Nested Virtualization support update, focussing on the : NV2 support (VNCR mapping and such). : . KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() KVM: arm64: nv: Map VNCR-capable registers to a separate page KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers KVM: arm64: Introduce a bad_trap() primitive for unexpected trap handling KVM: arm64: nv: Add include containing the VNCR_EL2 offsets KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR KVM: arm64: nv: Compute NV view of idregs as a one-off KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt() arm64: cpufeatures: Restrict NV support to FEAT_NV2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt()Marc Zyngier
A rather common idiom when writing NV code as part of KVM is to have things such has: if (vcpu_has_nv(vcpu) && is_hyp_ctxt(vcpu)) { [...] } to check that we are in a hyp-related context. The second part of the conjunction would be enough, but the first one contains a static key that allows the rest of the checkis to be elided when in a non-NV environment. Rewrite is_hyp_ctxt() to directly use vcpu_has_nv(). The result is the same, and the code easier to read. The one occurence of this that is already merged is rewritten in the process. In order to avoid nasty cirtular dependencies between kvm_emulate.h and kvm_nested.h, vcpu_has_feature() is itself hoisted into kvm_host.h, at the cost of some #deferry... Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-11-30KVM: arm64: Use helpers to classify exception types reported via ESRArd Biesheuvel
Currently, we rely on the fact that exceptions can be trivially classified by applying a mask/value pair to the syndrome value reported via the ESR register, but this will no longer be true once we enable support for 5 level paging. So introduce a couple of helpers that encapsulate this mask/value pair matching, and wire them up in the code. No functional change intended, the actual handling of translation level -1 will be added in a subsequent patch. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> [maz: folded in changes suggested by Mark] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231128140400.3132145-2-ardb@google.com
2023-11-27KVM: arm64: Support up to 5 levels of translation in kvm_pgtableRyan Roberts
FEAT_LPA2 increases the maximum levels of translation from 4 to 5 for the 4KB page case, when IA is >48 bits. While we can still use 4 levels for stage2 translation in this case (due to stage2 allowing concatenated page tables for first level lookup), the same kvm_pgtable library is used for the hyp stage1 page tables and stage1 does not support concatenation. Therefore, modify the library to support up to 5 levels. Previous patches already laid the groundwork for this by refactoring code to work in terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL. So we just need to change these macros. The hardware sometimes encodes the new level differently from the others: One such place is when reading the level from the FSC field in the ESR_EL2 register. We never expect to see the lowest level (-1) here since the stage 2 page tables always use concatenated tables for first level lookup and therefore only use 4 levels of lookup. So we get away with just adding a comment to explain why we are not being careful about decoding level -1. For stage2 VTCR_EL2.SL2 is introduced to encode the new start level. However, since we always use concatenated page tables for first level look up at stage2 (and therefore we will never need the new extra level) we never touch this new field. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231127111737.1897081-10-ryan.roberts@arm.com
2023-11-27KVM: arm64: Convert translation level parameter to s8Ryan Roberts
With the introduction of FEAT_LPA2, the Arm ARM adds a new level of translation, level -1, so levels can now be in the range [-1;3]. 3 is always the last level and the first level is determined based on the number of VA bits in use. Convert level variables to use a signed type in preparation for supporting this new level -1. Since the last level is always anchored at 3, and the first level varies to suit the number of VA/IPA bits, take the opportunity to replace KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no longer be true. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231127111737.1897081-9-ryan.roberts@arm.com
2023-11-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes LoongArch: - New architecture for kvm. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: - Support for the Smstateen and Zicond extensions - Support for virtualizing senvcfg - Support for virtualized SBI debug console (DBCN) S390: - Nested page table management can be monitored through tracepoints and statistics x86: - Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ - Avoid WARN on systems with Intel IPI virtualization - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. - Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. - Optimize injection of PMU interrupts that are simultaneous with NMIs. - Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: - Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. - Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. - Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: - Various updates and fixes, mostly for x86 - MTRR and PAT fixes and optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits) KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: x86: Service NMI requests after PMI requests in VM-Enter path KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} ...
2023-10-30Merge branch kvm-arm64/sgi-injection into kvmarm/nextOliver Upton
* kvm-arm64/sgi-injection: : vSGI injection improvements + fixes, courtesy Marc Zyngier : : Avoid linearly searching for vSGI targets using a compressed MPIDR to : index a cache. While at it, fix some egregious bugs in KVM's mishandling : of vcpuid (user-controlled value) and vcpu_idx. KVM: arm64: Clarify the ordering requirements for vcpu/RD creation KVM: arm64: vgic-v3: Optimize affinity-based SGI injection KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available KVM: arm64: Build MPIDR to vcpu index cache at runtime KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() KVM: arm64: Use vcpu_idx for invalidation tracking KVM: arm64: vgic: Use vcpu_idx for the debug information KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id KVM: arm64: vgic-v3: Refactor GICv3 SGI generation KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-16arm64: kvm: Use cpus_have_final_cap() explicitlyMark Rutland
Much of the arm64 KVM code uses cpus_have_const_cap() to check for cpucaps, but this is unnecessary and it would be preferable to use cpus_have_final_cap(). For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. KVM is initialized after cpucaps have been finalized and alternatives have been patched. Since commit: d86de40decaa14e6 ("arm64: cpufeature: upgrade hyp caps to final") ... use of cpus_have_const_cap() in hyp code is automatically converted to use cpus_have_final_cap(): | static __always_inline bool cpus_have_const_cap(int num) | { | if (is_hyp_code()) | return cpus_have_final_cap(num); | else if (system_capabilities_finalized()) | return __cpus_have_const_cap(num); | else | return cpus_have_cap(num); | } Thus, converting hyp code to use cpus_have_final_cap() directly will not result in any functional change. Non-hyp KVM code is also not executed until cpucaps have been finalized, and it would be preferable to extent the same treatment to this code and use cpus_have_final_cap() directly. This patch converts instances of cpus_have_const_cap() in KVM-only code over to cpus_have_final_cap(). As all of this code runs after cpucaps have been finalized, there should be no functional change as a result of this patch, but the redundant instructions generated by cpus_have_const_cap() will be removed from the non-hyp KVM code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-30KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff()Marc Zyngier
By definition, MPIDR_EL1 cannot be modified by the guest. This means it is pointless to check whether this is loaded on the CPU. Simplify the kvm_vcpu_get_mpidr_aff() helper to directly access the in-memory value. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-8-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-09-21KVM: arm64: Get rid of vCPU-scoped feature bitmapOliver Upton
The vCPU-scoped feature bitmap was left in place a couple of releases ago in case the change to VM-scoped vCPU features broke anyone. Nobody has complained and the interop between VM and vCPU bitmaps is pretty gross. Throw it out. Link: https://lore.kernel.org/r/20230920195036.1169791-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-26KVM: arm64: Fix resetting SME trap values on reset for (h)VHEFuad Tabba
Ensure that SME traps are disabled for (h)VHE when getting the reset value for the architectural feature control register. Fixes: 75c76ab5a641 ("KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration") Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230724123829.2929609-9-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-26KVM: arm64: Fix resetting SVE trap values on reset for hVHEFuad Tabba
Ensure that SVE traps are disabled for hVHE, if the FPSIMD state isn't owned by the guest, when getting the reset value for the architectural feature control register. Fixes: 75c76ab5a641 ("KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration") Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230724123829.2929609-8-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-26KVM: arm64: Helper to write to appropriate feature trap register based on modeFuad Tabba
Factor out the code that decides whether to write to the feature trap registers, CPTR_EL2 or CPACR_EL1, based on the KVM mode, i.e., (h)VHE or nVHE. This function will be used in the subsequent patch. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230724123829.2929609-6-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15Merge branch kvm-arm64/misc into kvmarm/nextOliver Upton
* kvm-arm64/misc: : Miscellaneous updates : : - Avoid trapping CTR_EL0 on systems with FEAT_EVT, as the register is : commonly read by userspace : : - Make use of FEAT_BTI at hyp stage-1, setting the Guard Page bit to 1 : for executable mappings : : - Use a separate set of pointer authentication keys for the hypervisor : when running in protected mode (i.e. pKVM) : : - Plug a few holes in timer initialization where KVM fails to free the : timer IRQ(s) KVM: arm64: Use different pointer authentication keys for pKVM KVM: arm64: timers: Fix resource leaks in kvm_timer_hyp_init() KVM: arm64: Use BTI for nvhe KVM: arm64: Relax trapping of CTR_EL0 when FEAT_EVT is available Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15Merge branch kvm-arm64/configurable-id-regs into kvmarm/nextOliver Upton
* kvm-arm64/configurable-id-regs: : Configurable ID register infrastructure, courtesy of Jing Zhang : : Create generalized infrastructure for allowing userspace to select the : supported feature set for a VM, so long as the feature set is a subset : of what hardware + KVM allows. This does not add any new features that : are user-configurable, and instead focuses on the necessary refactoring : to enable future work. : : As a consequence of the series, feature asymmetry is now deliberately : disallowed for KVM. It is unlikely that VMMs ever configured VMs with : asymmetry, nor does it align with the kernel's overall stance that : features must be uniform across all cores in the system. : : Furthermore, KVM incorrectly advertised an IMP_DEF PMU to guests for : some time. Migrations from affected kernels was supported by explicitly : allowing such an ID register value from userspace, and forwarding that : along to the guest. KVM now allows an IMP_DEF PMU version to be restored : through the ID register interface, but reinterprets the user value as : not implemented (0). KVM: arm64: Rip out the vestiges of the 'old' ID register scheme KVM: arm64: Handle ID register reads using the VM-wide values KVM: arm64: Use generic sanitisation for ID_AA64PFR0_EL1 KVM: arm64: Use generic sanitisation for ID_(AA64)DFR0_EL1 KVM: arm64: Use arm64_ftr_bits to sanitise ID register writes KVM: arm64: Save ID registers' sanitized value per guest KVM: arm64: Reuse fields of sys_reg_desc for idreg KVM: arm64: Rewrite IMPDEF PMU version as NI KVM: arm64: Make vCPU feature flags consistent VM-wide KVM: arm64: Relax invariance of KVM_ARM_VCPU_POWER_OFF KVM: arm64: Separate out feature sanitisation and initialisation Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12KVM: arm64: Force HCR_E2H in guest context when ARM64_KVM_HVHE is setMarc Zyngier
Also make sure HCR_EL2.E2H is set when switching HCR_EL2 in guest context. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230609162200.2024064-16-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12KVM: arm64: Rework CPTR_EL2 programming for HVHE configurationMarc Zyngier
Just like we repainted the early arm64 code, we need to update the CPTR_EL2 accesses that are taking place in the nVHE code when hVHE is used, making them look as if they were CPACR_EL1 accesses. Just like the VHE code. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230609162200.2024064-14-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12KVM: arm64: Make vCPU feature flags consistent VM-wideOliver Upton
To date KVM has allowed userspace to construct asymmetric VMs where particular features may only be supported on a subset of vCPUs. This wasn't really the intened usage pattern, and it is a total pain in the ass to keep working in the kernel. What's more, this is at odds with CPU features in host userspace, where asymmetric features are largely hidden or disabled. It's time to put an end to the whole game. Require all vCPUs in the VM to have the same feature set, rejecting deviants in the KVM_ARM_VCPU_INIT ioctl. Preserve some of the vestiges of per-vCPU feature flags in case we need to reinstate the old behavior for some limited configurations. Yes, this is a sign of cowardice around a user-visibile change. Hoist all of the 32-bit limitations into kvm_vcpu_init_check_features() to avoid nested attempts to acquire the config_lock, which won't end well. Link: https://lore.kernel.org/r/20230609190054.1542113-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-05-21KVM: arm64: Relax trapping of CTR_EL0 when FEAT_EVT is availableMarc Zyngier
CTR_EL0 can often be used in userspace, and it would be nice if KVM didn't have to emulate it unnecessarily. While it isn't possible to trap the cache configuration registers independently from CTR_EL0 in the base ARMv8.0 architecture, FEAT_EVT allows these cache configuration registers (CCSIDR_EL1, CCSIDR2_EL1, CLIDR_EL1 and CSSELR_EL1) to be trapped independently by setting HCR_EL2.TID4. Switch to using TID4 instead of TID2 in the cases where FEAT_EVT is available *and* that KVM doesn't need to sanitise CTR_EL0 to paper over mismatched cache configurations. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230515170016.965378-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-13Merge branch kvm-arm64/nv-prefix into kvmarm/nextOliver Upton
* kvm-arm64/nv-prefix: : Preamble to NV support, courtesy of Marc Zyngier. : : This brings in a set of prerequisite patches for supporting nested : virtualization in KVM/arm64. Of course, there is a long way to go until : NV is actually enabled in KVM. : : - Introduce cpucap / vCPU feature flag to pivot the NV code on : : - Add support for EL2 vCPU register state : : - Basic nested exception handling : : - Hide unsupported features from the ID registers for NV-capable VMs KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-13Merge branch kvm-arm64/virtual-cache-geometry into kvmarm/nextOliver Upton
* kvm-arm64/virtual-cache-geometry: : Virtualized cache geometry for KVM guests, courtesy of Akihiko Odaki. : : KVM/arm64 has always exposed the host cache geometry directly to the : guest, even though non-secure software should never perform CMOs by : Set/Way. This was slightly wrong, as the cache geometry was derived from : the PE on which the vCPU thread was running and not a sanitized value. : : All together this leads to issues migrating VMs on heterogeneous : systems, as the cache geometry saved/restored could be inconsistent. : : KVM/arm64 now presents 1 level of cache with 1 set and 1 way. The cache : geometry is entirely controlled by userspace, such that migrations from : older kernels continue to work. KVM: arm64: Mark some VM-scoped allocations as __GFP_ACCOUNT KVM: arm64: Normalize cache configuration KVM: arm64: Mask FEAT_CCIDX KVM: arm64: Always set HCR_TID2 arm64/cache: Move CLIDR macro definitions arm64/sysreg: Add CCSIDR2_EL1 arm64/sysreg: Convert CCSIDR_EL1 to automatic generation arm64: Allow the definition of UNKNOWN system register fields Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Support virtual EL2 exceptionsJintack Lim
Support injecting exceptions and performing exception returns to and from virtual EL2. This must be done entirely in software except when taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE} == {1,1} (a VHE guest hypervisor). [maz: switch to common exception injection framework, illegal exeption return handling] Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU stateChristoffer Dall
When running a nested hypervisor we commonly have to figure out if the VCPU mode is running in the context of a guest hypervisor or guest guest, or just a normal guest. Add convenient primitives for this. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-8-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-01-12KVM: arm64: Always set HCR_TID2Akihiko Odaki
Always set HCR_TID2 to trap CTR_EL0, CCSIDR2_EL1, CLIDR_EL1, and CSSELR_EL1. This saves a few lines of code and allows to employ their access trap handlers for more purposes anticipated by the old condition for setting HCR_TID2. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Link: https://lore.kernel.org/r/20230112023852.42012-6-akihiko.odaki@daynix.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-01-03KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*Marc Zyngier
The former is an AArch32 legacy, so let's move over to the verbose (and strictly identical) version. This involves moving some of the #defines that were private to KVM into the more generic esr.h. Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-01-03KVM: arm64: Fix S1PTW handling on RO memslotsMarc Zyngier
A recent development on the EFI front has resulted in guests having their page tables baked in the firmware binary, and mapped into the IPA space as part of a read-only memslot. Not only is this legitimate, but it also results in added security, so thumbs up. It is possible to take an S1PTW translation fault if the S1 PTs are unmapped at stage-2. However, KVM unconditionally treats S1PTW as a write to correctly handle hardware AF/DB updates to the S1 PTs. Furthermore, KVM injects an exception into the guest for S1PTW writes. In the aforementioned case this results in the guest taking an abort it won't recover from, as the S1 PTs mapping the vectors suffer from the same problem. So clearly our handling is... wrong. Instead, switch to a two-pronged approach: - On S1PTW translation fault, handle the fault as a read - On S1PTW permission fault, handle the fault as a write This is of no consequence to SW that *writes* to its PTs (the write will trigger a non-S1PTW fault), and SW that uses RO PTs will not use HW-assisted AF/DB anyway, as that'd be wrong. Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch") do we end-up with two back-to-back faults (page being evicted and faulted back). I don't think this is a case worth optimising for. Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Regression-tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
2022-06-29KVM: arm64: Warn when PENDING_EXCEPTION and INCREMENT_PC are set togetherMarc Zyngier
We really don't want PENDING_EXCEPTION and INCREMENT_PC to ever be set at the same time, as they are mutually exclusive. Add checks that will generate a warning should this ever happen. Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-10KVM: arm64: Move vcpu PC/Exception flags to the input flag setMarc Zyngier
The PC update flags (which also deal with exception injection) is one of the most complicated use of the flag we have. Make it more fool prof by: - moving it over to the new accessors and assign it to the input flag set - turn the combination of generic ELx flags with another flag indicating the target EL itself into an explicit set of flags for each EL and vector combination - add a new accessor to pend the exception This is otherwise a pretty straightformward conversion. Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "S390: - ultravisor communication device driver - fix TEID on terminating storage key ops RISC-V: - Added Sv57x4 support for G-stage page table - Added range based local HFENCE functions - Added remote HFENCE functions based on VCPU requests - Added ISA extension registers in ONE_REG interface - Updated KVM RISC-V maintainers entry to cover selftests support ARM: - Add support for the ARMv8.6 WFxT extension - Guard pages for the EL2 stacks - Trap and emulate AArch32 ID registers to hide unsupported features - Ability to select and save/restore the set of hypercalls exposed to the guest - Support for PSCI-initiated suspend in collaboration with userspace - GICv3 register-based LPI invalidation support - Move host PMU event merging into the vcpu data structure - GICv3 ITS save/restore fixes - The usual set of small-scale cleanups and fixes x86: - New ioctls to get/set TSC frequency for a whole VM - Allow userspace to opt out of hypercall patching - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr AMD SEV improvements: - Add KVM_EXIT_SHUTDOWN metadata for SEV-ES - V_TSC_AUX support Nested virtualization improvements for AMD: - Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE, nested vGIF) - Allow AVIC to co-exist with a nested guest running - Fixes for LBR virtualizations when a nested guest is running, and nested LBR virtualization support - PAUSE filtering for nested hypervisors Guest support: - Decoupling of vcpu_is_preempted from PV spinlocks" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (199 commits) KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest KVM: selftests: x86: Sync the new name of the test case to .gitignore Documentation: kvm: reorder ARM-specific section about KVM_SYSTEM_EVENT_SUSPEND x86, kvm: use correct GFP flags for preemption disabled KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer x86/kvm: Alloc dummy async #PF token outside of raw spinlock KVM: x86: avoid calling x86 emulator without a decoded instruction KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave) s390/uv_uapi: depend on CONFIG_S390 KVM: selftests: x86: Fix test failure on arch lbr capable platforms KVM: LAPIC: Trace LAPIC timer expiration on every vmentry KVM: s390: selftest: Test suppression indication on key prot exception KVM: s390: Don't indicate suppression on dirtying, failing memop selftests: drivers/s390x: Add uvdevice tests drivers/s390/char: Add Ultravisor io device MAINTAINERS: Update KVM RISC-V entry to cover selftests support RISC-V: KVM: Introduce ISA extension register RISC-V: KVM: Cleanup stale TLB entries when host CPU changes RISC-V: KVM: Add remote HFENCE functions based on VCPU requests ...
2022-05-23Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Initial support for the ARMv9 Scalable Matrix Extension (SME). SME takes the approach used for vectors in SVE and extends this to provide architectural support for matrix operations. No KVM support yet, SME is disabled in guests. - Support for crashkernel reservations above ZONE_DMA via the 'crashkernel=X,high' command line option. - btrfs search_ioctl() fix for live-lock with sub-page faults. - arm64 perf updates: support for the Hisilicon "CPA" PMU for monitoring coherent I/O traffic, support for Arm's CMN-650 and CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup. - Kselftest updates for SME, BTI, MTE. - Automatic generation of the system register macros from a 'sysreg' file describing the register bitfields. - Update the type of the function argument holding the ESR_ELx register value to unsigned long to match the architecture register size (originally 32-bit but extended since ARMv8.0). - stacktrace cleanups. - ftrace cleanups. - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(), avoid executable mappings in kexec/hibernate code, drop TLB flushing from get_clear_flush() (and rename it to get_clear_contig()), ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits) arm64/sysreg: Generate definitions for FAR_ELx arm64/sysreg: Generate definitions for DACR32_EL2 arm64/sysreg: Generate definitions for CSSELR_EL1 arm64/sysreg: Generate definitions for CPACR_ELx arm64/sysreg: Generate definitions for CONTEXTIDR_ELx arm64/sysreg: Generate definitions for CLIDR_EL1 arm64/sve: Move sve_free() into SVE code section arm64: Kconfig.platforms: Add comments arm64: Kconfig: Fix indentation and add comments arm64: mm: avoid writable executable mappings in kexec/hibernate code arm64: lds: move special code sections out of kernel exec segment arm64/hugetlb: Implement arm64 specific huge_ptep_get() arm64/hugetlb: Use ptep_get() to get the pte value of a huge page arm64: kdump: Do not allocate crash low memory if not needed arm64/sve: Generate ZCR definitions arm64/sme: Generate defintions for SVCR arm64/sme: Generate SMPRI_EL1 definitions arm64/sme: Automatically generate SMPRIMAP_EL2 definitions arm64/sme: Automatically generate SMIDR_EL1 defines arm64/sme: Automatically generate defines for SMCR ...
2022-05-04Merge branch kvm-arm64/aarch32-idreg-trap into kvmarm-master/nextMarc Zyngier
* kvm-arm64/aarch32-idreg-trap: : . : Add trapping/sanitising infrastructure for AArch32 systen registers, : allowing more control over what we actually expose (such as the PMU). : : Patches courtesy of Oliver and Alexandru. : . KVM: arm64: Fix new instances of 32bit ESRs KVM: arm64: Hide AArch32 PMU registers when not available KVM: arm64: Start trapping ID registers for 32 bit guests KVM: arm64: Plumb cp10 ID traps through the AArch64 sysreg handler KVM: arm64: Wire up CP15 feature registers to their AArch64 equivalents KVM: arm64: Don't write to Rt unless sys_reg emulation succeeds KVM: arm64: Return a bool from emulate_cp() Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-03KVM: arm64: Start trapping ID registers for 32 bit guestsOliver Upton
To date KVM has not trapped ID register accesses from AArch32, meaning that guests get an unconstrained view of what hardware supports. This can be a serious problem because we try to base the guest's feature registers on values that are safe system-wide. Furthermore, KVM does not implement the latest ISA in the PMU and Debug architecture, so we constrain these fields to supported values. Since KVM now correctly handles CP15 and CP10 register traps, we no longer need to clear HCR_EL2.TID3 for 32 bit guests and will instead emulate reads with their safe values. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220503060205.2823727-6-oupton@google.com
2022-04-29KVM: arm64: Treat ESR_EL2 as a 64-bit registerAlexandru Elisei
ESR_EL2 was defined as a 32-bit register in the initial release of the ARM Architecture Manual for Armv8-A, and was later extended to 64 bits, with bits [63:32] RES0. ARMv8.7 introduced FEAT_LS64, which makes use of bits [36:32]. KVM treats ESR_EL1 as a 64-bit register when saving and restoring the guest context, but ESR_EL2 is handled as a 32-bit register. Start treating ESR_EL2 as a 64-bit register to allow KVM to make use of the most significant 32 bits in the future. The type chosen to represent ESR_EL2 is u64, as that is consistent with the notation KVM overwhelmingly uses today (u32), and how the rest of the registers are declared. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220425114444.368693-5-alexandru.elisei@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-27KVM: arm64: Inject exception on out-of-IPA-range translation faultMarc Zyngier
When taking a translation fault for an IPA that is outside of the range defined by the hypervisor (between the HW PARange and the IPA range), we stupidly treat it as an IO and forward the access to userspace. Of course, userspace can't do much with it, and things end badly. Arguably, the guest is braindead, but we should at least catch the case and inject an exception. Check the faulting IPA against: - the sanitised PARange: inject an address size fault - the IPA size: inject an abort Reported-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-04-06KVM: arm64: mixed-width check should be skipped for uninitialized vCPUsReiji Watanabe
KVM allows userspace to configure either all EL1 32bit or 64bit vCPUs for a guest. At vCPU reset, vcpu_allowed_register_width() checks if the vcpu's register width is consistent with all other vCPUs'. Since the checking is done even against vCPUs that are not initialized (KVM_ARM_VCPU_INIT has not been done) yet, the uninitialized vCPUs are erroneously treated as 64bit vCPU, which causes the function to incorrectly detect a mixed-width VM. Introduce KVM_ARCH_FLAG_EL1_32BIT and KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED bits for kvm->arch.flags. A value of the EL1_32BIT bit indicates that the guest needs to be configured with all 32bit or 64bit vCPUs, and a value of the REG_WIDTH_CONFIGURED bit indicates if a value of the EL1_32BIT bit is valid (already set up). Values in those bits are set at the first KVM_ARM_VCPU_INIT for the guest based on KVM_ARM_VCPU_EL1_32BIT configuration for the vCPU. Check vcpu's register width against those new bits at the vcpu's KVM_ARM_VCPU_INIT (instead of against other vCPUs' register width). Fixes: 66e94d5cafd4 ("KVM: arm64: Prevent mixed-width VM creation") Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220329031924.619453-2-reijiw@google.com
2022-01-07Merge tag 'kvmarm-5.17' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update
2021-12-20KVM: arm64: Use defined value for SCTLR_ELx_EEFuad Tabba
Replace the hardcoded value with the existing definition. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208192810.657360-1-tabba@google.com
2021-12-08KVM: arm64: Move vGIC v4 handling for WFI out arch callback hookSean Christopherson
Move the put and reload of the vGIC out of the block/unblock callbacks and into a dedicated WFI helper. Functionally, this is nearly a nop as the block hook is called at the very beginning of kvm_vcpu_block(), and the only code in kvm_vcpu_block() after the unblock hook is to update the halt-polling controls, i.e. can only affect the next WFI. Back when the arch (un)blocking hooks were added by commits 3217f7c25bca ("KVM: Add kvm_arch_vcpu_{un}blocking callbacks) and d35268da6687 ("arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block"), the hooks were invoked only when KVM was about to "block", i.e. schedule out the vCPU. The use case at the time was to schedule a timer in the host based on the earliest timer in the guest in order to wake the blocking vCPU when the emulated guest timer fired. Commit accb99bcd0ca ("KVM: arm/arm64: Simplify bg_timer programming") reworked the timer logic to be even more precise, by waiting until the vCPU was actually scheduled out, and so move the timer logic from the (un)blocking hooks to vcpu_load/put. In the meantime, the hooks gained usage for enabling vGIC v4 doorbells in commit df9ba95993b9 ("KVM: arm/arm64: GICv4: Use the doorbell interrupt as an unblocking source"), and added related logic for the VMCR in commit 5eeaf10eec39 ("KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block"). Finally, commit 07ab0f8d9a12 ("KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence") hoisted the (un)blocking hooks so that they wrapped KVM's halt-polling logic in addition to the core "block" logic. In other words, the original need for arch hooks to take action _only_ in the block path is long since gone. Cc: Oliver Upton <oupton@google.com> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-12KVM: arm64: Fix reporting of endianess when the access originates at EL0Marc Zyngier
We currently check SCTLR_EL1.EE when computing the address of a faulting guest access. However, the fault could have occured at EL0, in which case the right bit to check would be SCTLR_EL1.E0E. This is pretty unlikely to cause any issue in practice: You'd have to have a guest with a LE EL1 and a BE EL0 (or the other way around), and have mapped a device into the EL0 page tables. Good luck with that! Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211012112312.1247467-1-maz@kernel.org
2021-06-22KVM: arm64: Introduce MTE VM featureSteven Price
Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging for a VM. This will expose the feature to the guest and automatically tag memory pages touched by the VM as PG_mte_tagged (and clear the tag storage) to ensure that the guest cannot see stale tags, and so that the tags are correctly saved/restored across swap. Actually exposing the new capability to user space happens in a later patch. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> [maz: move VM_SHARED sampling into the critical section] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210621111716.37157-3-steven.price@arm.com
2021-05-27KVM: arm64: Prevent mixed-width VM creationMarc Zyngier
It looks like we have tolerated creating mixed-width VMs since... forever. However, that was never the intention, and we'd rather not have to support that pointless complexity. Forbid such a setup by making sure all the vcpus have the same register width. Reported-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210524170752.1549797-1-maz@kernel.org