summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/arm.c
AgeCommit message (Collapse)Author
2023-03-30KVM: arm64: timers: Allow userspace to set the global counter offsetMarc Zyngier
And this is the moment you have all been waiting for: setting the counter offset from userspace. We expose a brand new capability that reports the ability to set the offset for both the virtual and physical sides. In keeping with the architecture, the offset is expressed as a delta that is substracted from the physical counter value. Once this new API is used, there is no going back, and the counters cannot be written to to set the offsets implicitly (the writes are instead ignored). Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-8-maz@kernel.org
2023-03-30KVM: arm64: Expose {un,}lock_all_vcpus() to the rest of KVMMarc Zyngier
Being able to lock/unlock all vcpus in one go is a feature that only the vgic has enjoyed so far. Let's be brave and expose it to the world. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-7-maz@kernel.org
2023-03-29KVM: arm64: Use config_lock to protect data ordered against KVM_RUNOliver Upton
There are various bits of VM-scoped data that can only be configured before the first call to KVM_RUN, such as the hypercall bitmaps and the PMU. As these fields are protected by the kvm->lock and accessed while holding vcpu->mutex, this is yet another example of lock inversion. Change out the kvm->lock for kvm->arch.config_lock in all of these instances. Opportunistically simplify the locking mechanics of the PMU configuration by holding the config_lock for the entirety of kvm_arm_pmu_v3_set_attr(). Note that this also addresses a couple of bugs. There is an unguarded read of the PMU version in KVM_ARM_VCPU_PMU_V3_FILTER which could race with KVM_ARM_VCPU_PMU_V3_SET_PMU. Additionally, until now writes to the per-vCPU vPMU irq were not serialized VM-wide, meaning concurrent calls to KVM_ARM_VCPU_PMU_V3_IRQ could lead to a false positive in pmu_irq_is_valid(). Cc: stable@vger.kernel.org Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-4-oliver.upton@linux.dev
2023-03-29KVM: arm64: Avoid lock inversion when setting the VM register widthOliver Upton
kvm->lock must be taken outside of the vcpu->mutex. Of course, the locking documentation for KVM makes this abundantly clear. Nonetheless, the locking order in KVM/arm64 has been wrong for quite a while; we acquire the kvm->lock while holding the vcpu->mutex all over the shop. All was seemingly fine until commit 42a90008f890 ("KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule") caught us with our pants down, leading to lockdep barfing: ====================================================== WARNING: possible circular locking dependency detected 6.2.0-rc7+ #19 Not tainted ------------------------------------------------------ qemu-system-aar/859 is trying to acquire lock: ffff5aa69269eba0 (&host_kvm->lock){+.+.}-{3:3}, at: kvm_reset_vcpu+0x34/0x274 but task is already holding lock: ffff5aa68768c0b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x8c/0xba0 which lock already depends on the new lock. Add a dedicated lock to serialize writes to VM-scoped configuration from the context of a vCPU. Protect the register width flags with the new lock, thus avoiding the need to grab the kvm->lock while holding vcpu->mutex in kvm_reset_vcpu(). Cc: stable@vger.kernel.org Reported-by: Jeremy Linton <jeremy.linton@arm.com> Link: https://lore.kernel.org/kvmarm/f6452cdd-65ff-34b8-bab0-5c06416da5f6@arm.com/ Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-3-oliver.upton@linux.dev
2023-03-29KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ONOliver Upton
KVM/arm64 had the lock ordering backwards on vcpu->mutex and kvm->lock from the very beginning. One such example is the way vCPU resets are handled: the kvm->lock is acquired while handling a guest CPU_ON PSCI call. Add a dedicated lock to serialize writes to kvm_vcpu_arch::{mp_state, reset_state}. Promote all accessors of mp_state to {READ,WRITE}_ONCE() as readers do not acquire the mp_state_lock. While at it, plug yet another race by taking the mp_state_lock in the KVM_SET_MP_STATE ioctl handler. As changes to MP state are now guarded with a dedicated lock, drop the kvm->lock acquisition from the PSCI CPU_ON path. Similarly, move the reader of reset_state outside of the kvm->lock and instead protect it with the mp_state_lock. Note that writes to reset_state::reset have been demoted to regular stores as both readers and writers acquire the mp_state_lock. While the kvm->lock inversion still exists in kvm_reset_vcpu(), at least now PSCI CPU_ON no longer depends on it for serializing vCPU reset. Cc: stable@vger.kernel.org Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-2-oliver.upton@linux.dev
2023-03-16KVM: Change return type of kvm_arch_vm_ioctl() to "int"Thomas Huth
All kvm_arch_vm_ioctl() implementations now only deal with "int" types as return values, so we can change the return type of these functions to use "int" instead of "long". Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20230208140105.655814-7-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-13Merge branch kvm-arm64/nv-prefix into kvmarm/nextOliver Upton
* kvm-arm64/nv-prefix: : Preamble to NV support, courtesy of Marc Zyngier. : : This brings in a set of prerequisite patches for supporting nested : virtualization in KVM/arm64. Of course, there is a long way to go until : NV is actually enabled in KVM. : : - Introduce cpucap / vCPU feature flag to pivot the NV code on : : - Add support for EL2 vCPU register state : : - Basic nested exception handling : : - Hide unsupported features from the ID registers for NV-capable VMs KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-13Merge branch kvm-arm64/misc into kvmarm/nextOliver Upton
* kvm-arm64/misc: : Miscellaneous updates : : - Convert CPACR_EL1_TTA to the new, generated system register : definitions. : : - Serialize toggling CPACR_EL1.SMEN to avoid unexpected exceptions when : accessing SVCR in the host. : : - Avoid quiescing the guest if a vCPU accesses its own redistributor's : SGIs/PPIs, eliminating the need to IPI. Largely an optimization for : nested virtualization, as the L1 accesses the affected registers : rather often. : : - Conversion to kstrtobool() : : - Common definition of INVALID_GPA across architectures : : - Enable CONFIG_USERFAULTFD for CI runs of KVM selftests KVM: arm64: Fix non-kerneldoc comments KVM: selftests: Enable USERFAULTFD KVM: selftests: Remove redundant setbuf() arm64/sysreg: clean up some inconsistent indenting KVM: MMU: Make the definition of 'INVALID_GPA' common KVM: arm64: vgic-v3: Use kstrtobool() instead of strtobool() KVM: arm64: vgic-v3: Limit IPI-ing when accessing GICR_{C,S}ACTIVER0 KVM: arm64: Synchronize SMEN on vcpu schedule out KVM: arm64: Kill CPACR_EL1_TTA definition Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-13Merge branch kvm-arm64/psci-relay-fixes into kvmarm/nextOliver Upton
* kvm-arm64/psci-relay-fixes: : Fixes for CPU on/resume with pKVM, courtesy Quentin Perret. : : A consequence of deprivileging the host is that pKVM relays PSCI calls : on behalf of the host. pKVM's CPU initialization failed to fully : initialize the CPU's EL2 state, which notably led to unexpected SVE : traps resulting in a hyp panic. : : The issue is addressed by reusing parts of __finalise_el2 to restore CPU : state in the PSCI relay. KVM: arm64: Finalise EL2 state from pKVM PSCI relay KVM: arm64: Use sanitized values in __check_override in nVHE KVM: arm64: Introduce finalise_el2_state macro KVM: arm64: Provide sanitized SYS_ID_AA64SMFR0_EL1 to nVHE
2023-02-13Merge branch kvm-arm64/virtual-cache-geometry into kvmarm/nextOliver Upton
* kvm-arm64/virtual-cache-geometry: : Virtualized cache geometry for KVM guests, courtesy of Akihiko Odaki. : : KVM/arm64 has always exposed the host cache geometry directly to the : guest, even though non-secure software should never perform CMOs by : Set/Way. This was slightly wrong, as the cache geometry was derived from : the PE on which the vCPU thread was running and not a sanitized value. : : All together this leads to issues migrating VMs on heterogeneous : systems, as the cache geometry saved/restored could be inconsistent. : : KVM/arm64 now presents 1 level of cache with 1 set and 1 way. The cache : geometry is entirely controlled by userspace, such that migrations from : older kernels continue to work. KVM: arm64: Mark some VM-scoped allocations as __GFP_ACCOUNT KVM: arm64: Normalize cache configuration KVM: arm64: Mask FEAT_CCIDX KVM: arm64: Always set HCR_TID2 arm64/cache: Move CLIDR macro definitions arm64/sysreg: Add CCSIDR2_EL1 arm64/sysreg: Convert CCSIDR_EL1 to automatic generation arm64: Allow the definition of UNKNOWN system register fields Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11arm64: Add ARM64_HAS_NESTED_VIRT cpufeatureJintack Lim
Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the CPU has the ARMv8.3 nested virtualization capability, together with the 'kvm-arm.mode=nested' command line option. This will be used to support nested virtualization in KVM. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [maz: moved the command-line option to kvm-arm.mode] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-07KVM: arm64: Mark some VM-scoped allocations as __GFP_ACCOUNTOliver Upton
Generally speaking, any memory allocations that can be associated with a particular VM should be charged to the cgroup of its process. Nonetheless, there are a couple spots in KVM/arm64 that aren't currently accounted: - the ccsidr array containing the virtualized cache hierarchy - the cpumask of supported cpus, for use of the vPMU on heterogeneous systems Go ahead and set __GFP_ACCOUNT for these allocations. Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20230206235229.4174711-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-07KVM: arm64: Fix non-kerneldoc commentsMarc Zyngier
The robots amongts us have started spitting out irritating emails about random errors such as: <quote> arch/arm64/kvm/arm.c:2207: warning: expecting prototype for Initialize Hyp(). Prototype was for kvm_arm_init() instead </quote> which makes little sense until you finally grok what they are on about: comments that look like a kerneldoc, but that aren't. Let's address this before I get even more irritated... ;-) Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/63e139e1.J5AHO6vmxaALh7xv%25lkp@intel.com Link: https://lore.kernel.org/r/20230207094321.1238600-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-02KVM: arm64: Provide sanitized SYS_ID_AA64SMFR0_EL1 to nVHEQuentin Perret
We will need a sanitized copy of SYS_ID_AA64SMFR0_EL1 from the nVHE EL2 code shortly, so make sure to provide it with a copy. Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230201103755.1398086-2-qperret@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2022-12-29KVM: Drop kvm_arch_check_processor_compat() hookSean Christopherson
Drop kvm_arch_check_processor_compat() and its support code now that all architecture implementations are nops. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Acked-by: Anup Patel <anup@brainfault.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-Id: <20221130230934.1014142-33-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29KVM: Drop kvm_arch_{init,exit}() hooksSean Christopherson
Drop kvm_arch_init() and kvm_arch_exit() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-30-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29KVM: arm64: Mark kvm_arm_init() and its unique descendants as __initSean Christopherson
Tag kvm_arm_init() and its unique helper as __init, and tag data that is only ever modified under the kvm_arm_init() umbrella as read-only after init. Opportunistically name the boolean param in kvm_timer_hyp_init()'s prototype to match its definition. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-21-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29KVM: arm64: Do arm/arch initialization without bouncing through kvm_init()Sean Christopherson
Do arm/arch specific initialization directly in arm's module_init(), now called kvm_arm_init(), instead of bouncing through kvm_init() to reach kvm_arch_init(). Invoking kvm_arch_init() is the very first action performed by kvm_init(), so from a initialization perspective this is a glorified nop. Avoiding kvm_arch_init() also fixes a mostly benign bug as kvm_arch_exit() doesn't properly unwind if a later stage of kvm_init() fails. While the soon-to-be-deleted comment about compiling as a module being unsupported is correct, kvm_arch_exit() can still be called by kvm_init() if any step after the call to kvm_arch_init() succeeds. Add a FIXME to call out that pKVM initialization isn't unwound if kvm_init() fails, which is a pre-existing problem inherited from kvm_arch_exit(). Making kvm_arch_init() a nop will also allow dropping kvm_arch_init() and kvm_arch_exit() entirely once all other architectures follow suit. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29KVM: arm64: Unregister perf callbacks if hypervisor finalization failsSean Christopherson
Undo everything done by init_subsystems() if a later initialization step fails, i.e. unregister perf callbacks in addition to unregistering the power management notifier. Fixes: bfa79a805454 ("KVM: arm64: Elevate hypervisor mappings creation at EL2") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29KVM: arm64: Free hypervisor allocations if vector slot init failsSean Christopherson
Teardown hypervisor mode if vector slot setup fails in order to avoid leaking any allocations done by init_hyp_mode(). Fixes: b881cdce77b4 ("KVM: arm64: Allocate hyp vectors statically") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29KVM: arm64: Simplify the CPUHP logicMarc Zyngier
For a number of historical reasons, the KVM/arm64 hotplug setup is pretty complicated, and we have two extra CPUHP notifiers for vGIC and timers. It looks pretty pointless, and gets in the way of further changes. So let's just expose some helpers that can be called from the core CPUHP callback, and get rid of everything else. This gives us the opportunity to drop a useless notifier entry, as well as tidy-up the timer enable/disable, which was a bit odd. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29KVM: Drop arch hardware (un)setup hooksSean Christopherson
Drop kvm_arch_hardware_setup() and kvm_arch_hardware_unsetup() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-09Merge tag 'kvmarm-6.2' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.2 - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on. - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Add/Enable/Fix a bunch of selftests covering memslots, breakpoints, stage-2 faults and access tracking. You name it, we got it, we probably broke it. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. As a side effect, this tag also drags: - The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring series - A shared branch with the arm64 tree that repaints all the system registers to match the ARM ARM's naming, and resulting in interesting conflicts
2022-12-05Merge branch kvm-arm64/pmu-unchained into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pmu-unchained: : . : PMUv3 fixes and improvements: : : - Make the CHAIN event handling strictly follow the architecture : : - Add support for PMUv3p5 (64bit counters all the way) : : - Various fixes and cleanups : . KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow KVM: arm64: PMU: Sanitise PMCR_EL0.LP on first vcpu run KVM: arm64: PMU: Simplify PMCR_EL0 reset handling KVM: arm64: PMU: Replace version number '0' with ID_AA64DFR0_EL1_PMUVer_NI KVM: arm64: PMU: Make kvm_pmc the main data structure KVM: arm64: PMU: Simplify vcpu computation on perf overflow notification KVM: arm64: PMU: Allow PMUv3p5 to be exposed to the guest KVM: arm64: PMU: Implement PMUv3p5 long counter support KVM: arm64: PMU: Allow ID_DFR0_EL1.PerfMon to be set from userspace KVM: arm64: PMU: Allow ID_AA64DFR0_EL1.PMUver to be set from userspace KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation KVM: arm64: PMU: Do not let AArch32 change the counters' top 32 bits KVM: arm64: PMU: Simplify setting a counter to a specific value KVM: arm64: PMU: Add counter_index_to_*reg() helpers KVM: arm64: PMU: Only narrow counters that are not 64bit wide KVM: arm64: PMU: Narrow the overflow checking when required KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow KVM: arm64: PMU: Always advertise the CHAIN event KVM: arm64: PMU: Align chained counter implementation with architecture pseudocode arm64: Add ID_DFR0_EL1.PerfMon values for PMUv3p7 and IMP_DEF Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05Merge branch kvm-arm64/pkvm-vcpu-state into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-vcpu-state: (25 commits) : . : Large drop of pKVM patches from Will Deacon and co, adding : a private vm/vcpu state at EL2, managed independently from : the EL1 state. From the cover letter: : : "This is version six of the pKVM EL2 state series, extending the pKVM : hypervisor code so that it can dynamically instantiate and manage VM : data structures without the host being able to access them directly. : These structures consist of a hyp VM, a set of hyp vCPUs and the stage-2 : page-table for the MMU. The pages used to hold the hypervisor structures : are returned to the host when the VM is destroyed." : . KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run() KVM: arm64: Don't unnecessarily map host kernel sections at EL2 KVM: arm64: Explicitly map 'kvm_vgic_global_state' at EL2 KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2 KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache KVM: arm64: Instantiate guest stage-2 page-tables at EL2 KVM: arm64: Consolidate stage-2 initialisation into a single function KVM: arm64: Add generic hyp_memcache helpers KVM: arm64: Provide I-cache invalidation by virtual address at EL2 KVM: arm64: Initialise hypervisor copies of host symbols unconditionally KVM: arm64: Add per-cpu fixmap infrastructure at EL2 KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1 KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 KVM: arm64: Rename 'host_kvm' to 'host_mmu' KVM: arm64: Add hyp_spinlock_t static initializer KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h KVM: arm64: Add helpers to pin memory shared with the hypervisor at EL2 KVM: arm64: Prevent the donation of no-map pages KVM: arm64: Implement do_donate() helper for donating memory ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05Merge branch kvm-arm64/dirty-ring into kvmarm-master/nextMarc Zyngier
* kvm-arm64/dirty-ring: : . : Add support for the "per-vcpu dirty-ring tracking with a bitmap : and sprinkles on top", courtesy of Gavin Shan. : : This branch drags the kvmarm-fixes-6.1-3 tag which was already : merged in 6.1-rc4 so that the branch is in a working state. : . KVM: Push dirty information unconditionally to backup bitmap KVM: selftests: Automate choosing dirty ring size in dirty_log_test KVM: selftests: Clear dirty ring states between two modes in dirty_log_test KVM: selftests: Use host page size to map ring buffer in dirty_log_test KVM: arm64: Enable ring-based dirty memory tracking KVM: Support dirty ring in conjunction with bitmap KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-11-19KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creationMarc Zyngier
As further patches will enable the selection of a PMU revision from userspace, sample the supported PMU revision at VM creation time, rather than building each time the ID_AA64DFR0_EL1 register is accessed. This shouldn't result in any change in behaviour. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221113163832.3154370-11-maz@kernel.org
2022-11-11KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2Will Deacon
Sharing 'kvm_arm_vmid_bits' between EL1 and EL2 allows the host to modify the variable arbitrarily, potentially leading to all sorts of shenanians as this is used to configure the VTTBR register for the guest stage-2. In preparation for unmapping host sections entirely from EL2, maintain a copy of 'kvm_arm_vmid_bits' in the pKVM hypervisor and initialise it from the host value while it is still trusted. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-23-will@kernel.org
2022-11-11KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the hostQuentin Perret
When pKVM is enabled, the hypervisor at EL2 does not trust the host at EL1 and must therefore prevent it from having unrestricted access to internal hypervisor state. The 'kvm_arm_hyp_percpu_base' array holds the offsets for hypervisor per-cpu allocations, so move this this into the nVHE code where it cannot be modified by the untrusted host at EL1. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-22-will@kernel.org
2022-11-11KVM: arm64: Consolidate stage-2 initialisation into a single functionQuentin Perret
The initialisation of guest stage-2 page-tables is currently split across two functions: kvm_init_stage2_mmu() and kvm_arm_setup_stage2(). That is presumably for historical reasons as kvm_arm_setup_stage2() originates from the (now defunct) KVM port for 32-bit Arm. Simplify this code path by merging both functions into one, taking care to map the 'struct kvm' into the hypervisor stage-1 early on in order to simplify the failure path. Tested-by: Vincent Donnefort <vdonnefort@google.com> Co-developed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-19-will@kernel.org
2022-11-11KVM: arm64: Provide I-cache invalidation by virtual address at EL2Will Deacon
In preparation for handling cache maintenance of guest pages from within the pKVM hypervisor at EL2, introduce an EL2 copy of icache_inval_pou() which will later be plumbed into the stage-2 page-table cache maintenance callbacks, ensuring that the initial contents of pages mapped as executable into the guest stage-2 page-table is visible to the instruction fetcher. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-17-will@kernel.org
2022-11-11KVM: arm64: Initialise hypervisor copies of host symbols unconditionallyWill Deacon
The nVHE object at EL2 maintains its own copies of some host variables so that, when pKVM is enabled, the host cannot directly modify the hypervisor state. When running in normal nVHE mode, however, these variables are still mirrored at EL2 but are not initialised. Initialise the hypervisor symbols from the host copies regardless of pKVM, ensuring that any reference to this data at EL2 with normal nVHE will return a sensibly initialised value. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-16-will@kernel.org
2022-11-11KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1Fuad Tabba
With the pKVM hypervisor at EL2 now offering hypercalls to the host for creating and destroying VM and vCPU structures, plumb these in to the existing arm64 KVM backend to ensure that the hypervisor data structures are allocated and initialised on first vCPU run for a pKVM guest. In the host, 'struct kvm_protected_vm' is introduced to hold the handle of the pKVM VM instance as well as to track references to the memory donated to the hypervisor so that it can be freed back to the host allocator following VM teardown. The stage-2 page-table, hypervisor VM and vCPU structures are allocated separately so as to avoid the need for a large physically-contiguous allocation in the host at run-time. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-14-will@kernel.org
2022-11-10KVM: arm64: Fix kvm init failure when mode!=vhe and VA_BITS=52.Ryan Roberts
For nvhe and protected modes, the hyp stage 1 page-tables were previously configured to have the same number of VA bits as the kernel's idmap. However, for kernel configs with VA_BITS=52 and where the kernel is loaded in physical memory below 48 bits, the idmap VA bits is actually smaller than the kernel's normal stage 1 VA bits. This can lead to kernel addresses that can't be mapped into the hypervisor, leading to kvm initialization failure during boot: kvm [1]: IPA Size Limit: 48 bits kvm [1]: Cannot map world-switch code kvm [1]: error initializing Hyp mode: -34 Fix this by ensuring that the hyp stage 1 VA size is the maximum of what's used for the idmap and the regular kernel stage 1. At the same time, refactor the code so that the hyp VA bits is only calculated in one place. Prior to 7ba8f2b2d652, the idmap was always 52 bits for a 52 VA bits kernel and therefore the hyp stage1 was also always 52 bits. Fixes: 7ba8f2b2d652 ("arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> [maz: commit message fixes] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221103150507.32948-2-ryan.roberts@arm.com
2022-11-10KVM: arm64: Enable ring-based dirty memory trackingGavin Shan
Enable ring-based dirty memory tracking on ARM64: - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL. - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP. - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page offset. - Add ARM64 specific kvm_arch_allow_write_without_running_vcpu() to keep the site of saving vgic/its tables out of the no-running-vcpu radar. Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110104914.31280-5-gshan@redhat.com
2022-11-09KVM: replace direct irq.h inclusionPaolo Bonzini
virt/kvm/irqchip.c is including "irq.h" from the arch-specific KVM source directory (i.e. not from arch/*/include) for the sole purpose of retrieving irqchip_in_kernel. Making the function inline in a header that is already included, such as asm/kvm_host.h, is not possible because it needs to look at struct kvm which is defined after asm/kvm_host.h is included. So add a kvm_arch_irqchip_in_kernel non-inline function; irqchip_in_kernel() is only performance critical on arm64 and x86, and the non-inline function is enough on all other architectures. irq.h can then be deleted from all architectures except x86. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-03Merge tag 'kvmarm-6.1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for v6.1 - Fixes for single-stepping in the presence of an async exception as well as the preservation of PSTATE.SS - Better handling of AArch32 ID registers on AArch64-only systems - Fixes for the dirty-ring API, allowing it to work on architectures with relaxed memory ordering - Advertise the new kvmarm mailing list - Various minor cleanups and spelling fixes
2022-09-30Merge tag 'kvm-x86-6.1-2' of https://github.com/sean-jc/linux into HEADPaolo Bonzini
KVM x86 updates for 6.1, batch #2: - Misc PMU fixes and cleanups. - Fixes for Hyper-V hypercall selftest
2022-09-26KVM: remove KVM_REQ_UNHALTPaolo Bonzini
KVM_REQ_UNHALT is now unnecessary because it is replaced by the return value of kvm_vcpu_block/kvm_vcpu_halt. Remove it. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20220921003201.1441511-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: arm64: Ignore kvm-arm.mode if !is_hyp_mode_available()Elliot Berman
Ignore kvm-arm.mode if !is_hyp_mode_available(). Specifically, we want to avoid switching kvm_mode to KVM_MODE_PROTECTED if hypervisor mode is not available. This prevents "Protected KVM" cpu capability being reported when Linux is booting in EL1 and would not have KVM enabled. Reasonably though, we should warn if the command line is requesting a KVM mode at all if KVM isn't actually available. Allow "kvm-arm.mode=none" to skip the warning since this would disable KVM anyway. Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220920190658.2880184-1-quic_eberman@quicinc.com
2022-09-19KVM: arm64: Use kmemleak_free_part_phys() to unregister hyp_mem_baseZenghui Yu
With commit 0c24e061196c ("mm: kmemleak: add rbtree and store physical address for objects allocated with PA"), kmemleak started to put the objects allocated with physical address onto object_phys_tree_root tree. The kmemleak_free_part() therefore no longer worked as expected on physically allocated objects (hyp_mem_base in this case) as it attempted to search and remove things in object_tree_root tree. Fix it by using kmemleak_free_part_phys() to unregister hyp_mem_base. This fixes an immediate crash when booting a KVM host in protected mode with kmemleak enabled. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220908130659.2021-1-yuzenghui@huawei.com
2022-08-19Merge tag 'kvmarm-fixes-6.0-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.0, take #1 - Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK - Tidy-up handling of AArch32 on asymmetric systems
2022-08-17KVM: arm64: Treat PMCR_EL1.LC as RES1 on asymmetric systemsOliver Upton
KVM does not support AArch32 on asymmetric systems. To that end, enforce AArch64-only behavior on PMCR_EL1.LC when on an asymmetric system. Fixes: 2122a833316f ("arm64: Allow mismatched 32-bit EL0 support") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220816192554.1455559-2-oliver.upton@linux.dev
2022-08-01Merge tag 'kvmarm-5.20' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.20: - Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack - Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure - Disagregation of the vcpu flags in separate sets to better track their use model. - A fix for the GICv2-on-v3 selftest - A small set of cosmetic fixes
2022-07-27Merge branch kvm-arm64/nvhe-stacktrace into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nvhe-stacktrace: (27 commits) : . : Add an overflow stack to the nVHE EL2 code, allowing : the implementation of an unwinder, courtesy of : Kalesh Singh. From the cover letter (slightly edited): : : "nVHE has two modes of operation: protected (pKVM) and unprotected : (conventional nVHE). Depending on the mode, a slightly different approach : is used to dump the hypervisor stacktrace but the core unwinding logic : remains the same. : : * Protected nVHE (pKVM) stacktraces: : : In protected nVHE mode, the host cannot directly access hypervisor memory. : : The hypervisor stack unwinding happens in EL2 and is made accessible to : the host via a shared buffer. Symbolizing and printing the stacktrace : addresses is delegated to the host and happens in EL1. : : * Non-protected (Conventional) nVHE stacktraces: : : In non-protected mode, the host is able to directly access the hypervisor : stack pages. : : The hypervisor stack unwinding and dumping of the stacktrace is performed : by the host in EL1, as this avoids the memory overhead of setting up : shared buffers between the host and hypervisor." : : Additional patches from Oliver Upton and Marc Zyngier, tidying up : the initial series. : . arm64: Update 'unwinder howto' KVM: arm64: Don't open code ARRAY_SIZE() KVM: arm64: Move nVHE-only helpers into kvm/stacktrace.c KVM: arm64: Make unwind()/on_accessible_stack() per-unwinder functions KVM: arm64: Move nVHE stacktrace unwinding into its own compilation unit KVM: arm64: Move PROTECTED_NVHE_STACKTRACE around KVM: arm64: Introduce pkvm_dump_backtrace() KVM: arm64: Implement protected nVHE hyp stack unwinder KVM: arm64: Save protected-nVHE (pKVM) hyp stacktrace KVM: arm64: Stub implementation of pKVM HYP stack unwinder KVM: arm64: Allocate shared pKVM hyp stacktrace buffers KVM: arm64: Add PROTECTED_NVHE_STACKTRACE Kconfig KVM: arm64: Introduce hyp_dump_backtrace() KVM: arm64: Implement non-protected nVHE hyp stack unwinder KVM: arm64: Prepare non-protected nVHE hypervisor stacktrace KVM: arm64: Stub implementation of non-protected nVHE HYP stack unwinder KVM: arm64: On stack overflow switch to hyp overflow_stack arm64: stacktrace: Add description of stacktrace/common.h arm64: stacktrace: Factor out common unwind() arm64: stacktrace: Handle frame pointer from different address spaces ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-07-26KVM: arm64: Implement non-protected nVHE hyp stack unwinderKalesh Singh
Implements the common framework necessary for unwind() to work for non-protected nVHE mode: - on_accessible_stack() - on_overflow_stack() - unwind_next() Non-protected nVHE unwind() is used to unwind and dump the hypervisor stacktrace by the host in EL1 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220726073750.3219117-11-kaleshsingh@google.com
2022-07-17Merge branch kvm-arm64/sysreg-cleanup-5.20 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/sysreg-cleanup-5.20: : . : Long overdue cleanup of the sysreg userspace access, : with extra scrubbing on the vgic side of things. : From the cover letter: : : "Schspa Shi recently reported[1] that some of the vgic code interacting : with userspace was reading uninitialised stack memory, and although : that read wasn't used any further, it prompted me to revisit this part : of the code. : : Needless to say, this area of the kernel is pretty crufty, and shows a : bunch of issues in other parts of the KVM/arm64 infrastructure. This : series tries to remedy a bunch of them: : : - Sanitise the way we deal with sysregs from userspace: at the moment, : each and every .set_user/.get_user callback has to implement its own : userspace accesses (directly or indirectly). It'd be much better if : that was centralised so that we can reason about it. : : - Enforce that all AArch64 sysregs are 64bit. Always. This was sort of : implied by the code, but it took some effort to convince myself that : this was actually the case. : : - Move the vgic-v3 sysreg userspace accessors to the userspace : callbacks instead of hijacking the vcpu trap callback. This allows : us to reuse the sysreg infrastructure. : : - Consolidate userspace accesses for both GICv2, GICv3 and common code : as much as possible. : : - Cleanup a bunch of not-very-useful helpers, tidy up some of the code : as we touch it. : : [1] https://lore.kernel.org/r/m2h740zz1i.fsf@gmail.com" : . KVM: arm64: Get rid or outdated comments KVM: arm64: Descope kvm_arm_sys_reg_{get,set}_reg() KVM: arm64: Get rid of find_reg_by_id() KVM: arm64: vgic: Tidy-up calls to vgic_{get,set}_common_attr() KVM: arm64: vgic: Consolidate userspace access for base address setting KVM: arm64: vgic-v2: Add helper for legacy dist/cpuif base address setting KVM: arm64: vgic: Use {get,put}_user() instead of copy_{from.to}_user KVM: arm64: vgic-v2: Consolidate userspace access for MMIO registers KVM: arm64: vgic-v3: Consolidate userspace access for MMIO registers KVM: arm64: vgic-v3: Use u32 to manage the line level from userspace KVM: arm64: vgic-v3: Convert userspace accessors over to FIELD_GET/FIELD_PREP KVM: arm64: vgic-v3: Make the userspace accessors use sysreg API KVM: arm64: vgic-v3: Push user access into vgic_v3_cpu_sysregs_uaccess() KVM: arm64: vgic-v3: Simplify vgic_v3_has_cpu_sysregs_attr() KVM: arm64: Get rid of reg_from/to_user() KVM: arm64: Consolidate sysreg userspace accesses KVM: arm64: Rely on index_to_param() for size checks on userspace access KVM: arm64: Introduce generic get_user/set_user helpers for system registers KVM: arm64: Reorder handling of invariant sysregs from userspace KVM: arm64: Add get_reg_by_id() as a sys_reg_desc retrieving helper Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-07-17KVM: arm64: vgic-v2: Add helper for legacy dist/cpuif base address settingMarc Zyngier
We carry a legacy interface to set the base addresses for GICv2. As this is currently plumbed into the same handling code as the modern interface, it limits the evolution we can make there. Add a helper dedicated to this handling, with a view of maybe removing this in the future. Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29Merge branch kvm-arm64/burn-the-flags into kvmarm-master/nextMarc Zyngier
* kvm-arm64/burn-the-flags: : . : Rework the per-vcpu flags to make them more manageable, : splitting them in different sets that have specific : uses: : : - configuration flags : - input to the world-switch : - state bookkeeping for the kernel itself : : The FP tracking is also simplified and tracked outside : of the flags as a separate state. : . KVM: arm64: Move the handling of !FP outside of the fast path KVM: arm64: Document why pause cannot be turned into a flag KVM: arm64: Reduce the size of the vcpu flag members KVM: arm64: Add build-time sanity checks for flags KVM: arm64: Warn when PENDING_EXCEPTION and INCREMENT_PC are set together KVM: arm64: Convert vcpu sysregs_loaded_on_cpu to a state flag KVM: arm64: Kill unused vcpu flags field KVM: arm64: Move vcpu WFIT flag to the state flag set KVM: arm64: Move vcpu ON_UNSUPPORTED_CPU flag to the state flag set KVM: arm64: Move vcpu SVE/SME flags to the state flag set KVM: arm64: Move vcpu debug/SPE/TRBE flags to the input flag set KVM: arm64: Move vcpu PC/Exception flags to the input flag set KVM: arm64: Move vcpu configuration flags into their own set KVM: arm64: Add three sets of flags to the vcpu state KVM: arm64: Add helpers to manipulate vcpu flags among a set KVM: arm64: Move FP state ownership from flag to a tristate KVM: arm64: Drop FP_FOREIGN_STATE from the hypervisor code Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29KVM: arm64: Move the handling of !FP outside of the fast pathMarc Zyngier
We currently start by assuming that the host owns the FP unit at load time, then check again whether this is the case as we are about to run. Only at this point do we account for the fact that there is a (vanishingly small) chance that we're running on a system without a FPSIMD unit (yes, this is madness). We can actually move this FPSIMD check as early as load-time, and drop the check at run time. No intended change in behaviour. Suggested-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>