From c726200dd106d4c58a281eea7159b8ba28a4ab34 Mon Sep 17 00:00:00 2001 From: Christoffer Dall Date: Fri, 11 Oct 2019 13:07:05 +0200 Subject: KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace For a long time, if a guest accessed memory outside of a memslot using any of the load/store instructions in the architecture which doesn't supply decoding information in the ESR_EL2 (the ISV bit is not set), the kernel would print the following message and terminate the VM as a result of returning -ENOSYS to userspace: load/store instruction decoding not implemented The reason behind this message is that KVM assumes that all accesses outside a memslot is an MMIO access which should be handled by userspace, and we originally expected to eventually implement some sort of decoding of load/store instructions where the ISV bit was not set. However, it turns out that many of the instructions which don't provide decoding information on abort are not safe to use for MMIO accesses, and the remaining few that would potentially make sense to use on MMIO accesses, such as those with register writeback, are not used in practice. It also turns out that fetching an instruction from guest memory can be a pretty horrible affair, involving stopping all CPUs on SMP systems, handling multiple corner cases of address translation in software, and more. It doesn't appear likely that we'll ever implement this in the kernel. What is much more common is that a user has misconfigured his/her guest and is actually not accessing an MMIO region, but just hitting some random hole in the IPA space. In this scenario, the error message above is almost misleading and has led to a great deal of confusion over the years. It is, nevertheless, ABI to userspace, and we therefore need to introduce a new capability that userspace explicitly enables to change behavior. This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV) which does exactly that, and introduces a new exit reason to report the event to userspace. User space can then emulate an exception to the guest, restart the guest, suspend the guest, or take any other appropriate action as per the policy of the running system. Reported-by: Heinrich Schuchardt Signed-off-by: Christoffer Dall Reviewed-by: Alexander Graf Signed-off-by: Marc Zyngier --- Documentation/virt/kvm/api.txt | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt index 4833904d32a5..7403f15657c2 100644 --- a/Documentation/virt/kvm/api.txt +++ b/Documentation/virt/kvm/api.txt @@ -4468,6 +4468,39 @@ Hyper-V SynIC state change. Notification is used to remap SynIC event/message pages and to enable/disable SynIC messages/events processing in userspace. + /* KVM_EXIT_ARM_NISV */ + struct { + __u64 esr_iss; + __u64 fault_ipa; + } arm_nisv; + +Used on arm and arm64 systems. If a guest accesses memory not in a memslot, +KVM will typically return to userspace and ask it to do MMIO emulation on its +behalf. However, for certain classes of instructions, no instruction decode +(direction, length of memory access) is provided, and fetching and decoding +the instruction from the VM is overly complicated to live in the kernel. + +Historically, when this situation occurred, KVM would print a warning and kill +the VM. KVM assumed that if the guest accessed non-memslot memory, it was +trying to do I/O, which just couldn't be emulated, and the warning message was +phrased accordingly. However, what happened more often was that a guest bug +caused access outside the guest memory areas which should lead to a more +meaningful warning message and an external abort in the guest, if the access +did not fall within an I/O window. + +Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable +this capability at VM creation. Once this is done, these types of errors will +instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from +the HSR (arm) and ESR_EL2 (arm64) in the esr_iss field, and the faulting IPA +in the fault_ipa field. Userspace can either fix up the access if it's +actually an I/O access by decoding the instruction from guest memory (if it's +very brave) and continue executing the guest, or it can decide to suspend, +dump, or restart the guest. + +Note that KVM does not skip the faulting instruction as it does for +KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state +if it decides to decode and emulate the instruction. + /* Fix the size of the union. */ char padding[256]; }; -- cgit From da345174ceca052469e4775e4ae263b5f27a9355 Mon Sep 17 00:00:00 2001 From: Christoffer Dall Date: Fri, 11 Oct 2019 13:07:06 +0200 Subject: KVM: arm/arm64: Allow user injection of external data aborts In some scenarios, such as buggy guest or incorrect configuration of the VMM and firmware description data, userspace will detect a memory access to a portion of the IPA, which is not mapped to any MMIO region. For this purpose, the appropriate action is to inject an external abort to the guest. The kernel already has functionality to inject an external abort, but we need to wire up a signal from user space that lets user space tell the kernel to do this. It turns out, we already have the set event functionality which we can perfectly reuse for this. Signed-off-by: Christoffer Dall Signed-off-by: Marc Zyngier --- Documentation/virt/kvm/api.txt | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt index 7403f15657c2..bd29d44af32b 100644 --- a/Documentation/virt/kvm/api.txt +++ b/Documentation/virt/kvm/api.txt @@ -1002,12 +1002,18 @@ Specifying exception.has_esr on a system that does not support it will return -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr will return -EINVAL. +It is not possible to read back a pending external abort (injected via +KVM_SET_VCPU_EVENTS or otherwise) because such an exception is always delivered +directly to the virtual CPU). + + struct kvm_vcpu_events { struct { __u8 serror_pending; __u8 serror_has_esr; + __u8 ext_dabt_pending; /* Align it to 8 bytes */ - __u8 pad[6]; + __u8 pad[5]; __u64 serror_esr; } exception; __u32 reserved[12]; @@ -1051,9 +1057,23 @@ contain a valid state and shall be written into the VCPU. ARM/ARM64: +User space may need to inject several types of events to the guest. + Set the pending SError exception state for this VCPU. It is not possible to 'cancel' an Serror that has been made pending. +If the guest performed an access to I/O memory which could not be handled by +userspace, for example because of missing instruction syndrome decode +information or because there is no device mapped at the accessed IPA, then +userspace can ask the kernel to inject an external abort using the address +from the exiting fault on the VCPU. It is a programming error to set +ext_dabt_pending after an exit which was not either KVM_EXIT_MMIO or +KVM_EXIT_ARM_NISV. This feature is only available if the system supports +KVM_CAP_ARM_INJECT_EXT_DABT. This is a helper which provides commonality in +how userspace reports accesses for the above cases to guests, across different +userspace implementations. Nevertheless, userspace can still emulate all Arm +exceptions by manipulating individual registers using the KVM_SET_ONE_REG API. + See KVM_GET_VCPU_EVENTS for the data structure. -- cgit From 6a7458485b390f48e481fcd4a0b20e6c5c843d2e Mon Sep 17 00:00:00 2001 From: Steven Price Date: Mon, 21 Oct 2019 16:28:14 +0100 Subject: KVM: arm64: Document PV-time interface Introduce a paravirtualization interface for KVM/arm64 based on the "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A. This only adds the details about "Stolen Time" as the details of "Live Physical Time" have not been fully agreed. User space can specify a reserved area of memory for the guest and inform KVM to populate the memory with information on time that the host kernel has stolen from the guest. A hypercall interface is provided for the guest to interrogate the hypervisor's support for this interface and the location of the shared memory structures. Signed-off-by: Steven Price Signed-off-by: Marc Zyngier --- Documentation/virt/kvm/arm/pvtime.rst | 80 +++++++++++++++++++++++++++++++++ Documentation/virt/kvm/devices/vcpu.txt | 14 ++++++ 2 files changed, 94 insertions(+) create mode 100644 Documentation/virt/kvm/arm/pvtime.rst (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/arm/pvtime.rst b/Documentation/virt/kvm/arm/pvtime.rst new file mode 100644 index 000000000000..2357dd2d8655 --- /dev/null +++ b/Documentation/virt/kvm/arm/pvtime.rst @@ -0,0 +1,80 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Paravirtualized time support for arm64 +====================================== + +Arm specification DEN0057/A defines a standard for paravirtualised time +support for AArch64 guests: + +https://developer.arm.com/docs/den0057/a + +KVM/arm64 implements the stolen time part of this specification by providing +some hypervisor service calls to support a paravirtualized guest obtaining a +view of the amount of time stolen from its execution. + +Two new SMCCC compatible hypercalls are defined: + +* PV_TIME_FEATURES: 0xC5000020 +* PV_TIME_ST: 0xC5000021 + +These are only available in the SMC64/HVC64 calling convention as +paravirtualized time is not available to 32 bit Arm guests. The existence of +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES +mechanism before calling it. + +PV_TIME_FEATURES + ============= ======== ========== + Function ID: (uint32) 0xC5000020 + PV_call_id: (uint32) The function to query for support. + Currently only PV_TIME_ST is supported. + Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant + PV-time feature is supported by the hypervisor. + ============= ======== ========== + +PV_TIME_ST + ============= ======== ========== + Function ID: (uint32) 0xC5000021 + Return value: (int64) IPA of the stolen time data structure for this + VCPU. On failure: + NOT_SUPPORTED (-1) + ============= ======== ========== + +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory +with inner and outer write back caching attributes, in the inner shareable +domain. A total of 16 bytes from the IPA returned are guaranteed to be +meaningfully filled by the hypervisor (see structure below). + +PV_TIME_ST returns the structure for the calling VCPU. + +Stolen Time +----------- + +The structure pointed to by the PV_TIME_ST hypercall is as follows: + ++-------------+-------------+-------------+----------------------------+ +| Field | Byte Length | Byte Offset | Description | ++=============+=============+=============+============================+ +| Revision | 4 | 0 | Must be 0 for version 1.0 | ++-------------+-------------+-------------+----------------------------+ +| Attributes | 4 | 4 | Must be 0 | ++-------------+-------------+-------------+----------------------------+ +| Stolen time | 8 | 8 | Stolen time in unsigned | +| | | | nanoseconds indicating how | +| | | | much time this VCPU thread | +| | | | was involuntarily not | +| | | | running on a physical CPU. | ++-------------+-------------+-------------+----------------------------+ + +All values in the structure are stored little-endian. + +The structure will be updated by the hypervisor prior to scheduling a VCPU. It +will be present within a reserved region of the normal memory given to the +guest. The guest should not attempt to write into this memory. There is a +structure per VCPU of the guest. + +It is advisable that one or more 64k pages are set aside for the purpose of +these structures and not used for other purposes, this enables the guest to map +the region using 64k pages and avoids conflicting attributes with other memory. + +For the user space interface see Documentation/virt/kvm/devices/vcpu.txt +section "3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL". diff --git a/Documentation/virt/kvm/devices/vcpu.txt b/Documentation/virt/kvm/devices/vcpu.txt index 2b5dab16c4f2..6f3bd64a05b0 100644 --- a/Documentation/virt/kvm/devices/vcpu.txt +++ b/Documentation/virt/kvm/devices/vcpu.txt @@ -60,3 +60,17 @@ time to use the number provided for a given timer, overwriting any previously configured values on other VCPUs. Userspace should configure the interrupt numbers on at least one VCPU after creating all VCPUs and before running any VCPUs. + +3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL +Architectures: ARM64 + +3.1 ATTRIBUTE: KVM_ARM_VCPU_PVTIME_IPA +Parameters: 64-bit base address +Returns: -ENXIO: Stolen time not implemented + -EEXIST: Base address already set for this VCPU + -EINVAL: Base address not 64 byte aligned + +Specifies the base address of the stolen time structure for this VCPU. The +base address must be 64 byte aligned and exist within a valid guest memory +region. See Documentation/virt/kvm/arm/pvtime.txt for more information +including the layout of the stolen time structure. -- cgit