diff options
Diffstat (limited to 'Documentation/virt/kvm')
-rw-r--r-- | Documentation/virt/kvm/api.rst | 113 | ||||
-rw-r--r-- | Documentation/virt/kvm/devices/arm-vgic-v3.rst | 77 | ||||
-rw-r--r-- | Documentation/virt/kvm/review-checklist.rst | 95 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/intel-tdx.rst | 15 |
4 files changed, 283 insertions, 17 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 1bd2d42e6424..6aa40ee05a4a 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -2006,7 +2006,14 @@ frequency is KHz. If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also be used as a vm ioctl to set the initial tsc frequency of subsequently -created vCPUs. +created vCPUs. Note, the vm ioctl is only allowed prior to creating vCPUs. + +For TSC protected Confidential Computing (CoCo) VMs where TSC frequency +is configured once at VM scope and remains unchanged during VM's +lifetime, the vm ioctl should be used to configure the TSC frequency +and the vcpu ioctl is not supported. + +Example of such CoCo VMs: TDX guests. 4.56 KVM_GET_TSC_KHZ -------------------- @@ -6645,7 +6652,8 @@ to the byte array. .. note:: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, - KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding + KVM_EXIT_EPR, KVM_EXIT_HYPERCALL, KVM_EXIT_TDX, + KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. @@ -7176,6 +7184,69 @@ The valid value for 'flags' is: :: + /* KVM_EXIT_TDX */ + struct { + __u64 flags; + __u64 nr; + union { + struct { + u64 ret; + u64 data[5]; + } unknown; + struct { + u64 ret; + u64 gpa; + u64 size; + } get_quote; + struct { + u64 ret; + u64 leaf; + u64 r11, r12, r13, r14; + } get_tdvmcall_info; + struct { + u64 ret; + u64 vector; + } setup_event_notify; + }; + } tdx; + +Process a TDVMCALL from the guest. KVM forwards select TDVMCALL based +on the Guest-Hypervisor Communication Interface (GHCI) specification; +KVM bridges these requests to the userspace VMM with minimal changes, +placing the inputs in the union and copying them back to the guest +on re-entry. + +Flags are currently always zero, whereas ``nr`` contains the TDVMCALL +number from register R11. The remaining field of the union provide the +inputs and outputs of the TDVMCALL. Currently the following values of +``nr`` are defined: + + * ``TDVMCALL_GET_QUOTE``: the guest has requested to generate a TD-Quote + signed by a service hosting TD-Quoting Enclave operating on the host. + Parameters and return value are in the ``get_quote`` field of the union. + The ``gpa`` field and ``size`` specify the guest physical address + (without the shared bit set) and the size of a shared-memory buffer, in + which the TDX guest passes a TD Report. The ``ret`` field represents + the return value of the GetQuote request. When the request has been + queued successfully, the TDX guest can poll the status field in the + shared-memory area to check whether the Quote generation is completed or + not. When completed, the generated Quote is returned via the same buffer. + + * ``TDVMCALL_GET_TD_VM_CALL_INFO``: the guest has requested the support + status of TDVMCALLs. The output values for the given leaf should be + placed in fields from ``r11`` to ``r14`` of the ``get_tdvmcall_info`` + field of the union. + + * ``TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT``: the guest has requested to + set up a notification interrupt for vector ``vector``. + +KVM may add support for more values in the future that may cause a userspace +exit, even without calls to ``KVM_ENABLE_CAP`` or similar. In this case, +it will enter with output fields already valid; in the common case, the +``unknown.ret`` field of the union will be ``TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED``. +Userspace need not do anything if it does not wish to support a TDVMCALL. +:: + /* Fix the size of the union. */ char padding[256]; }; @@ -7780,6 +7851,7 @@ Valid bits in args[0] are:: #define KVM_X86_DISABLE_EXITS_HLT (1 << 1) #define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2) #define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3) + #define KVM_X86_DISABLE_EXITS_APERFMPERF (1 << 4) Enabling this capability on a VM provides userspace with a way to no longer intercept some instructions for improved latency in some @@ -7790,6 +7862,28 @@ all such vmexits. Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits. +Virtualizing the ``IA32_APERF`` and ``IA32_MPERF`` MSRs requires more +than just disabling APERF/MPERF exits. While both Intel and AMD +document strict usage conditions for these MSRs--emphasizing that only +the ratio of their deltas over a time interval (T0 to T1) is +architecturally defined--simply passing through the MSRs can still +produce an incorrect ratio. + +This erroneous ratio can occur if, between T0 and T1: + +1. The vCPU thread migrates between logical processors. +2. Live migration or suspend/resume operations take place. +3. Another task shares the vCPU's logical processor. +4. C-states lower than C0 are emulated (e.g., via HLT interception). +5. The guest TSC frequency doesn't match the host TSC frequency. + +Due to these complexities, KVM does not automatically associate this +passthrough capability with the guest CPUID bit, +``CPUID.6:ECX.APERFMPERF[bit 0]``. Userspace VMMs that deem this +mechanism adequate for virtualizing the ``IA32_APERF`` and +``IA32_MPERF`` MSRs must set the guest CPUID bit explicitly. + + 7.14 KVM_CAP_S390_HPAGE_1M -------------------------- @@ -8316,7 +8410,7 @@ core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest. 7.36 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL ---------------------------------------------------------- -:Architectures: x86, arm64 +:Architectures: x86, arm64, riscv :Type: vm :Parameters: args[0] - size of the dirty log ring @@ -8528,7 +8622,7 @@ ENOSYS for the others. When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request. -7.37 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS +7.42 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS ------------------------------------- :Architectures: arm64 @@ -8557,6 +8651,17 @@ given VM. When this capability is enabled, KVM resets the VCPU when setting MP_STATE_INIT_RECEIVED through IOCTL. The original MP_STATE is preserved. +7.43 KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED +------------------------------------------- + +:Architectures: arm64 +:Target: VM +:Parameters: None + +This capability indicate to the userspace whether a PFNMAP memory region +can be safely mapped as cacheable. This relies on the presence of +force write back (FWB) feature support on the hardware. + 8. Other capabilities. ====================== diff --git a/Documentation/virt/kvm/devices/arm-vgic-v3.rst b/Documentation/virt/kvm/devices/arm-vgic-v3.rst index e860498b1e35..ff02102f7141 100644 --- a/Documentation/virt/kvm/devices/arm-vgic-v3.rst +++ b/Documentation/virt/kvm/devices/arm-vgic-v3.rst @@ -78,6 +78,8 @@ Groups: -ENXIO The group or attribute is unknown/unsupported for this device or hardware support is missing. -EFAULT Invalid user pointer for attr->addr. + -EBUSY Attempt to write a register that is read-only after + initialization ======= ============================================================= @@ -120,6 +122,12 @@ Groups: Note that distributor fields are not banked, but return the same value regardless of the mpidr used to access the register. + Userspace is allowed to write the following register fields prior to + initialization of the VGIC: + + * GICD_IIDR.Revision + * GICD_TYPER2.nASSGIcap + GICD_IIDR.Revision is updated when the KVM implementation is changed in a way directly observable by the guest or userspace. Userspace should read GICD_IIDR from KVM and write back the read value to confirm its expected @@ -128,6 +136,12 @@ Groups: behavior. + GICD_TYPER2.nASSGIcap allows userspace to control the support of SGIs + without an active state. At VGIC creation the field resets to the + maximum capability of the system. Userspace is expected to read the field + to determine the supported value(s) before writing to the field. + + The GICD_STATUSR and GICR_STATUSR registers are architecturally defined such that a write of a clear bit has no effect, whereas a write with a set bit clears that value. To allow userspace to freely set the values of these two @@ -202,16 +216,69 @@ Groups: KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS accesses the CPU interface registers for the CPU specified by the mpidr field. - CPU interface registers access is not implemented for AArch32 mode. - Error -ENXIO is returned when accessed in AArch32 mode. + The available registers are: + + =============== ==================================================== + ICC_PMR_EL1 + ICC_BPR0_EL1 + ICC_AP0R0_EL1 + ICC_AP0R1_EL1 when the host implements at least 6 bits of priority + ICC_AP0R2_EL1 when the host implements 7 bits of priority + ICC_AP0R3_EL1 when the host implements 7 bits of priority + ICC_AP1R0_EL1 + ICC_AP1R1_EL1 when the host implements at least 6 bits of priority + ICC_AP1R2_EL1 when the host implements 7 bits of priority + ICC_AP1R3_EL1 when the host implements 7 bits of priority + ICC_BPR1_EL1 + ICC_CTLR_EL1 + ICC_SRE_EL1 + ICC_IGRPEN0_EL1 + ICC_IGRPEN1_EL1 + =============== ==================================================== + + When EL2 is available for the guest, these registers are also available: + + ============= ==================================================== + ICH_AP0R0_EL2 + ICH_AP0R1_EL2 when the host implements at least 6 bits of priority + ICH_AP0R2_EL2 when the host implements 7 bits of priority + ICH_AP0R3_EL2 when the host implements 7 bits of priority + ICH_AP1R0_EL2 + ICH_AP1R1_EL2 when the host implements at least 6 bits of priority + ICH_AP1R2_EL2 when the host implements 7 bits of priority + ICH_AP1R3_EL2 when the host implements 7 bits of priority + ICH_HCR_EL2 + ICC_SRE_EL2 + ICH_VTR_EL2 + ICH_VMCR_EL2 + ICH_LR0_EL2 + ICH_LR1_EL2 + ICH_LR2_EL2 + ICH_LR3_EL2 + ICH_LR4_EL2 + ICH_LR5_EL2 + ICH_LR6_EL2 + ICH_LR7_EL2 + ICH_LR8_EL2 + ICH_LR9_EL2 + ICH_LR10_EL2 + ICH_LR11_EL2 + ICH_LR12_EL2 + ICH_LR13_EL2 + ICH_LR14_EL2 + ICH_LR15_EL2 + ============= ==================================================== + + CPU interface registers are only described using the AArch64 + encoding. Errors: - ======= ===================================================== - -ENXIO Getting or setting this register is not yet supported + ======= ================================================= + -ENXIO Getting or setting this register is not supported -EBUSY VCPU is running -EINVAL Invalid mpidr or register value supplied - ======= ===================================================== + ======= ================================================= KVM_DEV_ARM_VGIC_GRP_NR_IRQS diff --git a/Documentation/virt/kvm/review-checklist.rst b/Documentation/virt/kvm/review-checklist.rst index dc01aea4057b..debac54e14e7 100644 --- a/Documentation/virt/kvm/review-checklist.rst +++ b/Documentation/virt/kvm/review-checklist.rst @@ -7,7 +7,7 @@ Review checklist for kvm patches 1. The patch must follow Documentation/process/coding-style.rst and Documentation/process/submitting-patches.rst. -2. Patches should be against kvm.git master branch. +2. Patches should be against kvm.git master or next branches. 3. If the patch introduces or modifies a new userspace API: - the API must be documented in Documentation/virt/kvm/api.rst @@ -18,10 +18,10 @@ Review checklist for kvm patches 5. New features must default to off (userspace should explicitly request them). Performance improvements can and should default to on. -6. New cpu features should be exposed via KVM_GET_SUPPORTED_CPUID2 +6. New cpu features should be exposed via KVM_GET_SUPPORTED_CPUID2, + or its equivalent for non-x86 architectures -7. Emulator changes should be accompanied by unit tests for qemu-kvm.git - kvm/test directory. +7. The feature should be testable (see below). 8. Changes should be vendor neutral when possible. Changes to common code are better than duplicating changes to vendor code. @@ -36,6 +36,87 @@ Review checklist for kvm patches 11. New guest visible features must either be documented in a hardware manual or be accompanied by documentation. -12. Features must be robust against reset and kexec - for example, shared - host/guest memory must be unshared to prevent the host from writing to - guest memory that the guest has not reserved for this purpose. +Testing of KVM code +------------------- + +All features contributed to KVM, and in many cases bugfixes too, should be +accompanied by some kind of tests and/or enablement in open source guests +and VMMs. KVM is covered by multiple test suites: + +*Selftests* + These are low level tests that allow granular testing of kernel APIs. + This includes API failure scenarios, invoking APIs after specific + guest instructions, and testing multiple calls to ``KVM_CREATE_VM`` + within a single test. They are included in the kernel tree at + ``tools/testing/selftests/kvm``. + +``kvm-unit-tests`` + A collection of small guests that test CPU and emulated device features + from a guest's perspective. They run under QEMU or ``kvmtool``, and + are generally not KVM-specific: they can be run with any accelerator + that QEMU support or even on bare metal, making it possible to compare + behavior across hypervisors and processor families. + +Functional test suites + Various sets of functional tests exist, such as QEMU's ``tests/functional`` + suite and `avocado-vt <https://avocado-vt.readthedocs.io/en/latest/>`__. + These typically involve running a full operating system in a virtual + machine. + +The best testing approach depends on the feature's complexity and +operation. Here are some examples and guidelines: + +New instructions (no new registers or APIs) + The corresponding CPU features (if applicable) should be made available + in QEMU. If the instructions require emulation support or other code in + KVM, it is worth adding coverage to ``kvm-unit-tests`` or selftests; + the latter can be a better choice if the instructions relate to an API + that already has good selftest coverage. + +New hardware features (new registers, no new APIs) + These should be tested via ``kvm-unit-tests``; this more or less implies + supporting them in QEMU and/or ``kvmtool``. In some cases selftests + can be used instead, similar to the previous case, or specifically to + test corner cases in guest state save/restore. + +Bug fixes and performance improvements + These usually do not introduce new APIs, but it's worth sharing + any benchmarks and tests that will validate your contribution, + ideally in the form of regression tests. Tests and benchmarks + can be included in either ``kvm-unit-tests`` or selftests, depending + on the specifics of your change. Selftests are especially useful for + regression tests because they are included directly in Linux's tree. + +Large scale internal changes + While it's difficult to provide a single policy, you should ensure that + the changed code is covered by either ``kvm-unit-tests`` or selftests. + In some cases the affected code is run for any guests and functional + tests suffice. Explain your testing process in the cover letter, + as that can help identify gaps in existing test suites. + +New APIs + It is important to demonstrate your use case. This can be as simple as + explaining that the feature is already in use on bare metal, or it can be + a proof-of-concept implementation in userspace. The latter need not be + open source, though that is of course preferrable for easier testing. + Selftests should test corner cases of the APIs, and should also cover + basic host and guest operation if no open source VMM uses the feature. + +Bigger features, usually spanning host and guest + These should be supported by Linux guests, with limited exceptions for + Hyper-V features that are testable on Windows guests. It is strongly + suggested that the feature be usable with an open source host VMM, such + as at least one of QEMU or crosvm, and guest firmware. Selftests should + test at least API error cases. Guest operation can be covered by + either selftests of ``kvm-unit-tests`` (this is especially important for + paravirtualized and Windows-only features). Strong selftest coverage + can also be a replacement for implementation in an open source VMM, + but this is generally not recommended. + +Following the above suggestions for testing in selftests and +``kvm-unit-tests`` will make it easier for the maintainers to review +and accept your code. In fact, even before you contribute your changes +upstream it will make it easier for you to develop for KVM. + +Of course, the KVM maintainers reserve the right to require more tests, +though they may also waive the requirement from time to time. diff --git a/Documentation/virt/kvm/x86/intel-tdx.rst b/Documentation/virt/kvm/x86/intel-tdx.rst index 76bdd95334d6..5efac62c92c7 100644 --- a/Documentation/virt/kvm/x86/intel-tdx.rst +++ b/Documentation/virt/kvm/x86/intel-tdx.rst @@ -79,7 +79,20 @@ to be configured to the TDX guest. struct kvm_tdx_capabilities { __u64 supported_attrs; __u64 supported_xfam; - __u64 reserved[254]; + + /* TDG.VP.VMCALL hypercalls executed in kernel and forwarded to + * userspace, respectively + */ + __u64 kernel_tdvmcallinfo_1_r11; + __u64 user_tdvmcallinfo_1_r11; + + /* TDG.VP.VMCALL instruction executions subfunctions executed in kernel + * and forwarded to userspace, respectively + */ + __u64 kernel_tdvmcallinfo_1_r12; + __u64 user_tdvmcallinfo_1_r12; + + __u64 reserved[250]; /* Configurable CPUID bits for userspace */ struct kvm_cpuid2 cpuid; |