summaryrefslogtreecommitdiff
path: root/Documentation/virt/kvm
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/virt/kvm')
-rw-r--r--Documentation/virt/kvm/api.rst144
-rw-r--r--Documentation/virt/kvm/devices/vcpu.rst24
-rw-r--r--Documentation/virt/kvm/review-checklist.rst95
-rw-r--r--Documentation/virt/kvm/x86/index.rst1
-rw-r--r--Documentation/virt/kvm/x86/intel-tdx.rst268
5 files changed, 519 insertions, 13 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 47c7c3f92314..544fb11351d9 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1411,6 +1411,9 @@ the memory region are automatically reflected into the guest. For example, an
mmap() that affects the region will be made visible immediately. Another
example is madvise(MADV_DROP).
+For TDX guest, deleting/moving memory region loses guest memory contents.
+Read only region isn't supported. Only as-id 0 is supported.
+
Note: On arm64, a write generated by the page-table walker (to update
the Access and Dirty flags, for example) never results in a
KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
@@ -2005,6 +2008,13 @@ If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also
be used as a vm ioctl to set the initial tsc frequency of subsequently
created vCPUs.
+For TSC protected Confidential Computing (CoCo) VMs where TSC frequency
+is configured once at VM scope and remains unchanged during VM's
+lifetime, the vm ioctl should be used to configure the TSC frequency
+and the vcpu ioctl is not supported.
+
+Example of such CoCo VMs: TDX guests.
+
4.56 KVM_GET_TSC_KHZ
--------------------
@@ -3460,7 +3470,8 @@ The initial values are defined as:
- FPSIMD/NEON registers: set to 0
- SVE registers: set to 0
- System registers: Reset to their architecturally defined
- values as for a warm reset to EL1 (resp. SVC)
+ values as for a warm reset to EL1 (resp. SVC) or EL2 (in the
+ case of EL2 being enabled).
Note that because some registers reflect machine topology, all vcpus
should be created before this ioctl is invoked.
@@ -3527,6 +3538,17 @@ Possible features:
- the KVM_REG_ARM64_SVE_VLS pseudo-register is immutable, and can
no longer be written using KVM_SET_ONE_REG.
+ - KVM_ARM_VCPU_HAS_EL2: Enable Nested Virtualisation support,
+ booting the guest from EL2 instead of EL1.
+ Depends on KVM_CAP_ARM_EL2.
+ The VM is running with HCR_EL2.E2H being RES1 (VHE) unless
+ KVM_ARM_VCPU_HAS_EL2_E2H0 is also set.
+
+ - KVM_ARM_VCPU_HAS_EL2_E2H0: Restrict Nested Virtualisation
+ support to HCR_EL2.E2H being RES0 (non-VHE).
+ Depends on KVM_CAP_ARM_EL2_E2H0.
+ KVM_ARM_VCPU_HAS_EL2 must also be set.
+
4.83 KVM_ARM_PREFERRED_TARGET
-----------------------------
@@ -4768,7 +4790,7 @@ H_GET_CPU_CHARACTERISTICS hypercall.
:Capability: basic
:Architectures: x86
-:Type: vm
+:Type: vm ioctl, vcpu ioctl
:Parameters: an opaque platform specific structure (in/out)
:Returns: 0 on success; -1 on error
@@ -4776,9 +4798,11 @@ If the platform supports creating encrypted VMs then this ioctl can be used
for issuing platform-specific memory encryption commands to manage those
encrypted VMs.
-Currently, this ioctl is used for issuing Secure Encrypted Virtualization
-(SEV) commands on AMD Processors. The SEV commands are defined in
-Documentation/virt/kvm/x86/amd-memory-encryption.rst.
+Currently, this ioctl is used for issuing both Secure Encrypted Virtualization
+(SEV) commands on AMD Processors and Trusted Domain Extensions (TDX) commands
+on Intel Processors. The detailed commands are defined in
+Documentation/virt/kvm/x86/amd-memory-encryption.rst and
+Documentation/virt/kvm/x86/intel-tdx.rst.
4.111 KVM_MEMORY_ENCRYPT_REG_REGION
-----------------------------------
@@ -6628,7 +6652,8 @@ to the byte array.
.. note::
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
- KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
+ KVM_EXIT_EPR, KVM_EXIT_HYPERCALL, KVM_EXIT_TDX,
+ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN. The kernel side will first finish
incomplete operations and then check for pending signals.
@@ -6827,6 +6852,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
#define KVM_SYSTEM_EVENT_WAKEUP 4
#define KVM_SYSTEM_EVENT_SUSPEND 5
#define KVM_SYSTEM_EVENT_SEV_TERM 6
+ #define KVM_SYSTEM_EVENT_TDX_FATAL 7
__u32 type;
__u32 ndata;
__u64 data[16];
@@ -6853,6 +6879,11 @@ Valid values for 'type' are:
reset/shutdown of the VM.
- KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination.
The guest physical address of the guest's GHCB is stored in `data[0]`.
+ - KVM_SYSTEM_EVENT_TDX_FATAL -- a TDX guest reported a fatal error state.
+ KVM doesn't do any parsing or conversion, it just dumps 16 general-purpose
+ registers to userspace, in ascending order of the 4-bit indices for x86-64
+ general-purpose registers in instruction encoding, as defined in the Intel
+ SDM.
- KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and
KVM has recognized a wakeup event. Userspace may honor this event by
marking the exiting vCPU as runnable, or deny it and call KVM_RUN again.
@@ -7153,6 +7184,69 @@ The valid value for 'flags' is:
::
+ /* KVM_EXIT_TDX */
+ struct {
+ __u64 flags;
+ __u64 nr;
+ union {
+ struct {
+ u64 ret;
+ u64 data[5];
+ } unknown;
+ struct {
+ u64 ret;
+ u64 gpa;
+ u64 size;
+ } get_quote;
+ struct {
+ u64 ret;
+ u64 leaf;
+ u64 r11, r12, r13, r14;
+ } get_tdvmcall_info;
+ struct {
+ u64 ret;
+ u64 vector;
+ } setup_event_notify;
+ };
+ } tdx;
+
+Process a TDVMCALL from the guest. KVM forwards select TDVMCALL based
+on the Guest-Hypervisor Communication Interface (GHCI) specification;
+KVM bridges these requests to the userspace VMM with minimal changes,
+placing the inputs in the union and copying them back to the guest
+on re-entry.
+
+Flags are currently always zero, whereas ``nr`` contains the TDVMCALL
+number from register R11. The remaining field of the union provide the
+inputs and outputs of the TDVMCALL. Currently the following values of
+``nr`` are defined:
+
+ * ``TDVMCALL_GET_QUOTE``: the guest has requested to generate a TD-Quote
+ signed by a service hosting TD-Quoting Enclave operating on the host.
+ Parameters and return value are in the ``get_quote`` field of the union.
+ The ``gpa`` field and ``size`` specify the guest physical address
+ (without the shared bit set) and the size of a shared-memory buffer, in
+ which the TDX guest passes a TD Report. The ``ret`` field represents
+ the return value of the GetQuote request. When the request has been
+ queued successfully, the TDX guest can poll the status field in the
+ shared-memory area to check whether the Quote generation is completed or
+ not. When completed, the generated Quote is returned via the same buffer.
+
+ * ``TDVMCALL_GET_TD_VM_CALL_INFO``: the guest has requested the support
+ status of TDVMCALLs. The output values for the given leaf should be
+ placed in fields from ``r11`` to ``r14`` of the ``get_tdvmcall_info``
+ field of the union.
+
+ * ``TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT``: the guest has requested to
+ set up a notification interrupt for vector ``vector``.
+
+KVM may add support for more values in the future that may cause a userspace
+exit, even without calls to ``KVM_ENABLE_CAP`` or similar. In this case,
+it will enter with output fields already valid; in the common case, the
+``unknown.ret`` field of the union will be ``TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED``.
+Userspace need not do anything if it does not wish to support a TDVMCALL.
+::
+
/* Fix the size of the union. */
char padding[256];
};
@@ -7978,6 +8072,11 @@ apply some other policy-based mitigation. When exiting to userspace, KVM sets
KVM_RUN_X86_BUS_LOCK in vcpu-run->flags, and conditionally sets the exit_reason
to KVM_EXIT_X86_BUS_LOCK.
+Due to differences in the underlying hardware implementation, the vCPU's RIP at
+the time of exit diverges between Intel and AMD. On Intel hosts, RIP points at
+the next instruction, i.e. the exit is trap-like. On AMD hosts, RIP points at
+the offending instruction, i.e. the exit is fault-like.
+
Note! Detected bus locks may be coincident with other exits to userspace, i.e.
KVM_RUN_X86_BUS_LOCK should be checked regardless of the primary exit reason if
userspace wants to take action on all detected bus locks.
@@ -8194,6 +8293,28 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the
and 0x489), as KVM does now allow them to
be set by userspace (KVM sets them based on
guest CPUID, for safety purposes).
+
+KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores
+ guest PAT and forces the effective memory
+ type to WB in EPT. The quirk is not available
+ on Intel platforms which are incapable of
+ safely honoring guest PAT (i.e., without CPU
+ self-snoop, KVM always ignores guest PAT and
+ forces effective memory type to WB). It is
+ also ignored on AMD platforms or, on Intel,
+ when a VM has non-coherent DMA devices
+ assigned; KVM always honors guest PAT in
+ such case. The quirk is needed to avoid
+ slowdowns on certain Intel Xeon platforms
+ (e.g. ICX, SPR) where self-snoop feature is
+ supported but UC is slow enough to cause
+ issues with some older guests that use
+ UC instead of WC to map the video RAM.
+ Userspace can disable the quirk to honor
+ guest PAT if it knows that there is no such
+ guest software, for example if it does not
+ expose a bochs graphics device (which is
+ known to have had a buggy driver).
=================================== ============================================
7.32 KVM_CAP_MAX_VCPU_ID
@@ -8496,6 +8617,17 @@ aforementioned registers before the first KVM_RUN. These registers are VM
scoped, meaning that the same set of values are presented on all vCPUs in a
given VM.
+7.43 KVM_CAP_RISCV_MP_STATE_RESET
+---------------------------------
+
+:Architectures: riscv
+:Type: VM
+:Parameters: None
+:Returns: 0 on success, -EINVAL if arg[0] is not zero
+
+When this capability is enabled, KVM resets the VCPU when setting
+MP_STATE_INIT_RECEIVED through IOCTL. The original MP_STATE is preserved.
+
8. Other capabilities.
======================
diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 31a9576c07af..60bf205cb373 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -137,6 +137,30 @@ exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct by setting
hardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and
the cpu field to the processor id.
+1.5 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS
+--------------------------------------------------
+
+:Parameters: in kvm_device_attr.addr the address to an unsigned int
+ representing the maximum value taken by PMCR_EL0.N
+
+:Returns:
+
+ ======= ====================================================
+ -EBUSY PMUv3 already initialized, a VCPU has already run or
+ an event filter has already been set
+ -EFAULT Error accessing the value pointed to by addr
+ -ENODEV PMUv3 not supported or GIC not initialized
+ -EINVAL No PMUv3 explicitly selected, or value of N out of
+ range
+ ======= ====================================================
+
+Set the number of implemented event counters in the virtual PMU. This
+mandates that a PMU has explicitly been selected via
+KVM_ARM_VCPU_PMU_V3_SET_PMU, and will fail when no PMU has been
+explicitly selected, or the number of counters is out of range for the
+selected PMU. Selecting a new PMU cancels the effect of setting this
+attribute.
+
2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
=================================
diff --git a/Documentation/virt/kvm/review-checklist.rst b/Documentation/virt/kvm/review-checklist.rst
index dc01aea4057b..debac54e14e7 100644
--- a/Documentation/virt/kvm/review-checklist.rst
+++ b/Documentation/virt/kvm/review-checklist.rst
@@ -7,7 +7,7 @@ Review checklist for kvm patches
1. The patch must follow Documentation/process/coding-style.rst and
Documentation/process/submitting-patches.rst.
-2. Patches should be against kvm.git master branch.
+2. Patches should be against kvm.git master or next branches.
3. If the patch introduces or modifies a new userspace API:
- the API must be documented in Documentation/virt/kvm/api.rst
@@ -18,10 +18,10 @@ Review checklist for kvm patches
5. New features must default to off (userspace should explicitly request them).
Performance improvements can and should default to on.
-6. New cpu features should be exposed via KVM_GET_SUPPORTED_CPUID2
+6. New cpu features should be exposed via KVM_GET_SUPPORTED_CPUID2,
+ or its equivalent for non-x86 architectures
-7. Emulator changes should be accompanied by unit tests for qemu-kvm.git
- kvm/test directory.
+7. The feature should be testable (see below).
8. Changes should be vendor neutral when possible. Changes to common code
are better than duplicating changes to vendor code.
@@ -36,6 +36,87 @@ Review checklist for kvm patches
11. New guest visible features must either be documented in a hardware manual
or be accompanied by documentation.
-12. Features must be robust against reset and kexec - for example, shared
- host/guest memory must be unshared to prevent the host from writing to
- guest memory that the guest has not reserved for this purpose.
+Testing of KVM code
+-------------------
+
+All features contributed to KVM, and in many cases bugfixes too, should be
+accompanied by some kind of tests and/or enablement in open source guests
+and VMMs. KVM is covered by multiple test suites:
+
+*Selftests*
+ These are low level tests that allow granular testing of kernel APIs.
+ This includes API failure scenarios, invoking APIs after specific
+ guest instructions, and testing multiple calls to ``KVM_CREATE_VM``
+ within a single test. They are included in the kernel tree at
+ ``tools/testing/selftests/kvm``.
+
+``kvm-unit-tests``
+ A collection of small guests that test CPU and emulated device features
+ from a guest's perspective. They run under QEMU or ``kvmtool``, and
+ are generally not KVM-specific: they can be run with any accelerator
+ that QEMU support or even on bare metal, making it possible to compare
+ behavior across hypervisors and processor families.
+
+Functional test suites
+ Various sets of functional tests exist, such as QEMU's ``tests/functional``
+ suite and `avocado-vt <https://avocado-vt.readthedocs.io/en/latest/>`__.
+ These typically involve running a full operating system in a virtual
+ machine.
+
+The best testing approach depends on the feature's complexity and
+operation. Here are some examples and guidelines:
+
+New instructions (no new registers or APIs)
+ The corresponding CPU features (if applicable) should be made available
+ in QEMU. If the instructions require emulation support or other code in
+ KVM, it is worth adding coverage to ``kvm-unit-tests`` or selftests;
+ the latter can be a better choice if the instructions relate to an API
+ that already has good selftest coverage.
+
+New hardware features (new registers, no new APIs)
+ These should be tested via ``kvm-unit-tests``; this more or less implies
+ supporting them in QEMU and/or ``kvmtool``. In some cases selftests
+ can be used instead, similar to the previous case, or specifically to
+ test corner cases in guest state save/restore.
+
+Bug fixes and performance improvements
+ These usually do not introduce new APIs, but it's worth sharing
+ any benchmarks and tests that will validate your contribution,
+ ideally in the form of regression tests. Tests and benchmarks
+ can be included in either ``kvm-unit-tests`` or selftests, depending
+ on the specifics of your change. Selftests are especially useful for
+ regression tests because they are included directly in Linux's tree.
+
+Large scale internal changes
+ While it's difficult to provide a single policy, you should ensure that
+ the changed code is covered by either ``kvm-unit-tests`` or selftests.
+ In some cases the affected code is run for any guests and functional
+ tests suffice. Explain your testing process in the cover letter,
+ as that can help identify gaps in existing test suites.
+
+New APIs
+ It is important to demonstrate your use case. This can be as simple as
+ explaining that the feature is already in use on bare metal, or it can be
+ a proof-of-concept implementation in userspace. The latter need not be
+ open source, though that is of course preferrable for easier testing.
+ Selftests should test corner cases of the APIs, and should also cover
+ basic host and guest operation if no open source VMM uses the feature.
+
+Bigger features, usually spanning host and guest
+ These should be supported by Linux guests, with limited exceptions for
+ Hyper-V features that are testable on Windows guests. It is strongly
+ suggested that the feature be usable with an open source host VMM, such
+ as at least one of QEMU or crosvm, and guest firmware. Selftests should
+ test at least API error cases. Guest operation can be covered by
+ either selftests of ``kvm-unit-tests`` (this is especially important for
+ paravirtualized and Windows-only features). Strong selftest coverage
+ can also be a replacement for implementation in an open source VMM,
+ but this is generally not recommended.
+
+Following the above suggestions for testing in selftests and
+``kvm-unit-tests`` will make it easier for the maintainers to review
+and accept your code. In fact, even before you contribute your changes
+upstream it will make it easier for you to develop for KVM.
+
+Of course, the KVM maintainers reserve the right to require more tests,
+though they may also waive the requirement from time to time.
diff --git a/Documentation/virt/kvm/x86/index.rst b/Documentation/virt/kvm/x86/index.rst
index 9ece6b8dc817..851e99174762 100644
--- a/Documentation/virt/kvm/x86/index.rst
+++ b/Documentation/virt/kvm/x86/index.rst
@@ -11,6 +11,7 @@ KVM for x86 systems
cpuid
errata
hypercalls
+ intel-tdx
mmu
msr
nested-vmx
diff --git a/Documentation/virt/kvm/x86/intel-tdx.rst b/Documentation/virt/kvm/x86/intel-tdx.rst
new file mode 100644
index 000000000000..5efac62c92c7
--- /dev/null
+++ b/Documentation/virt/kvm/x86/intel-tdx.rst
@@ -0,0 +1,268 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+Intel Trust Domain Extensions (TDX)
+===================================
+
+Overview
+========
+Intel's Trust Domain Extensions (TDX) protect confidential guest VMs from the
+host and physical attacks. A CPU-attested software module called 'the TDX
+module' runs inside a new CPU isolated range to provide the functionalities to
+manage and run protected VMs, a.k.a, TDX guests or TDs.
+
+Please refer to [1] for the whitepaper, specifications and other resources.
+
+This documentation describes TDX-specific KVM ABIs. The TDX module needs to be
+initialized before it can be used by KVM to run any TDX guests. The host
+core-kernel provides the support of initializing the TDX module, which is
+described in the Documentation/arch/x86/tdx.rst.
+
+API description
+===============
+
+KVM_MEMORY_ENCRYPT_OP
+---------------------
+:Type: vm ioctl, vcpu ioctl
+
+For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic
+ioctl with TDX specific sub-ioctl() commands.
+
+::
+
+ /* Trust Domain Extensions sub-ioctl() commands. */
+ enum kvm_tdx_cmd_id {
+ KVM_TDX_CAPABILITIES = 0,
+ KVM_TDX_INIT_VM,
+ KVM_TDX_INIT_VCPU,
+ KVM_TDX_INIT_MEM_REGION,
+ KVM_TDX_FINALIZE_VM,
+ KVM_TDX_GET_CPUID,
+
+ KVM_TDX_CMD_NR_MAX,
+ };
+
+ struct kvm_tdx_cmd {
+ /* enum kvm_tdx_cmd_id */
+ __u32 id;
+ /* flags for sub-command. If sub-command doesn't use this, set zero. */
+ __u32 flags;
+ /*
+ * data for each sub-command. An immediate or a pointer to the actual
+ * data in process virtual address. If sub-command doesn't use it,
+ * set zero.
+ */
+ __u64 data;
+ /*
+ * Auxiliary error code. The sub-command may return TDX SEAMCALL
+ * status code in addition to -Exxx.
+ */
+ __u64 hw_error;
+ };
+
+KVM_TDX_CAPABILITIES
+--------------------
+:Type: vm ioctl
+:Returns: 0 on success, <0 on error
+
+Return the TDX capabilities that current KVM supports with the specific TDX
+module loaded in the system. It reports what features/capabilities are allowed
+to be configured to the TDX guest.
+
+- id: KVM_TDX_CAPABILITIES
+- flags: must be 0
+- data: pointer to struct kvm_tdx_capabilities
+- hw_error: must be 0
+
+::
+
+ struct kvm_tdx_capabilities {
+ __u64 supported_attrs;
+ __u64 supported_xfam;
+
+ /* TDG.VP.VMCALL hypercalls executed in kernel and forwarded to
+ * userspace, respectively
+ */
+ __u64 kernel_tdvmcallinfo_1_r11;
+ __u64 user_tdvmcallinfo_1_r11;
+
+ /* TDG.VP.VMCALL instruction executions subfunctions executed in kernel
+ * and forwarded to userspace, respectively
+ */
+ __u64 kernel_tdvmcallinfo_1_r12;
+ __u64 user_tdvmcallinfo_1_r12;
+
+ __u64 reserved[250];
+
+ /* Configurable CPUID bits for userspace */
+ struct kvm_cpuid2 cpuid;
+ };
+
+
+KVM_TDX_INIT_VM
+---------------
+:Type: vm ioctl
+:Returns: 0 on success, <0 on error
+
+Perform TDX specific VM initialization. This needs to be called after
+KVM_CREATE_VM and before creating any VCPUs.
+
+- id: KVM_TDX_INIT_VM
+- flags: must be 0
+- data: pointer to struct kvm_tdx_init_vm
+- hw_error: must be 0
+
+::
+
+ struct kvm_tdx_init_vm {
+ __u64 attributes;
+ __u64 xfam;
+ __u64 mrconfigid[6]; /* sha384 digest */
+ __u64 mrowner[6]; /* sha384 digest */
+ __u64 mrownerconfig[6]; /* sha384 digest */
+
+ /* The total space for TD_PARAMS before the CPUIDs is 256 bytes */
+ __u64 reserved[12];
+
+ /*
+ * Call KVM_TDX_INIT_VM before vcpu creation, thus before
+ * KVM_SET_CPUID2.
+ * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the
+ * TDX module directly virtualizes those CPUIDs without VMM. The user
+ * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with
+ * those values. If it doesn't, KVM may have wrong idea of vCPUIDs of
+ * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX
+ * module doesn't virtualize.
+ */
+ struct kvm_cpuid2 cpuid;
+ };
+
+
+KVM_TDX_INIT_VCPU
+-----------------
+:Type: vcpu ioctl
+:Returns: 0 on success, <0 on error
+
+Perform TDX specific VCPU initialization.
+
+- id: KVM_TDX_INIT_VCPU
+- flags: must be 0
+- data: initial value of the guest TD VCPU RCX
+- hw_error: must be 0
+
+KVM_TDX_INIT_MEM_REGION
+-----------------------
+:Type: vcpu ioctl
+:Returns: 0 on success, <0 on error
+
+Initialize @nr_pages TDX guest private memory starting from @gpa with userspace
+provided data from @source_addr.
+
+Note, before calling this sub command, memory attribute of the range
+[gpa, gpa + nr_pages] needs to be private. Userspace can use
+KVM_SET_MEMORY_ATTRIBUTES to set the attribute.
+
+If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends measurement.
+
+- id: KVM_TDX_INIT_MEM_REGION
+- flags: currently only KVM_TDX_MEASURE_MEMORY_REGION is defined
+- data: pointer to struct kvm_tdx_init_mem_region
+- hw_error: must be 0
+
+::
+
+ #define KVM_TDX_MEASURE_MEMORY_REGION (1UL << 0)
+
+ struct kvm_tdx_init_mem_region {
+ __u64 source_addr;
+ __u64 gpa;
+ __u64 nr_pages;
+ };
+
+
+KVM_TDX_FINALIZE_VM
+-------------------
+:Type: vm ioctl
+:Returns: 0 on success, <0 on error
+
+Complete measurement of the initial TD contents and mark it ready to run.
+
+- id: KVM_TDX_FINALIZE_VM
+- flags: must be 0
+- data: must be 0
+- hw_error: must be 0
+
+
+KVM_TDX_GET_CPUID
+-----------------
+:Type: vcpu ioctl
+:Returns: 0 on success, <0 on error
+
+Get the CPUID values that the TDX module virtualizes for the TD guest.
+When it returns -E2BIG, the user space should allocate a larger buffer and
+retry. The minimum buffer size is updated in the nent field of the
+struct kvm_cpuid2.
+
+- id: KVM_TDX_GET_CPUID
+- flags: must be 0
+- data: pointer to struct kvm_cpuid2 (in/out)
+- hw_error: must be 0 (out)
+
+::
+
+ struct kvm_cpuid2 {
+ __u32 nent;
+ __u32 padding;
+ struct kvm_cpuid_entry2 entries[0];
+ };
+
+ struct kvm_cpuid_entry2 {
+ __u32 function;
+ __u32 index;
+ __u32 flags;
+ __u32 eax;
+ __u32 ebx;
+ __u32 ecx;
+ __u32 edx;
+ __u32 padding[3];
+ };
+
+KVM TDX creation flow
+=====================
+In addition to the standard KVM flow, new TDX ioctls need to be called. The
+control flow is as follows:
+
+#. Check system wide capability
+
+ * KVM_CAP_VM_TYPES: Check if VM type is supported and if KVM_X86_TDX_VM
+ is supported.
+
+#. Create VM
+
+ * KVM_CREATE_VM
+ * KVM_TDX_CAPABILITIES: Query TDX capabilities for creating TDX guests.
+ * KVM_CHECK_EXTENSION(KVM_CAP_MAX_VCPUS): Query maximum VCPUs the TD can
+ support at VM level (TDX has its own limitation on this).
+ * KVM_SET_TSC_KHZ: Configure TD's TSC frequency if a different TSC frequency
+ than host is desired. This is Optional.
+ * KVM_TDX_INIT_VM: Pass TDX specific VM parameters.
+
+#. Create VCPU
+
+ * KVM_CREATE_VCPU
+ * KVM_TDX_INIT_VCPU: Pass TDX specific VCPU parameters.
+ * KVM_SET_CPUID2: Configure TD's CPUIDs.
+ * KVM_SET_MSRS: Configure TD's MSRs.
+
+#. Initialize initial guest memory
+
+ * Prepare content of initial guest memory.
+ * KVM_TDX_INIT_MEM_REGION: Add initial guest memory.
+ * KVM_TDX_FINALIZE_VM: Finalize the measurement of the TDX guest.
+
+#. Run VCPU
+
+References
+==========
+
+https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/documentation.html