summaryrefslogtreecommitdiff
path: root/Documentation/virt/kvm/x86
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/virt/kvm/x86')
-rw-r--r--Documentation/virt/kvm/x86/amd-memory-encryption.rst169
-rw-r--r--Documentation/virt/kvm/x86/errata.rst30
-rw-r--r--Documentation/virt/kvm/x86/index.rst1
-rw-r--r--Documentation/virt/kvm/x86/intel-tdx.rst255
4 files changed, 451 insertions, 4 deletions
diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 84335d119ff1..1ddb6a86ce7f 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -76,15 +76,56 @@ are defined in ``<linux/psp-dev.h>``.
KVM implements the following commands to support common lifecycle events of SEV
guests, such as launching, running, snapshotting, migrating and decommissioning.
-1. KVM_SEV_INIT
----------------
+1. KVM_SEV_INIT2
+----------------
-The KVM_SEV_INIT command is used by the hypervisor to initialize the SEV platform
+The KVM_SEV_INIT2 command is used by the hypervisor to initialize the SEV platform
context. In a typical workflow, this command should be the first command issued.
+For this command to be accepted, either KVM_X86_SEV_VM or KVM_X86_SEV_ES_VM
+must have been passed to the KVM_CREATE_VM ioctl. A virtual machine created
+with those machine types in turn cannot be run until KVM_SEV_INIT2 is invoked.
+
+Parameters: struct kvm_sev_init (in)
Returns: 0 on success, -negative on error
+::
+
+ struct kvm_sev_init {
+ __u64 vmsa_features; /* initial value of features field in VMSA */
+ __u32 flags; /* must be 0 */
+ __u16 ghcb_version; /* maximum guest GHCB version allowed */
+ __u16 pad1;
+ __u32 pad2[8];
+ };
+
+It is an error if the hypervisor does not support any of the bits that
+are set in ``flags`` or ``vmsa_features``. ``vmsa_features`` must be
+0 for SEV virtual machines, as they do not have a VMSA.
+
+``ghcb_version`` must be 0 for SEV virtual machines, as they do not issue GHCB
+requests. If ``ghcb_version`` is 0 for any other guest type, then the maximum
+allowed guest GHCB protocol will default to version 2.
+
+This command replaces the deprecated KVM_SEV_INIT and KVM_SEV_ES_INIT commands.
+The commands did not have any parameters (the ```data``` field was unused) and
+only work for the KVM_X86_DEFAULT_VM machine type (0).
+
+They behave as if:
+
+* the VM type is KVM_X86_SEV_VM for KVM_SEV_INIT, or KVM_X86_SEV_ES_VM for
+ KVM_SEV_ES_INIT
+
+* the ``flags`` and ``vmsa_features`` fields of ``struct kvm_sev_init`` are
+ set to zero, and ``ghcb_version`` is set to 0 for KVM_SEV_INIT and 1 for
+ KVM_SEV_ES_INIT.
+
+If the ``KVM_X86_SEV_VMSA_FEATURES`` attribute does not exist, the hypervisor only
+supports KVM_SEV_INIT and KVM_SEV_ES_INIT. In that case, note that KVM_SEV_ES_INIT
+might set the debug swap VMSA feature (bit 5) depending on the value of the
+``debug_swap`` parameter of ``kvm-amd.ko``.
+
2. KVM_SEV_LAUNCH_START
-----------------------
@@ -425,6 +466,124 @@ issued by the hypervisor to make the guest ready for execution.
Returns: 0 on success, -negative on error
+18. KVM_SEV_SNP_LAUNCH_START
+----------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. It must be called prior to issuing
+KVM_SEV_SNP_LAUNCH_UPDATE or KVM_SEV_SNP_LAUNCH_FINISH;
+
+Parameters (in): struct kvm_sev_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_start {
+ __u64 policy; /* Guest policy to use. */
+ __u8 gosvw[16]; /* Guest OS visible workarounds. */
+ __u16 flags; /* Must be zero. */
+ __u8 pad0[6];
+ __u64 pad1[4];
+ };
+
+See SNP_LAUNCH_START in the SEV-SNP specification [snp-fw-abi]_ for further
+details on the input parameters in ``struct kvm_sev_snp_launch_start``.
+
+19. KVM_SEV_SNP_LAUNCH_UPDATE
+-----------------------------
+
+The KVM_SEV_SNP_LAUNCH_UPDATE command is used for loading userspace-provided
+data into a guest GPA range, measuring the contents into the SNP guest context
+created by KVM_SEV_SNP_LAUNCH_START, and then encrypting/validating that GPA
+range so that it will be immediately readable using the encryption key
+associated with the guest context once it is booted, after which point it can
+attest the measurement associated with its context before unlocking any
+secrets.
+
+It is required that the GPA ranges initialized by this command have had the
+KVM_MEMORY_ATTRIBUTE_PRIVATE attribute set in advance. See the documentation
+for KVM_SET_MEMORY_ATTRIBUTES for more details on this aspect.
+
+Upon success, this command is not guaranteed to have processed the entire
+range requested. Instead, the ``gfn_start``, ``uaddr``, and ``len`` fields of
+``struct kvm_sev_snp_launch_update`` will be updated to correspond to the
+remaining range that has yet to be processed. The caller should continue
+calling this command until those fields indicate the entire range has been
+processed, e.g. ``len`` is 0, ``gfn_start`` is equal to the last GFN in the
+range plus 1, and ``uaddr`` is the last byte of the userspace-provided source
+buffer address plus 1. In the case where ``type`` is KVM_SEV_SNP_PAGE_TYPE_ZERO,
+``uaddr`` will be ignored completely.
+
+Parameters (in): struct kvm_sev_snp_launch_update
+
+Returns: 0 on success, < 0 on error, -EAGAIN if caller should retry
+
+::
+
+ struct kvm_sev_snp_launch_update {
+ __u64 gfn_start; /* Guest page number to load/encrypt data into. */
+ __u64 uaddr; /* Userspace address of data to be loaded/encrypted. */
+ __u64 len; /* 4k-aligned length in bytes to copy into guest memory.*/
+ __u8 type; /* The type of the guest pages being initialized. */
+ __u8 pad0;
+ __u16 flags; /* Must be zero. */
+ __u32 pad1;
+ __u64 pad2[4];
+
+ };
+
+where the allowed values for page_type are #define'd as::
+
+ KVM_SEV_SNP_PAGE_TYPE_NORMAL
+ KVM_SEV_SNP_PAGE_TYPE_ZERO
+ KVM_SEV_SNP_PAGE_TYPE_UNMEASURED
+ KVM_SEV_SNP_PAGE_TYPE_SECRETS
+ KVM_SEV_SNP_PAGE_TYPE_CPUID
+
+See the SEV-SNP spec [snp-fw-abi]_ for further details on how each page type is
+used/measured.
+
+20. KVM_SEV_SNP_LAUNCH_FINISH
+-----------------------------
+
+After completion of the SNP guest launch flow, the KVM_SEV_SNP_LAUNCH_FINISH
+command can be issued to make the guest ready for execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 vcek_disabled;
+ __u8 host_data[32];
+ __u8 pad0[3];
+ __u16 flags; /* Must be zero */
+ __u64 pad1[4];
+ };
+
+
+See SNP_LAUNCH_FINISH in the SEV-SNP specification [snp-fw-abi]_ for further
+details on the input parameters in ``struct kvm_sev_snp_launch_finish``.
+
+Device attribute API
+====================
+
+Attributes of the SEV implementation can be retrieved through the
+``KVM_HAS_DEVICE_ATTR`` and ``KVM_GET_DEVICE_ATTR`` ioctls on the ``/dev/kvm``
+device node, using group ``KVM_X86_GRP_SEV``.
+
+Currently only one attribute is implemented:
+
+* ``KVM_X86_SEV_VMSA_FEATURES``: return the set of all bits that
+ are accepted in the ``vmsa_features`` of ``KVM_SEV_INIT2``.
+
Firmware Management
===================
@@ -444,9 +603,11 @@ References
==========
-See [white-paper]_, [api-spec]_, [amd-apm]_ and [kvm-forum]_ for more info.
+See [white-paper]_, [api-spec]_, [amd-apm]_, [kvm-forum]_, and [snp-fw-abi]_
+for more info.
.. [white-paper] https://developer.amd.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
.. [api-spec] https://support.amd.com/TechDocs/55766_SEV-KM_API_Specification.pdf
.. [amd-apm] https://support.amd.com/TechDocs/24593.pdf (section 15.34)
.. [kvm-forum] https://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
+.. [snp-fw-abi] https://www.amd.com/system/files/TechDocs/56860.pdf
diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst
index 49a05f24747b..37c79362a48f 100644
--- a/Documentation/virt/kvm/x86/errata.rst
+++ b/Documentation/virt/kvm/x86/errata.rst
@@ -33,6 +33,18 @@ Note however that any software (e.g ``WIN87EM.DLL``) expecting these features
to be present likely predates these CPUID feature bits, and therefore
doesn't know to check for them anyway.
+``KVM_SET_VCPU_EVENTS`` issue
+-----------------------------
+
+Invalid KVM_SET_VCPU_EVENTS input with respect to error codes *may* result in
+failed VM-Entry on Intel CPUs. Pre-CET Intel CPUs require that exception
+injection through the VMCS correctly set the "error code valid" flag, e.g.
+require the flag be set when injecting a #GP, clear when injecting a #UD,
+clear when injecting a soft exception, etc. Intel CPUs that enumerate
+IA32_VMX_BASIC[56] as '1' relax VMX's consistency checks, and AMD CPUs have no
+restrictions whatsoever. KVM_SET_VCPU_EVENTS doesn't sanity check the vector
+versus "has_error_code", i.e. KVM's ABI follows AMD behavior.
+
Nested virtualization features
------------------------------
@@ -48,3 +60,21 @@ have the same physical APIC ID, KVM will deliver events targeting that APIC ID
only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is
not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
matching the target APIC ID receive the interrupt).
+
+MTRRs
+-----
+KVM does not virtualize guest MTRR memory types. KVM emulates accesses to MTRR
+MSRs, i.e. {RD,WR}MSR in the guest will behave as expected, but KVM does not
+honor guest MTRRs when determining the effective memory type, and instead
+treats all of guest memory as having Writeback (WB) MTRRs.
+
+CR0.CD
+------
+KVM does not virtualize CR0.CD on Intel CPUs. Similar to MTRR MSRs, KVM
+emulates CR0.CD accesses so that loads and stores from/to CR0 behave as
+expected, but setting CR0.CD=1 has no impact on the cachaeability of guest
+memory.
+
+Note, this erratum does not affect AMD CPUs, which fully virtualize CR0.CD in
+hardware, i.e. put the CPU caches into "no fill" mode when CR0.CD=1, even when
+running in the guest. \ No newline at end of file
diff --git a/Documentation/virt/kvm/x86/index.rst b/Documentation/virt/kvm/x86/index.rst
index 9ece6b8dc817..851e99174762 100644
--- a/Documentation/virt/kvm/x86/index.rst
+++ b/Documentation/virt/kvm/x86/index.rst
@@ -11,6 +11,7 @@ KVM for x86 systems
cpuid
errata
hypercalls
+ intel-tdx
mmu
msr
nested-vmx
diff --git a/Documentation/virt/kvm/x86/intel-tdx.rst b/Documentation/virt/kvm/x86/intel-tdx.rst
new file mode 100644
index 000000000000..76bdd95334d6
--- /dev/null
+++ b/Documentation/virt/kvm/x86/intel-tdx.rst
@@ -0,0 +1,255 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+Intel Trust Domain Extensions (TDX)
+===================================
+
+Overview
+========
+Intel's Trust Domain Extensions (TDX) protect confidential guest VMs from the
+host and physical attacks. A CPU-attested software module called 'the TDX
+module' runs inside a new CPU isolated range to provide the functionalities to
+manage and run protected VMs, a.k.a, TDX guests or TDs.
+
+Please refer to [1] for the whitepaper, specifications and other resources.
+
+This documentation describes TDX-specific KVM ABIs. The TDX module needs to be
+initialized before it can be used by KVM to run any TDX guests. The host
+core-kernel provides the support of initializing the TDX module, which is
+described in the Documentation/arch/x86/tdx.rst.
+
+API description
+===============
+
+KVM_MEMORY_ENCRYPT_OP
+---------------------
+:Type: vm ioctl, vcpu ioctl
+
+For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic
+ioctl with TDX specific sub-ioctl() commands.
+
+::
+
+ /* Trust Domain Extensions sub-ioctl() commands. */
+ enum kvm_tdx_cmd_id {
+ KVM_TDX_CAPABILITIES = 0,
+ KVM_TDX_INIT_VM,
+ KVM_TDX_INIT_VCPU,
+ KVM_TDX_INIT_MEM_REGION,
+ KVM_TDX_FINALIZE_VM,
+ KVM_TDX_GET_CPUID,
+
+ KVM_TDX_CMD_NR_MAX,
+ };
+
+ struct kvm_tdx_cmd {
+ /* enum kvm_tdx_cmd_id */
+ __u32 id;
+ /* flags for sub-command. If sub-command doesn't use this, set zero. */
+ __u32 flags;
+ /*
+ * data for each sub-command. An immediate or a pointer to the actual
+ * data in process virtual address. If sub-command doesn't use it,
+ * set zero.
+ */
+ __u64 data;
+ /*
+ * Auxiliary error code. The sub-command may return TDX SEAMCALL
+ * status code in addition to -Exxx.
+ */
+ __u64 hw_error;
+ };
+
+KVM_TDX_CAPABILITIES
+--------------------
+:Type: vm ioctl
+:Returns: 0 on success, <0 on error
+
+Return the TDX capabilities that current KVM supports with the specific TDX
+module loaded in the system. It reports what features/capabilities are allowed
+to be configured to the TDX guest.
+
+- id: KVM_TDX_CAPABILITIES
+- flags: must be 0
+- data: pointer to struct kvm_tdx_capabilities
+- hw_error: must be 0
+
+::
+
+ struct kvm_tdx_capabilities {
+ __u64 supported_attrs;
+ __u64 supported_xfam;
+ __u64 reserved[254];
+
+ /* Configurable CPUID bits for userspace */
+ struct kvm_cpuid2 cpuid;
+ };
+
+
+KVM_TDX_INIT_VM
+---------------
+:Type: vm ioctl
+:Returns: 0 on success, <0 on error
+
+Perform TDX specific VM initialization. This needs to be called after
+KVM_CREATE_VM and before creating any VCPUs.
+
+- id: KVM_TDX_INIT_VM
+- flags: must be 0
+- data: pointer to struct kvm_tdx_init_vm
+- hw_error: must be 0
+
+::
+
+ struct kvm_tdx_init_vm {
+ __u64 attributes;
+ __u64 xfam;
+ __u64 mrconfigid[6]; /* sha384 digest */
+ __u64 mrowner[6]; /* sha384 digest */
+ __u64 mrownerconfig[6]; /* sha384 digest */
+
+ /* The total space for TD_PARAMS before the CPUIDs is 256 bytes */
+ __u64 reserved[12];
+
+ /*
+ * Call KVM_TDX_INIT_VM before vcpu creation, thus before
+ * KVM_SET_CPUID2.
+ * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the
+ * TDX module directly virtualizes those CPUIDs without VMM. The user
+ * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with
+ * those values. If it doesn't, KVM may have wrong idea of vCPUIDs of
+ * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX
+ * module doesn't virtualize.
+ */
+ struct kvm_cpuid2 cpuid;
+ };
+
+
+KVM_TDX_INIT_VCPU
+-----------------
+:Type: vcpu ioctl
+:Returns: 0 on success, <0 on error
+
+Perform TDX specific VCPU initialization.
+
+- id: KVM_TDX_INIT_VCPU
+- flags: must be 0
+- data: initial value of the guest TD VCPU RCX
+- hw_error: must be 0
+
+KVM_TDX_INIT_MEM_REGION
+-----------------------
+:Type: vcpu ioctl
+:Returns: 0 on success, <0 on error
+
+Initialize @nr_pages TDX guest private memory starting from @gpa with userspace
+provided data from @source_addr.
+
+Note, before calling this sub command, memory attribute of the range
+[gpa, gpa + nr_pages] needs to be private. Userspace can use
+KVM_SET_MEMORY_ATTRIBUTES to set the attribute.
+
+If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends measurement.
+
+- id: KVM_TDX_INIT_MEM_REGION
+- flags: currently only KVM_TDX_MEASURE_MEMORY_REGION is defined
+- data: pointer to struct kvm_tdx_init_mem_region
+- hw_error: must be 0
+
+::
+
+ #define KVM_TDX_MEASURE_MEMORY_REGION (1UL << 0)
+
+ struct kvm_tdx_init_mem_region {
+ __u64 source_addr;
+ __u64 gpa;
+ __u64 nr_pages;
+ };
+
+
+KVM_TDX_FINALIZE_VM
+-------------------
+:Type: vm ioctl
+:Returns: 0 on success, <0 on error
+
+Complete measurement of the initial TD contents and mark it ready to run.
+
+- id: KVM_TDX_FINALIZE_VM
+- flags: must be 0
+- data: must be 0
+- hw_error: must be 0
+
+
+KVM_TDX_GET_CPUID
+-----------------
+:Type: vcpu ioctl
+:Returns: 0 on success, <0 on error
+
+Get the CPUID values that the TDX module virtualizes for the TD guest.
+When it returns -E2BIG, the user space should allocate a larger buffer and
+retry. The minimum buffer size is updated in the nent field of the
+struct kvm_cpuid2.
+
+- id: KVM_TDX_GET_CPUID
+- flags: must be 0
+- data: pointer to struct kvm_cpuid2 (in/out)
+- hw_error: must be 0 (out)
+
+::
+
+ struct kvm_cpuid2 {
+ __u32 nent;
+ __u32 padding;
+ struct kvm_cpuid_entry2 entries[0];
+ };
+
+ struct kvm_cpuid_entry2 {
+ __u32 function;
+ __u32 index;
+ __u32 flags;
+ __u32 eax;
+ __u32 ebx;
+ __u32 ecx;
+ __u32 edx;
+ __u32 padding[3];
+ };
+
+KVM TDX creation flow
+=====================
+In addition to the standard KVM flow, new TDX ioctls need to be called. The
+control flow is as follows:
+
+#. Check system wide capability
+
+ * KVM_CAP_VM_TYPES: Check if VM type is supported and if KVM_X86_TDX_VM
+ is supported.
+
+#. Create VM
+
+ * KVM_CREATE_VM
+ * KVM_TDX_CAPABILITIES: Query TDX capabilities for creating TDX guests.
+ * KVM_CHECK_EXTENSION(KVM_CAP_MAX_VCPUS): Query maximum VCPUs the TD can
+ support at VM level (TDX has its own limitation on this).
+ * KVM_SET_TSC_KHZ: Configure TD's TSC frequency if a different TSC frequency
+ than host is desired. This is Optional.
+ * KVM_TDX_INIT_VM: Pass TDX specific VM parameters.
+
+#. Create VCPU
+
+ * KVM_CREATE_VCPU
+ * KVM_TDX_INIT_VCPU: Pass TDX specific VCPU parameters.
+ * KVM_SET_CPUID2: Configure TD's CPUIDs.
+ * KVM_SET_MSRS: Configure TD's MSRs.
+
+#. Initialize initial guest memory
+
+ * Prepare content of initial guest memory.
+ * KVM_TDX_INIT_MEM_REGION: Add initial guest memory.
+ * KVM_TDX_FINALIZE_VM: Finalize the measurement of the TDX guest.
+
+#. Run VCPU
+
+References
+==========
+
+https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/documentation.html