3 files changed, 206 insertions, 17 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 43ed57e048a8..6aa40ee05a4a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2006,7 +2006,14 @@ frequency is KHz.
 
 If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also
 be used as a vm ioctl to set the initial tsc frequency of subsequently
-created vCPUs.
+created vCPUs.  Note, the vm ioctl is only allowed prior to creating vCPUs.
+
+For TSC protected Confidential Computing (CoCo) VMs where TSC frequency
+is configured once at VM scope and remains unchanged during VM's
+lifetime, the vm ioctl should be used to configure the TSC frequency
+and the vcpu ioctl is not supported.
+
+Example of such CoCo VMs: TDX guests.
 
 4.56 KVM_GET_TSC_KHZ
 --------------------
@@ -7230,8 +7237,8 @@ inputs and outputs of the TDVMCALL.  Currently the following values of
    placed in fields from ``r11`` to ``r14`` of the ``get_tdvmcall_info``
    field of the union.
 
-* ``TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT``: the guest has requested to
-set up a notification interrupt for vector ``vector``.
+ * ``TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT``: the guest has requested to
+   set up a notification interrupt for vector ``vector``.
 
 KVM may add support for more values in the future that may cause a userspace
 exit, even without calls to ``KVM_ENABLE_CAP`` or similar.  In this case,
@@ -7844,6 +7851,7 @@ Valid bits in args[0] are::
   #define KVM_X86_DISABLE_EXITS_HLT              (1 << 1)
   #define KVM_X86_DISABLE_EXITS_PAUSE            (1 << 2)
   #define KVM_X86_DISABLE_EXITS_CSTATE           (1 << 3)
+  #define KVM_X86_DISABLE_EXITS_APERFMPERF       (1 << 4)
 
 Enabling this capability on a VM provides userspace with a way to no
 longer intercept some instructions for improved latency in some
@@ -7854,6 +7862,28 @@ all such vmexits.
 
 Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
 
+Virtualizing the ``IA32_APERF`` and ``IA32_MPERF`` MSRs requires more
+than just disabling APERF/MPERF exits. While both Intel and AMD
+document strict usage conditions for these MSRs--emphasizing that only
+the ratio of their deltas over a time interval (T0 to T1) is
+architecturally defined--simply passing through the MSRs can still
+produce an incorrect ratio.
+
+This erroneous ratio can occur if, between T0 and T1:
+
+1. The vCPU thread migrates between logical processors.
+2. Live migration or suspend/resume operations take place.
+3. Another task shares the vCPU's logical processor.
+4. C-states lower than C0 are emulated (e.g., via HLT interception).
+5. The guest TSC frequency doesn't match the host TSC frequency.
+
+Due to these complexities, KVM does not automatically associate this
+passthrough capability with the guest CPUID bit,
+``CPUID.6:ECX.APERFMPERF[bit 0]``. Userspace VMMs that deem this
+mechanism adequate for virtualizing the ``IA32_APERF`` and
+``IA32_MPERF`` MSRs must set the guest CPUID bit explicitly.
+
+
 7.14 KVM_CAP_S390_HPAGE_1M
 --------------------------
 
@@ -8380,7 +8410,7 @@ core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
 7.36 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 ----------------------------------------------------------
 
-:Architectures: x86, arm64
+:Architectures: x86, arm64, riscv
 :Type: vm
 :Parameters: args[0] - size of the dirty log ring
 
@@ -8592,7 +8622,7 @@ ENOSYS for the others.
 When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
 type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
 
-7.37 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
+7.42 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
 -------------------------------------
 
 :Architectures: arm64
@@ -8621,6 +8651,17 @@ given VM.
 When this capability is enabled, KVM resets the VCPU when setting
 MP_STATE_INIT_RECEIVED through IOCTL.  The original MP_STATE is preserved.
 
+7.43 KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED
+-------------------------------------------
+
+:Architectures: arm64
+:Target: VM
+:Parameters: None
+
+This capability indicate to the userspace whether a PFNMAP memory region
+can be safely mapped as cacheable. This relies on the presence of
+force write back (FWB) feature support on the hardware.
+
 8. Other capabilities.
 ======================
 
diff --git a/Documentation/virt/kvm/devices/arm-vgic-v3.rst b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
index e860498b1e35..ff02102f7141 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v3.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
@@ -78,6 +78,8 @@ Groups:
     -ENXIO   The group or attribute is unknown/unsupported for this device
              or hardware support is missing.
     -EFAULT  Invalid user pointer for attr->addr.
+    -EBUSY   Attempt to write a register that is read-only after
+             initialization
     =======  =============================================================
 
 
@@ -120,6 +122,12 @@ Groups:
     Note that distributor fields are not banked, but return the same value
     regardless of the mpidr used to access the register.
 
+    Userspace is allowed to write the following register fields prior to
+    initialization of the VGIC:
+
+      * GICD_IIDR.Revision
+      * GICD_TYPER2.nASSGIcap
+
     GICD_IIDR.Revision is updated when the KVM implementation is changed in a
     way directly observable by the guest or userspace.  Userspace should read
     GICD_IIDR from KVM and write back the read value to confirm its expected
@@ -128,6 +136,12 @@ Groups:
     behavior.
 
 
+    GICD_TYPER2.nASSGIcap allows userspace to control the support of SGIs
+    without an active state. At VGIC creation the field resets to the
+    maximum capability of the system. Userspace is expected to read the field
+    to determine the supported value(s) before writing to the field.
+
+
     The GICD_STATUSR and GICR_STATUSR registers are architecturally defined such
     that a write of a clear bit has no effect, whereas a write with a set bit
     clears that value.  To allow userspace to freely set the values of these two
@@ -202,16 +216,69 @@ Groups:
     KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS accesses the CPU interface registers for the
     CPU specified by the mpidr field.
 
-    CPU interface registers access is not implemented for AArch32 mode.
-    Error -ENXIO is returned when accessed in AArch32 mode.
+    The available registers are:
+
+    ===============  ====================================================
+    ICC_PMR_EL1
+    ICC_BPR0_EL1
+    ICC_AP0R0_EL1
+    ICC_AP0R1_EL1    when the host implements at least 6 bits of priority
+    ICC_AP0R2_EL1    when the host implements 7 bits of priority
+    ICC_AP0R3_EL1    when the host implements 7 bits of priority
+    ICC_AP1R0_EL1
+    ICC_AP1R1_EL1    when the host implements at least 6 bits of priority
+    ICC_AP1R2_EL1    when the host implements 7 bits of priority
+    ICC_AP1R3_EL1    when the host implements 7 bits of priority
+    ICC_BPR1_EL1
+    ICC_CTLR_EL1
+    ICC_SRE_EL1
+    ICC_IGRPEN0_EL1
+    ICC_IGRPEN1_EL1
+    ===============  ====================================================
+
+    When EL2 is available for the guest, these registers are also available:
+
+    =============  ====================================================
+    ICH_AP0R0_EL2
+    ICH_AP0R1_EL2  when the host implements at least 6 bits of priority
+    ICH_AP0R2_EL2  when the host implements 7 bits of priority
+    ICH_AP0R3_EL2  when the host implements 7 bits of priority
+    ICH_AP1R0_EL2
+    ICH_AP1R1_EL2  when the host implements at least 6 bits of priority
+    ICH_AP1R2_EL2  when the host implements 7 bits of priority
+    ICH_AP1R3_EL2  when the host implements 7 bits of priority
+    ICH_HCR_EL2
+    ICC_SRE_EL2
+    ICH_VTR_EL2
+    ICH_VMCR_EL2
+    ICH_LR0_EL2
+    ICH_LR1_EL2
+    ICH_LR2_EL2
+    ICH_LR3_EL2
+    ICH_LR4_EL2
+    ICH_LR5_EL2
+    ICH_LR6_EL2
+    ICH_LR7_EL2
+    ICH_LR8_EL2
+    ICH_LR9_EL2
+    ICH_LR10_EL2
+    ICH_LR11_EL2
+    ICH_LR12_EL2
+    ICH_LR13_EL2
+    ICH_LR14_EL2
+    ICH_LR15_EL2
+    =============  ====================================================
+
+    CPU interface registers are only described using the AArch64
+    encoding.
 
   Errors:
 
-    =======  =====================================================
-    -ENXIO   Getting or setting this register is not yet supported
+    =======  =================================================
+    -ENXIO   Getting or setting this register is not supported
     -EBUSY   VCPU is running
     -EINVAL  Invalid mpidr or register value supplied
-    =======  =====================================================
+    =======  =================================================
 
 
   KVM_DEV_ARM_VGIC_GRP_NR_IRQS
diff --git a/Documentation/virt/kvm/review-checklist.rst b/Documentation/virt/kvm/review-checklist.rst
index dc01aea4057b..debac54e14e7 100644
--- a/Documentation/virt/kvm/review-checklist.rst
+++ b/Documentation/virt/kvm/review-checklist.rst
@@ -7,7 +7,7 @@ Review checklist for kvm patches
 1.  The patch must follow Documentation/process/coding-style.rst and
     Documentation/process/submitting-patches.rst.
 
-2.  Patches should be against kvm.git master branch.
+2.  Patches should be against kvm.git master or next branches.
 
 3.  If the patch introduces or modifies a new userspace API:
     - the API must be documented in Documentation/virt/kvm/api.rst
@@ -18,10 +18,10 @@ Review checklist for kvm patches
 5.  New features must default to off (userspace should explicitly request them).
     Performance improvements can and should default to on.
 
-6.  New cpu features should be exposed via KVM_GET_SUPPORTED_CPUID2
+6.  New cpu features should be exposed via KVM_GET_SUPPORTED_CPUID2,
+    or its equivalent for non-x86 architectures
 
-7.  Emulator changes should be accompanied by unit tests for qemu-kvm.git
-    kvm/test directory.
+7.  The feature should be testable (see below).
 
 8.  Changes should be vendor neutral when possible.  Changes to common code
     are better than duplicating changes to vendor code.
@@ -36,6 +36,87 @@ Review checklist for kvm patches
 11. New guest visible features must either be documented in a hardware manual
     or be accompanied by documentation.
 
-12. Features must be robust against reset and kexec - for example, shared
-    host/guest memory must be unshared to prevent the host from writing to
-    guest memory that the guest has not reserved for this purpose.
+Testing of KVM code
+-------------------
+
+All features contributed to KVM, and in many cases bugfixes too, should be
+accompanied by some kind of tests and/or enablement in open source guests
+and VMMs.  KVM is covered by multiple test suites:
+
+*Selftests*
+  These are low level tests that allow granular testing of kernel APIs.
+  This includes API failure scenarios, invoking APIs after specific
+  guest instructions, and testing multiple calls to ``KVM_CREATE_VM``
+  within a single test.  They are included in the kernel tree at
+  ``tools/testing/selftests/kvm``.
+
+``kvm-unit-tests``
+  A collection of small guests that test CPU and emulated device features
+  from a guest's perspective.  They run under QEMU or ``kvmtool``, and
+  are generally not KVM-specific: they can be run with any accelerator
+  that QEMU support or even on bare metal, making it possible to compare
+  behavior across hypervisors and processor families.
+
+Functional test suites
+  Various sets of functional tests exist, such as QEMU's ``tests/functional``
+  suite and `avocado-vt <https://avocado-vt.readthedocs.io/en/latest/>`__.
+  These typically involve running a full operating system in a virtual
+  machine.
+
+The best testing approach depends on the feature's complexity and
+operation. Here are some examples and guidelines:
+
+New instructions (no new registers or APIs)
+  The corresponding CPU features (if applicable) should be made available
+  in QEMU.  If the instructions require emulation support or other code in
+  KVM, it is worth adding coverage to ``kvm-unit-tests`` or selftests;
+  the latter can be a better choice if the instructions relate to an API
+  that already has good selftest coverage.
+
+New hardware features (new registers, no new APIs)
+  These should be tested via ``kvm-unit-tests``; this more or less implies
+  supporting them in QEMU and/or ``kvmtool``.  In some cases selftests
+  can be used instead, similar to the previous case, or specifically to
+  test corner cases in guest state save/restore.
+
+Bug fixes and performance improvements
+  These usually do not introduce new APIs, but it's worth sharing
+  any benchmarks and tests that will validate your contribution,
+  ideally in the form of regression tests.  Tests and benchmarks
+  can be included in either ``kvm-unit-tests`` or selftests, depending
+  on the specifics of your change.  Selftests are especially useful for
+  regression tests because they are included directly in Linux's tree.
+
+Large scale internal changes
+  While it's difficult to provide a single policy, you should ensure that
+  the changed code is covered by either ``kvm-unit-tests`` or selftests.
+  In some cases the affected code is run for any guests and functional
+  tests suffice.  Explain your testing process in the cover letter,
+  as that can help identify gaps in existing test suites.
+
+New APIs
+  It is important to demonstrate your use case.  This can be as simple as
+  explaining that the feature is already in use on bare metal, or it can be
+  a proof-of-concept implementation in userspace.  The latter need not be
+  open source, though that is of course preferrable for easier testing.
+  Selftests should test corner cases of the APIs, and should also cover
+  basic host and guest operation if no open source VMM uses the feature.
+
+Bigger features, usually spanning host and guest
+  These should be supported by Linux guests, with limited exceptions for
+  Hyper-V features that are testable on Windows guests.  It is strongly
+  suggested that the feature be usable with an open source host VMM, such
+  as at least one of QEMU or crosvm, and guest firmware.  Selftests should
+  test at least API error cases.  Guest operation can be covered by
+  either selftests of ``kvm-unit-tests`` (this is especially important for
+  paravirtualized and Windows-only features).  Strong selftest coverage
+  can also be a replacement for implementation in an open source VMM,
+  but this is generally not recommended.
+
+Following the above suggestions for testing in selftests and
+``kvm-unit-tests`` will make it easier for the maintainers to review
+and accept your code.  In fact, even before you contribute your changes
+upstream it will make it easier for you to develop for KVM.
+
+Of course, the KVM maintainers reserve the right to require more tests,
+though they may also waive the requirement from time to time.