summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-07-12KVM: x86: Query vcpu->vcpu_idx directly and drop its accessor, againSean Christopherson
Read vcpu->vcpu_idx directly instead of bouncing through the one-line wrapper, kvm_vcpu_get_idx(), and drop the wrapper. The wrapper is a remnant of the original implementation and serves no purpose; remove it (again) before it gains more users. kvm_vcpu_get_idx() was removed in the not-too-distant past by commit 4eeef2424153 ("KVM: x86: Query vcpu->vcpu_idx directly and drop its accessor"), but was unintentionally re-introduced by commit a54d806688fe ("KVM: Keep memslots in tree-based structures instead of array-based ones"), likely due to a rebase goof. The wrapper then managed to gain users in KVM's Xen code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220614225615.3843835-1-seanjc@google.com
2022-07-12KVM: x86/mmu: Replace UNMAPPED_GVA with INVALID_GPA for gva_to_gpa()Hou Wenlong
The result of gva_to_gpa() is physical address not virtual address, it is odd that UNMAPPED_GVA macro is used as the result for physical address. Replace UNMAPPED_GVA with INVALID_GPA and drop UNMAPPED_GVA macro. No functional change intended. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/6104978956449467d3c68f1ad7f2c2f6d771d0ee.1656667239.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-12KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1Vitaly Kuznetsov
Windows 10/11 guests with Hyper-V role (WSL2) enabled are observed to hang upon boot or shortly after when a non-default TSC frequency was set for L1. The issue is observed on a host where TSC scaling is supported. The problem appears to be that Windows doesn't use TSC scaling for its guests, even when the feature is advertised, and KVM filters SECONDARY_EXEC_TSC_SCALING out when creating L2 controls from L1's VMCS. This leads to L2 running with the default frequency (matching host's) while L1 is running with an altered one. Keep SECONDARY_EXEC_TSC_SCALING in secondary exec controls for L2 when it was set for L1. TSC_MULTIPLIER is already correctly computed and written by prepare_vmcs02(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Fixes: d041b5ea93352b ("KVM: nVMX: Enable nested TSC scaling") Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220712135009.952805-1-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-11s390: Add attestation query informationSteffen Eiden
We have information about the supported attestation header version and plaintext attestation flag bits. Let's expose it via the sysfs files. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/lkml/20220601100245.3189993-1-seiden@linux.ibm.com/ Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-07-11KVM: s390: drop unexpected word 'and' in the commentsJiang Jian
there is an unexpected word 'and' in the comments that need to be dropped file: arch/s390/kvm/interrupt.c line: 705 * Subsystem damage are the only two and and are indicated by changed to: * Subsystem damage are the only two and are indicated by Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> Link: https://lore.kernel.org/lkml/20220622140720.7617-1-jiangjian@cdjrlc.com/ Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-07-11Merge tag 'kvm-s390-pci-5.20' into kernelorgnextChristian Borntraeger
KVM: s390/pci: enable zPCI for interpretive execution Add the necessary code in s390 base, pci and KVM to enable interpretion of PCI pasthru.
2022-07-11MAINTAINERS: additional files related kvm s390 pci passthroughMatthew Rosato
Add entries from the s390 kvm subdirectory related to pci passthrough. Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Link: https://lore.kernel.org/r/20220606203325.110625-22-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devicesMatthew Rosato
The KVM_S390_ZPCI_OP ioctl provides a mechanism for managing hardware-assisted virtualization features for s390x zPCI passthrough. Add the first 2 operations, which can be used to enable/disable the specified device for Adapter Event Notification interpretation. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Link: https://lore.kernel.org/r/20220606203325.110625-21-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11vfio-pci/zdev: different maxstbl for interpreted devicesMatthew Rosato
When doing load/store interpretation, the maximum store block length is determined by the underlying firmware, not the host kernel API. Reflect that in the associated Query PCI Function Group clp capability and let userspace decide which is appropriate to present to the guest. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220606203325.110625-20-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11vfio-pci/zdev: add function handle to clp base capabilityMatthew Rosato
The function handle is a system-wide unique identifier for a zPCI device. With zPCI instruction interpretation, the host will no longer be executing the zPCI instructions on behalf of the guest. As a result, the guest needs to use the real function handle in order for firmware to associate the instruction with the proper PCI function. Let's provide that handle to the guest. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220606203325.110625-19-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11vfio-pci/zdev: add open/close device hooksMatthew Rosato
During vfio-pci open_device, pass the KVM associated with the vfio group (if one exists). This is needed in order to pass a special indicator (GISA) to firmware to allow zPCI interpretation facilities to be used for only the specific KVM associated with the vfio-pci device. During vfio-pci close_device, unregister the notifier. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-18-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11KVM: s390: pci: add routines to start/stop interpretive executionMatthew Rosato
These routines will be invoked at the time an s390x vfio-pci device is associated with a KVM (or when the association is removed), allowing the zPCI device to enable or disable load/store intepretation mode; this requires the host zPCI device to inform firmware of the unique token (GISA designation) that is associated with the owning KVM. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Pierre Morel <pmorel@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-17-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11KVM: s390: pci: provide routines for enabling/disabling interrupt forwardingMatthew Rosato
These routines will be wired into a kvm ioctl in order to respond to requests to enable / disable a device for Adapter Event Notifications / Adapter Interuption Forwarding. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-16-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11KVM: s390: mechanism to enable guest zPCI InterpretationMatthew Rosato
The guest must have access to certain facilities in order to allow interpretive execution of zPCI instructions and adapter event notifications. However, there are some cases where a guest might disable interpretation -- provide a mechanism via which we can defer enabling the associated zPCI interpretation facilities until the guest indicates it wishes to use them. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-15-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11KVM: s390: pci: enable host forwarding of Adapter Event NotificationsMatthew Rosato
In cases where interrupts are not forwarded to the guest via firmware, KVM is responsible for ensuring delivery. When an interrupt presents with the forwarding bit, we must process the forwarding tables until all interrupts are delivered. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-14-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11KVM: s390: pci: do initial setup for AEN interpretationMatthew Rosato
Initial setup for Adapter Event Notification Interpretation for zPCI passthrough devices. Specifically, allocate a structure for forwarding of adapter events and pass the address of this structure to firmware. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-13-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11KVM: s390: pci: add basic kvm_zdev structureMatthew Rosato
This structure will be used to carry kvm passthrough information related to zPCI devices. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-12-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11vfio/pci: introduce CONFIG_VFIO_PCI_ZDEV_KVMMatthew Rosato
The current contents of vfio-pci-zdev are today only useful in a KVM environment; let's tie everything currently under vfio-pci-zdev to this Kconfig statement and require KVM in this case, reducing complexity (e.g. symbol lookups). Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-11-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11s390/pci: stash dtsm and maxstblMatthew Rosato
Store information about what IOAT designation types are supported by underlying hardware as well as the largest store block size allowed. These values will be needed by passthrough. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-10-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11s390/pci: stash associated GISA designationMatthew Rosato
For passthrough devices, we will need to know the GISA designation of the guest if interpretation facilities are to be used. Setup to stash this in the zdev and set a default of 0 (no GISA designation) for now; a subsequent patch will set a valid GISA designation for passthrough devices. Also, extend mpcific routines to specify this stashed designation as part of the mpcific command. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-9-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11s390/pci: externalize the SIC operation controls and routineMatthew Rosato
A subsequent patch will be issuing SIC from KVM -- export the necessary routine and make the operation control definitions available from a header. Because the routine will now be exported, let's rename __zpci_set_irq_ctrl to zpci_set_irq_ctrl and get rid of the zero'd iib wrapper function of the same name. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-8-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11s390/airq: allow for airq structure that uses an input vectorMatthew Rosato
When doing device passthrough where interrupts are being forwarded from host to guest, we wish to use a pinned section of guest memory as the vector (the same memory used by the guest as the vector). To accomplish this, add a new parameter for airq_iv_create which allows passing an existing vector to be used instead of allocating a new one. The caller is responsible for ensuring the vector is pinned in memory as well as for unpinning the memory when the vector is no longer needed. A subsequent patch will use this new parameter for zPCI interpretation. Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-7-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11s390/airq: pass more TPI info to airq handlersMatthew Rosato
A subsequent patch will introduce an airq handler that requires additional TPI information beyond directed vs floating, so pass the entire tpi_info structure via the handler. Only pci actually uses this information today, for the other airq handlers this is effectively a no-op. Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-6-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11s390/sclp: detect the AISI facilityMatthew Rosato
Detect the Adapter Interruption Suppression Interpretation facility. Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-5-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11s390/sclp: detect the AENI facilityMatthew Rosato
Detect the Adapter Event Notification Interpretation facility. Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-4-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11s390/sclp: detect the AISII facilityMatthew Rosato
Detect the Adapter Interruption Source ID Interpretation facility. Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-3-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11s390/sclp: detect the zPCI load/store interpretation facilityMatthew Rosato
Detect the zPCI Load/Store Interpretation facility. Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-2-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-08KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()Vitaly Kuznetsov
'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally not needed for APIC_DM_REMRD, they're still referenced by __apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize the structure to avoid consuming random stack memory. Fixes: a183b638b61c ("KVM: x86: make apic_accept_irq tracepoint more generic") Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220708125147.593975-1-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-08KVM: x86: Fix handling of APIC LVT updates when userspace changes MCG_CAPSean Christopherson
Add a helper to update KVM's in-kernel local APIC in response to MCG_CAP being changed by userspace to fix multiple bugs. First and foremost, KVM needs to check that there's an in-kernel APIC prior to dereferencing vcpu->arch.apic. Beyond that, any "new" LVT entries need to be masked, and the APIC version register needs to be updated as it reports out the number of LVT entries. Fixes: 4b903561ec49 ("KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.") Reported-by: syzbot+8cdad6430c24f396f158@syzkaller.appspotmail.com Cc: Siddh Raman Pant <code@siddh.me> Cc: Jue Wang <juew@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-08KVM: x86: Initialize number of APIC LVT entries during APIC creationSean Christopherson
Initialize the number of LVT entries during APIC creation, else the field will be incorrectly left '0' if userspace never invokes KVM_X86_SETUP_MCE. Add and use a helper to calculate the number of entries even though MCG_CMCI_P is not set by default in vcpu->arch.mcg_cap. Relying on that to always be true is unnecessarily risky, and subtle/confusing as well. Fixes: 4b903561ec49 ("KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.") Reported-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Jue Wang <juew@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-08Merge branch 'kvm-5.20-msr-eperm'Sean Christopherson
Merge a bug fix and cleanups for {g,s}et_msr_mce() using a base that predates commit 281b52780b57 ("KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs."), which was written with the intention that it be applied _after_ the bug fix and cleanups. The bug fix in particular needs to be sent to stable trees; give them a stable hash to use.
2022-07-08KVM: x86: Add helpers to identify CTL and STATUS MCi MSRsSean Christopherson
Add helpers to identify CTL (control) and STATUS MCi MSR types instead of open coding the checks using the offset. Using the offset is perfectly safe, but unintuitive, as understanding what the code does requires knowing that the offset calcuation will not affect the lower three bits. Opportunistically comment the STATUS logic to save readers a trip to Intel's SDM or AMD's APM to understand the "data != 0" check. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220512222716.4112548-4-seanjc@google.com
2022-07-08KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce()Sean Christopherson
Use an explicit case statement to grab the full range of MCx bank MSRs in {g,s}et_msr_mce(), and manually check only the "end" (the number of banks configured by userspace may be less than the max). The "default" trick works, but is a bit odd now, and will be quite odd if/when support for accessing MCx_CTL2 MSRs is added, which has near identical logic. Hoist "offset" to function scope so as to avoid curly braces for the case statement, and because MCx_CTL2 support will need the same variables. Opportunstically clean up the comment about allowing bit 10 to be cleared from bank 4. No functional change intended. Cc: Jue Wang <juew@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220512222716.4112548-3-seanjc@google.com
2022-07-08KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)Sean Christopherson
Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or MCi_STATUS MSR. The behavior of "all zeros' or "all ones" for CTL MSRs is architectural, as is the "only zeros" behavior for STATUS MSRs. I.e. the intent is to inject a #GP, not exit to userspace due to an unhandled emulation case. Returning '-1' gets interpreted as -EPERM up the stack and effecitvely kills the guest. Fixes: 890ca9aefa78 ("KVM: Add MCE support") Fixes: 9ffd986c6e4e ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.com
2022-06-25KVM: x86/mmu: Buffer nested MMU split_desc_cache only by default capacitySean Christopherson
Buffer split_desc_cache, the cache used to allcoate rmap list entries, only by the default cache capacity (currently 40), not by doubling the minimum (513). Aliasing L2 GPAs to L1 GPAs is uncommon, thus eager page splitting is unlikely to need 500+ entries. And because each object is a non-trivial 128 bytes (see struct pte_list_desc), those extra ~500 entries means KVM is in all likelihood wasting ~64kb of memory per VM. Link: https://lore.kernel.org/all/YrTDcrsn0%2F+alpzf@google.com Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220624171808.2845941-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-25KVM: x86/mmu: Use "unsigned int", not "u32", for SPTEs' @access infoSean Christopherson
Use an "unsigned int" for @access parameters instead of a "u32", mostly to be consistent throughout KVM, but also because "u32" is misleading. @access can actually squeeze into a u8, i.e. doesn't need 32 bits, but is as an "unsigned int" because sp->role.access is an unsigned int. No functional change intended. Link: https://lore.kernel.org/all/YqyZxEfxXLsHGoZ%2F@google.com Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220624171808.2845941-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SEV-ES: reuse advance_sev_es_emulated_ins for OUT tooPaolo Bonzini
complete_emulator_pio_in() only has to be called by complete_sev_es_emulated_ins() now; therefore, all that the function does now is adjust sev_pio_count and sev_pio_data. Which is the same for both IN and OUT. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: de-underscorify __emulator_pio_inPaolo Bonzini
Now all callers except emulator_pio_in_emulated are using __emulator_pio_in/complete_emulator_pio_in explicitly. Move the "either copy the result or attempt PIO" logic in emulator_pio_in_emulated, and rename __emulator_pio_in to just emulator_pio_in. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: wean fast IN from emulator_pio_inPaolo Bonzini
Use __emulator_pio_in() directly for fast PIO instead of bouncing through emulator_pio_in() now that __emulator_pio_in() fills "val" when handling in-kernel PIO. vcpu->arch.pio.count is guaranteed to be '0', so this a pure nop. emulator_pio_in_emulated is now the last caller of emulator_pio_in. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: wean in-kernel PIO from vcpu->arch.pio*Paolo Bonzini
Make emulator_pio_in_out operate directly on the provided buffer as long as PIO is handled inside KVM. For input operations, this means that, in the case of in-kernel PIO, __emulator_pio_in() does not have to be always followed by complete_emulator_pio_in(). This affects emulator_pio_in() and kvm_sev_es_ins(); for the latter, that is why the call moves from advance_sev_es_emulated_ins() to complete_sev_es_emulated_ins(). For output, it means that vcpu->pio.count is never set unnecessarily and there is no need to clear it; but also vcpu->pio.size must not be used in kvm_sev_es_outs(), because it will not be updated for in-kernel OUT. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: move all vcpu->arch.pio* setup in emulator_pio_in_out()Paolo Bonzini
For now, this is basically an excuse to add back the void* argument to the function, while removing some knowledge of vcpu->arch.pio* from its callers. The WARN that vcpu->arch.pio.count is zero is also extended to OUT operations. The vcpu->arch.pio* fields still need to be filled even when the PIO is handled in-kernel as __emulator_pio_in() is always followed by complete_emulator_pio_in(). But after fixing that, it will be possible to to only populate the vcpu->arch.pio* fields on userspace exits. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: drop PIO from unregistered devicesPaolo Bonzini
KVM protects the device list with SRCU, and therefore different calls to kvm_io_bus_read()/kvm_io_bus_write() can very well see different incarnations of kvm->buses. If userspace unregisters a device while vCPUs are running there is no well-defined result. This patch applies a safe fallback by returning early from emulator_pio_in_out(). This corresponds to returning zeroes from IN, and dropping the writes on the floor for OUT. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: inline kernel_pio into its sole callerPaolo Bonzini
The caller of kernel_pio already has arguments for most of what kernel_pio fishes out of vcpu->arch.pio. This is the first step towards ensuring that vcpu->arch.pio.* is only used when exiting to userspace. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: complete fast IN directly with complete_emulator_pio_in()Paolo Bonzini
Use complete_emulator_pio_in() directly when completing fast PIO, there's no need to bounce through emulator_pio_in(): the comment about ECX changing doesn't apply to fast PIO, which isn't used for string I/O. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: nSVM: optimize svm_set_x2apic_msr_interceptionMaxim Levitsky
- Avoid toggling the x2apic msr interception if it is already up to date. - Avoid touching L0 msr bitmap when AVIC is inhibited on entry to the guest mode, because in this case the guest usually uses its own msr bitmap. Later on VM exit, the 1st optimization will allow KVM to skip touching the L0 msr bitmap as well. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220519102709.24125-18-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Add AVIC doorbell tracepointSuravee Suthikulpanit
Add a tracepoint to track number of doorbells being sent to signal a running vCPU to process IRQ after being injected. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-17-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possibleSuravee Suthikulpanit
For x2AVIC, the index from incomplete IPI #vmexit info is invalid for logical cluster mode. Only ICRH/ICRL values can be used to determine the IPI destination APIC ID. Since QEMU defines guest physical APIC ID to be the same as vCPU ID, it can be used to quickly identify the target vCPU to deliver IPI, and avoid the overhead from searching through all vCPUs to match the target vCPU. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-16-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: Warning APICv inconsistency only when vcpu APIC mode is validSuravee Suthikulpanit
When launching a VM with x2APIC and specify more than 255 vCPUs, the guest kernel can disable x2APIC (e.g. specify nox2apic kernel option). The VM fallbacks to xAPIC mode, and disable the vCPU ID 255 and greater. In this case, APICV is deactivated for the disabled vCPUs. However, the current APICv consistency warning does not account for this case, which results in a warning. Therefore, modify warning logic to report only when vCPU APIC mode is valid. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-15-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Introduce hybrid-AVIC modeSuravee Suthikulpanit
Currently, AVIC is inhibited when booting a VM w/ x2APIC support. because AVIC cannot virtualize x2APIC MSR register accesses. However, the AVIC doorbell can be used to accelerate interrupt injection into a running vCPU, while all guest accesses to x2APIC MSRs will be intercepted and emulated by KVM. With hybrid-AVIC support, the APICV_INHIBIT_REASON_X2APIC is no longer enforced. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-14-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpuSuravee Suthikulpanit
Originalliy, this WARN_ON is designed to detect when calling avic_vcpu_load() on an already running vcpu in AVIC mode (i.e. the AVIC is_running bit is set). However, for x2AVIC, the vCPU can switch from xAPIC to x2APIC mode while in running state, in which the avic_vcpu_load() will be called from svm_refresh_apicv_exec_ctrl(). Therefore, remove this warning since it is no longer appropriate. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-13-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>