summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-22KVM: emulate: Comment on difference between RDPMC implementation and manualWanpeng Li
SDM mentioned that, RDPMC: IF (((CR4.PCE = 1) or (CPL = 0) or (CR0.PE = 0)) and (ECX indicates a supported counter)) THEN EAX := counter[31:0]; EDX := ZeroExtend(counter[MSCB:32]); ELSE (* ECX is not valid or CR4.PCE is 0 and CPL is 1, 2, or 3 and CR0.PE is 1 *) #GP(0); FI; Let's add a comment why CR0.PE isn't tested since it's impossible for CPL to be >0 if CR0.PE=0. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1634724836-73721-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86: Add vendor name to kvm_x86_ops, use it for error messagesSean Christopherson
Paul pointed out the error messages when KVM fails to load are unhelpful in understanding exactly what went wrong if userspace probes the "wrong" module. Add a mandatory kvm_x86_ops field to track vendor module names, kvm_intel and kvm_amd, and use the name for relevant error message when KVM fails to load so that the user knows which module failed to load. Opportunistically tweak the "disabled by bios" error message to clarify that _support_ was disabled, not that the module itself was magically disabled by BIOS. Suggested-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211018183929.897461-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22kvm: x86: mmu: Make NX huge page recovery period configurableJunaid Shahid
Currently, the NX huge page recovery thread wakes up every minute and zaps 1/nx_huge_pages_recovery_ratio of the total number of split NX huge pages at a time. This is intended to ensure that only a relatively small number of pages get zapped at a time. But for very large VMs (or more specifically, VMs with a large number of executable pages), a period of 1 minute could still result in this number being too high (unless the ratio is changed significantly, but that can result in split pages lingering on for too long). This change makes the period configurable instead of fixing it at 1 minute. Users of large VMs can then adjust the period and/or the ratio to reduce the number of pages zapped at one time while still maintaining the same overall duration for cycling through the entire list. By default, KVM derives a period from the ratio such that a page will remain on the list for 1 hour on average. Signed-off-by: Junaid Shahid <junaids@google.com> Message-Id: <20211020010627.305925-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0Wanpeng Li
SDM section 18.2.3 mentioned that: "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of any general-purpose or fixed-function counters via a single WRMSR." It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing, the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop global_ovf_ctrl variable. Tested-by: Like Xu <likexu@tencent.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1634631160-67276-2-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86/mmu: Rename slot_handle_leaf to slot_handle_level_4kDavid Matlack
slot_handle_leaf is a misnomer because it only operates on 4K SPTEs whereas "leaf" is used to describe any valid terminal SPTE (4K or large page). Rename slot_handle_leaf to slot_handle_level_4k to avoid confusion. Making this change makes it more obvious there is a benign discrepency between the legacy MMU and the TDP MMU when it comes to dirty logging. The legacy MMU only iterates through 4K SPTEs when zapping for collapsing and when clearing D-bits. The TDP MMU, on the other hand, iterates through SPTEs on all levels. The TDP MMU behavior of zapping SPTEs at all levels is technically overkill for its current dirty logging implementation, which always demotes to 4k SPTES, but both the TDP MMU and legacy MMU zap if and only if the SPTE can be replaced by a larger page, i.e. will not spuriously zap 2m (or larger) SPTEs. Opportunistically add comments to explain this discrepency in the code. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211019162223.3935109-1-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: VMX: RTIT_CTL_BRANCH_EN has no dependency on other CPUID bitXiaoyao Li
Per Intel SDM, RTIT_CTL_BRANCH_EN bit has no dependency on any CPUID leaf 0x14. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-5-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: VMX: Rename pt_desc.addr_range to pt_desc.num_address_rangesXiaoyao Li
To better self explain the meaning of this field and match the PT_CAP_num_address_ranges constatn. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-4-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: VMX: Use precomputed vmx->pt_desc.addr_rangeXiaoyao Li
The number of valid PT ADDR MSRs for the guest is precomputed in vmx->pt_desc.addr_range. Use it instead of calculating again. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-3-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: VMX: Restore host's MSR_IA32_RTIT_CTL when it's not zeroXiaoyao Li
A minor optimization to WRMSR MSR_IA32_RTIT_CTL when necessary. Opportunistically refine the comment to call out that KVM requires VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-2-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86/mmu: clean up prefetch/prefault/speculative namingPaolo Bonzini
"prefetch", "prefault" and "speculative" are used throughout KVM to mean the same thing. Use a single name, standardizing on "prefetch" which is already used by various functions such as direct_pte_prefetch, FNAME(prefetch_gpte), FNAME(pte_prefetch), etc. Suggested-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: cleanup allocation of rmaps and page tracking dataDavid Stevens
Unify the flags for rmaps and page tracking data, using a single flag in struct kvm_arch and a single loop to go over all the address spaces and memslots. This avoids code duplication between alloc_all_memslots_rmaps and kvm_page_track_enable_mmu_write_tracking. Signed-off-by: David Stevens <stevensd@chromium.org> [This patch is the delta between David's v2 and v3, with conflicts fixed and my own commit message. - Paolo] Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-21Merge branch kvm/selftests/memslot into kvmarm-master/nextMarc Zyngier
* kvm/selftests/memslot: : . : Enable KVM memslot selftests on arm64, making them less : x86 specific. : . KVM: selftests: Build the memslot tests for arm64 KVM: selftests: Make memslot_perf_test arch independent Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-21KVM: selftests: Build the memslot tests for arm64Ricardo Koller
Add memslot_perf_test and memslot_modification_stress_test to the list of aarch64 selftests. Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907180957.609966-3-ricarkol@google.com
2021-10-21KVM: selftests: Make memslot_perf_test arch independentRicardo Koller
memslot_perf_test uses ucalls for synchronization between guest and host. Ucalls API is architecture independent: tests do not need to know details like what kind of exit they generate on a specific arch. More specifically, there is no need to check whether an exit is KVM_EXIT_IO in x86 for the host to know that the exit is ucall related, as get_ucall() already makes that check. Change memslot_perf_test to not require specifying what exit does a ucall generate. Also add a missing ucall_init. Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907180957.609966-2-ricarkol@google.com
2021-10-20KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpuHalil Pasic
Changing the deliverable mask in __airqs_kick_single_vcpu() is a bug. If one idle vcpu can't take the interrupts we want to deliver, we should look for another vcpu that can, instead of saying that we don't want to deliver these interrupts by clearing the bits from the deliverable_mask. Fixes: 9f30f6216378 ("KVM: s390: add gib_alert_irq_handler()") Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20211019175401.3757927-3-pasic@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-20KVM: s390: clear kicked_mask before sleeping againHalil Pasic
The idea behind kicked mask is that we should not re-kick a vcpu that is already in the "kick" process, i.e. that was kicked and is is about to be dispatched if certain conditions are met. The problem with the current implementation is, that it assumes the kicked vcpu is going to enter SIE shortly. But under certain circumstances, the vcpu we just kicked will be deemed non-runnable and will remain in wait state. This can happen, if the interrupt(s) this vcpu got kicked to deal with got already cleared (because the interrupts got delivered to another vcpu). In this case kvm_arch_vcpu_runnable() would return false, and the vcpu would remain in kvm_vcpu_block(), but this time with its kicked_mask bit set. So next time around we wouldn't kick the vcpu form __airqs_kick_single_vcpu(), but would assume that we just kicked it. Let us make sure the kicked_mask is cleared before we give up on re-dispatching the vcpu. Fixes: 9f30f6216378 ("KVM: s390: add gib_alert_irq_handler()") Reported-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20211019175401.3757927-2-pasic@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-18selftests: KVM: Introduce system counter offset testOliver Upton
Introduce a KVM selftest to verify that userspace manipulation of the TSC (via the new vCPU attribute) results in the correct behavior within the guest. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20210916181555.973085-6-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18selftests: KVM: Add helpers for vCPU device attributesOliver Upton
vCPU file descriptors are abstracted away from test code in KVM selftests, meaning that tests cannot directly access a vCPU's device attributes. Add helpers that tests can use to get at vCPU device attributes. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20210916181555.973085-5-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18selftests: KVM: Fix kvm device helper ioctl assertionsOliver Upton
The KVM_CREATE_DEVICE and KVM_{GET,SET}_DEVICE_ATTR ioctls are defined to return a value of zero on success. As such, tighten the assertions in the helper functions to only pass if the return code is zero. Suggested-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20210916181555.973085-4-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18selftests: KVM: Add test for KVM_{GET,SET}_CLOCKOliver Upton
Add a selftest for the new KVM clock UAPI that was introduced. Ensure that the KVM clock is consistent between userspace and the guest, and that the difference in realtime will only ever cause the KVM clock to advance forward. Cc: Andrew Jones <drjones@redhat.com> Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20210916181555.973085-3-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18tools: arch: x86: pull in pvclock headersOliver Upton
Copy over approximately clean versions of the pvclock headers into tools. Reconcile headers/symbols missing in tools that are unneeded. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20210916181555.973085-2-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: x86: Expose TSC offset controls to userspaceOliver Upton
To date, VMM-directed TSC synchronization and migration has been a bit messy. KVM has some baked-in heuristics around TSC writes to infer if the VMM is attempting to synchronize. This is problematic, as it depends on host userspace writing to the guest's TSC within 1 second of the last write. A much cleaner approach to configuring the guest's views of the TSC is to simply migrate the TSC offset for every vCPU. Offsets are idempotent, and thus not subject to change depending on when the VMM actually reads/writes values from/to KVM. The VMM can then read the TSC once with KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when the guest is paused. Cc: David Matlack <dmatlack@google.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210916181538.968978-8-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: x86: Refactor tsc synchronization codeOliver Upton
Refactor kvm_synchronize_tsc to make a new function that allows callers to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly for the sake of participating in TSC synchronization. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20210916181538.968978-7-oupton@google.com> [Make sure kvm->arch.cur_tsc_generation and vcpu->arch.this_tsc_generation are equal at the end of __kvm_synchronize_tsc, if matched is false. Reported by Maxim Levitsky. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18kvm: x86: protect masterclock with a seqcountPaolo Bonzini
Protect the reference point for kvmclock with a seqcount, so that kvmclock updates for all vCPUs can proceed in parallel. Xen runstate updates will also run in parallel and not bounce the kvmclock cacheline. Of the variables that were protected by pvclock_gtod_sync_lock, nr_vcpus_matched_tsc is different because it is updated outside pvclock_update_vm_gtod_copy and read inside it. Therefore, we need to keep it protected by a spinlock. In fact it must now be a raw spinlock, because pvclock_update_vm_gtod_copy, being the write-side of a seqcount, is non-preemptible. Since we already have tsc_write_lock which is a raw spinlock, we can just use tsc_write_lock as the lock that protects the write-side of the seqcount. Co-developed-by: Oliver Upton <oupton@google.com> Message-Id: <20210916181538.968978-6-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCKOliver Upton
Handling the migration of TSCs correctly is difficult, in part because Linux does not provide userspace with the ability to retrieve a (TSC, realtime) clock pair for a single instant in time. In lieu of a more convenient facility, KVM can report similar information in the kvm_clock structure. Provide userspace with a host TSC & realtime pair iff the realtime clock is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid realtime value, advance the KVM clock by the amount of elapsed time. Do not step the KVM clock backwards, though, as it is a monotonic oscillator. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210916181538.968978-5-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: x86: avoid warning with -Wbitwise-instead-of-logicalPaolo Bonzini
This is a new warning in clang top-of-tree (will be clang 14): In file included from arch/x86/kvm/mmu/mmu.c:27: arch/x86/kvm/mmu/spte.h:318:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical] return __is_bad_mt_xwr(rsvd_check, spte) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ || arch/x86/kvm/mmu/spte.h:318:9: note: cast one or both operands to int to silence this warning The code is fine, but change it anyway to shut up this clever clogs of a compiler. Reported-by: torvic9@mailbox.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18Merge commit 'kvm-pagedata-alloc-fixes' into HEADPaolo Bonzini
2021-10-18KVM: X86: fix lazy allocation of rmapsPaolo Bonzini
If allocation of rmaps fails, but some of the pointers have already been written, those pointers can be cleaned up when the memslot is freed, or even reused later for another attempt at allocating the rmaps. Therefore there is no need to WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be skipped lest KVM will overwrite the previous pointer and will indeed leak memory. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18Merge branch kvm-arm64/pkvm/fixed-features into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm/fixed-features: (22 commits) : . : Add the pKVM fixed feature that allows a bunch of exceptions : to either be forbidden or be easily handled at EL2. : . KVM: arm64: pkvm: Give priority to standard traps over pvm handling KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array() KVM: arm64: pkvm: Move kvm_handle_pvm_restricted around KVM: arm64: pkvm: Consolidate include files KVM: arm64: pkvm: Preserve pending SError on exit from AArch32 KVM: arm64: pkvm: Handle GICv3 traps as required KVM: arm64: pkvm: Drop sysregs that should never be routed to the host KVM: arm64: pkvm: Drop AArch32-specific registers KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WI KVM: arm64: pkvm: Use a single function to expose all id-regs KVM: arm64: Fix early exit ptrauth handling KVM: arm64: Handle protected guests at 32 bits KVM: arm64: Trap access to pVM restricted features KVM: arm64: Move sanitized copies of CPU features KVM: arm64: Initialize trap registers for protected VMs KVM: arm64: Add handlers for protected VM System Registers KVM: arm64: Simplify masking out MTE in feature id reg KVM: arm64: Add missing field descriptor for MDCR_EL2 KVM: arm64: Pass struct kvm to per-EC handlers KVM: arm64: Move early handlers to per-EC handlers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-18KVM: arm64: pkvm: Give priority to standard traps over pvm handlingMarc Zyngier
Checking for pvm handling first means that we cannot handle ptrauth traps or apply any of the workarounds (GICv3 or TX2 #219). Flip the order around. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-12-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array()Marc Zyngier
Passing a VM pointer around is odd, and results in extra work on VHE. Follow the rest of the design that uses the vcpu instead, and let the nVHE code look into the struct kvm as required. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-11-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Move kvm_handle_pvm_restricted aroundMarc Zyngier
Place kvm_handle_pvm_restricted() next to its little friends such as kvm_handle_pvm_sysreg(). This allows to make inject_undef64() static. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-10-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Consolidate include filesMarc Zyngier
kvm_fixed_config.h is pkvm specific, and would be better placed near its users. At the same time, include/nvhe/sys_regs.h is now almost empty. Merge the two into arch/arm64/kvm/hyp/include/nvhe/fixed_config.h. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-9-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Preserve pending SError on exit from AArch32Marc Zyngier
Don't drop a potential SError when a guest gets caught red-handed running AArch32 code. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-8-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Handle GICv3 traps as requiredMarc Zyngier
Forward accesses to the ICV_*SGI*_EL1 registers to EL1, and emulate ICV_SRE_EL1 by returning a fixed value. This should be enough to support GICv3 in a protected guest. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-7-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Drop sysregs that should never be routed to the hostMarc Zyngier
A bunch of system registers (most of them MM related) should never trap to the host under any circumstance. Keep them close to our chest. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-6-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Drop AArch32-specific registersMarc Zyngier
All the SYS_*32_EL2 registers are AArch32-specific. Since we forbid AArch32, there is no need to handle those in any way. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-5-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WIMarc Zyngier
The ERR*/ERX* registers should be handled as RAZ/WI, and there should be no need to involve EL1 for that. Add a helper that handles such registers, and repaint the sysreg table to declare these registers as RAZ/WI. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-4-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Use a single function to expose all id-regsMarc Zyngier
Rather than exposing a whole set of helper functions to retrieve individual ID registers, use the existing decoding tree and expose a single helper instead. This allow a number of functions to be made static, and we now have a single entry point to maintain. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-3-maz@kernel.org
2021-10-18KVM: arm64: Fix early exit ptrauth handlingMarc Zyngier
The previous rework of the early exit code to provide an EC-based decoding tree missed the fact that we have two trap paths for ptrauth: the instructions (EC_PAC) and the sysregs (EC_SYS64). Rework the handlers to call the ptrauth handling code on both paths. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-2-maz@kernel.org
2021-10-18KVM: x86/mmu: kvm_faultin_pfn has to return false if pfh is returnedAndrei Vagin
This looks like a typo in 8f32d5e563cb. This change didn't intend to do any functional changes. The problem was caught by gVisor tests. Fixes: 8f32d5e563cb ("KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code") Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Message-Id: <20211015163221.472508-1-avagin@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-17Merge branch kvm-arm64/memory-accounting into kvmarm-master/nextMarc Zyngier
* kvm-arm64/memory-accounting: : . : Sprinkle a bunch of GFP_KERNEL_ACCOUNT all over the code base : to better track memory allocation made on behalf of a VM. : . KVM: arm64: Add memcg accounting to KVM allocations KVM: arm64: vgic: Add memcg accounting to vgic allocations Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-17KVM: arm64: Add memcg accounting to KVM allocationsJia He
Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM allocations"), it would be better to make arm64 KVM consistent with common kvm codes. The memory allocations of VM scope should be charged into VM process cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT. There remain a few cases since these allocations are global, not in VM scope. Signed-off-by: Jia He <justin.he@arm.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907123112.10232-3-justin.he@arm.com
2021-10-17KVM: arm64: vgic: Add memcg accounting to vgic allocationsJia He
Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM allocations"), it would be better to make arm64 vgic consistent with common kvm codes. The memory allocations of VM scope should be charged into VM process cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT. There remain a few cases since these allocations are global, not in VM scope. Signed-off-by: Jia He <justin.he@arm.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907123112.10232-2-justin.he@arm.com
2021-10-17Merge branch kvm-arm64/selftest/timer into kvmarm-master/nextMarc Zyngier
* kvm-arm64/selftest/timer: : . : Add a set of selftests for the KVM/arm64 timer emulation. : Comes with a minimal GICv3 infrastructure. : . KVM: arm64: selftests: arch_timer: Support vCPU migration KVM: arm64: selftests: Add arch_timer test KVM: arm64: selftests: Add host support for vGIC KVM: arm64: selftests: Add basic GICv3 support KVM: arm64: selftests: Add light-weight spinlock support KVM: arm64: selftests: Add guest support to get the vcpuid KVM: arm64: selftests: Maintain consistency for vcpuid type KVM: arm64: selftests: Add support to disable and enable local IRQs KVM: arm64: selftests: Add basic support to generate delays KVM: arm64: selftests: Add basic support for arch_timers KVM: arm64: selftests: Add support for cpu_relax KVM: arm64: selftests: Introduce ARM64_SYS_KVM_REG tools: arm64: Import sysreg.h KVM: arm64: selftests: Add MMIO readl/writel support Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-17KVM: arm64: selftests: arch_timer: Support vCPU migrationRaghavendra Rao Ananta
Since the timer stack (hardware and KVM) is per-CPU, there are potential chances for races to occur when the scheduler decides to migrate a vCPU thread to a different physical CPU. Hence, include an option to stress-test this part as well by forcing the vCPUs to migrate across physical CPUs in the system at a particular rate. Originally, the bug for the fix with commit 3134cc8beb69d0d ("KVM: arm64: vgic: Resample HW pending state on deactivation") was discovered using arch_timer test with vCPU migrations and can be easily reproduced. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211007233439.1826892-16-rananta@google.com
2021-10-17KVM: arm64: selftests: Add arch_timer testRaghavendra Rao Ananta
Add a KVM selftest to validate the arch_timer functionality. Primarily, the test sets up periodic timer interrupts and validates the basic architectural expectations upon its receipt. The test provides command-line options to configure the period of the timer, number of iterations, and number of vCPUs. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211007233439.1826892-15-rananta@google.com
2021-10-17KVM: arm64: selftests: Add host support for vGICRaghavendra Rao Ananta
Implement a simple library to perform vGIC-v3 setup from a host point of view. This includes creating a vGIC device, setting up distributor and redistributor attributes, and mapping the guest physical addresses. The definition of REDIST_REGION_ATTR_ADDR is taken from aarch64/vgic_init test. Hence, replace the definition by including vgic.h in the test file. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211007233439.1826892-14-rananta@google.com
2021-10-17KVM: arm64: selftests: Add basic GICv3 supportRaghavendra Rao Ananta
Add basic support for ARM Generic Interrupt Controller v3. The support provides guests to setup interrupts. The work is inspired from kvm-unit-tests and the kernel's GIC driver (drivers/irqchip/irq-gic-v3.c). Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211007233439.1826892-13-rananta@google.com
2021-10-17KVM: arm64: selftests: Add light-weight spinlock supportRaghavendra Rao Ananta
Add a simpler version of spinlock support for ARM64 for the guests to use. The implementation is loosely based on the spinlock implementation in kvm-unit-tests. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211007233439.1826892-12-rananta@google.com