summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2021-10-22KVM: VMX: Rename pt_desc.addr_range to pt_desc.num_address_rangesXiaoyao Li
To better self explain the meaning of this field and match the PT_CAP_num_address_ranges constatn. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-4-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: VMX: Use precomputed vmx->pt_desc.addr_rangeXiaoyao Li
The number of valid PT ADDR MSRs for the guest is precomputed in vmx->pt_desc.addr_range. Use it instead of calculating again. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-3-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: VMX: Restore host's MSR_IA32_RTIT_CTL when it's not zeroXiaoyao Li
A minor optimization to WRMSR MSR_IA32_RTIT_CTL when necessary. Opportunistically refine the comment to call out that KVM requires VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-2-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86/mmu: clean up prefetch/prefault/speculative namingPaolo Bonzini
"prefetch", "prefault" and "speculative" are used throughout KVM to mean the same thing. Use a single name, standardizing on "prefetch" which is already used by various functions such as direct_pte_prefetch, FNAME(prefetch_gpte), FNAME(pte_prefetch), etc. Suggested-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: cleanup allocation of rmaps and page tracking dataDavid Stevens
Unify the flags for rmaps and page tracking data, using a single flag in struct kvm_arch and a single loop to go over all the address spaces and memslots. This avoids code duplication between alloc_all_memslots_rmaps and kvm_page_track_enable_mmu_write_tracking. Signed-off-by: David Stevens <stevensd@chromium.org> [This patch is the delta between David's v2 and v3, with conflicts fixed and my own commit message. - Paolo] Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-20KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpuHalil Pasic
Changing the deliverable mask in __airqs_kick_single_vcpu() is a bug. If one idle vcpu can't take the interrupts we want to deliver, we should look for another vcpu that can, instead of saying that we don't want to deliver these interrupts by clearing the bits from the deliverable_mask. Fixes: 9f30f6216378 ("KVM: s390: add gib_alert_irq_handler()") Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20211019175401.3757927-3-pasic@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-20KVM: s390: clear kicked_mask before sleeping againHalil Pasic
The idea behind kicked mask is that we should not re-kick a vcpu that is already in the "kick" process, i.e. that was kicked and is is about to be dispatched if certain conditions are met. The problem with the current implementation is, that it assumes the kicked vcpu is going to enter SIE shortly. But under certain circumstances, the vcpu we just kicked will be deemed non-runnable and will remain in wait state. This can happen, if the interrupt(s) this vcpu got kicked to deal with got already cleared (because the interrupts got delivered to another vcpu). In this case kvm_arch_vcpu_runnable() would return false, and the vcpu would remain in kvm_vcpu_block(), but this time with its kicked_mask bit set. So next time around we wouldn't kick the vcpu form __airqs_kick_single_vcpu(), but would assume that we just kicked it. Let us make sure the kicked_mask is cleared before we give up on re-dispatching the vcpu. Fixes: 9f30f6216378 ("KVM: s390: add gib_alert_irq_handler()") Reported-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20211019175401.3757927-2-pasic@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-18KVM: x86: Expose TSC offset controls to userspaceOliver Upton
To date, VMM-directed TSC synchronization and migration has been a bit messy. KVM has some baked-in heuristics around TSC writes to infer if the VMM is attempting to synchronize. This is problematic, as it depends on host userspace writing to the guest's TSC within 1 second of the last write. A much cleaner approach to configuring the guest's views of the TSC is to simply migrate the TSC offset for every vCPU. Offsets are idempotent, and thus not subject to change depending on when the VMM actually reads/writes values from/to KVM. The VMM can then read the TSC once with KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when the guest is paused. Cc: David Matlack <dmatlack@google.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210916181538.968978-8-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: x86: Refactor tsc synchronization codeOliver Upton
Refactor kvm_synchronize_tsc to make a new function that allows callers to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly for the sake of participating in TSC synchronization. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20210916181538.968978-7-oupton@google.com> [Make sure kvm->arch.cur_tsc_generation and vcpu->arch.this_tsc_generation are equal at the end of __kvm_synchronize_tsc, if matched is false. Reported by Maxim Levitsky. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18kvm: x86: protect masterclock with a seqcountPaolo Bonzini
Protect the reference point for kvmclock with a seqcount, so that kvmclock updates for all vCPUs can proceed in parallel. Xen runstate updates will also run in parallel and not bounce the kvmclock cacheline. Of the variables that were protected by pvclock_gtod_sync_lock, nr_vcpus_matched_tsc is different because it is updated outside pvclock_update_vm_gtod_copy and read inside it. Therefore, we need to keep it protected by a spinlock. In fact it must now be a raw spinlock, because pvclock_update_vm_gtod_copy, being the write-side of a seqcount, is non-preemptible. Since we already have tsc_write_lock which is a raw spinlock, we can just use tsc_write_lock as the lock that protects the write-side of the seqcount. Co-developed-by: Oliver Upton <oupton@google.com> Message-Id: <20210916181538.968978-6-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCKOliver Upton
Handling the migration of TSCs correctly is difficult, in part because Linux does not provide userspace with the ability to retrieve a (TSC, realtime) clock pair for a single instant in time. In lieu of a more convenient facility, KVM can report similar information in the kvm_clock structure. Provide userspace with a host TSC & realtime pair iff the realtime clock is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid realtime value, advance the KVM clock by the amount of elapsed time. Do not step the KVM clock backwards, though, as it is a monotonic oscillator. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210916181538.968978-5-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: x86: avoid warning with -Wbitwise-instead-of-logicalPaolo Bonzini
This is a new warning in clang top-of-tree (will be clang 14): In file included from arch/x86/kvm/mmu/mmu.c:27: arch/x86/kvm/mmu/spte.h:318:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical] return __is_bad_mt_xwr(rsvd_check, spte) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ || arch/x86/kvm/mmu/spte.h:318:9: note: cast one or both operands to int to silence this warning The code is fine, but change it anyway to shut up this clever clogs of a compiler. Reported-by: torvic9@mailbox.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18Merge commit 'kvm-pagedata-alloc-fixes' into HEADPaolo Bonzini
2021-10-18KVM: X86: fix lazy allocation of rmapsPaolo Bonzini
If allocation of rmaps fails, but some of the pointers have already been written, those pointers can be cleaned up when the memslot is freed, or even reused later for another attempt at allocating the rmaps. Therefore there is no need to WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be skipped lest KVM will overwrite the previous pointer and will indeed leak memory. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18Merge branch kvm-arm64/pkvm/fixed-features into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm/fixed-features: (22 commits) : . : Add the pKVM fixed feature that allows a bunch of exceptions : to either be forbidden or be easily handled at EL2. : . KVM: arm64: pkvm: Give priority to standard traps over pvm handling KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array() KVM: arm64: pkvm: Move kvm_handle_pvm_restricted around KVM: arm64: pkvm: Consolidate include files KVM: arm64: pkvm: Preserve pending SError on exit from AArch32 KVM: arm64: pkvm: Handle GICv3 traps as required KVM: arm64: pkvm: Drop sysregs that should never be routed to the host KVM: arm64: pkvm: Drop AArch32-specific registers KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WI KVM: arm64: pkvm: Use a single function to expose all id-regs KVM: arm64: Fix early exit ptrauth handling KVM: arm64: Handle protected guests at 32 bits KVM: arm64: Trap access to pVM restricted features KVM: arm64: Move sanitized copies of CPU features KVM: arm64: Initialize trap registers for protected VMs KVM: arm64: Add handlers for protected VM System Registers KVM: arm64: Simplify masking out MTE in feature id reg KVM: arm64: Add missing field descriptor for MDCR_EL2 KVM: arm64: Pass struct kvm to per-EC handlers KVM: arm64: Move early handlers to per-EC handlers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-18KVM: arm64: pkvm: Give priority to standard traps over pvm handlingMarc Zyngier
Checking for pvm handling first means that we cannot handle ptrauth traps or apply any of the workarounds (GICv3 or TX2 #219). Flip the order around. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-12-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array()Marc Zyngier
Passing a VM pointer around is odd, and results in extra work on VHE. Follow the rest of the design that uses the vcpu instead, and let the nVHE code look into the struct kvm as required. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-11-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Move kvm_handle_pvm_restricted aroundMarc Zyngier
Place kvm_handle_pvm_restricted() next to its little friends such as kvm_handle_pvm_sysreg(). This allows to make inject_undef64() static. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-10-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Consolidate include filesMarc Zyngier
kvm_fixed_config.h is pkvm specific, and would be better placed near its users. At the same time, include/nvhe/sys_regs.h is now almost empty. Merge the two into arch/arm64/kvm/hyp/include/nvhe/fixed_config.h. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-9-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Preserve pending SError on exit from AArch32Marc Zyngier
Don't drop a potential SError when a guest gets caught red-handed running AArch32 code. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-8-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Handle GICv3 traps as requiredMarc Zyngier
Forward accesses to the ICV_*SGI*_EL1 registers to EL1, and emulate ICV_SRE_EL1 by returning a fixed value. This should be enough to support GICv3 in a protected guest. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-7-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Drop sysregs that should never be routed to the hostMarc Zyngier
A bunch of system registers (most of them MM related) should never trap to the host under any circumstance. Keep them close to our chest. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-6-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Drop AArch32-specific registersMarc Zyngier
All the SYS_*32_EL2 registers are AArch32-specific. Since we forbid AArch32, there is no need to handle those in any way. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-5-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WIMarc Zyngier
The ERR*/ERX* registers should be handled as RAZ/WI, and there should be no need to involve EL1 for that. Add a helper that handles such registers, and repaint the sysreg table to declare these registers as RAZ/WI. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-4-maz@kernel.org
2021-10-18KVM: arm64: pkvm: Use a single function to expose all id-regsMarc Zyngier
Rather than exposing a whole set of helper functions to retrieve individual ID registers, use the existing decoding tree and expose a single helper instead. This allow a number of functions to be made static, and we now have a single entry point to maintain. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-3-maz@kernel.org
2021-10-18KVM: arm64: Fix early exit ptrauth handlingMarc Zyngier
The previous rework of the early exit code to provide an EC-based decoding tree missed the fact that we have two trap paths for ptrauth: the instructions (EC_PAC) and the sysregs (EC_SYS64). Rework the handlers to call the ptrauth handling code on both paths. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211013120346.2926621-2-maz@kernel.org
2021-10-18KVM: x86/mmu: kvm_faultin_pfn has to return false if pfh is returnedAndrei Vagin
This looks like a typo in 8f32d5e563cb. This change didn't intend to do any functional changes. The problem was caught by gVisor tests. Fixes: 8f32d5e563cb ("KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code") Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Message-Id: <20211015163221.472508-1-avagin@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-17Merge branch kvm-arm64/memory-accounting into kvmarm-master/nextMarc Zyngier
* kvm-arm64/memory-accounting: : . : Sprinkle a bunch of GFP_KERNEL_ACCOUNT all over the code base : to better track memory allocation made on behalf of a VM. : . KVM: arm64: Add memcg accounting to KVM allocations KVM: arm64: vgic: Add memcg accounting to vgic allocations Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-17KVM: arm64: Add memcg accounting to KVM allocationsJia He
Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM allocations"), it would be better to make arm64 KVM consistent with common kvm codes. The memory allocations of VM scope should be charged into VM process cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT. There remain a few cases since these allocations are global, not in VM scope. Signed-off-by: Jia He <justin.he@arm.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907123112.10232-3-justin.he@arm.com
2021-10-17KVM: arm64: vgic: Add memcg accounting to vgic allocationsJia He
Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM allocations"), it would be better to make arm64 vgic consistent with common kvm codes. The memory allocations of VM scope should be charged into VM process cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT. There remain a few cases since these allocations are global, not in VM scope. Signed-off-by: Jia He <justin.he@arm.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907123112.10232-2-justin.he@arm.com
2021-10-17Merge branch kvm-arm64/vgic-fixes-5.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/vgic-fixes-5.16: : . : Multiple updates to the GICv3 emulation in order to better support : the dreadful Apple M1 that only implements half of it, and in a : broken way... : . KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3 Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-17KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocodeMarc Zyngier
Having realised that a virtual LPI does transition through an active state that does not exist on bare metal, align the CPU interface emulation with the behaviour specified in the architecture pseudocode. The LPIs now transition to active on IAR read, and to inactive on EOI write. Special care is taken not to increment the EOIcount for an LPI that isn't present in the LRs. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010150910.2911495-6-maz@kernel.org
2021-10-17KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEISMarc Zyngier
Since we are trapping all sysreg accesses when ICH_VTR_EL2.SEIS is set, and that we never deliver an SError when emulating any of the GICv3 sysregs, don't advertise ICC_CTLR_EL1.SEIS. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010150910.2911495-5-maz@kernel.org
2021-10-17KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possibleMarc Zyngier
On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg accesses from the guest. From a performance perspective, this is OK as long as the guest doesn't hammer the GICv3 CPU interface. In most cases, this is fine, unless the guest actively uses priorities and switches PMR_EL1 very often. Which is exactly what happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1. In these condition, the performance plumets as we hit PMR each time we mask/unmask interrupts. Not good. There is however an opportunity for improvement. Careful reading of the architecture specification indicates that the only GICv3 sysreg belonging to the common group (which contains the SGI registers, PMR, DIR, CTLR and RPR) that is allowed to generate a SError is DIR. Everything else is safe. It is thus possible to substitute the trapping of all the common group with just that of DIR if it supported by the implementation. Yes, that's yet another optional bit of the architecture. So let's just do that, as it leads to some impressive result on the M1: Without this change: bash-5.1# /host/home/maz/hackbench 100 process 1000 Running with 100*40 (== 4000) tasks. Time: 56.596 With this change: bash-5.1# /host/home/maz/hackbench 100 process 1000 Running with 100*40 (== 4000) tasks. Time: 8.649 which is a pretty convincing result. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
2021-10-17KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrorsMarc Zyngier
The infamous M1 has a feature nobody else ever implemented, in the form of the "GIC locally generated SError interrupts", also known as SEIS for short. These SErrors are generated when a guest does something that violates the GIC state machine. It would have been simpler to just *ignore* the damned thing, but that's not what this HW does. Oh well. This part of of the architecture is also amazingly under-specified. There is a whole 10 lines that describe the feature in a spec that is 930 pages long, and some of these lines are factually wrong. Oh, and it is deprecated, so the insentive to clarify it is low. Now, the spec says that this should be a *virtual* SError when HCR_EL2.AMO is set. As it turns out, that's not always the case on this CPU, and the SError sometimes fires on the host as a physical SError. Goodbye, cruel world. This clearly is a HW bug, and it means that a guest can easily take the host down, on demand. Thankfully, we have seen systems that were just as broken in the past, and we have the perfect vaccine for it. Apple M1, please meet the Cavium ThunderX workaround. All your GIC accesses will be trapped, sanitised, and emulated. Only the signalling aspect of the HW will be used. It won't be super speedy, but it will at least be safe. You're most welcome. Given that this has only ever been seen on this single implementation, that the spec is unclear at best and that we cannot trust it to ever be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS being set. Tested-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org
2021-10-17KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3Marc Zyngier
Until now, we always let ID_AA64PFR0_EL1.GIC reflect the value visible on the host, even if we were running a GICv2-enabled VM on a GICv3+compat host. That's fine, but we also now have the case of a host that does not expose ID_AA64PFR0_EL1.GIC==1 despite having a vGIC. Yes, this is confusing. Thank you M1. Let's go back to first principles and expose ID_AA64PFR0_EL1.GIC=1 when a GICv3 is exposed to the guest. This also hides a GICv4.1 CPU interface from the guest which has no business knowing about the v4.1 extension. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010150910.2911495-2-maz@kernel.org
2021-10-15Merge tag 'kvmarm-fixes-5.15-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.15, take #2 - Properly refcount pages used as a concatenated stage-2 PGD - Fix missing unlock when detecting the use of MTE+VM_SHARED
2021-10-15KVM: SEV-ES: fix length of string I/OPaolo Bonzini
The size of the data in the scratch buffer is not divided by the size of each port I/O operation, so vcpu->arch.pio.count ends up being larger than it should be by a factor of size. Cc: stable@vger.kernel.org Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest") Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-12Merge branch kvm-arm64/misc-5.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-5.16: : . : - Allow KVM to be disabled from the command-line : - Clean up CONFIG_KVM vs CONFIG_HAVE_KVM : - Fix endianess evaluation on MMIO access from EL0 : . KVM: arm64: Fix reporting of endianess when the access originates at EL0 Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-12KVM: arm64: Fix reporting of endianess when the access originates at EL0Marc Zyngier
We currently check SCTLR_EL1.EE when computing the address of a faulting guest access. However, the fault could have occured at EL0, in which case the right bit to check would be SCTLR_EL1.E0E. This is pretty unlikely to cause any issue in practice: You'd have to have a guest with a LE EL1 and a BE EL0 (or the other way around), and have mapped a device into the EL0 page tables. Good luck with that! Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211012112312.1247467-1-maz@kernel.org
2021-10-11KVM: arm64: Handle protected guests at 32 bitsFuad Tabba
Protected KVM does not support protected AArch32 guests. However, it is possible for the guest to force run AArch32, potentially causing problems. Add an extra check so that if the hypervisor catches the guest doing that, it can prevent the guest from running again by resetting vcpu->arch.target and returning ARM_EXCEPTION_IL. If this were to happen, The VMM can try and fix it by re- initializing the vcpu with KVM_ARM_VCPU_INIT, however, this is likely not possible for protected VMs. Adapted from commit 22f553842b14 ("KVM: arm64: Handle Asymmetric AArch32 systems") Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-12-tabba@google.com
2021-10-11KVM: arm64: Trap access to pVM restricted featuresFuad Tabba
Trap accesses to restricted features for VMs running in protected mode. Access to feature registers are emulated, and only supported features are exposed to protected VMs. Accesses to restricted registers as well as restricted instructions are trapped, and an undefined exception is injected into the protected guests, i.e., with EC = 0x0 (unknown reason). This EC is the one used, according to the Arm Architecture Reference Manual, for unallocated or undefined system registers or instructions. Only affects the functionality of protected VMs. Otherwise, should not affect non-protected VMs when KVM is running in protected mode. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-11-tabba@google.com
2021-10-11KVM: arm64: Move sanitized copies of CPU featuresFuad Tabba
Move the sanitized copies of the CPU feature registers to the recently created sys_regs.c. This consolidates all copies in a more relevant file. No functional change intended. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-10-tabba@google.com
2021-10-11KVM: arm64: Initialize trap registers for protected VMsFuad Tabba
Protected VMs have more restricted features that need to be trapped. Moreover, the host should not be trusted to set the appropriate trapping registers and their values. Initialize the trapping registers, i.e., hcr_el2, mdcr_el2, and cptr_el2 at EL2 for protected guests, based on the values of the guest's feature id registers. No functional change intended as trap handlers introduced in the previous patch are still not hooked in to the guest exit handlers. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-9-tabba@google.com
2021-10-11KVM: arm64: Add handlers for protected VM System RegistersFuad Tabba
Add system register handlers for protected VMs. These cover Sys64 registers (including feature id registers), and debug. No functional change intended as these are not hooked in yet to the guest exit handlers introduced earlier. So when trapping is triggered, the exit handlers let the host handle it, as before. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-8-tabba@google.com
2021-10-11KVM: arm64: Simplify masking out MTE in feature id regFuad Tabba
Simplify code for hiding MTE support in feature id register when MTE is not enabled/supported by KVM. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-7-tabba@google.com
2021-10-11KVM: arm64: Add missing field descriptor for MDCR_EL2Fuad Tabba
It's not currently used. Added for completeness. No functional change intended. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-6-tabba@google.com
2021-10-11KVM: arm64: Pass struct kvm to per-EC handlersFuad Tabba
We need struct kvm to check for protected VMs to be able to pick the right handlers for them in subsequent patches. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-5-tabba@google.com
2021-10-11KVM: arm64: Move early handlers to per-EC handlersMarc Zyngier
Simplify the early exception handling by slicing the gigantic decoding tree into a more manageable set of functions, similar to what we have in handle_exit.c. This will also make the structure reusable for pKVM's own early exit handling. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211010145636.1950948-4-tabba@google.com
2021-10-11KVM: arm64: Don't include switch.h into nvhe/kvm-main.cMarc Zyngier
hyp-main.c includes switch.h while it only requires adjust-pc.h. Fix it to remove an unnecessary dependency. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20211010145636.1950948-3-tabba@google.com