summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm
AgeCommit message (Collapse)Author
2024-06-20KVM: arm64: nv: Forward FP/ASIMD traps to guest hypervisorJintack Lim
Give precedence to the guest hypervisor's trap configuration when routing an FP/ASIMD trap taken to EL2. Take advantage of the infrastructure for translating CPTR_EL2 into the VHE (i.e. EL1) format and base the trap decision solely on the VHE view of the register. The in-memory value of CPTR_EL2 will always be up to date for the guest hypervisor (more on that later), so just read it directly from memory. Bury all of this behind a macro keyed off of the CPTR bitfield in anticipation of supporting other traps (e.g. SVE). [maz: account for HCR_EL2.E2H when testing for TFP/FPEN, with all the hard work actually being done by Chase Conklin] [ oliver: translate nVHE->VHE format for testing traps; macro for reuse in other CPTR_EL2.xEN fields ] Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240620164653.1130714-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: nVHE: Support CONFIG_CFI_CLANG at EL2Pierre-Clément Tosi
The compiler implements kCFI by adding type information (u32) above every function that might be indirectly called and, whenever a function pointer is called, injects a read-and-compare of that u32 against the value corresponding to the expected type. In case of a mismatch, a BRK instruction gets executed. When the hypervisor triggers such an exception in nVHE, it panics and triggers and exception return to EL1. Therefore, teach nvhe_hyp_panic_handler() to detect kCFI errors from the ESR and report them. If necessary, remind the user that EL2 kCFI is not affected by CONFIG_CFI_PERMISSIVE. Pass $(CC_FLAGS_CFI) to the compiler when building the nVHE hyp code. Use SYM_TYPED_FUNC_START() for __pkvm_init_switch_pgd, as nVHE can't call it directly and must use a PA function pointer from C (because it is part of the idmap page), which would trigger a kCFI failure if the type ID wasn't present. Signed-off-by: Pierre-Clément Tosi <ptosi@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240610063244.2828978-9-ptosi@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Introduce print_nvhe_hyp_panic helperPierre-Clément Tosi
Add a helper to display a panic banner soon to also be used for kCFI failures, to ensure that we remain consistent. Signed-off-by: Pierre-Clément Tosi <ptosi@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240610063244.2828978-8-ptosi@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20arm64: Introduce esr_brk_comment, esr_is_cfi_brkPierre-Clément Tosi
As it is already used in two places, move esr_comment() to a header for re-use, with a clearer name. Introduce esr_is_cfi_brk() to detect kCFI BRK syndromes, currently used by early_brk64() but soon to also be used by hypervisor code. Signed-off-by: Pierre-Clément Tosi <ptosi@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240610063244.2828978-7-ptosi@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: VHE: Mark __hyp_call_panic __noreturnPierre-Clément Tosi
Given that the sole purpose of __hyp_call_panic() is to call panic(), a __noreturn function, give it the __noreturn attribute, removing the need for its caller to use unreachable(). Signed-off-by: Pierre-Clément Tosi <ptosi@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240610063244.2828978-6-ptosi@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: nVHE: gen-hyprel: Skip R_AARCH64_ABS32Pierre-Clément Tosi
Ignore R_AARCH64_ABS32 relocations, instead of panicking, when emitting the relocation table of the hypervisor. The toolchain might produce them when generating function calls with kCFI to represent the 32-bit type ID which can then be resolved across compilation units at link time. These are NOT actual 32-bit addresses and are therefore not needed in the final (runtime) relocation table (which is unlikely to use 32-bit absolute addresses for arm64 anyway). Signed-off-by: Pierre-Clément Tosi <ptosi@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240610063244.2828978-5-ptosi@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: nVHE: Simplify invalid_host_el2_vectPierre-Clément Tosi
The invalid_host_el2_vect macro is used by EL2{t,h} handlers in nVHE *host* context, which should never run with a guest context loaded. Therefore, remove the superfluous vCPU context check and branch unconditionally to hyp_panic. Signed-off-by: Pierre-Clément Tosi <ptosi@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240610063244.2828978-4-ptosi@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Fix __pkvm_init_switch_pgd call ABIPierre-Clément Tosi
Fix the mismatch between the (incorrect) C signature, C call site, and asm implementation by aligning all three on an API passing the parameters (pgd and SP) separately, instead of as a bundled struct. Remove the now unnecessary memory accesses while the MMU is off from the asm, which simplifies the C caller (as it does not need to convert a VA struct pointer to PA) and makes the code slightly more robust by offsetting the struct fields from C and properly expressing the call to the C compiler (e.g. type checker and kCFI). Fixes: f320bc742bc2 ("KVM: arm64: Prepare the creation of s1 mappings at EL2") Signed-off-by: Pierre-Clément Tosi <ptosi@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240610063244.2828978-3-ptosi@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Fix clobbered ELR in sync abort/SErrorPierre-Clément Tosi
When the hypervisor receives a SError or synchronous exception (EL2h) while running with the __kvm_hyp_vector and if ELR_EL2 doesn't point to an extable entry, it panics indirectly by overwriting ELR with the address of a panic handler in order for the asm routine it returns to to ERET into the handler. However, this clobbers ELR_EL2 for the handler itself. As a result, hyp_panic(), when retrieving what it believes to be the PC where the exception happened, actually ends up reading the address of the panic handler that called it! This results in an erroneous and confusing panic message where the source of any synchronous exception (e.g. BUG() or kCFI) appears to be __guest_exit_panic, making it hard to locate the actual BRK instruction. Therefore, store the original ELR_EL2 in the per-CPU kvm_hyp_ctxt and point the sysreg to a routine that first restores it to its previous value before running __guest_exit_panic. Fixes: 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") Signed-off-by: Pierre-Clément Tosi <ptosi@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240610063244.2828978-2-ptosi@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: rename functions for invariant sys regsSebastian Ott
Invariant system id registers are populated with host values at initialization time using their .reset function cb. These are currently called get_* which is usually used by the functions implementing the .get_user callback. Change their function names to reset_* to reflect what they are used for. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-10-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: show writable masks for feature registersSebastian Ott
Instead of using ~0UL provide the actual writable mask for non-id feature registers in the output of the KVM_ARM_GET_REG_WRITABLE_MASKS ioctl. This changes the mask for the CTR_EL0 and CLIDR_EL1 registers. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Treat CTR_EL0 as a VM feature ID registerSebastian Ott
CTR_EL0 is currently handled as an invariant register, thus guests will be presented with the host value of that register. Add emulation for CTR_EL0 based on a per VM value. Userspace can switch off DIC and IDC bits and reduce DminLine and IminLine sizes. Naturally, ensure CTR_EL0 is trapped (HCR_EL2.TID2=1) any time that a VM's CTR_EL0 differs from hardware. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: unify code to prepare trapsSebastian Ott
There are 2 functions to calculate traps via HCR_EL2: * kvm_init_sysreg() called via KVM_RUN (before the 1st run or when the pid changes) * vcpu_reset_hcr() called via KVM_ARM_VCPU_INIT To unify these 2 and to support traps that are dependent on the ID register configuration, move the code from vcpu_reset_hcr() to sys_regs.c and call it via kvm_init_sysreg(). We still have to keep the non-FWB handling stuff in vcpu_reset_hcr(). Also the initialization with HCR_GUEST_FLAGS is kept there but guarded by !vcpu_has_run_once() to ensure that previous calculated values don't get overwritten. While at it rename kvm_init_sysreg() to kvm_calculate_traps() to better reflect what it's doing. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: nv: Use accessors for modifying ID registersOliver Upton
In the interest of abstracting away the underlying storage of feature ID registers, rework the nested code to go through the accessors instead of directly iterating the id_regs array. This means we now lose the property that ID registers unknown to the nested code get zeroed, but we really ought to be handling those explicitly going forward. Link: https://lore.kernel.org/r/20240619174036.483943-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Add helper for writing ID regsOliver Upton
Replace the remaining usage of IDREG() with a new helper for setting the value of a feature ID register, with the benefit of cramming in some extra sanity checks. Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Use read-only helper for reading VM ID registersOliver Upton
IDREG() expands to the storage of a particular ID reg, which can be useful for handling both reads and writes. However, outside of a select few situations, the ID registers should be considered read only. Replace current readers with a new macro that expands to the value of the field rather than the field itself. Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Make idregs debugfs iterator search sysreg table directlyOliver Upton
CTR_EL0 complicates the existing scheme for iterating feature ID registers, as it is not in the contiguous range that we presently support. Just search the sysreg table for the Nth feature ID register in anticipation of this. Yes, the debugfs interface has quadratic time completixy now. Boo hoo. Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Get sys_reg encoding from descriptor in idregs_debug_show()Oliver Upton
KVM is about to add support for more VM-scoped feature ID regs that live outside of the id_regs[] array, which means the index of the debugfs iterator may not actually be an index into the array. Prepare by getting the sys_reg encoding from the descriptor itself. Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Use GFP_KERNEL_ACCOUNT for sysreg_masks allocationOliver Upton
Of course, userspace is in the driver's seat for struct kvm and associated allocations. Make sure the sysreg_masks allocation participates in kmem accounting. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240617181018.2054332-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Add handling of NXS-flavoured TLBI operationsMarc Zyngier
Latest kid on the block: NXS (Non-eXtra-Slow) TLBI operations. Let's add those in bulk (NSH, ISH, OSH, both normal and range) as they directly map to their XS (the standard ones) counterparts. Not a lot to say about them, they are basically useless. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-17-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Add handling of range-based TLBI operationsMarc Zyngier
We already support some form of range operation by handling FEAT_TTL, but so far the "arbitrary" range operations are unsupported. Let's fix that. For EL2 S1, this is simple enough: we just map both NSH, ISH and OSH instructions onto the ISH version for EL1. For TLBI instructions affecting EL1 S1, we use the same model as their non-range counterpart to invalidate in the context of the correct VMID. For TLBI instructions affecting S2, we interpret the data passed by the guest to compute the range and use that to tear-down part of the shadow S2 range and invalidate the TLBs. Finally, we advertise FEAT_TLBIRANGE if the host supports it. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-16-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Add handling of outer-shareable TLBI operationsMarc Zyngier
Our handling of outer-shareable TLBIs is pretty basic: we just map them to the existing inner-shareable ones, because we really don't have anything else. The only significant change is that we can now advertise FEAT_TLBIOS support if the host supports it. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like informationMarc Zyngier
In order to be able to make S2 TLB invalidations more performant on NV, let's use a scheme derived from the FEAT_TTL extension. If bits [56:55] in the leaf descriptor translating the address in the corresponding shadow S2 are non-zero, they indicate a level which can be used as an invalidation range. This allows further reduction of the systematic over-invalidation that takes place otherwise. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-14-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 levelMarc Zyngier
Populate bits [56:55] of the leaf entry with the level provided by the guest's S2 translation. This will allow us to better scope the invalidation by remembering the mapping size. Of course, this assume that the guest will issue an invalidation with an address that falls into the same leaf. If the guest doesn't, we'll over-invalidate. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-13-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle FEAT_TTL hinted TLB operationsMarc Zyngier
Support guest-provided information information to size the range of required invalidation. This helps with reducing over-invalidation, provided that the guest actually provides accurate information. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-12-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operationsMarc Zyngier
TLBI IPAS2E1* are the last class of TLBI instructions we need to handle. For each matching S2 MMU context, we invalidate a range corresponding to the largest possible mapping for that context. At this stage, we don't handle TTL, which means we are likely over-invalidating. Further patches will aim at making this a bit better. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle TLBI ALLE1{,IS} operationsMarc Zyngier
TLBI ALLE1* is a pretty big hammer that invalides all S1/S2 TLBs. This translates into the unmapping of all our shadow S2 PTs, itself resulting in the corresponding TLB invalidations. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operationsMarc Zyngier
Emulating TLBI VMALLS12E1* results in tearing down all the shadow S2 PTs that match the current VMID, since our shadow S2s are just some form of SW-managed TLBs. That teardown itself results in a full TLB invalidation for both S1 and S2. This can result in over-invalidation if two vcpus use the same VMID to tag private S2 PTs, but this is still correct from an architecture perspective. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1Marc Zyngier
While dealing with TLB invalidation targeting the guest hypervisor's own stage-1 was easy, doing the same thing for its own guests is a bit more involved. Since such an invalidation is scoped by VMID, it needs to apply to all s2_mmu contexts that have been tagged by that VMID, irrespective of the value of VTTBR_EL2.BADDR. So for each s2_mmu context matching that VMID, we invalidate the corresponding TLBs, each context having its own "physical" VMID. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-8-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidationMarc Zyngier
Due to the way FEAT_NV2 suppresses traps when accessing EL2 system registers, we can't track when the guest changes its HCR_EL2.TGE setting. This means we always trap EL1 TLBIs, even if they don't affect any L2 guest. Given that invalidating the EL2 TLBs doesn't require any messing with the shadow stage-2 page-tables, we can simply emulate the instructions early and return directly to the guest. This is conditioned on the instruction being an EL1 one and the guest's HCR_EL2.{E2H,TGE} being {1,1} (indicating that the instruction targets the EL2 S1 TLBs), or the instruction being one of the EL2 ones (which are not ambiguous). EL1 TLBIs issued with HCR_EL2.{E2H,TGE}={1,0} are not handled here, and cause a full exit so that they can be handled in the context of a VMID. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Add Stage-1 EL2 invalidation primitivesMarc Zyngier
Provide the primitives required to handle TLB invalidation for Stage-1 EL2 TLBs, which by definition do not require messing with the Stage-2 page tables. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Unmap/flush shadow stage 2 page tablesChristoffer Dall
Unmap/flush shadow stage 2 page tables for the nested VMs as well as the stage 2 page table for the guest hypervisor. Note: A bunch of the code in mmu.c relating to MMU notifiers is currently dealt with in an extremely abrupt way, for example by clearing out an entire shadow stage-2 table. This will be handled in a more efficient way using the reverse mapping feature in a later version of the patch series. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle shadow stage 2 page faultsMarc Zyngier
If we are faulting on a shadow stage 2 translation, we first walk the guest hypervisor's stage 2 page table to see if it has a mapping. If not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we create a mapping in the shadow stage 2 page table. Note that we have to deal with two IPAs when we got a shadow stage 2 page fault. One is the address we faulted on, and is in the L2 guest phys space. The other is from the guest stage-2 page table walk, and is in the L1 guest phys space. To differentiate them, we rename variables so that fault_ipa is used for the former and ipa is used for the latter. When mapping a page in a shadow stage-2, special care must be taken not to be more permissive than the guest is. Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org> Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Implement nested Stage-2 page table walk logicChristoffer Dall
Based on the pseudo-code in the ARM ARM, implement a stage 2 software page table walker. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Support multiple nested Stage-2 mmu structuresMarc Zyngier
Add Stage-2 mmu data structures for virtual EL2 and for nested guests. We don't yet populate shadow Stage-2 page tables, but we now have a framework for getting to a shadow Stage-2 pgd. We allocate twice the number of vcpus as Stage-2 mmu structures because that's sufficient for each vcpu running two translation regimes without having to flush the Stage-2 page tables. Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-18KVM: Introduce vcpu->wants_to_runDavid Matlack
Introduce vcpu->wants_to_run to indicate when a vCPU is in its core run loop, i.e. when the vCPU is running the KVM_RUN ioctl and immediate_exit was not set. Replace all references to vcpu->run->immediate_exit with !vcpu->wants_to_run to avoid TOCTOU races with userspace. For example, a malicious userspace could invoked KVM_RUN with immediate_exit=true and then after KVM reads it to set wants_to_run=false, flip it to false. This would result in the vCPU running in KVM_RUN with wants_to_run=false. This wouldn't cause any real bugs today but is a dangerous landmine. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240503181734.1467938-2-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-14KVM: arm64: Use FF-A 1.1 with pKVMSebastian Ene
Now that the layout of the structures is compatible with 1.1 it is time to probe the 1.1 version of the FF-A protocol inside the hypervisor. If the TEE doesn't support it, it should return the minimum supported version. Signed-off-by: Sebastian Ene <sebastianene@google.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240613132035.1070360-5-sebastianene@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-14KVM: arm64: Update the identification range for the FF-A smcsSebastian Ene
The FF-A spec 1.2 reserves the following ranges for identifying FF-A calls: 0x84000060-0x840000FF: FF-A 32-bit calls 0xC4000060-0xC40000FF: FF-A 64-bit calls. Use the range identification according to the spec and allow calls that are currently out of the range(eg. FFA_MSG_SEND_DIRECT_REQ2) to be identified correctly. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Sebastian Ene <sebastianene@google.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240613132035.1070360-4-sebastianene@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-14KVM: arm64: Add support for FFA_PARTITION_INFO_GETSebastian Ene
Handle the FFA_PARTITION_INFO_GET host call inside the pKVM hypervisor and copy the response message back to the host buffers. Signed-off-by: Sebastian Ene <sebastianene@google.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240613132035.1070360-3-sebastianene@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-14KVM: arm64: Trap FFA_VERSION host call in pKVMSebastian Ene
The pKVM hypervisor initializes with FF-A version 1.0. The spec requires that no other FF-A calls to be issued before the version negotiation phase is complete. Split the hypervisor proxy initialization code in two parts so that we can move the later one after the host negotiates its version. Without trapping the call, the host drivers can negotiate a higher version number with TEE which can result in a different memory layout described during the memory sharing calls. Signed-off-by: Sebastian Ene <sebastianene@google.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240613132035.1070360-2-sebastianene@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-14KVM: arm64: nv: Fix RESx behaviour of disabled FGTs with negative polarityMarc Zyngier
The Fine Grained Trap extension is pretty messy as it doesn't consistently use the same polarity for all trap bits. A bunch of them, added later in the life of the architecture, have a *negative* priority. So if these bits are disabled, they must be RES1 and not RES0. But that's not what the code implements, making the traps for these negative trap bits being always on instead of disabled. Fix the relevant bits, and stick a brown paper bag on my head for the rest of the day... Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614125858.78361-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-14KVM: arm64: Add early_param to control WFx trappingColton Lewis
Add an early_params to control WFI and WFE trapping. This is to control the degree guests can wait for interrupts on their own without being trapped by KVM. Options for each param are trap and notrap. trap enables the trap. notrap disables the trap. Note that when enabled, traps are allowed but not guaranteed by the CPU architecture. Absent an explicitly set policy, default to current behavior: disabling the trap if only a single task is running and enabling otherwise. Signed-off-by: Colton Lewis <coltonlewis@google.com> Reviewed-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20240523174056.1565133-1-coltonlewis@google.com [ oliver: rework kvm_vcpu_should_clear_tw*() for readability ] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-11KVM: arm64: FFA: Release hyp rx bufferVincent Donnefort
According to the FF-A spec (Buffer states and ownership), after a producer has written into a buffer, it is "full" and now owned by the consumer. The producer won't be able to use that buffer, until the consumer hands it over with an invocation such as RX_RELEASE. It is clear in the following paragraph (Transfer of buffer ownership), that MEM_RETRIEVE_RESP is transferring the ownership from producer (in our case SPM) to consumer (hypervisor). RX_RELEASE is therefore mandatory here. It is less clear though what is happening with MEM_FRAG_TX. But this invocation, as a response to MEM_FRAG_RX writes into the same hypervisor RX buffer (see paragraph "Transmission of transaction descriptor in fragments"). Also this is matching the TF-A implementation where the RX buffer is marked "full" during a MEM_FRAG_RX. Release the RX hypervisor buffer in those two cases. This will unblock later invocations using this buffer which would otherwise fail. (RETRIEVE_REQ, MEM_FRAG_RX and PARTITION_INFO_GET). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20240611175317.1220842-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-06KVM: arm64: Disassociate vcpus from redistributor region on teardownMarc Zyngier
When tearing down a redistributor region, make sure we don't have any dangling pointer to that region stored in a vcpu. Fixes: e5a35635464b ("kvm: arm64: vgic-v3: Introduce vgic_v3_free_redist_region()") Reported-by: Alexander Potapenko <glider@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240605175637.1635653-1-maz@kernel.org Cc: stable@vger.kernel.org
2024-06-04KVM: arm64: Ensure that SME controls are disabled in protected modeFuad Tabba
KVM (and pKVM) do not support SME guests. Therefore KVM ensures that the host's SME state is flushed and that SME controls for enabling access to ZA storage and for streaming are disabled. pKVM needs to protect against a buggy/malicious host. Ensure that it wouldn't run a guest when protected mode is enabled should any of the SME controls be enabled. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-10-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx formatFuad Tabba
When setting/clearing CPACR bits for EL0 and EL1, use the ELx format of the bits, which covers both. This makes the code clearer, and reduces the chances of accidentally missing a bit. No functional change intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-9-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVMFuad Tabba
Now that we have introduced finalize_init_hyp_mode(), lets consolidate the initializing of the host_data fpsimd_state and sve state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240603122852.3923848-8-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Eagerly restore host fpsimd/sve state in pKVMFuad Tabba
When running in protected mode we don't want to leak protected guest state to the host, including whether a guest has used fpsimd/sve. Therefore, eagerly restore the host state on guest exit when running in protected mode, which happens only if the guest has used fpsimd/sve. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVMFuad Tabba
Protected mode needs to maintain (save/restore) the host's sve state, rather than relying on the host kernel to do that. This is to avoid leaking information to the host about guests and the type of operations they are performing. As a first step towards that, allocate memory mapped at hyp, per cpu, for the host sve state. The following patch will use this memory to save/restore the host state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Specialize handling of host fpsimd state on trapFuad Tabba
In subsequent patches, n/vhe will diverge on saving the host fpsimd/sve state when taking a guest fpsimd/sve trap. Add a specialized helper to handle it. No functional change intended. Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>