summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
11 daysKVM: arm64: nv: Fix ATS12 handling of single-stage translationMarc Zyngier
Volodymyr reports that using a Xen DomU as a nested guest (where HCR_EL2.E2H == 0), ATS12 results in a translation that stops at the L2's S1, which isn't something you'd normally expects. Comparing the code against the spec proves to be illuminating, and suggests that the author of such code must have been tired, cross-eyed, drunk, or maybe all of the above. The gist of it is that, apart from HCR_EL2.VM or HCR_EL2.DC being 0, only the use of the EL2&0 translation regime limits the walk to S1 only, and that we must finish the S2 walk in any other case. Which solves the above issue, as E2H==0 indicates that ATS12 walks the EL1&0 translation regime. Explicitly checking for EL2&0 fixes this. Reported-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Fixes: be04cebf3e788 ("KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W}") Link: https://lore.kernel.org/r/20250806141707.3479194-2-volodymyr_babchuk@epam.com Link: https://lore.kernel.org/r/20250809144811.2314038-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
11 daysKVM: arm64: Remove __vcpu_{read,write}_sys_reg_{from,to}_cpu()Marc Zyngier
There is no point having __vcpu_{read,write}_sys_reg_{from,to}_cpu() exposed to the rest of the kernel, as the only callers are in sys_regs.c. Move them where they below, which is another opportunity to simplify things a bit. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250817121926.217900-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
11 daysKVM: arm64: Fix vcpu_{read,write}_sys_reg() accessorsMarc Zyngier
Volodymyr reports (again!) that under some circumstances (E2H==0, walking S1 PTs), PAR_EL1 doesn't report the value of the latest walk in the CPU register, but that instead the value is written to the backing store. Further investigation indicates that the root cause of this is that a group of registers (PAR_EL1, TPIDR*_EL{0,1}, the *32_EL2 dregs) should always be considered as "on CPU", as they are not remapped between EL1 and EL2. We fail to treat them accordingly, and end-up considering that the register (PAR_EL1 in this example) should be written to memory instead of in the register. While it would be possible to quickly work around it, it is obvious that the way we track these things at the moment is pretty horrible, and could do with some improvement. Revamp the whole thing by: - defining a location for a register (memory, cpu), potentially depending on the state of the vcpu - define a transformation for this register (mapped register, potential translation, special register needing some particular attention) - convey this information in a structure that can be easily passed around As a result, the accessors themselves become much simpler, as the state is explicit instead of being driven by hard-to-understand conventions. We get rid of the "pure EL2 register" notion, which wasn't very useful, and add sanitisation of the values by applying the RESx masks as required, something that was missing so far. And of course, we add the missing registers to the list, with the indication that they are always loaded. Reported-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Fixes: fedc612314acf ("KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()") Link: https://lore.kernel.org/r/20250806141707.3479194-3-volodymyr_babchuk@epam.com Link: https://lore.kernel.org/r/20250817121926.217900-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
11 daysKVM: arm64: Simplify sysreg access on exception deliveryMarc Zyngier
Distinguishing between NV and VHE is slightly pointless, and only serves as an extra complication, or a way to introduce bugs, such as the way SPSR_EL1 gets written without checking for the state being resident. Get rid if this silly distinction, and fix the bug in one go. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250817121926.217900-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
11 daysKVM: arm64: Check for SYSREGS_ON_CPU before accessing the 32bit stateMarc Zyngier
Just like c6e35dff58d3 ("KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state") fixed the 64bit state access, add a check for the 32bit state actually being on the CPU before writing it. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250817121926.217900-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-21KVM: arm64: selftests: Sync ID_AA64MMFR3_EL1 in set_id_regsMark Brown
When we added coverage for ID_AA64MMFR3_EL1 we didn't add it to the list of registers we read in the guest, do so. Fixes: 0b593ef12afc ("KVM: arm64: selftests: Catch up set_id_regs with the kernel") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250818-kvm-arm64-selftests-mmfr3-idreg-v1-1-2f85114d0163@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-21KVM: arm64: Get rid of ARM64_FEATURE_MASK()Marc Zyngier
The ARM64_FEATURE_MASK() macro was a hack introduce whilst the automatic generation of sysreg encoding was introduced, and was too unreliable to be entirely trusted. We are in a better place now, and we could really do without this macro. Get rid of it altogether. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250817202158.395078-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-21KVM: arm64: Make ID_AA64PFR1_EL1.RAS_frac writableMarc Zyngier
Allow userspace to write to RAS_frac, under the condition that the host supports RASv1p1 with RAS_frac==1. Other configurations will result in RAS_frac being exposed as 0, and therefore implicitly not writable. To avoid the clutter, the ID_AA64PFR1_EL1 sanitisation is moved to its own function. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20250817202158.395078-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-21KVM: arm64: Make ID_AA64PFR0_EL1.RAS writableMarc Zyngier
Make ID_AA64PFR0_EL1.RAS writable so that we can restore a VM from a system without RAS to a RAS-equipped machine (or disable RAS in the guest). Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20250817202158.395078-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-21KVM: arm64: Ignore HCR_EL2.FIEN set by L1 guest's EL2Marc Zyngier
An EL2 guest can set HCR_EL2.FIEN, which gives access to the RASv1p1 fault injection mechanism. This would allow an EL1 guest to inject error records into the system, which does sound like a terrible idea. Prevent this situation by added FIEN to the list of bits we silently exclude from being inserted into the host configuration. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250817202158.395078-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-21KVM: arm64: Handle RASv1p1 registersMarc Zyngier
FEAT_RASv1p1 system registeres are not handled at all so far. KVM will give an embarassed warning on the console and inject an UNDEF, despite RASv1p1 being exposed to the guest on suitable HW. Handle these registers similarly to FEAT_RAS, with the added fun that there are *two* way to indicate the presence of FEAT_RASv1p1. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20250817202158.395078-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-21arm64: Add capability denoting FEAT_RASv1p1Marc Zyngier
Detecting FEAT_RASv1p1 is rather complicated, as there are two ways for the architecture to advertise the same thing (always a delight...). Add a capability that will advertise this in a synthetic way to the rest of the kernel. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20250817202158.395078-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-21KVM: arm64: Reschedule as needed when destroying the stage-2 page-tablesRaghavendra Rao Ananta
When a large VM, specifically one that holds a significant number of PTEs, gets abruptly destroyed, the following warning is seen during the page-table walk: sched: CPU 0 need_resched set for > 100018840 ns (100 ticks) without schedule CPU: 0 UID: 0 PID: 9617 Comm: kvm_page_table_ Tainted: G O 6.16.0-smp-DEV #3 NONE Tainted: [O]=OOT_MODULE Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x3c/0xb8 dump_stack+0x18/0x30 resched_latency_warn+0x7c/0x88 sched_tick+0x1c4/0x268 update_process_times+0xa8/0xd8 tick_nohz_handler+0xc8/0x168 __hrtimer_run_queues+0x11c/0x338 hrtimer_interrupt+0x104/0x308 arch_timer_handler_phys+0x40/0x58 handle_percpu_devid_irq+0x8c/0x1b0 generic_handle_domain_irq+0x48/0x78 gic_handle_irq+0x1b8/0x408 call_on_irq_stack+0x24/0x30 do_interrupt_handler+0x54/0x78 el1_interrupt+0x44/0x88 el1h_64_irq_handler+0x18/0x28 el1h_64_irq+0x84/0x88 stage2_free_walker+0x30/0xa0 (P) __kvm_pgtable_walk+0x11c/0x258 __kvm_pgtable_walk+0x180/0x258 __kvm_pgtable_walk+0x180/0x258 __kvm_pgtable_walk+0x180/0x258 kvm_pgtable_walk+0xc4/0x140 kvm_pgtable_stage2_destroy+0x5c/0xf0 kvm_free_stage2_pgd+0x6c/0xe8 kvm_uninit_stage2_mmu+0x24/0x48 kvm_arch_flush_shadow_all+0x80/0xa0 kvm_mmu_notifier_release+0x38/0x78 __mmu_notifier_release+0x15c/0x250 exit_mmap+0x68/0x400 __mmput+0x38/0x1c8 mmput+0x30/0x68 exit_mm+0xd4/0x198 do_exit+0x1a4/0xb00 do_group_exit+0x8c/0x120 get_signal+0x6d4/0x778 do_signal+0x90/0x718 do_notify_resume+0x70/0x170 el0_svc+0x74/0xd8 el0t_64_sync_handler+0x60/0xc8 el0t_64_sync+0x1b0/0x1b8 The warning is seen majorly on the host kernels that are configured not to force-preempt, such as CONFIG_PREEMPT_NONE=y. To avoid this, instead of walking the entire page-table in one go, split it into smaller ranges, by checking for cond_resched() between each range. Since the path is executed during VM destruction, after the page-table structure is unlinked from the KVM MMU, relying on cond_resched_rwlock_write() isn't necessary. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250820162242.2624752-3-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-21KVM: arm64: Split kvm_pgtable_stage2_destroy()Raghavendra Rao Ananta
Split kvm_pgtable_stage2_destroy() into two: - kvm_pgtable_stage2_destroy_range(), that performs the page-table walk and free the entries over a range of addresses. - kvm_pgtable_stage2_destroy_pgd(), that frees the PGD. This refactoring enables subsequent patches to free large page-tables in chunks, calling cond_resched() between each chunk, to yield the CPU as necessary. Existing callers of kvm_pgtable_stage2_destroy(), that probably cannot take advantage of this (such as nVMHE), will continue to function as is. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250820162242.2624752-2-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-15KVM: arm64: Correctly populate FAR_EL2 on nested SEA injectionMarc Zyngier
vcpu_write_sys_reg()'s signature is not totally obvious, and it is rather easy to write something that looks correct, except that... Oh wait... Swap addr and FAR_EL2 to restore some sanity in the nested SEA department. Fixes: 9aba641b9ec2a ("KVM: arm64: nv: Respect exception routing rules for SEAs") Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250813163747.2591317-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-08KVM: arm64: selftest: Add standalone test checking for KVM's own UUIDMarc Zyngier
Tinkering with UUIDs is a perilious task, and the KVM UUID gets broken at times. In order to spot this early enough, add a selftest that will shout if the expected value isn't found. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250721130558.50823-1-jackabt.amazon@gmail.com Link: https://lore.kernel.org/r/20250806171341.1521210-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-08arm64: vgic-v2: Fix guest endianness check in hVHE modeFuad Tabba
In hVHE when running at the hypervisor, SCTLR_EL1 refers to the hypervisor's System Control Register rather than the guest's. Make sure to access the guest's register to determine its endianness. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250807120133.871892-4-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-08KVM: arm64: Sync protected guest VBAR_EL1 on injecting an undef exceptionFuad Tabba
In pKVM, a race condition can occur if a guest updates its VBAR_EL1 register and, before a vCPU exit synchronizes this change, the hypervisor needs to inject an undefined exception into a protected guest. In this scenario, the vCPU still holds the stale VBAR_EL1 value from before the guest's update. When pKVM injects the exception, it ends up using the stale value. Explicitly read the live value of VBAR_EL1 from the guest and update the vCPU value immediately before pending the exception. This ensures the vCPU's value is the same as the guest's and that the exception will be handled at the correct address upon resuming the guest. Reported-by: Keir Fraser <keirf@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250807120133.871892-3-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-08KVM: arm64: Handle AIDR_EL1 and REVIDR_EL1 in host for protected VMsFuad Tabba
Since commit 17efc1acee62 ("arm64: Expose AIDR_EL1 via sysfs"), AIDR_EL1 is read early during boot. Therefore, a guest running as a protected VM will fail to boot because when it attempts to access AIDR_EL1, access to that register is restricted in pKVM for protected guests. Similar to how MIDR_EL1 is handled by the host for protected VMs, let the host handle accesses to AIDR_EL1 as well as REVIDR_EL1. However note that, unlike MIDR_EL1, AIDR_EL1 and REVIDR_EL1 are trapped by HCR_EL2.TID1. Therefore, explicitly mark them as handled by the host for protected VMs. TID1 is always set in pKVM, because it needs to restrict access to SMIDR_EL1, which is also trapped by that bit. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250807120133.871892-2-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-08kvm: arm64: use BUG() instead of BUG_ON(1)Arnd Bergmann
The BUG_ON() macro adds a little bit of complexity over BUG(), and in some cases this ends up confusing the compiler's control flow analysis in a way that results in a warning. This one now shows up with clang-21: arch/arm64/kvm/vgic/vgic-mmio.c:1094:3: error: variable 'len' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] 1094 | BUG_ON(1); Change both instances of BUG_ON(1) to a plain BUG() in the arm64 kvm code, to avoid the false-positive warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250807072132.4170088-1-arnd@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-04KVM: arm64: nv: Handle SEAs due to VNCR redirectionOliver Upton
System register accesses redirected to the VNCR page can also generate external aborts just like any other form of memory access. Route to kvm_handle_guest_sea() for potential APEI handling, falling back to a vSError if the kernel didn't handle the abort. Take the opportunity to throw out the useless kvm_ras.h which provided a helper with a single callsite... Cc: Jiaqi Yan <jiaqiyan@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250729182342.3281742-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-04KVM: arm64: nv: Properly check ESR_EL2.VNCR on taking a VNCR_EL2 related faultMarc Zyngier
Instead of checking for the ESR_EL2.VNCR bit being set (the only case we should be here), we are actually testing random bits in ESR_EL2.DFSC. 13 obviously being a lucky number, it matches both permission and translation fault status codes, which explains why we never saw it failing. This was found by inspection, while reviewing a vaguely related patch. Whilst we're at it, turn the BUG_ON() into a WARN_ON_ONCE(), as exploding here is just silly. Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250730101828.1168707-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-29KVM: arm64: Don't attempt vLPI mappings when vPE allocation is disabledRaghavendra Rao Ananta
commit c652887a9288 ("KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap") makes the allocation of vPEs depend on nASSGIcap for GICv4.1 hosts. While the vGIC v4 initialization and teardown is handled correctly, it erroneously attempts to establish a vLPI mapping to a VM that has no vPEs allocated: Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8 Mem abort info: ESR = 0x0000000096000044 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000073a453b000 [00000000000000a8] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000044 [#1] SMP pstate: 23400009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : its_irq_set_vcpu_affinity+0x58c/0x95c lr : its_irq_set_vcpu_affinity+0x1e0/0x95c sp : ffff8001029bb9e0 pmr_save: 00000060 x29: ffff8001029bba20 x28: ffff0001ca5e28c0 x27: 0000000000000000 x26: 0000000000000000 x25: ffff00019eee9f80 x24: ffff0001992b3f00 x23: ffff8001029bbab8 x22: ffff00001159fb80 x21: 00000000000024a7 x20: 00000000000024a7 x19: ffff00019eee9fb4 x18: 0000000000000494 x17: 000000000000000e x16: 0000000000000494 x15: 0000000000000002 x14: ffff0001a7f34600 x13: ffffccaad1203000 x12: 0000000000000018 x11: ffff000011991000 x10: 0000000000000000 x9 : 00000000000000a2 x8 : 00000000000020a8 x7 : 0000000000000000 x6 : 000000000000003f x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004 x2 : 0000000000000000 x1 : ffff8001029bbab8 x0 : 00000000000000a8 Call trace: its_irq_set_vcpu_affinity+0x58c/0x95c irq_set_vcpu_affinity+0x74/0xc8 its_map_vlpi+0x4c/0x94 kvm_vgic_v4_set_forwarding+0x134/0x298 kvm_arch_irq_bypass_add_producer+0x28/0x34 irq_bypass_register_producer+0xf8/0x1d8 vfio_msi_set_vector_signal+0x2c8/0x308 vfio_pci_set_msi_trigger+0x198/0x2d4 vfio_pci_set_irqs_ioctl+0xf0/0x104 vfio_pci_core_ioctl+0x6ac/0xc5c vfio_device_fops_unl_ioctl+0x128/0x370 __arm64_sys_ioctl+0x98/0xd0 el0_svc_common+0xd8/0x1d8 do_el0_svc+0x28/0x34 el0_svc+0x40/0xb8 el0t_64_sync_handler+0x70/0xbc el0t_64_sync+0x1a8/0x1ac Code: 321f0129 f940094a 8b080148 d1400900 (39000009) ---[ end trace 0000000000000000 ]--- Fix it by moving the GICv4.1 special-casing to vgic_supports_direct_msis(), returning false if the user explicitly disabled nASSGIcap for the VM. Fixes: c652887a9288 ("KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap") Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20250729210644.830364-1-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-listOliver Upton
VDISR_EL2 and VSESR_EL2 are now visible to userspace for nested VMs. Add them to get-reg-list. Link: https://lore.kernel.org/r/20250728152603.2823699-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28Merge branch 'kvm-arm64/vgic-v4-ctl' into kvmarm/nextOliver Upton
* kvm-arm64/vgic-v4-ctl: : Userspace control of nASSGIcap, courtesy of Raghavendra Rao Ananta : : Allow userspace to decide if support for SGIs without an active state is : advertised to the guest, allowing VMs from GICv3-only hardware to be : migrated to to GICv4.1 capable machines. Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init KVM: arm64: selftests: Add test for nASSGIcap attribute KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling KVM: arm64: Disambiguate support for vSGIs v. vLPIs Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28Merge branch 'kvm-arm64/el2-reg-visibility' into kvmarm/nextOliver Upton
* kvm-arm64/el2-reg-visibility: : Fixes to EL2 register visibility, courtesy of Marc Zyngier : : - Expose EL2 VGICv3 registers via the VGIC attributes accessor, not the : KVM_{GET,SET}_ONE_REG ioctls : : - Condition visibility of FGT registers on the presence of FEAT_FGT in : the VM KVM: arm64: selftest: vgic-v3: Add basic GICv3 sysreg userspace access test KVM: arm64: Enforce the sorting of the GICv3 system register table KVM: arm64: Clarify the check for reset callback in check_sysreg_table() KVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2 KVM: arm64: Document registers exposed via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS KVM: arm64: selftests: get-reg-list: Add base EL2 registers KVM: arm64: selftests: get-reg-list: Simplify feature dependency KVM: arm64: Advertise FGT2 registers to userspace KVM: arm64: Condition FGT registers on feature availability KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS KVM: arm64: Let GICv3 save/restore honor visibility attribute KVM: arm64: Define helper for ICH_VTR_EL2 KVM: arm64: Define constant value for ICC_SRE_EL2 KVM: arm64: Don't advertise ICH_*_EL2 registers through GET_ONE_REG KVM: arm64: Make RVBAR_EL2 accesses UNDEF Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28Merge branch 'kvm-arm64/config-masks' into kvmarm/nextOliver Upton
* kvm-arm64/config-masks: : More config-driven mask computation, courtesy of Marc Zyngier : : Converts more system registers to the config-driven computation of RESx : masks based on the advertised feature set KVM: arm64: Tighten the definition of FEAT_PMUv3p9 KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation arm64: sysreg: Add THE/ASID2 controls to TCR2_ELx Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26Merge branch 'kvm-arm64/misc' into kvmarm/nextOliver Upton
* kvm-arm64/misc: : Miscellaneous fixes/cleanups for KVM/arm64 : : - Fixes for computing POE output permissions : : - Return ENXIO for invalid VGIC device attribute : : - String helper conversions arm64: kvm: trace_handle_exit: use string choices helper arm64: kvm: sys_regs: use string choices helper KVM: arm64: Follow specification when implementing WXN KVM: arm64: Remove the wi->{e0,}poe vs wr->{p,u}ov confusion KVM: arm64: vgic-its: Return -ENXIO to invalid KVM_DEV_ARM_VGIC_GRP_CTRL attrs Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26Merge branch 'kvm-arm64/gcie-legacy' into kvmarm/nextOliver Upton
* kvm-arm64/gcie-legacy: : Support for GICv3 emulation on GICv5, courtesy of Sascha Bischoff : : FEAT_GCIE_LEGACY adds the necessary hardware for GICv5 systems to : support the legacy GICv3 for VMs, including a backwards-compatible VGIC : implementation that we all know and love. : : As a starting point for GICv5 enablement in KVM, enable + use the : GICv3-compatible feature when running VMs on GICv5 hardware. KVM: arm64: gic-v5: Probe for GICv5 KVM: arm64: gic-v5: Support GICv3 compat arm64/sysreg: Add ICH_VCTLR_EL2 irqchip/gic-v5: Populate struct gic_kvm_info irqchip/gic-v5: Skip deactivate for forwarded PPI interrupts Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26Merge tag 'irqchip-gic-v5-host' into kvmarm/nextOliver Upton
GICv5 initial host support Add host kernel support for the new arm64 GICv5 architecture, which is quite a departure from the previous ones. Include support for the full gamut of the architecture (interrupt routing and delivery to CPUs, wired interrupts, MSIs, and interrupt translation). * tag 'irqchip-gic-v5-host': (32 commits) arm64: smp: Fix pNMI setup after GICv5 rework arm64: Kconfig: Enable GICv5 docs: arm64: gic-v5: Document booting requirements for GICv5 irqchip/gic-v5: Add GICv5 IWB support irqchip/gic-v5: Add GICv5 ITS support irqchip/msi-lib: Add IRQ_DOMAIN_FLAG_FWNODE_PARENT handling irqchip/gic-v3: Rename GICv3 ITS MSI parent PCI/MSI: Add pci_msi_map_rid_ctlr_node() helper function of/irq: Add of_msi_xlate() helper function irqchip/gic-v5: Enable GICv5 SMP booting irqchip/gic-v5: Add GICv5 LPI/IPI support irqchip/gic-v5: Add GICv5 IRS/SPI support irqchip/gic-v5: Add GICv5 PPI support arm64: Add support for GICv5 GSB barriers arm64: smp: Support non-SGIs for IPIs arm64: cpucaps: Add GICv5 CPU interface (GCIE) capability arm64: cpucaps: Rename GICv3 CPU interface capability arm64: Disable GICv5 read/write/instruction traps arm64/sysreg: Add ICH_HFGITR_EL2 arm64/sysreg: Add ICH_HFGWTR_EL2 ... Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26Merge branch 'kvm-arm64/doublefault2' into kvmarm/nextOliver Upton
* kvm-arm64/doublefault2: (33 commits) : NV Support for FEAT_RAS + DoubleFault2 : : Delegate the vSError context to the guest hypervisor when in a nested : state, including registers related to ESR propagation. Additionally, : catch up KVM's external abort infrastructure to the architecture, : implementing the effects of FEAT_DoubleFault2. : : This has some impact on non-nested guests, as SErrors deemed unmasked at : the time they're made pending are now immediately injected with an : emulated exception entry rather than using the VSE bit. KVM: arm64: Make RAS registers UNDEF when RAS isn't advertised KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately KVM: arm64: selftests: Test ESR propagation for vSError injection KVM: arm64: Populate ESR_ELx.EC for emulated SError injection KVM: arm64: selftests: Catch up set_id_regs with the kernel KVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list KVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1 KVM: arm64: selftests: Add basic SError injection test KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError KVM: arm64: Advertise support for FEAT_DoubleFault2 KVM: arm64: Advertise support for FEAT_SCTLR2 KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set KVM: arm64: Route SEAs to the SError vector when EASE is set KVM: arm64: nv: Ensure Address size faults affect correct ESR KVM: arm64: Factor out helper for selecting exception target EL KVM: arm64: Describe SCTLR2_ELx RESx masks ... Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26Merge branch 'kvm-arm64/cacheable-pfnmap' into kvmarm/nextOliver Upton
* kvm-arm64/cacheable-pfnmap: : Cacheable PFNMAP support at stage-2, courtesy of Ankit Agrawal : : For historical reasons, KVM only allows cacheable mappings at stage-2 : when a kernel alias exists in the direct map for the memory region. On : hardware without FEAT_S2FWB, this is necessary as KVM must do cache : maintenance to keep guest/host accesses coherent. : : This is unnecessarily restrictive on systems with FEAT_S2FWB and : CTR_EL0.DIC, as KVM no longer needs to perform cache maintenance to : maintain correctness. : : Allow cacheable mappings at stage-2 on supporting hardware when the : corresponding VMA has cacheable memory attributes and advertise a : capability to userspace such that a VMM can determine if a stage-2 : mapping can be established (e.g. VFIO device). KVM: arm64: Expose new KVM cap for cacheable PFNMAP KVM: arm64: Allow cacheable stage 2 mapping using VMA flags KVM: arm64: Block cacheable PFNMAP mapping KVM: arm64: Assume non-PFNMAP/MIXEDMAP VMAs can be mapped cacheable KVM: arm64: Rename the device variable to s2_force_noncacheable Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26Documentation: KVM: arm64: Describe VGICv3 registers writable pre-initOliver Upton
KVM allows userspace to control GICD_IIDR.Revision and GICD_TYPER2.nASSGIcap prior to initialization for the sake of provisioning the guest-visible feature set. Document the userspace expectations surrounding accesses to these registers. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250724062805.2658919-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26KVM: arm64: selftests: Add test for nASSGIcap attributeRaghavendra Rao Ananta
Extend vgic_init to test the nASSGIcap attribute, asserting that it is configurable (within reason) prior to initializing the VGIC. Additionally, check that userspace cannot set the attribute after the VGIC has been initialized. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250724062805.2658919-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcapRaghavendra Rao Ananta
KVM unconditionally advertises GICD_TYPER2.nASSGIcap (which internally implies vSGIs) on GICv4.1 systems. Allow userspace to change whether a VM supports the feature. Only allow changes prior to VGIC initialization as at that point vPEs need to be allocated for the VM. For convenience, bundle support for vLPIs and vSGIs behind this feature, allowing userspace to control vPE allocation for VMs in environments that may be constrained on vPE IDs. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250724062805.2658919-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initializationOliver Upton
KVM allows userspace to write GICD_IIDR for backwards-compatibility with older kernels, where new implementation revisions have new features. Unfortunately this is allowed to happen at runtime, and ripping features out from underneath a running guest is a terrible idea. While we can't do anything about the ABI, prepare for more ID-like registers by allowing access to GICD_IIDR prior to VGIC initialization. Hoist initializaiton of the default value to kvm_vgic_create() and discard the incorrect comment that assumed userspace could access the register before initialization (until now). Subsequent changes will allow the VMM to further provision the GIC feature set, e.g. the presence of nASSGIcap. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250724062805.2658919-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handlingOliver Upton
Consolidate the duplicated handling of the VGICv3 maintenance IRQ attribute as a regular GICv3 attribute, as it is neither a register nor a common attribute. As this is now handled separately from the VGIC registers, the locking is relaxed to only acquire the intended config_lock. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250724062805.2658919-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26KVM: arm64: Disambiguate support for vSGIs v. vLPIsOliver Upton
vgic_supports_direct_msis() is a bit of a misnomer, as it returns true if either vSGIs or vLPIs are supported. Pick it apart into a few predicates and replace some open-coded checks for vSGIs, including an opportunistic fix to always check if the CPUIF is capable of handling vSGIs. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250724062805.2658919-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26KVM: arm64: selftest: vgic-v3: Add basic GICv3 sysreg userspace access testMarc Zyngier
We have a lot of more or less useful vgic tests, but none of them tracks the availability of GICv3 system registers, which is a bit annoying. Add one such test, which covers both EL1 and EL2 registers. Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com> Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250718111154.104029-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26KVM: arm64: Enforce the sorting of the GICv3 system register tableMarc Zyngier
In order to avoid further embarassing bugs, enforce that the GICv3 sysreg table is actually sorted, just like all the other tables. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250718111154.104029-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26KVM: arm64: Clarify the check for reset callback in check_sysreg_table()Marc Zyngier
check_sysreg_table() has a wonky 'is_32" parameter, which is really an indication that we should enforce the presence of a reset helper. Clean this up by naming the variable accordingly and inverting the condition. Contrary to popular belief, system instructions don't have a reset value (duh!), and therefore do not need to be checked for reset (they escaped the check through luck...). Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250718111154.104029-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26KVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2Marc Zyngier
The sysreg tables are supposed to be sorted so that a binary search can easily find them. However, ICH_HCR_EL2 is obviously at the wrong spot. Move it where it belongs. Fixes: 9fe9663e47e21 ("KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS") Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250718111154.104029-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-23arm64: kvm: trace_handle_exit: use string choices helperKuninori Morimoto
We can use string choices helper, let's use it. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://lore.kernel.org/r/87o6ti5ksx.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-23arm64: kvm: sys_regs: use string choices helperKuninori Morimoto
We can use string choices helper, let's use it. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://lore.kernel.org/r/87pldy5ktb.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-23KVM: arm64: Follow specification when implementing WXNMarc Zyngier
The R_QXXPC and R_NPBXC rules have some interesting (and pretty sharp) corners when defining the behaviour of of WXN at S1: - when S1 overlay is enabled, WXN applies to the overlay and will remove W - when S1 overlay is disabled, WXN applies to the base permissions and will remove X. Today, we lumb the two together in a way that doesn't really match the rules, making things awkward to follow what is happening, in particular when overlays are enabled. Split these two rules over two distinct paths, which makes things a lot easier to read and validate against the architecture rules. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250701151648.754785-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-23KVM: arm64: Remove the wi->{e0,}poe vs wr->{p,u}ov confusionMarc Zyngier
Some of the POE computation is a bit confused. Specifically, there is an element of confusion between what wi->{e0,}poe an wr->{p,u}ov actually represent. - wi->{e0,}poe is an *input* to the walk, and indicates whether POE is enabled at EL0 or EL{1,2} - wr->{p,u}ov is a *result* of the walk, and indicates whether overlays are enabled. Crutially, it is possible to have POE enabled, and yet overlays disabled, while the converse isn't true What this all means is that once the base permissions have been established, checking for wi->{e0,}poe makes little sense, because the truth about overlays resides in wr->{p,u}ov. So constructs checking for (wi->poe && wr->pov) only add perplexity. Refactor compute_s1_overlay_permissions() and the way it is called according to the above principles. Take the opportunity to avoid reading registers that are not strictly required. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250701151648.754785-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-23KVM: arm64: vgic-its: Return -ENXIO to invalid KVM_DEV_ARM_VGIC_GRP_CTRL attrsDavid Woodhouse
A preliminary version of a hack to invoke unmap_all_vpes() from an ioctl didn't work very well. We eventually determined this was because we were invoking it on the wrong file descriptor, but not getting an error. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/bbbddd56135399baf699bc46ffb6e7f08d9f8c9f.camel@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-21KVM: arm64: Make RAS registers UNDEF when RAS isn't advertisedMarc Zyngier
We currently always expose FEAT_RAS when available on the host. As we are about to make this feature selectable from userspace, check for it being present before emulating register accesses as RAZ/WI, and inject an UNDEF otherwise. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250721101955.535159-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-21KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor contextMarc Zyngier
Most HCR_EL2 bits are not supposed to affect EL2 at all, but only the guest. However, we gladly merge these bits with the host's HCR_EL2 configuration, irrespective of entering L1 or L2. This leads to some funky behaviour, such as L1 trying to inject a virtual SError for L2, and getting a taste of its own medecine. Not quite what the architecture anticipated. In the end, the only bits that matter are those we have defined as invariants, either because we've made them RESx (E2H, HCD...), or that we actively refuse to merge because the mess with KVM's own logic. Use the sanitisation infrastructure to get the RES1 bits, and let things rip in a safer way. Fixes: 04ab519bb86df ("KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250721101955.535159-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-21KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU stateMarc Zyngier
Mark Brown reports that since we commit to making exceptions visible without the vcpu being loaded, the external abort selftest fails. Upon investigation, it turns out that the code that makes registers affected by an exception visible to the guest is completely broken on VHE, as we don't check whether the system registers are loaded on the CPU at this point. We managed to get away with this so far, but that's obviously as bad as it gets, Add the required checksm and document the absolute need to check for the SYSREGS_ON_CPU flag before calling into any of the __vcpu_write_sys_reg_to_cpu()__vcpu_read_sys_reg_from_cpu() helpers. Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/18535df8-e647-4643-af9a-bb780af03a70@sirena.org.uk Link: https://lore.kernel.org/r/20250720102229.179114-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>