summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/nested.c
AgeCommit message (Collapse)Author
2025-02-04KVM: arm64: Fix nested S2 MMU structures reallocationMarc Zyngier
For each vcpu that userspace creates, we allocate a number of s2_mmu structures that will eventually contain our shadow S2 page tables. Since this is a dynamically allocated array, we reallocate the array and initialise the newly allocated elements. Once everything is correctly initialised, we adjust pointer and size in the kvm structure, and move on. But should that initialisation fail *and* the reallocation triggered a copy to another location, we end-up returning early, with the kvm structure still containing the (now stale) old pointer. Weeee! Cure it by assigning the pointer early, and use this to perform the initialisation. If everything succeeds, we adjust the size. Otherwise, we just leave the size as it was, no harm done, and the new memory is as good as the ol' one (we hope...). Fixes: 4f128f8e1aaac ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20250204145554.774427-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-17Merge branch kvm-arm64/misc-6.14 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-6.14: : . : Misc KVM/arm64 changes for 6.14 : : - Don't expose AArch32 EL0 capability when NV is enabled : : - Update documentation to reflect the full gamut of kvm-arm.mode : behaviours : : - Use the hypervisor VA bit width when dumping stacktraces : : - Decouple the hypervisor stack size from PAGE_SIZE, at least : on the surface... : : - Make use of str_enabled_disabled() when advertising GICv4.1 support : : - Explicitly handle BRBE traps as UNDEFINED : . KVM: arm64: Explicitly handle BRBE traps as UNDEFINED KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe() arm64: kvm: Introduce nvhe stack size constants KVM: arm64: Fix nVHE stacktrace VA bits mask Documentation: Update the behaviour of "kvm-arm.mode" KVM: arm64: nv: Advertise the lack of AArch32 EL0 support Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-17Merge branch kvm-arm64/nv-resx-fixes-6.14 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-resx-fixes-6.14: : . : Fixes for NV sysreg accessors. From the cover letter: : : "Joey recently reported that some rather basic tests were failing on : NV, and managed to track it down to critical register fields (such as : HCR_EL2.E2H) not having their expect value. : : Further investigation has outlined a couple of critical issues: : : - Evaluating HCR_EL2.E2H must always be done with a sanitising : accessor, no ifs, no buts. Given that KVM assumes a fixed value for : this bit, we cannot leave it to the guest to mess with. : : - Resetting the sysreg file must result in the RESx bits taking : effect. Otherwise, we may end-up making the wrong decision (see : above), and we definitely expose invalid values to the guest. Note : that because we compute the RESx masks very late in the VM setup, we : need to apply these masks at that particular point as well. : [...]" : . KVM: arm64: nv: Apply RESx settings to sysreg reset values KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors Signed-off-by: Marc Zyngier <maz@kernel.org> # Conflicts: # arch/arm64/kvm/nested.c
2025-01-17Merge branch kvm-arm64/nv-timers into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-timers: : . : Nested Virt support for the EL2 timers. From the initial cover letter: : : "Here's another batch of NV-related patches, this time bringing in most : of the timer support for EL2 as well as nested guests. : : The code is pretty convoluted for a bunch of reasons: : : - FEAT_NV2 breaks the timer semantics by redirecting HW controls to : memory, meaning that a guest could setup a timer and never see it : firing until the next exit : : - We go try hard to reflect the timer state in memory, but that's not : great. : : - With FEAT_ECV, we can finally correctly emulate the virtual timer, : but this emulation is pretty costly : : - As a way to make things suck less, we handle timer reads as early as : possible, and only defer writes to the normal trap handling : : - Finally, some implementations are badly broken, and require some : hand-holding, irrespective of NV support. So we try and reuse the NV : infrastructure to make them usable. This could be further optimised, : but I'm running out of patience for this sort of HW. : : [...]" : . KVM: arm64: nv: Fix doc header layout for timers KVM: arm64: nv: Document EL2 timer API KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosity KVM: arm64: nv: Sanitise CNTHCTL_EL2 KVM: arm64: nv: Propagate CNTHCTL_EL2.EL1NV{P,V}CT bits KVM: arm64: nv: Add trap routing for CNTHCTL_EL2.EL1{NVPCT,NVVCT,TVT,TVCT} KVM: arm64: Handle counter access early in non-HYP context KVM: arm64: nv: Accelerate EL0 counter accesses from hypervisor context KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV in use KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state KVM: arm64: nv: Sync nested timer state with FEAT_NV2 KVM: arm64: nv: Add handling of EL2-specific timer registers Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-14KVM: arm64: nv: Apply RESx settings to sysreg reset valuesMarc Zyngier
While we have sanitisation in place for the guest sysregs, we lack that sanitisation out of reset. So some of the fields could be evaluated and not reflect their RESx status, which sounds like a very bad idea. Apply the RESx masks to the the sysreg file in two situations: - when going via a reset of the sysregs - after having computed the RESx masks Having this separate reset phase from the actual reset handling is a bit grotty, but we need to apply this after the ID registers are final. Tested-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250112165029.1181056-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: nv: Sanitise CNTHCTL_EL2Marc Zyngier
Inject some sanity in CNTHCTL_EL2, ensuring that we don't handle more than we advertise to the guest. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-22KVM: arm64: nv: Advertise the lack of AArch32 EL0 supportMarc Zyngier
Although we never supported 32bit anywhere in NV, we fail to advertise so for EL0, probably owing to the relative lack of hardware supporting both NV2 and 32bit EL0. Add some sanitising to ID_AA64PFR0_EL1.EL0, and reaffirm that "in 64bit-only we trust". Reported-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20KVM: arm64: Use kvm_vcpu_has_feature() directly for struct kvmFuad Tabba
Now that we have introduced kvm_vcpu_has_feature(), use it in the remaining code that checks for features in struct kvm, instead of using the __vcpu_has_feature() helper. No functional change intended. Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-18-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-11-20KVM: arm64: Mark set_sysreg_masks() as inline to avoid build failureMarc Zyngier
When compiling with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, set_sysreg_masks() fails to compile thanks to: BUILD_BUG_ON(!__builtin_constant_p(sr)); as the compiler doesn't identify sr as a constant, despite all the callers passing constants. Fix the issue by always inlining this function, which allows GCC to do the right thing. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202411201857.ZNudtGJl-lkp@intel.com/ Fixes: a0162020095e2 ("KVM: arm64: Extend masking facility to arbitrary registers") Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241120111516.304250-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11Merge branch kvm-arm64/nv-pmu into kvmarm/nextOliver Upton
* kvm-arm64/nv-pmu: : Support for vEL2 PMU controls : : Align the vEL2 PMU support with the current state of non-nested KVM, : including: : : - Trap routing, with the annoying complication of EL2 traps that apply : in Host EL0 : : - PMU emulation, using the correct configuration bits depending on : whether a counter falls in the hypervisor or guest range of PMCs : : - Perf event swizzling across nested boundaries, as the event filtering : needs to be remapped to cope with vEL2 KVM: arm64: nv: Reprogram PMU events affected by nested transition KVM: arm64: nv: Apply EL2 event filtering when in hyp context KVM: arm64: nv: Honor MDCR_EL2.HLP KVM: arm64: nv: Honor MDCR_EL2.HPME KVM: arm64: Add helpers to determine if PMC counts at a given EL KVM: arm64: nv: Adjust range of accessible PMCs according to HPMN KVM: arm64: Rename kvm_pmu_valid_counter_mask() KVM: arm64: nv: Advertise support for FEAT_HPMN0 KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMN KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0 KVM: arm64: nv: Reinject traps that take effect in Host EL0 KVM: arm64: nv: Rename BEHAVE_FORWARD_ANY KVM: arm64: nv: Allow coarse-grained trap combos to use complex traps KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2 arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1 arm64: sysreg: Migrate MDCR_EL2 definition to table arm64: sysreg: Describe ID_AA64DFR2_EL1 fields Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Advertise support for FEAT_HPMN0Oliver Upton
Everything is in place now for KVM to actually handle MDCR_EL2.HPMN. Not only that, the emulation is capable of doing FEAT_HPMN0. Advertise support for the feature in the VM's ID registers. It is possible to emulate FEAT_HPMN0 on hardware that doesn't support it since KVM currently traps all PMU registers. Having said that, let's only advertise the feature on supporting hardware in case KVM ever provides 'direct' PMU support to VMs w/o involving host perf. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-12-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2Oliver Upton
Add support for sanitising MDCR_EL2 and describe the RES0/RES1 bits according to the feature set exposed to the VM. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add kvm_has_s1poe() helperMarc Zyngier
Just like we have kvm_has_s1pie(), add its S1POE counterpart, making the code slightly more readable. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-31-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Hide S1PIE registers from userspace when disabled for guestsMark Brown
When the guest does not support S1PIE we should not allow any access to the system registers it adds in order to ensure that we do not create spurious issues with guest migration. Add a visibility operation for these registers. Fixes: 86f9de9db178 ("KVM: arm64: Save/restore PIE registers") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240822-kvm-arm64-hide-pie-regs-v2-3-376624fa829c@kernel.org [maz: simplify by using __el2_visibility(), kvm_has_s1pie() throughout] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-26-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Hide TCR2_EL1 from userspace when disabled for guestsMark Brown
When the guest does not support FEAT_TCR2 we should not allow any access to it in order to ensure that we do not create spurious issues with guest migration. Add a visibility operation for it. Fixes: fbff56068232 ("KVM: arm64: Save/restore TCR2_EL1") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240822-kvm-arm64-hide-pie-regs-v2-2-376624fa829c@kernel.org [maz: simplify by using __el2_visibility(), kvm_has_tcr2() throughout] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-25-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Extend masking facility to arbitrary registersMarc Zyngier
We currently only use the masking (RES0/RES1) facility for VNCR registers, as they are memory-based and thus easy to sanitise. But we could apply the same thing to other registers if we: - split the sanitisation from __VNCR_START__ - apply the sanitisation when reading from a HW register This involves a new "marker" in the vcpu_sysreg enum, which defines the point at which the sanitisation applies (the VNCR registers being of course after this marker). Whle we are at it, rename kvm_vcpu_sanitise_vncr_reg() to kvm_vcpu_apply_reg_masks(), which is vaguely more explicit, and harden set_sysreg_masks() against setting masks for random registers... Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Sanitise TCR2_EL2Marc Zyngier
TCR2_EL2 is a bag of control bits, all of which are only valid if certain features are present, and RES0 otherwise. Describe these constraints and register them with the masking infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-13-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-08KVM: arm64: nv: Punt stage-2 recycling to a vCPU requestOliver Upton
Currently, when a nested MMU is repurposed for some other MMU context, KVM unmaps everything during vcpu_load() while holding the MMU lock for write. This is quite a performance bottleneck for large nested VMs, as all vCPU scheduling will spin until the unmap completes. Start punting the MMU cleanup to a vCPU request, where it is then possible to periodically release the MMU lock and CPU in the presence of contention. Ensure that no vCPU winds up using a stale MMU by tracking the pending unmap on the S2 MMU itself and requesting an unmap on every vCPU that finds it. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-08KVM: arm64: nv: Do not block when unmapping stage-2 if disallowedOliver Upton
Right now the nested code allows unmap operations on a shadow stage-2 to block unconditionally. This is wrong in a couple places, such as a non-blocking MMU notifier or on the back of a sched_in() notifier as part of shadow MMU recycling. Carry through whether or not blocking is allowed to kvm_pgtable_stage2_unmap(). This 'fixes' an issue where stage-2 MMU reclaim would precipitate a stack overflow from a pile of kvm_sched_in() callbacks, all trying to recycle a stage-2 MMU. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-08KVM: arm64: nv: Keep reference on stage-2 MMU when scheduled outOliver Upton
If a vCPU is scheduling out and not in WFI emulation, it is highly likely it will get scheduled again soon and reuse the MMU it had before. Dropping the MMU at vcpu_put() can have some unfortunate consequences, as the MMU could get reclaimed and used in a different context, forcing another 'cold start' on an otherwise active MMU. Avoid that altogether by keeping a reference on the MMU if the vCPU is scheduling out, ensuring that another vCPU cannot reclaim it while the current vCPU is away. Since there are more MMUs than vCPUs, this does not affect the guarantee that an unused MMU is available at any time. Furthermore, this makes the vcpu->arch.hw_mmu ~stable in preemptible code, at least for where it matters in the stage-2 abort path. Yes, the MMU can change across WFI emulation, but there isn't even a use case where this would matter. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-21Merge tag 'mm-stable-2024-09-20-02-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Along with the usual shower of singleton patches, notable patch series in this pull request are: - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. - "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. - "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. - "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". - "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. - "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. - "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. - "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. - "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). - "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. - "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. - "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. - "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. - "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. - "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. - "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. - "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). - "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! - "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. - "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. - "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. - "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. - "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. - "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy. This was overprovisioning THPs in sparsely accessed memory areas. - "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. - "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. - "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. - "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios" * tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits) zram: free secondary algorithms names uprobes: turn xol_area->pages[2] into xol_area->page uprobes: introduce the global struct vm_special_mapping xol_mapping Revert "uprobes: use vm_special_mapping close() functionality" mm: support large folios swap-in for sync io devices mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios mm: fix swap_read_folio_zeromap() for large folios with partial zeromap mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries set_memory: add __must_check to generic stubs mm/vma: return the exact errno in vms_gather_munmap_vmas() memcg: cleanup with !CONFIG_MEMCG_V1 mm/show_mem.c: report alloc tags in human readable units mm: support poison recovery from copy_present_page() mm: support poison recovery from do_cow_fault() resource, kunit: add test case for region_intersects() resource: make alloc_free_mem_region() works for iomem_resource mm: z3fold: deprecate CONFIG_Z3FOLD vfio/pci: implement huge_fault support mm/arm64: support large pfn mappings mm/x86: support large pfn mappings ...
2024-09-12Merge branch kvm-arm64/nv-at-pan into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-at-pan: : . : Add NV support for the AT family of instructions, which mostly results : in adding a page table walker that deals with most of the complexity : of the architecture. : : From the cover letter: : : "Another task that a hypervisor supporting NV on arm64 has to deal with : is to emulate the AT instruction, because we multiplex all the S1 : translations on a single set of registers, and the guest S2 is never : truly resident on the CPU. : : So given that we lie about page tables, we also have to lie about : translation instructions, hence the emulation. Things are made : complicated by the fact that guest S1 page tables can be swapped out, : and that our shadow S2 is likely to be incomplete. So while using AT : to emulate AT is tempting (and useful), it is not going to always : work, and we thus need a fallback in the shape of a SW S1 walker." : . KVM: arm64: nv: Add support for FEAT_ATS1A KVM: arm64: nv: Plumb handling of AT S1* traps from EL2 KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3 KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration KVM: arm64: nv: Add SW walker for AT S1 emulation KVM: arm64: nv: Make ps_to_output_size() generally available KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W} KVM: arm64: nv: Add basic emulation of AT S1E2{R,W} KVM: arm64: nv: Add basic emulation of AT S1E1{R,W}P KVM: arm64: nv: Add basic emulation of AT S1E{0,1}{R,W} KVM: arm64: nv: Honor absence of FEAT_PAN2 KVM: arm64: nv: Turn upper_attr for S2 walk into the full descriptor KVM: arm64: nv: Enforce S2 alignment when contiguous bit is set arm64: Add ESR_ELx_FSC_ADDRSZ_L() helper arm64: Add system register encoding for PSTATE.PAN arm64: Add PAR_EL1 field description arm64: Add missing APTable and TCR_ELx.HPD masks KVM: arm64: Make kvm_at() take an OP_AT_* Signed-off-by: Marc Zyngier <maz@kernel.org> # Conflicts: # arch/arm64/kvm/nested.c
2024-09-01mm: kvmalloc: align kvrealloc() with krealloc()Danilo Krummrich
Besides the obvious (and desired) difference between krealloc() and kvrealloc(), there is some inconsistency in their function signatures and behavior: - krealloc() frees the memory when the requested size is zero, whereas kvrealloc() simply returns a pointer to the existing allocation. - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas kvrealloc() does not accept a NULL pointer at all and, if passed, would fault instead. - krealloc() is self-contained, whereas kvrealloc() relies on the caller to provide the size of the previous allocation. Inconsistent behavior throughout allocation APIs is error prone, hence make kvrealloc() behave like krealloc(), which seems superior in all mentioned aspects. Besides that, implementing kvrealloc() by making use of krealloc() and vrealloc() provides oppertunities to grow (and shrink) allocations more efficiently. For instance, vrealloc() can be optimized to allocate and map additional pages to grow the allocation or unmap and free unused pages to shrink the allocation. [dakr@kernel.org: document concurrency restrictions] Link: https://lkml.kernel.org/r/20240725125442.4957-1-dakr@kernel.org [dakr@kernel.org: disable KASAN when switching to vmalloc] Link: https://lkml.kernel.org/r/20240730185049.6244-2-dakr@kernel.org [dakr@kernel.org: properly document __GFP_ZERO behavior] Link: https://lkml.kernel.org/r/20240730185049.6244-5-dakr@kernel.org Link: https://lkml.kernel.org/r/20240722163111.4766-3-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Christian König <christian.koenig@amd.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <kees@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-08-30KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configurationMarc Zyngier
Ensure that SCTLR_EL1.EPAN is RES0 when FEAT_PAN3 isn't supported. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: nv: Make ps_to_output_size() generally availableMarc Zyngier
Make this helper visible to at.c, we are going to need it. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: nv: Turn upper_attr for S2 walk into the full descriptorMarc Zyngier
The upper_attr attribute has been badly named, as it most of the time carries the full "last walked descriptor". Rename it to "desc" and make ti contain the full 64bit descriptor. This will be used by the S1 PTW. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: nv: Enforce S2 alignment when contiguous bit is setMarc Zyngier
Despite KVM not using the contiguous bit for anything related to TLBs, the spec does require that the alignment defined by the contiguous bit for the page size and the level is enforced. Add the required checks to offset the point where PA and VA merge. Fixes: 61e30b9eef7f ("KVM: arm64: nv: Implement nested Stage-2 page table walk logic") Reported-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Add helper for last ditch idreg adjustmentsMarc Zyngier
We already have to perform a set of last-chance adjustments for NV purposes. We will soon have to do the same for the GIC, so introduce a helper for that exact purpose. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-02KVM: arm64: free kvm->arch.nested_mmus with kvfree()Danilo Krummrich
kvm->arch.nested_mmus is allocated with kvrealloc(), hence free it with kvfree() instead of kfree(). Fixes: 4f128f8e1aaa ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures") Signed-off-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240723142204.758796-1-dakr@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/nv-sve into kvmarm/nextOliver Upton
* kvm-arm64/nv-sve: : CPTR_EL2, FPSIMD/SVE support for nested : : This series brings support for honoring the guest hypervisor's CPTR_EL2 : trap configuration when running a nested guest, along with support for : FPSIMD/SVE usage at L1 and L2. KVM: arm64: Allow the use of SVE+NV KVM: arm64: nv: Add additional trap setup for CPTR_EL2 KVM: arm64: nv: Add trap description for CPTR_EL2 KVM: arm64: nv: Add TCPAC/TTA to CPTR->CPACR conversion helper KVM: arm64: nv: Honor guest hypervisor's FP/SVE traps in CPTR_EL2 KVM: arm64: nv: Load guest FP state for ZCR_EL2 trap KVM: arm64: nv: Handle CPACR_EL1 traps KVM: arm64: Spin off helper for programming CPTR traps KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state KVM: arm64: nv: Use guest hypervisor's max VL when running nested guest KVM: arm64: nv: Save guest's ZCR_EL2 when in hyp context KVM: arm64: nv: Load guest hyp's ZCR into EL1 state KVM: arm64: nv: Handle ZCR_EL2 traps KVM: arm64: nv: Forward SVE traps to guest hypervisor KVM: arm64: nv: Forward FP/ASIMD traps to guest hypervisor Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/ctr-el0 into kvmarm/nextOliver Upton
* kvm-arm64/ctr-el0: : Support for user changes to CTR_EL0, courtesy of Sebastian Ott : : Allow userspace to change the guest-visible value of CTR_EL0 for a VM, : so long as the requested value represents a subset of features supported : by hardware. In other words, prevent the VMM from over-promising the : capabilities of hardware. : : Make this happen by fitting CTR_EL0 into the existing infrastructure for : feature ID registers. KVM: selftests: Assert that MPIDR_EL1 is unchanged across vCPU reset KVM: arm64: nv: Unfudge ID_AA64PFR0_EL1 masking KVM: selftests: arm64: Test writes to CTR_EL0 KVM: arm64: rename functions for invariant sys regs KVM: arm64: show writable masks for feature registers KVM: arm64: Treat CTR_EL0 as a VM feature ID register KVM: arm64: unify code to prepare traps KVM: arm64: nv: Use accessors for modifying ID registers KVM: arm64: Add helper for writing ID regs KVM: arm64: Use read-only helper for reading VM ID registers KVM: arm64: Make idregs debugfs iterator search sysreg table directly KVM: arm64: Get sys_reg encoding from descriptor in idregs_debug_show() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/shadow-mmu into kvmarm/nextOliver Upton
* kvm-arm64/shadow-mmu: : Shadow stage-2 MMU support for NV, courtesy of Marc Zyngier : : Initial implementation of shadow stage-2 page tables to support a guest : hypervisor. In the author's words: : : So here's the 10000m (approximately 30000ft for those of you stuck : with the wrong units) view of what this is doing: : : - for each {VMID,VTTBR,VTCR} tuple the guest uses, we use a : separate shadow s2_mmu context. This context has its own "real" : VMID and a set of page tables that are the combination of the : guest's S2 and the host S2, built dynamically one fault at a time. : : - these shadow S2 contexts are ephemeral, and behave exactly as : TLBs. For all intent and purposes, they *are* TLBs, and we discard : them pretty often. : : - TLB invalidation takes three possible paths: : : * either this is an EL2 S1 invalidation, and we directly emulate : it as early as possible : : * or this is an EL1 S1 invalidation, and we need to apply it to : the shadow S2s (plural!) that match the VMID set by the L1 guest : : * or finally, this is affecting S2, and we need to teardown the : corresponding part of the shadow S2s, which invalidates the TLBs KVM: arm64: nv: Truely enable nXS TLBI operations KVM: arm64: nv: Add handling of NXS-flavoured TLBI operations KVM: arm64: nv: Add handling of range-based TLBI operations KVM: arm64: nv: Add handling of outer-shareable TLBI operations KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 level KVM: arm64: nv: Handle FEAT_TTL hinted TLB operations KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operations KVM: arm64: nv: Handle TLBI ALLE1{,IS} operations KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operations KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1 KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidation KVM: arm64: nv: Add Stage-1 EL2 invalidation primitives KVM: arm64: nv: Unmap/flush shadow stage 2 page tables KVM: arm64: nv: Handle shadow stage 2 page faults KVM: arm64: nv: Implement nested Stage-2 page table walk logic KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-08Revert "KVM: arm64: nv: Fix RESx behaviour of disabled FGTs with negative ↵Oliver Upton
polarity" This reverts commit eb9d53d4a949c6d6d7c9f130e537f6b5687fedf9. As Marc pointed out on the list [*], this patch is wrong, and those who find themselves in the SOB chain should have their heads checked. Annoyingly, the architecture has some FGT trap bits that are negative (i.e. 0 implies trap), and there was some confusion how KVM handles this for nested guests. However, it is clear now that KVM honors the RES0-ness of FGT traps already, meaning traps for features never exposed to the guest hypervisor get handled at L0. As they should. Link: https://lore.kernel.org/kvmarm/86bk3c3uss.wl-maz@kernel.org/T/#mb9abb3dd79f6a4544a91cb35676bd637c3a5e836 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-03KVM: arm64: nv: Truely enable nXS TLBI operationsMarc Zyngier
Although we now have support for nXS-flavoured TLBI instructions, we still don't expose the feature to the guest thanks to a mixture of misleading comment and use of a bunch of magic values. Fix the comment and correctly express the masking of LS64, which is enough to expose nXS to the world. Not that anyone cares... Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240703154743.824824-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-22KVM: arm64: nv: Unfudge ID_AA64PFR0_EL1 maskingOliver Upton
Marc reports that L1 VMs aren't booting with the NV series applied to today's kvmarm/next. After bisecting the issue, it appears that 44241f34fac9 ("KVM: arm64: nv: Use accessors for modifying ID registers") is to blame. Poking around at the issue a bit further, it'd appear that the value for ID_AA64PFR0_EL1 is complete garbage, as 'val' still contains the value we set ID_AA64ISAR1_EL1 to. Fix the read-modify-write pattern to actually use ID_AA64PFR0_EL1 as the starting point. Excuse me as I return to my shame cube. Reported-by: Marc Zyngier <maz@kernel.org> Fixes: 44241f34fac9 ("KVM: arm64: nv: Use accessors for modifying ID registers") Acked-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240621224044.2465901-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Allow the use of SVE+NVOliver Upton
Allow SVE and NV to mix now that everything is in place to handle it correctly. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240620164653.1130714-16-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: nv: Use accessors for modifying ID registersOliver Upton
In the interest of abstracting away the underlying storage of feature ID registers, rework the nested code to go through the accessors instead of directly iterating the id_regs array. This means we now lose the property that ID registers unknown to the nested code get zeroed, but we really ought to be handling those explicitly going forward. Link: https://lore.kernel.org/r/20240619174036.483943-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Add helper for writing ID regsOliver Upton
Replace the remaining usage of IDREG() with a new helper for setting the value of a feature ID register, with the benefit of cramming in some extra sanity checks. Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Use GFP_KERNEL_ACCOUNT for sysreg_masks allocationOliver Upton
Of course, userspace is in the driver's seat for struct kvm and associated allocations. Make sure the sysreg_masks allocation participates in kmem accounting. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240617181018.2054332-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Add handling of range-based TLBI operationsMarc Zyngier
We already support some form of range operation by handling FEAT_TTL, but so far the "arbitrary" range operations are unsupported. Let's fix that. For EL2 S1, this is simple enough: we just map both NSH, ISH and OSH instructions onto the ISH version for EL1. For TLBI instructions affecting EL1 S1, we use the same model as their non-range counterpart to invalidate in the context of the correct VMID. For TLBI instructions affecting S2, we interpret the data passed by the guest to compute the range and use that to tear-down part of the shadow S2 range and invalidate the TLBs. Finally, we advertise FEAT_TLBIRANGE if the host supports it. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-16-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Add handling of outer-shareable TLBI operationsMarc Zyngier
Our handling of outer-shareable TLBIs is pretty basic: we just map them to the existing inner-shareable ones, because we really don't have anything else. The only significant change is that we can now advertise FEAT_TLBIOS support if the host supports it. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like informationMarc Zyngier
In order to be able to make S2 TLB invalidations more performant on NV, let's use a scheme derived from the FEAT_TTL extension. If bits [56:55] in the leaf descriptor translating the address in the corresponding shadow S2 are non-zero, they indicate a level which can be used as an invalidation range. This allows further reduction of the systematic over-invalidation that takes place otherwise. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-14-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle FEAT_TTL hinted TLB operationsMarc Zyngier
Support guest-provided information information to size the range of required invalidation. This helps with reducing over-invalidation, provided that the guest actually provides accurate information. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-12-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1Marc Zyngier
While dealing with TLB invalidation targeting the guest hypervisor's own stage-1 was easy, doing the same thing for its own guests is a bit more involved. Since such an invalidation is scoped by VMID, it needs to apply to all s2_mmu contexts that have been tagged by that VMID, irrespective of the value of VTTBR_EL2.BADDR. So for each s2_mmu context matching that VMID, we invalidate the corresponding TLBs, each context having its own "physical" VMID. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-8-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Unmap/flush shadow stage 2 page tablesChristoffer Dall
Unmap/flush shadow stage 2 page tables for the nested VMs as well as the stage 2 page table for the guest hypervisor. Note: A bunch of the code in mmu.c relating to MMU notifiers is currently dealt with in an extremely abrupt way, for example by clearing out an entire shadow stage-2 table. This will be handled in a more efficient way using the reverse mapping feature in a later version of the patch series. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle shadow stage 2 page faultsMarc Zyngier
If we are faulting on a shadow stage 2 translation, we first walk the guest hypervisor's stage 2 page table to see if it has a mapping. If not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we create a mapping in the shadow stage 2 page table. Note that we have to deal with two IPAs when we got a shadow stage 2 page fault. One is the address we faulted on, and is in the L2 guest phys space. The other is from the guest stage-2 page table walk, and is in the L1 guest phys space. To differentiate them, we rename variables so that fault_ipa is used for the former and ipa is used for the latter. When mapping a page in a shadow stage-2, special care must be taken not to be more permissive than the guest is. Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org> Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Implement nested Stage-2 page table walk logicChristoffer Dall
Based on the pseudo-code in the ARM ARM, implement a stage 2 software page table walker. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Support multiple nested Stage-2 mmu structuresMarc Zyngier
Add Stage-2 mmu data structures for virtual EL2 and for nested guests. We don't yet populate shadow Stage-2 page tables, but we now have a framework for getting to a shadow Stage-2 pgd. We allocate twice the number of vcpus as Stage-2 mmu structures because that's sufficient for each vcpu running two translation regimes without having to flush the Stage-2 page tables. Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-14KVM: arm64: nv: Fix RESx behaviour of disabled FGTs with negative polarityMarc Zyngier
The Fine Grained Trap extension is pretty messy as it doesn't consistently use the same polarity for all trap bits. A bunch of them, added later in the life of the architecture, have a *negative* priority. So if these bits are disabled, they must be RES1 and not RES0. But that's not what the code implements, making the traps for these negative trap bits being always on instead of disabled. Fix the relevant bits, and stick a brown paper bag on my head for the rest of the day... Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614125858.78361-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-05-30KVM: arm64: nv: Expose BTI and CSV_frac to a guest hypervisorMarc Zyngier
Now that we expose PAC to NV guests, we can also expose BTI (as the two as joined at the hip, due to some of the PAC instructions being landing pads). While we're at it, also propagate CSV_frac, which requires no particular emulation. Fixes: f4f6a95bac49 ("KVM: arm64: nv: Advertise support for PAuth") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240528100632.1831995-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>