summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm
AgeCommit message (Collapse)Author
2025-05-06KVM: arm64: Unconditionally configure fine-grain trapsMark Rutland
... otherwise we can inherit the host configuration if this differs from the KVM configuration. Signed-off-by: Mark Rutland <mark.rutland@arm.com> [maz: simplified a couple of things] Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Use computed masks as sanitisers for FGT registersMarc Zyngier
Now that we have computed RES0 bits, use them to sanitise the guest view of FGT registers. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Add description of FGT bits leading to EC!=0x18Marc Zyngier
The current FTP tables are only concerned with the bits generating ESR_ELx.EC==0x18. However, we want an exhaustive view of what KVM really knows about. So let's add another small table that provides that extra information. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Compute FGT masks from KVM's own FGT tablesMarc Zyngier
In the process of decoupling KVM's view of the FGT bits from the wider architectural state, use KVM's own FGT tables to build a synthetic view of what is actually known. This allows for some checking along the way. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Plug FEAT_GCS handlingMarc Zyngier
We don't seem to be handling the GCS-specific exception class. Handle it by delivering an UNDEF to the guest, and populate the relevant trap bits. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Don't treat HCRX_EL2 as a FGT registerMarc Zyngier
Treating HCRX_EL2 as yet another FGT register seems excessive, and gets in a way of further improvements. It is actually simpler to just be explicit about the masking, so just to that. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Restrict ACCDATA_EL1 undef to FEAT_LS64_ACCDATA being disabledMarc Zyngier
We currently unconditionally make ACCDATA_EL1 accesses UNDEF. As we are about to support it, restrict the UNDEF behaviour to cases where FEAT_LS64_ACCDATA is not exposed to the guest. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Handle trapping of FEAT_LS64* instructionsMarc Zyngier
We generally don't expect FEAT_LS64* instructions to trap, unless they are trapped by a guest hypervisor. Otherwise, this is just the guest playing tricks on us by using an instruction that isn't advertised, which we handle with a well deserved UNDEF. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Simplify handling of negative FGT bitsMarc Zyngier
check_fgt_bit() and triage_sysreg_trap() implement the same thing twice for no good reason. We have to lookup the FGT register twice, as we don't communicate it. Similarly, we extract the register value at the wrong spot. Reorganise the code in a more logical way so that things are done at the correct location, removing a lot of duplication. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Tighten handling of unknown FGT groupsMarc Zyngier
triage_sysreg_trap() assumes that it knows all the possible values for FGT groups, which won't be the case as we start adding more FGT registers (unless we add everything in one go, which is obviously undesirable). At the same time, it doesn't offer much in terms of debugging info when things go wrong. Turn the "__NR_FGT_GROUP_IDS__" case into a default, covering any unhandled value, and give the kernel hacker a bit of a clue about what's wrong (system register and full trap descriptor). Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06arm64: sysreg: Replace HFGxTR_EL2 with HFG{R,W}TR_EL2Marc Zyngier
Treating HFGRTR_EL2 and HFGWTR_EL2 identically was a mistake. It makes things hard to reason about, has the potential to introduce bugs by giving a meaning to bits that are really reserved, and is in general a bad description of the architecture. Given that #defines are cheap, let's describe both registers as intended by the architecture, and repaint all the existing uses. Yes, this is painful. The registers themselves are generated from the JSON file in an automated way. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Extend pKVM selftest for np-guestsQuentin Perret
The pKVM selftest intends to test as many memory 'transitions' as possible, so extend it to cover sharing pages with non-protected guests, including in the case of multi-sharing. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-5-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Selftest for pKVM transitionsQuentin Perret
We have recently found a bug [1] in the pKVM memory ownership transitions by code inspection, but it could have been caught with a test. Introduce a boot-time selftest exercising all the known pKVM memory transitions and importantly checks the rejection of illegal transitions. The new test is hidden behind a new Kconfig option separate from CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the transition checks ([1] doesn't reproduce with EL2 debug enabled). [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/ Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-4-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Don't WARN from __pkvm_host_share_guest()Quentin Perret
We currently WARN() if the host attempts to share a page that is not in an acceptable state with a guest. This isn't strictly necessary and makes testing much harder, so drop the WARN and make sure to propage the error code instead. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Add .hyp.data sectionDavid Brazdil
The hypervisor has not needed its own .data section because all globals were either .rodata or .bss. To avoid having to initialize future data-structures at run-time, let's introduce add a .data section to the hypervisor. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE modeMarc Zyngier
We keep setting and clearing these bits depending on the role of the host kernel, mimicking what we do for nVHE. But that's actually pretty pointless, as we always want physical interrupts to make it to the host, at EL2. This has also two problems: - it prevents IRQs from being taken when these bits are cleared if the implementation has chosen to implement these bits as masks when HCR_EL2.{TGE,xMO}=={0,0} - it triggers a bad erratum on the AmpereOne HW, which catches fire on clearing these bits while an interrupt is being taken (AC03_CPU_36). Let's kill these two birds with a single stone, and permanently set the xMO bits when running VHE. This involves a bit of surgery on code paths that rely on flipping these bits on and off for other purposes. Note that the earliest setting of hcr_el2 (in the init_hcr_el2 macro) is left untouched as is runs extremely early, with interrupts disabled, and soon enough overwritten with the final value containing the xMO bits. Reported-by: D Scott Phillips <scott@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250429114326.3618875-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Replace ternary flags with str_on_off() helperSeongsu Park
Replace repetitive ternary expressions with the str_on_off() helper function. This change improves code readability and ensures consistency in tracepoint string formatting Signed-off-by: Seongsu Park <sgsu.park@samsung.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1891546521.01744691102904.JavaMail.epsvc@epcpadp1new Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-05KVM: arm64: Prevent userspace from disabling AArch64 support at any ↵Marc Zyngier
virtualisable EL A sorry excuse for a selftest is trying to disable AArch64 support. And yes, this goes as well as you can imagine. Let's forbid this sort of things. Normal userspace shouldn't get caught doing that. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250429114117.3618800-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE modeMarc Zyngier
We keep setting and clearing these bits depending on the role of the host kernel, mimicking what we do for nVHE. But that's actually pretty pointless, as we always want physical interrupts to make it to the host, at EL2. This has also two problems: - it prevents IRQs from being taken when these bits are cleared if the implementation has chosen to implement these bits as masks when HCR_EL2.{TGE,xMO}=={0,0} - it triggers a bad erratum on the AmpereOne HW, which catches fire on clearing these bits while an interrupt is being taken (AC03_CPU_36). Let's kill these two birds with a single stone, and permanently set the xMO bits when running VHE. This involves a bit of surgery on code paths that rely on flipping these bits on and off for other purposes. Note that the earliest setting of hcr_el2 (in the init_hcr_el2 macro) is left untouched as is runs extremely early, with interrupts disabled, and soon enough overwritten with the final value containing the xMO bits. Reported-by: D Scott Phillips <scott@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250429114326.3618875-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort()Sebastian Ott
Commit fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") made the initialization of the local memcache variable in user_mem_abort() conditional, leaving a codepath where it is used uninitialized via kvm_pgtable_stage2_map(). This can fail on any path that requires a stage-2 allocation without transition via a permission fault or dirty logging. Fix this by making sure that memcache is always valid. Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/kvmarm/3f5db4c7-ccce-fb95-595c-692fa7aad227@redhat.com/ Link: https://lore.kernel.org/r/20250505173148.33900-1-sebott@redhat.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-30arm64: drop binutils version checksArnd Bergmann
Now that gcc-8 and binutils-2.30 are the minimum versions, a lot of the individual feature checks can go away for simplification. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-04-29Merge branch kvm-arm64/nv-pmu-fixes into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-pmu-fixes: : . : Fixes for NV PMU emulation. From the cover letter: : : "Joey reports that some of his PMU tests do not behave quite as : expected: : : - MDCR_EL2.HPMN is set to 0 out of reset : : - PMCR_EL0.P should reset all the counters when written from EL2 : : Oliver points out that setting PMCR_EL0.N from userspace by writing to : the register is silly with NV, and that we need a new PMU attribute : instead. : : On top of that, I figured out that we had a number of little gotchas: : : - It is possible for a guest to write an HPMN value that is out of : bound, and it seems valuable to limit it : : - PMCR_EL0.N should be the maximum number of counters when read from : EL2, and MDCR_EL2.HPMN when read from EL0/EL1 : : - Prevent userspace from updating PMCR_EL0.N when EL2 is available" : . KVM: arm64: Let kvm_vcpu_read_pmcr() return an EL-dependent value for PMCR_EL0.N KVM: arm64: Handle out-of-bound write to MDCR_EL2.HPMN KVM: arm64: Don't let userspace write to PMCR_EL0.N when the vcpu has EL2 KVM: arm64: Allow userspace to limit the number of PMU counters for EL2 VMs KVM: arm64: Contextualise the handling of PMCR_EL0.P writes KVM: arm64: Fix MDCR_EL2.HPMN reset value KVM: arm64: Repaint pmcr_n into nr_pmu_counters Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Unconditionally cross check hyp stateQuentin Perret
Now that the hypervisor's state is stored in the hyp_vmemmap, we no longer need an expensive page-table walk to read it. This means we can now afford to cross check the hyp-state during all memory ownership transitions where the hyp is involved unconditionally, hence avoiding problems such as [1]. [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/ Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-8-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Defer EL2 stage-1 mapping on shareQuentin Perret
We currently blindly map into EL2 stage-1 *any* page passed to the __pkvm_host_share_hyp() HVC. This is less than ideal from a security perspective as it makes exploitation of potential hypervisor gadgets easier than it should be. But interestingly, pKVM should never need to access SHARED_BORROWED pages that it hasn't previously pinned, so there is no need to map the page before that. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-7-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Move hyp state to hyp_vmemmapQuentin Perret
Tracking the hypervisor's ownership state into struct hyp_page has several benefits, including allowing far more efficient lookups (no page-table walk needed) and de-corelating the state from the presence of a mapping. This will later allow to map pages into EL2 stage-1 less proactively which is generally a good thing for security. And in the future this will help with tracking the state of pages mapped into the hypervisor's private range without requiring an alias into the 'linear map' range. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-6-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Introduce {get,set}_host_state() helpersQuentin Perret
Instead of directly accessing the host_state member in struct hyp_page, introduce static inline accessors to do it. The future hyp_state member will follow the same pattern as it will need some logic in the accessors. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-5-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Use 0b11 for encoding PKVM_NOPAGEQuentin Perret
The page ownership state encoded as 0b11 is currently considered reserved for future use, and PKVM_NOPAGE uses bit 2. In order to simplify the relocation of the hyp ownership state into the vmemmap in later patches, let's use the 'reserved' encoding for the PKVM_NOPAGE state. The struct hyp_page layout isn't guaranteed stable at all, so there is no real reason to have 'reserved' encodings. No functional changes intended. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-4-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Fix pKVM page-tracking commentsQuentin Perret
Most of the comments relating to pKVM page-tracking in nvhe/memory.h are now either slightly outdated or outright wrong. Fix the comments. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Track SVE state in the hypervisor vcpu structureFuad Tabba
When dealing with a guest with SVE enabled, make sure the host SVE state is pinned at EL2 S1, and that the hypervisor vCPU state is correctly initialised (and then unpinned on teardown). Co-authored-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-24KVM: arm64, x86: make kvm_arch_has_irq_bypass() inlinePaolo Bonzini
kvm_arch_has_irq_bypass() is a small function and even though it does not appear in any *really* hot paths, it's also not entirely rare. Make it inline---it also works out nicely in preparation for using it in kvm-intel.ko and kvm-amd.ko, since the function is not currently exported. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-11KVM: arm64: Let kvm_vcpu_read_pmcr() return an EL-dependent value for PMCR_EL0.NMarc Zyngier
When EL2 is present, PMCR_EL0.N is the effective value of MDCR_EL2.HPMN when accessed from EL1 or EL0. Make sure we honor this requirement. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11KVM: arm64: Handle out-of-bound write to MDCR_EL2.HPMNMarc Zyngier
We don't really pay attention to what gets written to MDCR_EL2.HPMN, and funky guests could play ugly games on us. Restrict what gets written there, and limit the number of counters to what the PMU is allowed to have. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11KVM: arm64: Don't let userspace write to PMCR_EL0.N when the vcpu has EL2Marc Zyngier
Now that userspace can provide its limit for hte maximum number of counters, prevent it from writing to PMCR_EL0.N, as the value should be derived from MDCR_EL2.HPMN in that case. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11KVM: arm64: Allow userspace to limit the number of PMU counters for EL2 VMsMarc Zyngier
As long as we had purely EL1 VMs, we could easily update the number of guest-visible counters by letting userspace write to PMCR_EL0.N. With VMs started at EL2, PMCR_EL1.N only reflects MDCR_EL2.HPMN, and we don't have a good way to limit it. For this purpose, introduce a new PMUv3 attribute that allows limiting the maximum number of counters. This requires the explicit selection of a PMU. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11KVM: arm64: Contextualise the handling of PMCR_EL0.P writesMarc Zyngier
Contrary to what the comment says in kvm_pmu_handle_pmcr(), writing PMCR_EL0.P==1 has the following effects: <quote> The event counters affected by this field are: * All event counters in the first range. * If any of the following are true, all event counters in the second range: - EL2 is disabled or not implemented in the current Security state. - The PE is executing at EL2 or EL3. </quote> where the "first range" represent the counters in the [0..HPMN-1] range, and the "second range" the counters in the [HPMN..MAX] range. It so appears that writing P from EL2 should nuke all counters, and not just the "guest" view. Just do that, and nuke the misleading comment. Reported-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11KVM: arm64: Fix MDCR_EL2.HPMN reset valueMarc Zyngier
The MDCR_EL2 documentation indicates that the HPMN field has the following behaviour: "On a Warm reset, this field resets to the expression NUM_PMU_COUNTERS." However, it appears we reset it to zero, which is not very useful. Add a reset helper for MDCR_EL2, and handle the case where userspace changes the target PMU, which may force us to change HPMN again. Reported-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11KVM: arm64: Repaint pmcr_n into nr_pmu_countersMarc Zyngier
The pmcr_n field obviously refers to PMCR_EL0.N, but is generally used as the number of counters seen by the guest. Rename it accordingly. Suggested-by: Oliver upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk stage-1 page tables) to align with the architecture. This avoids possibly taking an SEA at EL2 on the page table walk or using an architecturally UNKNOWN fault IPA - Use acquire/release semantics in the KVM FF-A proxy to avoid reading a stale value for the FF-A version - Fix KVM guest driver to match PV CPUID hypercall ABI - Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM selftests, which is the only memory type for which atomic instructions are architecturally guaranteed to work s390: - Don't use %pK for debug printing and tracepoints x86: - Use a separate subclass when acquiring KVM's per-CPU posted interrupts wakeup lock in the scheduled out path, i.e. when adding a vCPU on the list of vCPUs to wake, to workaround a false positive deadlock. The schedule out code runs with a scheduler lock that the wakeup handler takes in the opposite order; but it does so with IRQs disabled and cannot run concurrently with a wakeup - Explicitly zero-initialize on-stack CPUID unions - Allow building irqbypass.ko as as module when kvm.ko is a module - Wrap relatively expensive sanity check with KVM_PROVE_MMU - Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses selftests: - Add more scenarios to the MONITOR/MWAIT test - Add option to rseq test to override /dev/cpu_dma_latency - Bring list of exit reasons up to date - Cleanup Makefile to list once tests that are valid on all architectures Other: - Documentation fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits) KVM: arm64: Use acquire/release to communicate FF-A version negotiation KVM: arm64: selftests: Explicitly set the page attrs to Inner-Shareable KVM: arm64: selftests: Introduce and use hardware-definition macros KVM: VMX: Use separate subclasses for PI wakeup lock to squash false positive KVM: VMX: Assert that IRQs are disabled when putting vCPU on PI wakeup list KVM: x86: Explicitly zero-initialize on-stack CPUID unions KVM: Allow building irqbypass.ko as as module when kvm.ko is a module KVM: x86/mmu: Wrap sanity check on number of TDP MMU pages with KVM_PROVE_MMU KVM: selftests: Add option to rseq test to override /dev/cpu_dma_latency KVM: x86: Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses Documentation: kvm: remove KVM_CAP_MIPS_TE Documentation: kvm: organize capabilities in the right section Documentation: kvm: fix some definition lists Documentation: kvm: drop "Capability" heading from capabilities Documentation: kvm: give correct name for KVM_CAP_SPAPR_MULTITCE Documentation: KVM: KVM_GET_SUPPORTED_CPUID now exposes TSC_DEADLINE selftests: kvm: list once tests that are valid on all architectures selftests: kvm: bring list of exit reasons up to date selftests: kvm: revamp MONITOR/MWAIT tests KVM: arm64: Don't translate FAR if invalid/unsafe ...
2025-04-07KVM: arm64: Use acquire/release to communicate FF-A version negotiationWill Deacon
The pKVM FF-A proxy rejects FF-A requests other than FFA_VERSION until version negotiation is complete, which is signalled by setting the global 'has_version_negotiated' variable. To avoid excessive locking, this variable is checked directly from kvm_host_ffa_handler() in response to an FF-A call, but this can race against another CPU performing the negotiation and potentially lead to reading a torn value (incredibly unlikely for a 'bool') or problematic re-ordering of the accesses to 'has_version_negotiated' and 'hyp_ffa_version' whereby a stale version number could be read by __do_ffa_mem_xfer(). Use acquire/release primitives when writing 'has_version_negotiated' with the version lock held and when reading without the lock held. Cc: Sebastian Ene <sebastianene@google.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Quentin Perret <qperret@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Marc Zyngier <maz@kernel.org> Fixes: c9c012625e12 ("KVM: arm64: Trap FFA_VERSION host call in pKVM") Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250407152755.1041-1-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-03KVM: arm64: Don't translate FAR if invalid/unsafeOliver Upton
Don't re-walk the page tables if an SEA occurred during the faulting page table walk to avoid taking a fatal exception in the hyp. Additionally, check that FAR_EL2 is valid for SEAs not taken on PTW as the architecture doesn't guarantee it contains the fault VA. Finally, fix up the rest of the abort path by checking for SEAs early and bugging the VM if we get further along with an UNKNOWN fault IPA. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250402201725.2963645-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-03arm64: Convert HPFAR_EL2 to sysreg tableOliver Upton
Switch over to the typical sysreg table for HPFAR_EL2 as we're about to start using more fields in the register. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250402201725.2963645-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-03KVM: arm64: Only read HPFAR_EL2 when value is architecturally validOliver Upton
KVM's logic for deciding when HPFAR_EL2 is UNKNOWN doesn't align with the architecture. Most notably, KVM assumes HPFAR_EL2 contains the faulting IPA even in the case of an SEA. Align the logic with the architecture rather than attempting to paraphrase it. Additionally, take the opportunity to improve the language around ARM erratum #834220 such that it actually describes the bug. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250402201725.2963645-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-01Merge tag 'mm-stable-2025-03-30-16-52' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was found to be incorrect. - The series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. * tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits) mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex() x86/mm: restore early initialization of high_memory for 32-bits mm/vmscan: don't try to reclaim hwpoison folio mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper cgroup: docs: add pswpin and pswpout items in cgroup v2 doc mm: vmscan: split proactive reclaim statistics from direct reclaim statistics selftests/mm: speed up split_huge_page_test selftests/mm: uffd-unit-tests support for hugepages > 2M docs/mm/damon/design: document active DAMOS filter type mm/damon: implement a new DAMOS filter type for active pages fs/dax: don't disassociate zero page entries MM documentation: add "Unaccepted" meminfo entry selftests/mm: add commentary about 9pfs bugs fork: use __vmalloc_node() for stack allocation docs/mm: Physical Memory: Populate the "Zones" section xen: balloon: update the NR_BALLOON_PAGES state hv_balloon: update the NR_BALLOON_PAGES state balloon_compaction: update the NR_BALLOON_PAGES state meminfo: add a per node counter for balloon drivers mm: remove references to folio in __memcg_kmem_uncharge_page() ...
2025-03-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Nested virtualization support for VGICv3, giving the nested hypervisor control of the VGIC hardware when running an L2 VM - Removal of 'late' nested virtualization feature register masking, making the supported feature set directly visible to userspace - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers - Paravirtual interface for discovering the set of CPU implementations where a VM may run, addressing a longstanding issue of guest CPU errata awareness in big-little systems and cross-implementation VM migration - Userspace control of the registers responsible for identifying a particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1), allowing VMs to be migrated cross-implementation - pKVM updates, including support for tracking stage-2 page table allocations in the protected hypervisor in the 'SecPageTable' stat - Fixes to vPMU, ensuring that userspace updates to the vPMU after KVM_RUN are reflected into the backing perf events LoongArch: - Remove unnecessary header include path - Assume constant PGD during VM context switch - Add perf events support for guest VM RISC-V: - Disable the kernel perf counter during configure - KVM selftests improvements for PMU - Fix warning at the time of KVM module removal x86: - Add support for aging of SPTEs without holding mmu_lock. Not taking mmu_lock allows multiple aging actions to run in parallel, and more importantly avoids stalling vCPUs. This includes an implementation of per-rmap-entry locking; aging the gfn is done with only a per-rmap single-bin spinlock taken, whereas locking an rmap for write requires taking both the per-rmap spinlock and the mmu_lock. Note that this decreases slightly the accuracy of accessed-page information, because changes to the SPTE outside aging might not use atomic operations even if they could race against a clear of the Accessed bit. This is deliberate because KVM and mm/ tolerate false positives/negatives for accessed information, and testing has shown that reducing the latency of aging is far more beneficial to overall system performance than providing "perfect" young/old information. - Defer runtime CPUID updates until KVM emulates a CPUID instruction, to coalesce updates when multiple pieces of vCPU state are changing, e.g. as part of a nested transition - Fix a variety of nested emulation bugs, and add VMX support for synthesizing nested VM-Exit on interception (instead of injecting #UD into L2) - Drop "support" for async page faults for protected guests that do not set SEND_ALWAYS (i.e. that only want async page faults at CPL3) - Bring a bit of sanity to x86's VM teardown code, which has accumulated a lot of cruft over the years. Particularly, destroy vCPUs before the MMU, despite the latter being a VM-wide operation - Add common secure TSC infrastructure for use within SNP and in the future TDX - Block KVM_CAP_SYNC_REGS if guest state is protected. It does not make sense to use the capability if the relevant registers are not available for reading or writing - Don't take kvm->lock when iterating over vCPUs in the suspend notifier to fix a largely theoretical deadlock - Use the vCPU's actual Xen PV clock information when starting the Xen timer, as the cached state in arch.hv_clock can be stale/bogus - Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend notifier only accounts for kvmclock, and there's no evidence that the flag is actually supported by Xen guests - Clean up the per-vCPU "cache" of its reference pvclock, and instead only track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately expensive to compute, and rarely changes for modern setups) - Don't write to the Xen hypercall page on MSR writes that are initiated by the host (userspace or KVM) to fix a class of bugs where KVM can write to guest memory at unexpected times, e.g. during vCPU creation if userspace has set the Xen hypercall MSR index to collide with an MSR that KVM emulates - Restrict the Xen hypercall MSR index to the unofficial synthetic range to reduce the set of possible collisions with MSRs that are emulated by KVM (collisions can still happen as KVM emulates Hyper-V MSRs, which also reside in the synthetic range) - Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config - Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID entries when updating PV clocks; there is no guarantee PV clocks will be updated between TSC frequency changes and CPUID emulation, and guest reads of the TSC leaves should be rare, i.e. are not a hot path x86 (Intel): - Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1 - Pass XFD_ERR as the payload when injecting #NM, as a preparatory step for upcoming FRED virtualization support - Decouple the EPT entry RWX protection bit macros from the EPT Violation bits, both as a general cleanup and in anticipation of adding support for emulating Mode-Based Execution Control (MBEC) - Reject KVM_RUN if userspace manages to gain control and stuff invalid guest state while KVM is in the middle of emulating nested VM-Enter - Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs in anticipation of adding sanity checks for secondary exit controls (the primary field is out of bits) x86 (AMD): - Ensure the PSP driver is initialized when both the PSP and KVM modules are built-in (the initcall framework doesn't handle dependencies) - Use long-term pins when registering encrypted memory regions, so that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to excessive fragmentation - Add macros and helpers for setting GHCB return/error codes - Add support for Idle HLT interception, which elides interception if the vCPU has a pending, unmasked virtual IRQ when HLT is executed - Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical address - Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g. because the vCPU was "destroyed" via SNP's AP Creation hypercall - Reject SNP AP Creation if the requested SEV features for the vCPU don't match the VM's configured set of features Selftests: - Fix again the Intel PMU counters test; add a data load and do CLFLUSH{OPT} on the data instead of executing code. The theory is that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters - Fix a flaw in the Intel PMU counters test where it asserts that an event is counting correctly without actually knowing what the event counts on the underlying hardware - Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and improve its coverage by collecting all dirty entries on each iteration - Fix a few minor bugs related to handling of stats FDs - Add infrastructure to make vCPU and VM stats FDs available to tests by default (open the FDs during VM/vCPU creation) - Relax an assertion on the number of HLT exits in the xAPIC IPI test when running on a CPU that supports AMD's Idle HLT (which elides interception of HLT if a virtual IRQ is pending and unmasked)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits) RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed RISC-V: KVM: Teardown riscv specific bits after kvm_exit LoongArch: KVM: Register perf callbacks for guest LoongArch: KVM: Implement arch-specific functions for guest perf LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel() LoongArch: KVM: Remove PGD saving during VM context switch LoongArch: KVM: Remove unnecessary header include path KVM: arm64: Tear down vGIC on failed vCPU creation KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected KVM: x86: Add infrastructure for secure TSC KVM: x86: Push down setting vcpu.arch.user_set_tsc ...
2025-03-25Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Nothing major this time around. Apart from the usual perf/PMU updates, some page table cleanups, the notable features are average CPU frequency based on the AMUv1 counters, CONFIG_HOTPLUG_SMT and MOPS instructions (memcpy/memset) in the uaccess routines. Perf and PMUs: - Support for the 'Rainier' CPU PMU from Arm - Preparatory driver changes and cleanups that pave the way for BRBE support - Support for partial virtualisation of the Apple-M1 PMU - Support for the second event filter in Arm CSPMU designs - Minor fixes and cleanups (CMN and DWC PMUs) - Enable EL2 requirements for FEAT_PMUv3p9 Power, CPU topology: - Support for AMUv1-based average CPU frequency - Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It adds a generic topology_is_primary_thread() function overridden by x86 and powerpc New(ish) features: - MOPS (memcpy/memset) support for the uaccess routines Security/confidential compute: - Fix the DMA address for devices used in Realms with Arm CCA. The CCA architecture uses the address bit to differentiate between shared and private addresses - Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by default Memory management clean-ups: - Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs - Some minor page table accessor clean-ups - PIE/POE (permission indirection/overlay) helpers clean-up Kselftests: - MTE: skip hugetlb tests if MTE is not supported on such mappings and user correct naming for sync/async tag checking modes Miscellaneous: - Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people request) - Sysreg updates for new register fields - CPU type info for some Qualcomm Kryo cores" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits) arm64: mm: Don't use %pK through printk perf/arm_cspmu: Fix missing io.h include arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists arm64: cputype: Add MIDR_CORTEX_A76AE arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list arm64/sysreg: Enforce whole word match for open/close tokens arm64/sysreg: Fix unbalanced closing block arm64: Kconfig: Enable HOTPLUG_SMT arm64: topology: Support SMT control on ACPI based system arch_topology: Support SMT control for OF based system cpu/SMT: Provide a default topology_is_primary_thread() arm64/mm: Define PTDESC_ORDER perf/arm_cspmu: Add PMEVFILT2R support perf/arm_cspmu: Generalise event filtering perf/arm_cspmu: Move register definitons to header arm64/kernel: Always use level 2 or higher for early mappings arm64/mm: Drop PXD_TABLE_BIT arm64/mm: Check pmd_table() in pmd_trans_huge() ...
2025-03-25Merge branches 'for-next/amuv1-avg-freq', 'for-next/pkey_unrestricted', ↵Catalin Marinas
'for-next/sysreg', 'for-next/misc', 'for-next/pgtable-cleanups', 'for-next/kselftest', 'for-next/uaccess-mops', 'for-next/pie-poe-cleanup', 'for-next/cputype-kryo', 'for-next/cca-dma-address', 'for-next/drop-pxd_table_bit' and 'for-next/spectre-bhb-assume-vulnerable', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf/arm_cspmu: Fix missing io.h include perf/arm_cspmu: Add PMEVFILT2R support perf/arm_cspmu: Generalise event filtering perf/arm_cspmu: Move register definitons to header drivers/perf: apple_m1: Support host/guest event filtering drivers/perf: apple_m1: Refactor event select/filter configuration perf/dwc_pcie: fix duplicate pci_dev devices perf/dwc_pcie: fix some unreleased resources perf/arm-cmn: Minor event type housekeeping perf: arm_pmu: Move PMUv3-specific data perf: apple_m1: Don't disable counter in m1_pmu_enable_event() perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event() perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event() perf: arm_pmu: Don't disable counter in armpmu_add() perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters perf: arm_pmuv3: Add support for ARM Rainier PMU * for-next/amuv1-avg-freq: : Add support for AArch64 AMUv1-based average freq arm64: Utilize for_each_cpu_wrap for reference lookup arm64: Update AMU-based freq scale factor on entering idle arm64: Provide an AMU-based version of arch_freq_get_on_cpu cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry cpufreq: Allow arch_freq_get_on_cpu to return an error arch_topology: init capacity_freq_ref to 0 * for-next/pkey_unrestricted: : mm/pkey: Add PKEY_UNRESTRICTED macro selftest/powerpc/mm/pkey: fix build-break introduced by commit 00894c3fc917 selftests/powerpc: Use PKEY_UNRESTRICTED macro selftests/mm: Use PKEY_UNRESTRICTED macro mm/pkey: Add PKEY_UNRESTRICTED macro * for-next/sysreg: : arm64 sysreg updates arm64/sysreg: Enforce whole word match for open/close tokens arm64/sysreg: Fix unbalanced closing block arm64/sysreg: Add register fields for HFGWTR2_EL2 arm64/sysreg: Add register fields for HFGRTR2_EL2 arm64/sysreg: Add register fields for HFGITR2_EL2 arm64/sysreg: Add register fields for HDFGWTR2_EL2 arm64/sysreg: Add register fields for HDFGRTR2_EL2 arm64/sysreg: Update register fields for ID_AA64MMFR0_EL1 * for-next/misc: : Miscellaneous arm64 patches arm64: mm: Don't use %pK through printk arm64/fpsimd: Remove unused declaration fpsimd_kvm_prepare() * for-next/pgtable-cleanups: : arm64 pgtable accessors cleanup arm64/mm: Define PTDESC_ORDER arm64/kernel: Always use level 2 or higher for early mappings arm64/hugetlb: Consistently use pud_sect_supported() arm64/mm: Convert __pte_to_phys() and __phys_to_pte_val() as functions * for-next/kselftest: : arm64 kselftest updates kselftest/arm64: mte: Skip the hugetlb tests if MTE not supported on such mappings kselftest/arm64: mte: Use the correct naming for tag check modes in check_hugetlb_options.c * for-next/uaccess-mops: : Implement the uaccess memory copy/set using MOPS instructions arm64: lib: Use MOPS for usercopy routines arm64: mm: Handle PAN faults on uaccess CPY* instructions arm64: extable: Add fixup handling for uaccess CPY* instructions * for-next/pie-poe-cleanup: : PIE/POE helpers cleanup arm64/sysreg: Move POR_EL0_INIT to asm/por.h arm64/sysreg: Rename POE_RXW to POE_RWX arm64/sysreg: Improve PIR/POR helpers * for-next/cputype-kryo: : Add cputype info for some Qualcomm Kryo cores arm64: cputype: Add comments about Qualcomm Kryo 5XX and 6XX cores arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD * for-next/cca-dma-address: : Fix DMA address for devices used in realms with Arm CCA arm64: realm: Use aliased addresses for device DMA to shared buffers dma: Introduce generic dma_addr_*crypted helpers dma: Fix encryption bit clearing for dma_to_phys * for-next/drop-pxd_table_bit: : Drop the arm64 PXD_TABLE_BIT (clean-up in preparation for 128-bit PTEs) arm64/mm: Drop PXD_TABLE_BIT arm64/mm: Check pmd_table() in pmd_trans_huge() arm64/mm: Check PUD_TYPE_TABLE in pud_bad() arm64/mm: Check PXD_TYPE_TABLE in [p4d|pgd]_bad() arm64/mm: Clear PXX_TYPE_MASK and set PXD_TYPE_SECT in [pmd|pud]_mkhuge() arm64/mm: Clear PXX_TYPE_MASK in mk_[pmd|pud]_sect_prot() arm64/ptdump: Test PMD_TYPE_MASK for block mapping KVM: arm64: ptdump: Test PMD_TYPE_MASK for block mapping * for-next/spectre-bhb-assume-vulnerable: : Rework Spectre BHB mitigations to not assume "safe" arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists arm64: cputype: Add MIDR_CORTEX_A76AE arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list
2025-03-25Merge tag 'timers-cleanups-2025-03-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide hrtimer timer cleanup hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member" * tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) wifi: rt2x00: Switch to use hrtimer_update_function() io_uring: Use helper function hrtimer_update_function() serial: xilinx_uartps: Use helper function hrtimer_update_function() ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup() RDMA: Switch to use hrtimer_setup() virtio: mem: Switch to use hrtimer_setup() drm/vmwgfx: Switch to use hrtimer_setup() drm/xe/oa: Switch to use hrtimer_setup() drm/vkms: Switch to use hrtimer_setup() drm/msm: Switch to use hrtimer_setup() drm/i915/request: Switch to use hrtimer_setup() drm/i915/uncore: Switch to use hrtimer_setup() drm/i915/pmu: Switch to use hrtimer_setup() drm/i915/perf: Switch to use hrtimer_setup() drm/i915/gvt: Switch to use hrtimer_setup() drm/i915/huc: Switch to use hrtimer_setup() drm/amdgpu: Switch to use hrtimer_setup() stm class: heartbeat: Switch to use hrtimer_setup() i2c: Switch to use hrtimer_setup() iio: Switch to use hrtimer_setup() ...
2025-03-19Merge branch 'kvm-arm64/pmu-fixes' into kvmarm/nextOliver Upton
* kvm-arm64/pmu-fixes: : vPMU fixes for 6.15 courtesy of Akihiko Odaki : : Various fixes to KVM's vPMU implementation, notably ensuring : userspace-directed changes to the PMCs are reflected in the backing perf : events. KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/pkvm-6.15' into kvmarm/nextOliver Upton
* kvm-arm64/pkvm-6.15: : pKVM updates for 6.15 : : - SecPageTable stats for stage-2 table pages allocated by the protected : hypervisor (Vincent Donnefort) : : - HCRX_EL2 trap + vCPU initialization fixes for pKVM (Fuad Tabba) KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: arm64: Count pKVM stage-2 usage in secondary pagetable stats KVM: arm64: Distinct pKVM teardown memcache for stage-2 KVM: arm64: Add flags to kvm_hyp_memcache Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/writable-midr' into kvmarm/nextOliver Upton
* kvm-arm64/writable-midr: : Writable implementation ID registers, courtesy of Sebastian Ott : : Introduce a new capability that allows userspace to set the : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1, : and AIDR_EL1. Also plug a hole in KVM's trap configuration where : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not : support SME. KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable KVM: arm64: Copy guest CTR_EL0 into hyp VM KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR KVM: arm64: Allow userspace to change the implementation ID registers KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value KVM: arm64: Maintain per-VM copy of implementation ID regs KVM: arm64: Set HCR_EL2.TID1 unconditionally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>