summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-23Merge branch kvm-arm64/mte-frac into kvmarm-master/nextMarc Zyngier
* kvm-arm64/mte-frac: : . : Prevent FEAT_MTE_ASYNC from being accidently exposed to a guest, : courtesy of Ben Horgan. From the cover letter: : : "The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM. : However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0 : indicates that MTE_ASYNC is supported. On a host with : ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the : MTE capability enabled will incorrectly see MTE_ASYNC advertised as : supported. This series fixes that." : . KVM: selftests: Confirm exposing MTE_frac does not break migration KVM: arm64: Make MTE_frac masking conditional on MTE capability arm64/sysreg: Expose MTE_frac so that it is visible to KVM Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/ubsan-el2: : . : Add UBSAN support to the EL2 portion of KVM, reusing most of the : existing logic provided by CONFIG_IBSAN_TRAP. : : Patches courtesy of Mostafa Saleh. : . KVM: arm64: Handle UBSAN faults KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2 ubsan: Remove regs from report_ubsan_failure() arm64: Introduce esr_is_ubsan_brk() Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/pkvm-np-thp-6.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-np-thp-6.16: (21 commits) : . : Large mapping support for non-protected pKVM guests, courtesy of : Vincent Donnefort. From the cover letter: : : "This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM : np-guests, that is installing PMD-level mappings in the stage-2, : whenever the stage-1 is backed by either Hugetlbfs or THPs." : . KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section KVM: arm64: Unconditionally cross check hyp state KVM: arm64: Defer EL2 stage-1 mapping on share KVM: arm64: Move hyp state to hyp_vmemmap KVM: arm64: Introduce {get,set}_host_state() helpers KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE KVM: arm64: Fix pKVM page-tracking comments ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: np-guest CMOs with PMD_SIZE fixmapVincent Donnefort
With the introduction of stage-2 huge mappings in the pKVM hypervisor, guest pages CMO is needed for PMD_SIZE size. Fixmap only supports PAGE_SIZE and iterating over the huge-page is time consuming (mostly due to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency. Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap) to improve guest page CMOs when stage-2 huge mappings are installed. On a Pixel6, the iterative solution resulted in a latency of ~700us, while the PMD_SIZE fixmap reduces it to ~100us. Because of the horrendous private range allocation that would be necessary, this is disabled for 64KiB pages systems. Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Stage-2 huge mappings for np-guestsVincent Donnefort
Now np-guests hypercalls with range are supported, we can let the hypervisor to install block mappings whenever the Stage-1 allows it, that is when backed by either Hugetlbfs or THPs. The size of those block mappings is limited to PMD_SIZE. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-10-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to pkvm_mappingsQuentin Perret
In preparation for supporting stage-2 huge mappings for np-guest, add a nr_pages member for pkvm_mappings to allow EL1 to track the size of the stage-2 mapping. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-9-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Convert pkvm_mappings to interval treeQuentin Perret
In preparation for supporting stage-2 huge mappings for np-guest, let's convert pgt.pkvm_mappings to an interval tree. No functional change intended. Suggested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-8-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-7-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-6-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_unshare_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-5-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_share_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_share_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-4-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Introduce for_each_hyp_pageVincent Donnefort
Add a helper to iterate over the hypervisor vmemmap. This will be particularly handy with the introduction of huge mapping support for the np-guest stage-2. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-3-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Handle huge mappings for np-guest CMOsVincent Donnefort
clean_dcache_guest_page() and invalidate_icache_guest_page() accept a size as an argument. But they also rely on fixmap, which can only map a single PAGE_SIZE page. With the upcoming stage-2 huge mappings for pKVM np-guests, those callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis until the whole range is done. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-2-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21Merge branch kvm-arm64/pkvm-selftest-6.16 into kvm-arm64/pkvm-np-thp-6.16Marc Zyngier
* kvm-arm64/pkvm-selftest-6.16: : . : pKVM selftests covering the memory ownership transitions by : Quentin Perret. From the initial cover letter: : : "We have recently found a bug [1] in the pKVM memory ownership : transitions by code inspection, but it could have been caught with a : test. : : Introduce a boot-time selftest exercising all the known pKVM memory : transitions and importantly checks the rejection of illegal transitions. : : The new test is hidden behind a new Kconfig option separate from : CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the : transition checks ([1] doesn't reproduce with EL2 debug enabled). : : [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/" : . KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21Merge branch kvm-arm64/pkvm-6.16 into kvm-arm64/pkvm-np-thp-6.16Marc Zyngier
* kvm-arm64/pkvm-6.16: : . : pKVM memory management cleanups, courtesy of Quentin Perret. : From the cover letter: : : "This series moves the hypervisor's ownership state to the hyp_vmemmap, : as discussed in [1]. The two main benefits are: : : 1. much cheaper hyp state lookups, since we can avoid the hyp stage-1 : page-table walk; : : 2. de-correlates the hyp state from the presence of a mapping in the : linear map range of the hypervisor; which enables a bunch of : clean-ups in the existing code and will simplify the introduction of : other features in the future (hyp tracing, ...)" : . KVM: arm64: Unconditionally cross check hyp state KVM: arm64: Defer EL2 stage-1 mapping on share KVM: arm64: Move hyp state to hyp_vmemmap KVM: arm64: Introduce {get,set}_host_state() helpers KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE KVM: arm64: Fix pKVM page-tracking comments KVM: arm64: Track SVE state in the hypervisor vcpu structure Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-16KVM: selftests: Confirm exposing MTE_frac does not break migrationBen Horgan
When MTE is supported but MTE_ASYMM is not (ID_AA64PFR1_EL1.MTE == 2) ID_AA64PFR1_EL1.MTE_frac == 0xF indicates MTE_ASYNC is unsupported and MTE_frac == 0 indicates it is supported. As MTE_frac was previously unconditionally read as 0 from the guest and user-space, check that using SET_ONE_REG to set it to 0 succeeds but does not change MTE_frac from unsupported (0xF) to supported (0). This is required as values originating from KVM from user-space must be accepted to avoid breaking migration. Also, to allow this MTE field to be tested, enable KVM_ARM_CAP_MTE for the set_id_regs test. No effect on existing tests is expected. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Link: https://lore.kernel.org/r/20250512114112.359087-4-ben.horgan@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-16KVM: arm64: Make MTE_frac masking conditional on MTE capabilityBen Horgan
If MTE_frac is masked out unconditionally then the guest will always see ID_AA64PFR1_EL1_MTE_frac as 0. However, a value of 0 when ID_AA64PFR1_EL1_MTE is 2 indicates that MTE_ASYNC is supported. Hence, for a host with ID_AA64PFR1_EL1_MTE==2 and ID_AA64PFR1_EL1_MTE_frac==0xf (MTE_ASYNC unsupported) the guest would see MTE_ASYNC advertised as supported whilst the host does not support it. Hence, expose the sanitised value of MTE_frac to the guest and user-space. As MTE_frac was previously hidden, always 0, and KVM must accept values from KVM provided by user-space, when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid incorrectly claiming hardware support for MTE_ASYNC in the guest. Note that linux does not check the value of ID_AA64PFR1_EL1_MTE_frac and wrongly assumes that MTE async faults can be generated even on hardware that does nto support them. This issue is not addressed here. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Link: https://lore.kernel.org/r/20250512114112.359087-3-ben.horgan@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-16arm64/sysreg: Expose MTE_frac so that it is visible to KVMBen Horgan
KVM exposes the sanitised ID registers to guests. Currently these ignore the ID_AA64PFR1_EL1.MTE_frac field, meaning guests always see a value of zero. This is a problem for platforms without the MTE_ASYNC feature where ID_AA64PFR1_EL1.MTE==0x2 and ID_AA64PFR1_EL1.MTE_frac==0xf. KVM forces MTE_frac to zero, meaning the guest believes MTE_ASYNC is supported, when no async fault will ever occur. Before KVM can fix this, the architecture needs to sanitise the ID register field for MTE_frac. Linux itself does not use MTE_frac field and just assumes MTE async faults can be generated if MTE is supported. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Link: https://lore.kernel.org/r/20250512114112.359087-2-ben.horgan@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-07KVM: arm64: Handle UBSAN faultsMostafa Saleh
As now UBSAN can be enabled, handle brk64 exits from UBSAN. Re-use the decoding code from the kernel, and panic with UBSAN message. Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250430162713.1997569-5-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-07KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2Mostafa Saleh
Add a new Kconfig CONFIG_UBSAN_KVM_EL2 for KVM which enables UBSAN for EL2 code (in protected/nvhe/hvhe) modes. This will re-use the same checks enabled for the kernel for the hypervisor. The only difference is that for EL2 it always emits a "brk" instead of implementing hooks as the hypervisor can't print reports. The KVM code will re-use the same code for the kernel "report_ubsan_failure()" so #ifdefs are changed to also have this code for CONFIG_UBSAN_KVM_EL2 Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250430162713.1997569-4-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-07ubsan: Remove regs from report_ubsan_failure()Mostafa Saleh
report_ubsan_failure() doesn't use argument regs, and soon it will be called from the hypervisor context were regs are not available. So, remove the unused argument. Signed-off-by: Mostafa Saleh <smostafa@google.com> Acked-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250430162713.1997569-3-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-07arm64: Introduce esr_is_ubsan_brk()Mostafa Saleh
Soon, KVM is going to use this logic for hypervisor panics, so add it in a wrapper that can be used by the hypervisor exit handler to decode hyp panics. Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250430162713.1997569-2-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Extend pKVM selftest for np-guestsQuentin Perret
The pKVM selftest intends to test as many memory 'transitions' as possible, so extend it to cover sharing pages with non-protected guests, including in the case of multi-sharing. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-5-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Selftest for pKVM transitionsQuentin Perret
We have recently found a bug [1] in the pKVM memory ownership transitions by code inspection, but it could have been caught with a test. Introduce a boot-time selftest exercising all the known pKVM memory transitions and importantly checks the rejection of illegal transitions. The new test is hidden behind a new Kconfig option separate from CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the transition checks ([1] doesn't reproduce with EL2 debug enabled). [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/ Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-4-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Don't WARN from __pkvm_host_share_guest()Quentin Perret
We currently WARN() if the host attempts to share a page that is not in an acceptable state with a guest. This isn't strictly necessary and makes testing much harder, so drop the WARN and make sure to propage the error code instead. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Add .hyp.data sectionDavid Brazdil
The hypervisor has not needed its own .data section because all globals were either .rodata or .bss. To avoid having to initialize future data-structures at run-time, let's introduce add a .data section to the hypervisor. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-29Merge branch kvm-arm64/nv-pmu-fixes into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-pmu-fixes: : . : Fixes for NV PMU emulation. From the cover letter: : : "Joey reports that some of his PMU tests do not behave quite as : expected: : : - MDCR_EL2.HPMN is set to 0 out of reset : : - PMCR_EL0.P should reset all the counters when written from EL2 : : Oliver points out that setting PMCR_EL0.N from userspace by writing to : the register is silly with NV, and that we need a new PMU attribute : instead. : : On top of that, I figured out that we had a number of little gotchas: : : - It is possible for a guest to write an HPMN value that is out of : bound, and it seems valuable to limit it : : - PMCR_EL0.N should be the maximum number of counters when read from : EL2, and MDCR_EL2.HPMN when read from EL0/EL1 : : - Prevent userspace from updating PMCR_EL0.N when EL2 is available" : . KVM: arm64: Let kvm_vcpu_read_pmcr() return an EL-dependent value for PMCR_EL0.N KVM: arm64: Handle out-of-bound write to MDCR_EL2.HPMN KVM: arm64: Don't let userspace write to PMCR_EL0.N when the vcpu has EL2 KVM: arm64: Allow userspace to limit the number of PMU counters for EL2 VMs KVM: arm64: Contextualise the handling of PMCR_EL0.P writes KVM: arm64: Fix MDCR_EL2.HPMN reset value KVM: arm64: Repaint pmcr_n into nr_pmu_counters Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Unconditionally cross check hyp stateQuentin Perret
Now that the hypervisor's state is stored in the hyp_vmemmap, we no longer need an expensive page-table walk to read it. This means we can now afford to cross check the hyp-state during all memory ownership transitions where the hyp is involved unconditionally, hence avoiding problems such as [1]. [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/ Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-8-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Defer EL2 stage-1 mapping on shareQuentin Perret
We currently blindly map into EL2 stage-1 *any* page passed to the __pkvm_host_share_hyp() HVC. This is less than ideal from a security perspective as it makes exploitation of potential hypervisor gadgets easier than it should be. But interestingly, pKVM should never need to access SHARED_BORROWED pages that it hasn't previously pinned, so there is no need to map the page before that. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-7-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Move hyp state to hyp_vmemmapQuentin Perret
Tracking the hypervisor's ownership state into struct hyp_page has several benefits, including allowing far more efficient lookups (no page-table walk needed) and de-corelating the state from the presence of a mapping. This will later allow to map pages into EL2 stage-1 less proactively which is generally a good thing for security. And in the future this will help with tracking the state of pages mapped into the hypervisor's private range without requiring an alias into the 'linear map' range. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-6-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Introduce {get,set}_host_state() helpersQuentin Perret
Instead of directly accessing the host_state member in struct hyp_page, introduce static inline accessors to do it. The future hyp_state member will follow the same pattern as it will need some logic in the accessors. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-5-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Use 0b11 for encoding PKVM_NOPAGEQuentin Perret
The page ownership state encoded as 0b11 is currently considered reserved for future use, and PKVM_NOPAGE uses bit 2. In order to simplify the relocation of the hyp ownership state into the vmemmap in later patches, let's use the 'reserved' encoding for the PKVM_NOPAGE state. The struct hyp_page layout isn't guaranteed stable at all, so there is no real reason to have 'reserved' encodings. No functional changes intended. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-4-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Fix pKVM page-tracking commentsQuentin Perret
Most of the comments relating to pKVM page-tracking in nvhe/memory.h are now either slightly outdated or outright wrong. Fix the comments. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Track SVE state in the hypervisor vcpu structureFuad Tabba
When dealing with a guest with SVE enabled, make sure the host SVE state is pinned at EL2 S1, and that the hypervisor vCPU state is correctly initialised (and then unpinned on teardown). Co-authored-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-27Linux 6.15-rc4v6.15-rc4Linus Torvalds
2025-04-26Merge tag 'pci-v6.15-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI fixes from Bjorn Helgaas: - When releasing a start-aligned resource, e.g., a bridge window, save start/end/flags for the next assignment attempt; fixes a v6.15-rc1 regression (Ilpo Järvinen) - Move set_pcie_speed.sh from TEST_PROGS to TEST_FILE; fixes a bwctrl selftest v6.15-rc1 regression (Ilpo Järvinen) - Add Manivannan Sadhasivam as maintainer of native host bridge and endpoint drivers (Manivannan Sadhasivam) - In endpoint test driver, defer IRQ allocation from .probe() until ioctl() to fix a regression on platforms where the Vendor/Device ID match doesn't include driver_data (Niklas Cassel) * tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: misc: pci_endpoint_test: Defer IRQ allocation until ioctl(PCITEST_SET_IRQTYPE) MAINTAINERS: Move Manivannan Sadhasivam as PCI Native host bridge and endpoint maintainer selftests/pcie_bwctrl: Fix test progs list PCI: Restore assigned resources fully after release
2025-04-26Merge tag 'nfsd-6.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Revert a v6.15 patch due to a report of SELinux test failures * tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "sunrpc: clean cache_detail immediately when flush is written frequently"
2025-04-26Merge tag 'x86-urgent-2025-04-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix 32-bit kernel boot crash if passed physical memory with more than 32 address bits - Fix Xen PV crash - Work around build bug in certain limited build environments - Fix CTEST instruction decoding in insn_decoder_test * tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn: Fix CTEST instruction decoding x86/boot: Work around broken busybox 'truncate' tool x86/mm: Fix _pgd_alloc() for Xen PV mode x86/e820: Discard high memory that can't be addressed by 32-bit systems
2025-04-26Merge tag 'sched-urgent-2025-04-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix sporadic crashes in dequeue_entities() due to ... bad math. [ Arguably if pick_eevdf()/pick_next_entity() was less trusting of complex math being correct it could have de-escalated a crash into a warning, but that's for a different patch ]" * tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash
2025-04-26Merge tag 'perf-urgent-2025-04-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc perf events fixes from Ingo Molnar: - Use POLLERR for events in error state, instead of the ambiguous POLLHUP error value - Fix non-sampling (counting) events on certain x86 platforms * tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix non-sampling (counting) events on certain x86 platforms perf/core: Change to POLLERR for pinned events with error
2025-04-26Merge tag 'irq-urgent-2025-04-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "Fix crashes in the gic-v2m irqchip driver, caused by an incorrect __init annotation" * tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()
2025-04-26Merge tag 'loongarch-fixes-6.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Add a missing Kconfig option, fix some bugs in exception handlers, memory management and KVM" * tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally LoongArch: KVM: Fully clear some CSRs when VM reboot LoongArch: KVM: Fix multiple typos of KVM code LoongArch: Return NULL from huge_pte_offset() for invalid PMD LoongArch: Remove a bogus reference to ZONE_DMA LoongArch: Handle fp, lsx, lasx and lbt assembly symbols LoongArch: Make do_xyz() exception handlers more robust LoongArch: Make regs_irqs_disabled() more clear LoongArch: Select ARCH_USE_MEMTEST
2025-04-26Merge tag 'for-linus' of https://github.com/openrisc/linuxLinus Torvalds
Pull OpenRISC updates from Stafford Horne: - Support for cacheinfo API to expose OpenRISC cache info via sysfs, this also translated to some cleanups to OpenRISC cache flush and invalidate API's - Documentation updates for new mailing list and toolchain binaries * tag 'for-linus' of https://github.com/openrisc/linux: Documentation: openrisc: Update toolchain binaries URL Documentation: openrisc: Update mailing list openrisc: Add cacheinfo support openrisc: Introduce new utility functions to flush and invalidate caches openrisc: Refactor struct cpuinfo_or1k to reduce duplication
2025-04-26Revert "sunrpc: clean cache_detail immediately when flush is written frequently"Chuck Lever
Ondrej reports that certain SELinux tests are failing after commit fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is written frequently"), merged during the v6.15 merge window. Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Fixes: fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is written frequently") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-04-26Merge tag 'move-lib-kunit-v6.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kunit fix from Kees Cook: "A single fix for the kunit lib/tests/ relocation: - Ensure prime numbers tests are included in KUnit test runs (Mark Brown)" * tag 'move-lib-kunit-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lib: Ensure prime numbers tests are included in KUnit test runs
2025-04-26Merge tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly drm fixes, mostly amdgpu, with some exynos cleanups and a couple of minor fixes, seems a bit quiet, but probably some lag from Easter holidays. amdgpu: - P2P DMA fixes - Display reset fixes - DCN 3.5 fixes - ACPI EDID fix - LTTPR fix - mode_valid() fix exynos: - fix spelling error - remove redundant error handling in exynos_drm_vidi.c module - marks struct decon_data as const in the exynos7_drm_decon driver since it is only read - Remove unnecessary checking in exynos_drm_drv.c module meson: - Fix VCLK calculation panel: - jd9365a: Fix reset polarity" * tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel: drm/exynos: Fix spelling mistake "enqueu" -> "enqueue" drm/exynos: exynos7_drm_decon: Consstify struct decon_data drm/exynos: fixed a spelling error drm/exynos/vidi: Remove redundant error handling in vidi_get_modes() drm/exynos: Remove unnecessary checking drm/amd/display: do not copy invalid CRTC timing info drm/amd/display: Default IPS to RCG_IN_ACTIVE_IPS2_IN_OFF drm/amd/display: Use 16ms AUX read interval for LTTPR with old sinks drm/amd/display: Fix ACPI edid parsing on some Lenovo systems drm/amdgpu: Allow P2P access through XGMI drm/amd/display: Enable urgent latency adjustment on DCN35 drm/amd/display: Force full update in gpu reset drm/amd/display: Fix gpu reset in multidisplay config drm/amdgpu: Don't pin VRAM without DMABUF_MOVE_NOTIFY drm/amdgpu: Use allowed_domains for pinning dmabufs drm: panel: jd9365da: fix reset signal polarity in unprepare drm/meson: use unsigned long long / Hz for frequency types Revert "drm/meson: vclk: fix calculation of 59.94 fractional rates"
2025-04-26sched/eevdf: Fix se->slice being set to U64_MAX and resulting crashOmar Sandoval
There is a code path in dequeue_entities() that can set the slice of a sched_entity to U64_MAX, which sometimes results in a crash. The offending case is when dequeue_entities() is called to dequeue a delayed group entity, and then the entity's parent's dequeue is delayed. In that case: 1. In the if (entity_is_task(se)) else block at the beginning of dequeue_entities(), slice is set to cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX. 2. The first for_each_sched_entity() loop dequeues the entity. 3. If the entity was its parent's only child, then the next iteration tries to dequeue the parent. 4. If the parent's dequeue needs to be delayed, then it breaks from the first for_each_sched_entity() loop _without updating slice_. 5. The second for_each_sched_entity() loop sets the parent's ->slice to the saved slice, which is still U64_MAX. This throws off subsequent calculations with potentially catastrophic results. A manifestation we saw in production was: 6. In update_entity_lag(), se->slice is used to calculate limit, which ends up as a huge negative number. 7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit is negative, vlag > limit, so se->vlag is set to the same huge negative number. 8. In place_entity(), se->vlag is scaled, which overflows and results in another huge (positive or negative) number. 9. The adjusted lag is subtracted from se->vruntime, which increases or decreases se->vruntime by a huge number. 10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which incorrectly returns false because the vruntime is so far from the other vruntimes on the queue, causing the (vruntime - cfs_rq->min_vruntime) * load calulation to overflow. 11. Nothing appears to be eligible, so pick_eevdf() returns NULL. 12. pick_next_entity() tries to dereference the return value of pick_eevdf() and crashes. Dumping the cfs_rq states from the core dumps with drgn showed tell-tale huge vruntime ranges and bogus vlag values, and I also traced se->slice being set to U64_MAX on live systems (which was usually "benign" since the rest of the runqueue needed to be in a particular state to crash). Fix it in dequeue_entities() by always setting slice from the first non-empty cfs_rq. Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/f0c2d1072be229e1bdddc73c0703919a8b00c652.1745570998.git.osandov@fb.com
2025-04-26irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()Suzuki K Poulose
With ACPI in place, gicv2m_get_fwnode() is registered with the pci subsystem as pci_msi_get_fwnode_cb(), which may get invoked at runtime during a PCI host bridge probe. But, the call back is wrongly marked as __init, causing it to be freed, while being registered with the PCI subsystem and could trigger: Unable to handle kernel paging request at virtual address ffff8000816c0400 gicv2m_get_fwnode+0x0/0x58 (P) pci_set_bus_msi_domain+0x74/0x88 pci_register_host_bridge+0x194/0x548 This is easily reproducible on a Juno board with ACPI boot. Retain the function for later use. Fixes: 0644b3daca28 ("irqchip/gic-v2m: acpi: Introducing GICv2m ACPI support") Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
2025-04-26LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finallyBibo Mao
In function kvm_pre_enter_guest(), it prepares to enter guest and check whether there are pending signals or events. And it will not enter guest if there are, PMU pass-through preparation for guest should be cancelled and host should own PMU hardware. Cc: stable@vger.kernel.org Fixes: f4e40ea9f78f ("LoongArch: KVM: Add PMU support for guest") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: KVM: Fully clear some CSRs when VM rebootBibo Mao
Some registers such as LOONGARCH_CSR_ESTAT and LOONGARCH_CSR_GINTC are partly cleared with function _kvm_setcsr(). This comes from the hardware specification, some bits are read only in VM mode, and however they can be written in host mode. So they are partly cleared in VM mode, and can be fully cleared in host mode. These read only bits show pending interrupt or exception status. When VM reset, the read-only bits should be cleared, otherwise vCPU will receive unknown interrupts in boot stage. Here registers LOONGARCH_CSR_ESTAT/LOONGARCH_CSR_GINTC are fully cleared in ioctl KVM_REG_LOONGARCH_VCPU_RESET vCPU reset path. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>