Age | Commit message (Collapse) | Author |
|
* kvm-arm64/mte-frac:
: .
: Prevent FEAT_MTE_ASYNC from being accidently exposed to a guest,
: courtesy of Ben Horgan. From the cover letter:
:
: "The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM.
: However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0
: indicates that MTE_ASYNC is supported. On a host with
: ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the
: MTE capability enabled will incorrectly see MTE_ASYNC advertised as
: supported. This series fixes that."
: .
KVM: selftests: Confirm exposing MTE_frac does not break migration
KVM: arm64: Make MTE_frac masking conditional on MTE capability
arm64/sysreg: Expose MTE_frac so that it is visible to KVM
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/ubsan-el2:
: .
: Add UBSAN support to the EL2 portion of KVM, reusing most of the
: existing logic provided by CONFIG_IBSAN_TRAP.
:
: Patches courtesy of Mostafa Saleh.
: .
KVM: arm64: Handle UBSAN faults
KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2
ubsan: Remove regs from report_ubsan_failure()
arm64: Introduce esr_is_ubsan_brk()
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pkvm-np-thp-6.16: (21 commits)
: .
: Large mapping support for non-protected pKVM guests, courtesy of
: Vincent Donnefort. From the cover letter:
:
: "This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM
: np-guests, that is installing PMD-level mappings in the stage-2,
: whenever the stage-1 is backed by either Hugetlbfs or THPs."
: .
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
KVM: arm64: Stage-2 huge mappings for np-guests
KVM: arm64: Add a range to pkvm_mappings
KVM: arm64: Convert pkvm_mappings to interval tree
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
KVM: arm64: Add a range to __pkvm_host_share_guest()
KVM: arm64: Introduce for_each_hyp_page
KVM: arm64: Handle huge mappings for np-guest CMOs
KVM: arm64: Extend pKVM selftest for np-guests
KVM: arm64: Selftest for pKVM transitions
KVM: arm64: Don't WARN from __pkvm_host_share_guest()
KVM: arm64: Add .hyp.data section
KVM: arm64: Unconditionally cross check hyp state
KVM: arm64: Defer EL2 stage-1 mapping on share
KVM: arm64: Move hyp state to hyp_vmemmap
KVM: arm64: Introduce {get,set}_host_state() helpers
KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE
KVM: arm64: Fix pKVM page-tracking comments
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
With the introduction of stage-2 huge mappings in the pKVM hypervisor,
guest pages CMO is needed for PMD_SIZE size. Fixmap only supports
PAGE_SIZE and iterating over the huge-page is time consuming (mostly due
to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency.
Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap)
to improve guest page CMOs when stage-2 huge mappings are installed.
On a Pixel6, the iterative solution resulted in a latency of ~700us,
while the PMD_SIZE fixmap reduces it to ~100us.
Because of the horrendous private range allocation that would be
necessary, this is disabled for 64KiB pages systems.
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Now np-guests hypercalls with range are supported, we can let the
hypervisor to install block mappings whenever the Stage-1 allows it,
that is when backed by either Hugetlbfs or THPs. The size of those block
mappings is limited to PMD_SIZE.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-10-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest, add a
nr_pages member for pkvm_mappings to allow EL1 to track the size of the
stage-2 mapping.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-9-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest, let's
convert pgt.pkvm_mappings to an interval tree.
No functional change intended.
Suggested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-8-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall.
This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is
512 on a 4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-7-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This
range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512
on a 4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-6-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-5-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_share_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-4-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add a helper to iterate over the hypervisor vmemmap. This will be
particularly handy with the introduction of huge mapping support
for the np-guest stage-2.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-3-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
clean_dcache_guest_page() and invalidate_icache_guest_page() accept a
size as an argument. But they also rely on fixmap, which can only map a
single PAGE_SIZE page.
With the upcoming stage-2 huge mappings for pKVM np-guests, those
callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis
until the whole range is done.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-2-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pkvm-selftest-6.16:
: .
: pKVM selftests covering the memory ownership transitions by
: Quentin Perret. From the initial cover letter:
:
: "We have recently found a bug [1] in the pKVM memory ownership
: transitions by code inspection, but it could have been caught with a
: test.
:
: Introduce a boot-time selftest exercising all the known pKVM memory
: transitions and importantly checks the rejection of illegal transitions.
:
: The new test is hidden behind a new Kconfig option separate from
: CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the
: transition checks ([1] doesn't reproduce with EL2 debug enabled).
:
: [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/"
: .
KVM: arm64: Extend pKVM selftest for np-guests
KVM: arm64: Selftest for pKVM transitions
KVM: arm64: Don't WARN from __pkvm_host_share_guest()
KVM: arm64: Add .hyp.data section
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pkvm-6.16:
: .
: pKVM memory management cleanups, courtesy of Quentin Perret.
: From the cover letter:
:
: "This series moves the hypervisor's ownership state to the hyp_vmemmap,
: as discussed in [1]. The two main benefits are:
:
: 1. much cheaper hyp state lookups, since we can avoid the hyp stage-1
: page-table walk;
:
: 2. de-correlates the hyp state from the presence of a mapping in the
: linear map range of the hypervisor; which enables a bunch of
: clean-ups in the existing code and will simplify the introduction of
: other features in the future (hyp tracing, ...)"
: .
KVM: arm64: Unconditionally cross check hyp state
KVM: arm64: Defer EL2 stage-1 mapping on share
KVM: arm64: Move hyp state to hyp_vmemmap
KVM: arm64: Introduce {get,set}_host_state() helpers
KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE
KVM: arm64: Fix pKVM page-tracking comments
KVM: arm64: Track SVE state in the hypervisor vcpu structure
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When MTE is supported but MTE_ASYMM is not (ID_AA64PFR1_EL1.MTE == 2)
ID_AA64PFR1_EL1.MTE_frac == 0xF indicates MTE_ASYNC is unsupported
and MTE_frac == 0 indicates it is supported.
As MTE_frac was previously unconditionally read as 0 from the guest
and user-space, check that using SET_ONE_REG to set it to 0 succeeds
but does not change MTE_frac from unsupported (0xF) to supported (0).
This is required as values originating from KVM from user-space must
be accepted to avoid breaking migration.
Also, to allow this MTE field to be tested, enable KVM_ARM_CAP_MTE
for the set_id_regs test. No effect on existing tests is expected.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Link: https://lore.kernel.org/r/20250512114112.359087-4-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
If MTE_frac is masked out unconditionally then the guest will always
see ID_AA64PFR1_EL1_MTE_frac as 0. However, a value of 0 when
ID_AA64PFR1_EL1_MTE is 2 indicates that MTE_ASYNC is supported. Hence, for
a host with ID_AA64PFR1_EL1_MTE==2 and ID_AA64PFR1_EL1_MTE_frac==0xf
(MTE_ASYNC unsupported) the guest would see MTE_ASYNC advertised as
supported whilst the host does not support it. Hence, expose the sanitised
value of MTE_frac to the guest and user-space.
As MTE_frac was previously hidden, always 0, and KVM must accept values
from KVM provided by user-space, when ID_AA64PFR1_EL1.MTE is 2 allow
user-space to set ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to
avoid incorrectly claiming hardware support for MTE_ASYNC in the guest.
Note that linux does not check the value of ID_AA64PFR1_EL1_MTE_frac and
wrongly assumes that MTE async faults can be generated even on hardware
that does nto support them. This issue is not addressed here.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Link: https://lore.kernel.org/r/20250512114112.359087-3-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
KVM exposes the sanitised ID registers to guests. Currently these ignore
the ID_AA64PFR1_EL1.MTE_frac field, meaning guests always see a value of
zero.
This is a problem for platforms without the MTE_ASYNC feature where
ID_AA64PFR1_EL1.MTE==0x2 and ID_AA64PFR1_EL1.MTE_frac==0xf. KVM forces
MTE_frac to zero, meaning the guest believes MTE_ASYNC is supported, when
no async fault will ever occur.
Before KVM can fix this, the architecture needs to sanitise the ID
register field for MTE_frac.
Linux itself does not use MTE_frac field and just assumes MTE async faults
can be generated if MTE is supported.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Link: https://lore.kernel.org/r/20250512114112.359087-2-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
As now UBSAN can be enabled, handle brk64 exits from UBSAN.
Re-use the decoding code from the kernel, and panic with
UBSAN message.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-5-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add a new Kconfig CONFIG_UBSAN_KVM_EL2 for KVM which enables
UBSAN for EL2 code (in protected/nvhe/hvhe) modes.
This will re-use the same checks enabled for the kernel for
the hypervisor. The only difference is that for EL2 it always
emits a "brk" instead of implementing hooks as the hypervisor
can't print reports.
The KVM code will re-use the same code for the kernel
"report_ubsan_failure()" so #ifdefs are changed to also have this
code for CONFIG_UBSAN_KVM_EL2
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-4-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
report_ubsan_failure() doesn't use argument regs, and soon it will
be called from the hypervisor context were regs are not available.
So, remove the unused argument.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Acked-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-3-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Soon, KVM is going to use this logic for hypervisor panics,
so add it in a wrapper that can be used by the hypervisor exit
handler to decode hyp panics.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-2-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The pKVM selftest intends to test as many memory 'transitions' as
possible, so extend it to cover sharing pages with non-protected guests,
including in the case of multi-sharing.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-5-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We have recently found a bug [1] in the pKVM memory ownership
transitions by code inspection, but it could have been caught with a
test.
Introduce a boot-time selftest exercising all the known pKVM memory
transitions and importantly checks the rejection of illegal transitions.
The new test is hidden behind a new Kconfig option separate from
CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the
transition checks ([1] doesn't reproduce with EL2 debug enabled).
[1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-4-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We currently WARN() if the host attempts to share a page that is not in
an acceptable state with a guest. This isn't strictly necessary and
makes testing much harder, so drop the WARN and make sure to propage the
error code instead.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-3-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The hypervisor has not needed its own .data section because all globals
were either .rodata or .bss. To avoid having to initialize future
data-structures at run-time, let's introduce add a .data section to the
hypervisor.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/nv-pmu-fixes:
: .
: Fixes for NV PMU emulation. From the cover letter:
:
: "Joey reports that some of his PMU tests do not behave quite as
: expected:
:
: - MDCR_EL2.HPMN is set to 0 out of reset
:
: - PMCR_EL0.P should reset all the counters when written from EL2
:
: Oliver points out that setting PMCR_EL0.N from userspace by writing to
: the register is silly with NV, and that we need a new PMU attribute
: instead.
:
: On top of that, I figured out that we had a number of little gotchas:
:
: - It is possible for a guest to write an HPMN value that is out of
: bound, and it seems valuable to limit it
:
: - PMCR_EL0.N should be the maximum number of counters when read from
: EL2, and MDCR_EL2.HPMN when read from EL0/EL1
:
: - Prevent userspace from updating PMCR_EL0.N when EL2 is available"
: .
KVM: arm64: Let kvm_vcpu_read_pmcr() return an EL-dependent value for PMCR_EL0.N
KVM: arm64: Handle out-of-bound write to MDCR_EL2.HPMN
KVM: arm64: Don't let userspace write to PMCR_EL0.N when the vcpu has EL2
KVM: arm64: Allow userspace to limit the number of PMU counters for EL2 VMs
KVM: arm64: Contextualise the handling of PMCR_EL0.P writes
KVM: arm64: Fix MDCR_EL2.HPMN reset value
KVM: arm64: Repaint pmcr_n into nr_pmu_counters
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Now that the hypervisor's state is stored in the hyp_vmemmap, we no
longer need an expensive page-table walk to read it. This means we can
now afford to cross check the hyp-state during all memory ownership
transitions where the hyp is involved unconditionally, hence avoiding
problems such as [1].
[1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-8-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We currently blindly map into EL2 stage-1 *any* page passed to the
__pkvm_host_share_hyp() HVC. This is less than ideal from a security
perspective as it makes exploitation of potential hypervisor gadgets
easier than it should be. But interestingly, pKVM should never need to
access SHARED_BORROWED pages that it hasn't previously pinned, so there
is no need to map the page before that.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-7-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Tracking the hypervisor's ownership state into struct hyp_page has
several benefits, including allowing far more efficient lookups (no
page-table walk needed) and de-corelating the state from the presence
of a mapping. This will later allow to map pages into EL2 stage-1 less
proactively which is generally a good thing for security. And in the
future this will help with tracking the state of pages mapped into the
hypervisor's private range without requiring an alias into the 'linear
map' range.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-6-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Instead of directly accessing the host_state member in struct hyp_page,
introduce static inline accessors to do it. The future hyp_state member
will follow the same pattern as it will need some logic in the accessors.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-5-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The page ownership state encoded as 0b11 is currently considered
reserved for future use, and PKVM_NOPAGE uses bit 2. In order to
simplify the relocation of the hyp ownership state into the
vmemmap in later patches, let's use the 'reserved' encoding for
the PKVM_NOPAGE state. The struct hyp_page layout isn't guaranteed
stable at all, so there is no real reason to have 'reserved' encodings.
No functional changes intended.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-4-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Most of the comments relating to pKVM page-tracking in nvhe/memory.h are
now either slightly outdated or outright wrong. Fix the comments.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-3-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When dealing with a guest with SVE enabled, make sure the host SVE
state is pinned at EL2 S1, and that the hypervisor vCPU state is
correctly initialised (and then unpinned on teardown).
Co-authored-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- When releasing a start-aligned resource, e.g., a bridge window, save
start/end/flags for the next assignment attempt; fixes a v6.15-rc1
regression (Ilpo Järvinen)
- Move set_pcie_speed.sh from TEST_PROGS to TEST_FILE; fixes a bwctrl
selftest v6.15-rc1 regression (Ilpo Järvinen)
- Add Manivannan Sadhasivam as maintainer of native host bridge and
endpoint drivers (Manivannan Sadhasivam)
- In endpoint test driver, defer IRQ allocation from .probe() until
ioctl() to fix a regression on platforms where the Vendor/Device ID
match doesn't include driver_data (Niklas Cassel)
* tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
misc: pci_endpoint_test: Defer IRQ allocation until ioctl(PCITEST_SET_IRQTYPE)
MAINTAINERS: Move Manivannan Sadhasivam as PCI Native host bridge and endpoint maintainer
selftests/pcie_bwctrl: Fix test progs list
PCI: Restore assigned resources fully after release
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Revert a v6.15 patch due to a report of SELinux test failures
* tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "sunrpc: clean cache_detail immediately when flush is written frequently"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Fix 32-bit kernel boot crash if passed physical memory with more than
32 address bits
- Fix Xen PV crash
- Work around build bug in certain limited build environments
- Fix CTEST instruction decoding in insn_decoder_test
* tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/insn: Fix CTEST instruction decoding
x86/boot: Work around broken busybox 'truncate' tool
x86/mm: Fix _pgd_alloc() for Xen PV mode
x86/e820: Discard high memory that can't be addressed by 32-bit systems
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix sporadic crashes in dequeue_entities() due to ... bad math.
[ Arguably if pick_eevdf()/pick_next_entity() was less trusting of
complex math being correct it could have de-escalated a crash into
a warning, but that's for a different patch ]"
* tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc perf events fixes from Ingo Molnar:
- Use POLLERR for events in error state, instead of the ambiguous
POLLHUP error value
- Fix non-sampling (counting) events on certain x86 platforms
* tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix non-sampling (counting) events on certain x86 platforms
perf/core: Change to POLLERR for pinned events with error
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"Fix crashes in the gic-v2m irqchip driver, caused by an incorrect
__init annotation"
* tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Add a missing Kconfig option, fix some bugs in exception handlers,
memory management and KVM"
* tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally
LoongArch: KVM: Fully clear some CSRs when VM reboot
LoongArch: KVM: Fix multiple typos of KVM code
LoongArch: Return NULL from huge_pte_offset() for invalid PMD
LoongArch: Remove a bogus reference to ZONE_DMA
LoongArch: Handle fp, lsx, lasx and lbt assembly symbols
LoongArch: Make do_xyz() exception handlers more robust
LoongArch: Make regs_irqs_disabled() more clear
LoongArch: Select ARCH_USE_MEMTEST
|
|
Pull OpenRISC updates from Stafford Horne:
- Support for cacheinfo API to expose OpenRISC cache info via sysfs,
this also translated to some cleanups to OpenRISC cache flush and
invalidate API's
- Documentation updates for new mailing list and toolchain binaries
* tag 'for-linus' of https://github.com/openrisc/linux:
Documentation: openrisc: Update toolchain binaries URL
Documentation: openrisc: Update mailing list
openrisc: Add cacheinfo support
openrisc: Introduce new utility functions to flush and invalidate caches
openrisc: Refactor struct cpuinfo_or1k to reduce duplication
|
|
Ondrej reports that certain SELinux tests are failing after commit
fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is
written frequently"), merged during the v6.15 merge window.
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Fixes: fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is written frequently")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kunit fix from Kees Cook:
"A single fix for the kunit lib/tests/ relocation:
- Ensure prime numbers tests are included in KUnit test runs (Mark Brown)"
* tag 'move-lib-kunit-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib: Ensure prime numbers tests are included in KUnit test runs
|
|
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, mostly amdgpu, with some exynos cleanups and a
couple of minor fixes, seems a bit quiet, but probably some lag from
Easter holidays.
amdgpu:
- P2P DMA fixes
- Display reset fixes
- DCN 3.5 fixes
- ACPI EDID fix
- LTTPR fix
- mode_valid() fix
exynos:
- fix spelling error
- remove redundant error handling in exynos_drm_vidi.c module
- marks struct decon_data as const in the exynos7_drm_decon driver
since it is only read
- Remove unnecessary checking in exynos_drm_drv.c module
meson:
- Fix VCLK calculation
panel:
- jd9365a: Fix reset polarity"
* tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel:
drm/exynos: Fix spelling mistake "enqueu" -> "enqueue"
drm/exynos: exynos7_drm_decon: Consstify struct decon_data
drm/exynos: fixed a spelling error
drm/exynos/vidi: Remove redundant error handling in vidi_get_modes()
drm/exynos: Remove unnecessary checking
drm/amd/display: do not copy invalid CRTC timing info
drm/amd/display: Default IPS to RCG_IN_ACTIVE_IPS2_IN_OFF
drm/amd/display: Use 16ms AUX read interval for LTTPR with old sinks
drm/amd/display: Fix ACPI edid parsing on some Lenovo systems
drm/amdgpu: Allow P2P access through XGMI
drm/amd/display: Enable urgent latency adjustment on DCN35
drm/amd/display: Force full update in gpu reset
drm/amd/display: Fix gpu reset in multidisplay config
drm/amdgpu: Don't pin VRAM without DMABUF_MOVE_NOTIFY
drm/amdgpu: Use allowed_domains for pinning dmabufs
drm: panel: jd9365da: fix reset signal polarity in unprepare
drm/meson: use unsigned long long / Hz for frequency types
Revert "drm/meson: vclk: fix calculation of 59.94 fractional rates"
|
|
There is a code path in dequeue_entities() that can set the slice of a
sched_entity to U64_MAX, which sometimes results in a crash.
The offending case is when dequeue_entities() is called to dequeue a
delayed group entity, and then the entity's parent's dequeue is delayed.
In that case:
1. In the if (entity_is_task(se)) else block at the beginning of
dequeue_entities(), slice is set to
cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then
it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX.
2. The first for_each_sched_entity() loop dequeues the entity.
3. If the entity was its parent's only child, then the next iteration
tries to dequeue the parent.
4. If the parent's dequeue needs to be delayed, then it breaks from the
first for_each_sched_entity() loop _without updating slice_.
5. The second for_each_sched_entity() loop sets the parent's ->slice to
the saved slice, which is still U64_MAX.
This throws off subsequent calculations with potentially catastrophic
results. A manifestation we saw in production was:
6. In update_entity_lag(), se->slice is used to calculate limit, which
ends up as a huge negative number.
7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit
is negative, vlag > limit, so se->vlag is set to the same huge
negative number.
8. In place_entity(), se->vlag is scaled, which overflows and results in
another huge (positive or negative) number.
9. The adjusted lag is subtracted from se->vruntime, which increases or
decreases se->vruntime by a huge number.
10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which
incorrectly returns false because the vruntime is so far from the
other vruntimes on the queue, causing the
(vruntime - cfs_rq->min_vruntime) * load calulation to overflow.
11. Nothing appears to be eligible, so pick_eevdf() returns NULL.
12. pick_next_entity() tries to dereference the return value of
pick_eevdf() and crashes.
Dumping the cfs_rq states from the core dumps with drgn showed tell-tale
huge vruntime ranges and bogus vlag values, and I also traced se->slice
being set to U64_MAX on live systems (which was usually "benign" since
the rest of the runqueue needed to be in a particular state to crash).
Fix it in dequeue_entities() by always setting slice from the first
non-empty cfs_rq.
Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/f0c2d1072be229e1bdddc73c0703919a8b00c652.1745570998.git.osandov@fb.com
|
|
With ACPI in place, gicv2m_get_fwnode() is registered with the pci
subsystem as pci_msi_get_fwnode_cb(), which may get invoked at runtime
during a PCI host bridge probe. But, the call back is wrongly marked as
__init, causing it to be freed, while being registered with the PCI
subsystem and could trigger:
Unable to handle kernel paging request at virtual address ffff8000816c0400
gicv2m_get_fwnode+0x0/0x58 (P)
pci_set_bus_msi_domain+0x74/0x88
pci_register_host_bridge+0x194/0x548
This is easily reproducible on a Juno board with ACPI boot.
Retain the function for later use.
Fixes: 0644b3daca28 ("irqchip/gic-v2m: acpi: Introducing GICv2m ACPI support")
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
|
|
In function kvm_pre_enter_guest(), it prepares to enter guest and check
whether there are pending signals or events. And it will not enter guest
if there are, PMU pass-through preparation for guest should be cancelled
and host should own PMU hardware.
Cc: stable@vger.kernel.org
Fixes: f4e40ea9f78f ("LoongArch: KVM: Add PMU support for guest")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Some registers such as LOONGARCH_CSR_ESTAT and LOONGARCH_CSR_GINTC are
partly cleared with function _kvm_setcsr(). This comes from the hardware
specification, some bits are read only in VM mode, and however they can
be written in host mode. So they are partly cleared in VM mode, and can
be fully cleared in host mode.
These read only bits show pending interrupt or exception status. When VM
reset, the read-only bits should be cleared, otherwise vCPU will receive
unknown interrupts in boot stage.
Here registers LOONGARCH_CSR_ESTAT/LOONGARCH_CSR_GINTC are fully cleared
in ioctl KVM_REG_LOONGARCH_VCPU_RESET vCPU reset path.
Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|