Age | Commit message (Collapse) | Author |
|
HCRX_EL2.TMEA further modifies the external abort behavior where
unmasked aborts are taken to EL1 and masked aborts are taken to EL2.
It's rather weird when you consider that SEAs are, well, *synchronous*
and therefore not actually maskable. However, for the purposes of
exception routing, they're considered "masked" if the A flag is set.
This gets a bit hairier when considering the fact that TMEA
also enables vSErrors, i.e. KVM has delegated the HW vSError context to
the guest hypervisor. We can keep the vSError context delegation as-is
by taking advantage of a couple properties:
- If SErrors are unmasked, the 'physical' SError can be taken
in-context immediately. In other words, KVM can emulate the EL1
SError while preserving vEL2's ownership of the vSError context.
- If SErrors are masked, the 'physical' SError is taken to EL2
immediately and needs the usual nested exception entry.
Note that the new in-context handling has the benign effect where
unmasked SError injections are emulated even for non-nested VMs.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-19-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
One of the finest additions of FEAT_DoubleFault2 is the ability for
software to request *synchronous* external aborts be taken to the
SError vector, which of coure are *asynchronous* in nature.
Opinions be damned, implement the architecture and send SEAs to the
SError vector if EASE is set for the target context.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-18-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
For historical reasons, Address size faults are first injected into the
guest as an SEA and ESR_EL1 is subsequently modified to reflect the
correct FSC. Of course, when dealing with a vEL2 this should poke
ESR_EL2.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-17-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Pull out the exception target selection from pend_sync_exception() for
general use. Use PSR_MODE_ELxh as a shorthand for the target EL, as
SP_ELx selection is handled further along in the hyp's exception
emulation.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-16-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
External abort injection will soon rely on a sanitised view of
SCTLR2_ELx to determine exception routing. Compute the RESx masks.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-15-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Restore SCTLR2_EL1 with the correct value for the given context when
FEAT_SCTLR2 is advertised to the guest.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-13-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Set up the sysreg descriptors for SCTLR2_ELx, along with the associated
storage and VNCR mapping.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-12-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Add the complete trap description for SCTLR2_EL1, including FGT and the
inverted HCRX bit.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-11-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Now that the missing bits for vSError injection/deferral have been added
we can merrily claim support for FEAT_RAS.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-10-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
When HCR_EL2.AMO is set, physical SErrors are routed to EL2 and virtual
SError injection is enabled for EL1. Conceptually treating
host-initiated SErrors as 'physical', this means we can delegate control
of the vSError injection context to the guest hypervisor when nesting &&
AMO is set.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-9-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Prepare to implement RAS for NV by adding the missing EL2 sysregs for
the vSError context.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-8-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
To date KVM has used HCR_EL2.VSE to track the state of a pending SError
for the guest. With this bit set, hardware respects the EL1 exception
routing / masking rules and injects the vSError when appropriate.
This isn't correct for NV guests as hardware is oblivious to vEL2's
intentions for SErrors. Better yet, with FEAT_NV2 the guest can change
the routing behind our back as HCR_EL2 is redirected to memory. Cope
with this mess by:
- Using a flag (instead of HCR_EL2.VSE) to track the pending SError
state when SErrors are unconditionally masked for the current context
- Resampling the routing / masking of a pending SError on every guest
entry/exit
- Emulating exception entry when SError routing implies a translation
regime change
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Synchronous external aborts are taken to EL2 if ELIsInHost() or
HCR_EL2.TEA=1. Rework the SEA injection plumbing to respect the imposed
routing of the guest hypervisor and opportunistically rephrase things to
make their function a bit more obvious.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Per R_VRLPB, a pending SError is a WFI wakeup event regardless of
PSTATE.A, meaning that the vCPU is runnable. Sample VSE in addition to
the other IRQ lines.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
A common idiom in the KVM code is to check if we are currently
dealing with a "nested" context, defined as having NV enabled,
but being in the EL1&0 translation regime.
This is usually expressed as:
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu) ... )
which is a mouthful and a bit hard to read, specially when followed
by additional conditions.
Introduce a new helper that encapsulate these two terms, allowing
the above to be written as
if (is_nested_context(vcpu) ... )
which is both shorter and easier to read, and makes more obvious
the potential for simplification on some code paths.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
While BRBE can record branches within guests, the host recording
branches in guests is not supported by perf (though events are).
Support for BRBE in guests will supported by providing direct access
to BRBE within the guests. That is how x86 LBR works for guests.
Therefore, BRBE needs to be disabled on guest entry and restored on
exit.
For nVHE, this requires explicit handling for guests. Before
entering a guest, save the BRBE state and disable the it. When
returning to the host, restore the state.
For VHE, it is not necessary. We initialize
BRBCR_EL1.{E1BRE,E0BRE}=={0,0} at boot time, and HCR_EL2.TGE==1 while
running in the host. We configure BRBCR_EL2.{E2BRE,E0HBRE} to enable
branch recording in the host. When entering the guest, we set
HCR_EL2.TGE==0 which means BRBCR_EL1 is used instead of BRBCR_EL2.
Consequently for VHE, BRBE recording is disabled at EL1 and EL0 when
running a guest.
Should recording in guests (by the host) ever be desired, the perf ABI
will need to be extended to distinguish guest addresses (struct
perf_branch_entry.priv) for starters. BRBE records would also need to be
invalidated on guest entry/exit as guest/host EL1 and EL0 records can't
be distinguished.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Co-developed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: James Clark <james.clark@linaro.org>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-3-e7775563036e@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Introduce a new KVM capability to expose to the userspace whether
cacheable mapping of PFNMAP is supported.
The ability to safely do the cacheable mapping of PFNMAP is contingent
on S2FWB and ARM64_HAS_CACHE_DIC. S2FWB allows KVM to avoid flushing
the D cache, ARM64_HAS_CACHE_DIC allows KVM to avoid flushing the icache
and turns icache_inval_pou() into a NOP. The cap would be false if
those requirements are missing and is checked by making use of
kvm_arch_supports_cacheable_pfnmap.
This capability would allow userspace to discover the support.
It could for instance be used by userspace to prevent live-migration
across FWB and non-FWB hosts.
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Jason Gunthorpe <jgg@nvidia.com>
CC: Oliver Upton <oliver.upton@linux.dev>
CC: David Hildenbrand <david@redhat.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250705071717.5062-7-ankita@nvidia.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
KVM currently forces non-cacheable memory attributes (either Normal-NC
or Device-nGnRE) for a region based on pfn_is_map_memory(), i.e. whether
or not the kernel has a cacheable alias for it. This is necessary in
situations where KVM needs to perform CMOs on the region but is
unnecessarily restrictive when hardware obviates the need for CMOs.
KVM doesn't need to perform any CMOs on hardware with FEAT_S2FWB and
CTR_EL0.DIC. As luck would have it, there are implementations in the
wild that need to map regions of a device with cacheable attributes to
function properly. An example of this is Nvidia's Grace Hopper/Blackwell
systems where GPU memory is interchangeable with DDR and retains
properties such as cacheability, unaligned accesses, atomics and
handling of executable faults. Of course, for this to work in a VM the
GPU memory needs to have a cacheable mapping at stage-2.
Allow cacheable stage-2 mappings to be created on supporting hardware
when the VMA has cacheable memory attributes. Check these preconditions
during memslot creation (in addition to fault handling) to potentially
'fail-fast' as a courtesy to userspace.
CC: Oliver Upton <oliver.upton@linux.dev>
CC: Sean Christopherson <seanjc@google.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Tested-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250705071717.5062-6-ankita@nvidia.com
[ Oliver: refine changelog, squash kvm_supports_cacheable_pfnmap() patch ]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Fixes a security bug due to mismatched attributes between S1 and
S2 mapping.
Currently, it is possible for a region to be cacheable in the userspace
VMA, but mapped non cached in S2. This creates a potential issue where
the VMM may sanitize cacheable memory across VMs using cacheable stores,
ensuring it is zeroed. However, if KVM subsequently assigns this memory
to a VM as uncached, the VM could end up accessing stale, non-zeroed data
from a previous VM, leading to unintended data exposure. This is a security
risk.
Block such mismatch attributes case by returning EINVAL when userspace
try to map PFNMAP cacheable. Only allow NORMAL_NC and DEVICE_*.
CC: Oliver Upton <oliver.upton@linux.dev>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Sean Christopherson <seanjc@google.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250705071717.5062-4-ankita@nvidia.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Despite its name, kvm_is_device_pfn() is actually used to determine if a
given PFN has a kernel mapping that can be used to perform cache
maintenance, as it calls pfn_is_map_memory() internally.
Expand the helper into its single callsite and further condition the
check on the VMA having either VM_PFNMAP or VM_MIXEDMAP set. VMAs that
set neither of these flags must always contain Normal, struct page
backed memory with valid aliases in the kernel address space.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250705071717.5062-3-ankita@nvidia.com
[ Oliver: fixed typos, refined changelog ]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
To perform cache maintenance on a region of memory, KVM/arm64 relies on
that region having a cacheable alias in the kernel's address space which
can be used with CMO instructions.
The 'device' variable is somewhat of a misnomer, as it actually
indicates whether or not the stage-2 alias is allowed to have cacheable
memory attributes. The resulting stage-2 memory attributes are further
modified by VM_ALLOW_ANY_UNCACHED, selecting between Normal-NC or
Device-nGnRE depending on what the endpoint supports.
Rename the to s2_force_noncacheable such that its purpose is a bit more
obvious.
CC: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250705071717.5062-2-ankita@nvidia.com
[ Oliver: addressed typos, wound up rewriting changelog ]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Historically KVM hyp code saved the host's FPSIMD state into the hosts's
fpsimd_state memory, and so it was necessary to map this into the hyp
Stage-1 mappings before running a vCPU.
This is no longer necessary as of commits:
* fbc7e61195e2 ("KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state")
* 8eca7f6d5100 ("KVM: arm64: Remove host FPSIMD saving for non-protected KVM")
Since those commits, we eagerly save the host's FPSIMD state before
calling into hyp to run a vCPU, and hyp code never reads nor writes the
host's fpsimd_state memory. There's no longer any need to map the host's
fpsimd_state memory into the hyp Stage-1, and kvm_arch_vcpu_run_map_fp()
is unnecessary but benign.
Remove kvm_arch_vcpu_run_map_fp(). Currently there is no code to perform
a corresponding unmap, and we never mapped the host's SVE or SME state
into the hyp Stage-1, so no other code needs to be removed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Cc: kvmarm@lists.linux.dev
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250619134817.4075340-1-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Booting an EL2 guest on a system only supporting a subset of the
possible page sizes leads to interesting situations.
For example, on a system that only supports 4kB and 64kB, and is
booted with a 4kB kernel, we end-up advertising 16kB support at
stage-2, which is pretty weird.
That's because we consider that any S2 bigger than our base granule
is fair game, irrespective of what the HW actually supports. While this
is not impossible to support (KVM would happily handle it), it is likely
to be confusing for the guest.
Add new checks that will verify that this granule size is actually
supported before publishing it to the guest.
Fixes: e7ef6ed4583ea ("KVM: arm64: Enforce NV limits on a per-idregs basis")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
expose MTE_STORE_ONLY feature to guest.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250618092957.2069907-6-yeoreum.yun@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
expose FEAT_MTE_TAGGED_FAR feature to guest.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250618084513.1761345-4-yeoreum.yun@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Marc reported that enabling protected mode on a device with GICv2
doesn't fail gracefully as one would expect, and leads to a host
kernel crash.
As it turns out, the first half of pKVM init happens before the vgic
probe, and so by the time we find out we have a GICv2 we're already
committed to keeping the pKVM vectors installed at EL2 -- pKVM rejects
stub HVCs for obvious security reasons. However, the error path on KVM
init leads to teardown_hyp_mode() which unconditionally frees hypervisor
allocations (including the EL2 stacks and per-cpu pages) under the
assumption that a previous cpu_hyp_uninit() execution has reset the
vectors back to the stubs, which is false with pKVM.
Interestingly, host stage-2 protection is not enabled yet at this point,
so this use-after-free may go unnoticed for a while. The issue becomes
more obvious after the finalize_pkvm() call.
Fix this by keeping track of the CPUs on which pKVM is initialized in
the kvm_hyp_initialized per-cpu variable, and use it from
teardown_hyp_mode() to skip freeing pages that are in fact used.
Fixes: a770ee80e662 ("KVM: arm64: pkvm: Disable GICv2 support")
Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250626101014.1519345-1-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In the unlikely case pKVM failed to allocate carveout, the error path
tries to access NULL ptr when it de-reference the SVE state from the
uninitialized nVHE per-cpu base.
[ 1.575420] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 1.576010] pc : teardown_hyp_mode+0xe4/0x180
[ 1.576920] lr : teardown_hyp_mode+0xd0/0x180
[ 1.577308] sp : ffff8000826fb9d0
[ 1.577600] x29: ffff8000826fb9d0 x28: 0000000000000000 x27: ffff80008209b000
[ 1.578383] x26: ffff800081dde000 x25: ffff8000820493c0 x24: ffff80008209eb00
[ 1.579180] x23: 0000000000000040 x22: 0000000000000001 x21: 0000000000000000
[ 1.579881] x20: 0000000000000002 x19: ffff800081d540b8 x18: 0000000000000000
[ 1.580544] x17: ffff800081205230 x16: 0000000000000152 x15: 00000000fffffff8
[ 1.581183] x14: 0000000000000008 x13: fff00000ff7f6880 x12: 000000000000003e
[ 1.581813] x11: 0000000000000002 x10: 00000000000000ff x9 : 0000000000000000
[ 1.582503] x8 : 0000000000000000 x7 : 7f7f7f7f7f7f7f7f x6 : 43485e525851ff30
[ 1.583140] x5 : fff00000ff6e9030 x4 : fff00000ff6e8f80 x3 : 0000000000000000
[ 1.583780] x2 : 0000000000000000 x1 : 0000000000000002 x0 : 0000000000000000
[ 1.584526] Call trace:
[ 1.584945] teardown_hyp_mode+0xe4/0x180 (P)
[ 1.585578] init_hyp_mode+0x920/0x994
[ 1.586005] kvm_arm_init+0xb4/0x25c
[ 1.586387] do_one_initcall+0xe0/0x258
[ 1.586819] do_initcall_level+0xa0/0xd4
[ 1.587224] do_initcalls+0x54/0x94
[ 1.587606] do_basic_setup+0x1c/0x28
[ 1.587998] kernel_init_freeable+0xc8/0x130
[ 1.588409] kernel_init+0x20/0x1a4
[ 1.588768] ret_from_fork+0x10/0x20
[ 1.589568] Code: f875db48 8b1c0109 f100011f 9a8903e8 (f9463100)
[ 1.590332] ---[ end trace 0000000000000000 ]---
As Quentin pointed, the order of free is also wrong, we need to free
SVE state first before freeing the per CPU ptrs.
I initially observed this on 6.12, but I could also repro in master.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Fixes: 66d5b53e20a6 ("KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM")
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250625123058.875179-1-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
host_stage2_adjust_range() tries to find the largest block mapping that
fits within a memory or mmio region (represented by a kvm_mem_range in
this function) during host stage-2 faults under pKVM. To do so, it walks
the host stage-2 page-table, finds the faulting PTE and its level, and
then progressively increments the level until it finds a granule of the
appropriate size. However, the condition in the loop implementing the
above is broken as it checks kvm_level_supports_block_mapping() for the
next level instead of the current, so pKVM may attempt to map a region
larger than can be covered with a single block.
This is not a security problem and is quite rare in practice (the
kvm_mem_range check usually forces host_stage2_adjust_range() to choose a
smaller granule), but this is clearly not the expected behaviour.
Refactor the loop to fix the bug and improve readability.
Fixes: c4f0935e4d95 ("KVM: arm64: Optimize host memory aborts")
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250625105548.984572-1-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The state of the vcpu's MI line should be asserted when its
ICH_HCR_EL2.En is set and ICH_MISR_EL2 is non-zero. Using bitwise AND
(&=) directly for this calculation will not give us the correct result
when the LSB of the vcpu's ICH_MISR_EL2 isn't set. Correct this by
directly computing the line level with a logical AND operation.
Signed-off-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw>
Link: https://lore.kernel.org/r/20250625084709.3968844-1-r09922117@csie.ntu.edu.tw
[maz: drop the level check from the original code]
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Fold kvm_arch_irqfd_route_changed() into kvm_arch_update_irqfd_routing().
Calling arch code to know whether or not to call arch code is absurd.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250611224604.313496-35-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Don't bother WARNing if updating an IRTE route fails now that vendor code
provides much more precise WARNs. The generic WARN doesn't provide enough
information to actually debug the problem, and has obviously done nothing
to surface the myriad bugs in KVM x86's implementation.
Drop all of the associated return code plumbing that existed just so that
common KVM could WARN.
Link: https://lore.kernel.org/r/20250611224604.313496-34-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When updating IRTEs in response to a GSI routing or IRQ bypass change,
pass the new/current routing information along with the associated irqfd.
This will allow KVM x86 to harden, simplify, and deduplicate its code.
Since adding/removing a bypass producer is now conveniently protected with
irqfds.lock, i.e. can't run concurrently with kvm_irq_routing_update(),
use the routing information cached in the irqfd instead of looking up
the information in the current GSI routing tables.
Opportunistically convert an existing printk() to pr_info() and put its
string onto a single line (old code that strictly adhered to 80 chars).
Link: https://lore.kernel.org/r/20250611224604.313496-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When unmapping a vLPI, WARN if nullifying vCPU affinity fails, not just if
failure occurs when freeing an ITE. If undoing vCPU affinity fails, then
odds are very good that vLPI state tracking has has gotten out of whack,
i.e. that KVM and the GIC disagree on the state of an IRQ/vLPI. At best,
inconsistent state means there is a lurking bug/flaw somewhere. At worst,
the inconsistency could eventually be fatal to the host, e.g. if an ITS
command fails because KVM's view of things doesn't match reality/hardware.
Note, only the call from kvm_arch_irq_bypass_del_producer() by way of
kvm_vgic_v4_unset_forwarding() doesn't already WARN. Common KVM's
kvm_irq_routing_update() WARNs if kvm_arch_update_irqfd_routing() fails.
For that path, if its_unmap_vlpi() fails in kvm_vgic_v4_unset_forwarding(),
the only possible causes are that the GIC doesn't have a v4 ITS (from
its_irq_set_vcpu_affinity()):
/* Need a v4 ITS */
if (!is_v4(its_dev->its))
return -EINVAL;
guard(raw_spinlock)(&its_dev->event_map.vlpi_lock);
/* Unmap request? */
if (!info)
return its_vlpi_unmap(d);
or that KVM has gotten out of sync with the GIC/ITS (from its_vlpi_unmap()):
if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d))
return -EINVAL;
All of the above failure scenarios are warnable offences, as they should
never occur absent a kernel/KVM bug.
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/all/aFWY2LTVIxz5rfhh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The VHE hyp code has recently gained a few ISBs. Simplify this to one
unconditional ISB in __kvm_vcpu_run_vhe(), and remove the unnecessary
ISB from the kvm_call_hyp_ret() macro.
While kvm_call_hyp_ret() is also used to invoke
__vgic_v3_get_gic_config(), but no ISB is necessary in that case either.
For the moment, an ISB is left in kvm_call_hyp(), as there are many more
users, and removing the ISB would require a more thorough audit.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250617133718.4014181-8-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The hyp code FPSIMD/SVE/SME trap handling logic has some rather messy
open-coded manipulation of CPTR/CPACR. This is benign for non-nested
guests, but broken for nested guests, as the guest hypervisor's CPTR
configuration is not taken into account.
Consider the case where L0 provides FPSIMD+SVE to an L1 guest
hypervisor, and the L1 guest hypervisor only provides FPSIMD to an L2
guest (with L1 configuring CPTR/CPACR to trap SVE usage from L2). If the
L2 guest triggers an FPSIMD trap to the L0 hypervisor,
kvm_hyp_handle_fpsimd() will see that the vCPU supports FPSIMD+SVE, and
will configure CPTR/CPACR to NOT trap FPSIMD+SVE before returning to the
L2 guest. Consequently the L2 guest would be able to manipulate SVE
state even though the L1 hypervisor had configured CPTR/CPACR to forbid
this.
Clean this up, and fix the nested virt issue by always using
__deactivate_cptr_traps() and __activate_cptr_traps() to manage the CPTR
traps. This removes the need for the ad-hoc fixup in
kvm_hyp_save_fpsimd_host(), and ensures that any guest hypervisor
configuration of CPTR/CPACR is taken into account.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250617133718.4014181-6-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
There's no need for fpsimd_sve_sync() to write to CPTR/CPACR. All
relevant traps are always disabled earlier within __kvm_vcpu_run(), when
__deactivate_cptr_traps() configures CPTR/CPACR.
With irrelevant details elided, the flow is:
handle___kvm_vcpu_run(...)
{
flush_hyp_vcpu(...) {
fpsimd_sve_flush(...);
}
__kvm_vcpu_run(...) {
__activate_traps(...) {
__activate_cptr_traps(...);
}
do {
__guest_enter(...);
} while (...);
__deactivate_traps(....) {
__deactivate_cptr_traps(...);
}
}
sync_hyp_vcpu(...) {
fpsimd_sve_sync(...);
}
}
Remove the unnecessary write to CPTR/CPACR. An ISB is still necessary,
so a comment is added to describe this requirement.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250617133718.4014181-5-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The NVHE/HVHE and VHE modes have separate implementations of
__activate_cptr_traps() and __deactivate_cptr_traps() in their
respective switch.c files. There's some duplication of logic, and it's
not currently possible to reuse this logic elsewhere.
Move the logic into the common switch.h header so that it can be reused,
and de-duplicate the common logic.
This rework changes the way SVE traps are deactivated in VHE mode,
aligning it with NVHE/HVHE modes:
* Before this patch, VHE's __deactivate_cptr_traps() would
unconditionally enable SVE for host EL2 (but not EL0), regardless of
whether the ARM64_SVE cpucap was set.
* After this patch, VHE's __deactivate_cptr_traps() will take the
ARM64_SVE cpucap into account. When ARM64_SVE is not set, SVE will be
trapped from EL2 and below.
The old and new behaviour are both benign:
* When ARM64_SVE is not set, the host will not touch SVE state, and will
not reconfigure SVE traps. Host EL0 access to SVE will be trapped as
expected.
* When ARM64_SVE is set, the host will configure EL0 SVE traps before
returning to EL0 as part of reloading the EL0 FPSIMD/SVE/SME state.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250617133718.4014181-4-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Currently there is no ISB between __deactivate_cptr_traps() disabling
traps that affect EL2 and fpsimd_lazy_switch_to_host() manipulating
registers potentially affected by CPTR traps.
When NV is not in use, this is safe because the relevant registers are
only accessed when guest_owns_fp_regs() && vcpu_has_sve(vcpu), and this
also implies that SVE traps affecting EL2 have been deactivated prior to
__guest_entry().
When NV is in use, a guest hypervisor may have configured SVE traps for
a nested context, and so it is necessary to have an ISB between
__deactivate_cptr_traps() and fpsimd_lazy_switch_to_host().
Due to the current lack of an ISB, when a guest hypervisor enables SVE
traps in CPTR, the host can take an unexpected SVE trap from within
fpsimd_lazy_switch_to_host(), e.g.
| Unhandled 64-bit el1h sync exception on CPU1, ESR 0x0000000066000000 -- SVE
| CPU: 1 UID: 0 PID: 164 Comm: kvm-vcpu-0 Not tainted 6.15.0-rc4-00138-ga05e0f012c05 #3 PREEMPT
| Hardware name: FVP Base RevC (DT)
| pstate: 604023c9 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : __kvm_vcpu_run+0x6f4/0x844
| lr : __kvm_vcpu_run+0x150/0x844
| sp : ffff800083903a60
| x29: ffff800083903a90 x28: ffff000801f4a300 x27: 0000000000000000
| x26: 0000000000000000 x25: ffff000801f90000 x24: ffff000801f900f0
| x23: ffff800081ff7720 x22: 0002433c807d623f x21: ffff000801f90000
| x20: ffff00087f730730 x19: 0000000000000000 x18: 0000000000000000
| x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
| x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
| x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
| x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff000801f90d70
| x5 : 0000000000001000 x4 : ffff8007fd739000 x3 : ffff000801f90000
| x2 : 0000000000000000 x1 : 00000000000003cc x0 : ffff800082f9d000
| Kernel panic - not syncing: Unhandled exception
| CPU: 1 UID: 0 PID: 164 Comm: kvm-vcpu-0 Not tainted 6.15.0-rc4-00138-ga05e0f012c05 #3 PREEMPT
| Hardware name: FVP Base RevC (DT)
| Call trace:
| show_stack+0x18/0x24 (C)
| dump_stack_lvl+0x60/0x80
| dump_stack+0x18/0x24
| panic+0x168/0x360
| __panic_unhandled+0x68/0x74
| el1h_64_irq_handler+0x0/0x24
| el1h_64_sync+0x6c/0x70
| __kvm_vcpu_run+0x6f4/0x844 (P)
| kvm_arm_vcpu_enter_exit+0x64/0xa0
| kvm_arch_vcpu_ioctl_run+0x21c/0x870
| kvm_vcpu_ioctl+0x1a8/0x9d0
| __arm64_sys_ioctl+0xb4/0xf4
| invoke_syscall+0x48/0x104
| el0_svc_common.constprop.0+0x40/0xe0
| do_el0_svc+0x1c/0x28
| el0_svc+0x30/0xcc
| el0t_64_sync_handler+0x10c/0x138
| el0t_64_sync+0x198/0x19c
| SMP: stopping secondary CPUs
| Kernel Offset: disabled
| CPU features: 0x0000,000002c0,02df4fb9,97ee773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: Unhandled exception ]---
Fix this by adding an ISB between __deactivate_traps() and
fpsimd_lazy_switch_to_host().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250617133718.4014181-3-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When KVM runs in non-protected VHE mode, there's no context
synchronization event between __debug_switch_to_host() restoring the
host debug registers and __kvm_vcpu_run() unmasking debug exceptions.
Due to this, it's theoretically possible for the host to take an
unexpected debug exception due to the stale guest configuration.
This cannot happen in NVHE/HVHE mode as debug exceptions are masked in
the hyp code, and the exception return to the host will provide the
necessary context synchronization before debug exceptions can be taken.
For now, avoid the problem by adding an ISB after VHE hyp code restores
the host debug registers.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250617133718.4014181-2-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Explicitly treat type differences as GSI routing changes, as comparing MSI
data between two entries could get a false negative, e.g. if userspace
changed the type but left the type-specific data as-
Note, the same bug was fixed in x86 by commit bcda70c56f3e ("KVM: x86:
Explicitly treat routing entry type changes as changes").
Fixes: 4bf3693d36af ("KVM: arm64: Unmap vLPIs affected by changes to GSI routing information")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250611224604.313496-3-seanjc@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Wei-Lin reports that the tracking of shadow list registers is
majorly broken when resync'ing the L2 state after a run, as
we confuse the guest's LR index with the host's, potentially
losing the interrupt state.
While this could be fixed by adding yet another side index to
track it (Wei-Lin's fix), it may be better to refactor this
code to avoid having a side index altogether, limiting the
risk to introduce this class of bugs.
A key observation is that the shadow index is always the number
of bits in the lr_map bitmap. With that, the parallel indexing
scheme can be completely dropped.
While doing this, introduce a couple of helpers that abstract
the index conversion and some of the LR repainting, making the
whole exercise much simpler.
Reported-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw>
Reviewed-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250614145721.2504524-1-r09922117@csie.ntu.edu.tw
Link: https://lore.kernel.org/r/86qzzkc5xa.wl-maz@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.16, take #2
- Rework of system register accessors for system registers that are
directly writen to memory, so that sanitisation of the in-memory
value happens at the correct time (after the read, or before the
write). For convenience, RMW-style accessors are also provided.
- Multiple fixes for the so-called "arch-timer-edge-cases' selftest,
which was always broken.
|
|
We are about to prevent the use of __vcpu_sys_reg() as a lvalue,
and getting the address of a rvalue is not a thing.
Update the couple of places where we do this to use the __ctxt_sys_reg()
accessor, which return the address of a register.
Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In a number of cases, we perform a Read-Modify-Write operation on
a system register, meaning that we would apply the RESx masks twice.
Instead, provide a new accessor that performs this RMW operation,
allowing the masks to be applied exactly once per operation.
Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Assigning a value to a system register doesn't do what it is
supposed to be doing if that register is one that has RESx bits.
The main problem is that we use __vcpu_sys_reg(), which can be used
both as a lvalue and rvalue. When used as a lvalue, the bit masking
occurs *before* the new value is assigned, meaning that we (1) do
pointless work on the old cvalue, and (2) potentially assign an
invalid value as we fail to apply the masks to it.
Fix this by providing a new __vcpu_assign_sys_reg() that does
what it says on the tin, and sanitises the *new* value instead of
the old one. This comes with a significant amount of churn.
Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Support for Virtual Trust Level (VTL) on arm64 (Roman Kisel)
- Fixes for Hyper-V UIO driver (Long Li)
- Fixes for Hyper-V PCI driver (Michael Kelley)
- Select CONFIG_SYSFB for Hyper-V guests (Michael Kelley)
- Documentation updates for Hyper-V VMBus (Michael Kelley)
- Enhance logging for hv_kvp_daemon (Shradha Gupta)
* tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (23 commits)
Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests
Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir
Documentation: hyperv: Update VMBus doc with new features and info
PCI: hv: Remove unnecessary flex array in struct pci_packet
Drivers: hv: Remove hv_alloc/free_* helpers
Drivers: hv: Use kzalloc for panic page allocation
uio_hv_generic: Align ring size to system page
uio_hv_generic: Use correct size for interrupt and monitor pages
Drivers: hv: Allocate interrupt and monitor pages aligned to system page boundary
arch/x86: Provide the CPU number in the wakeup AP callback
x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap()
PCI: hv: Get vPCI MSI IRQ domain from DeviceTree
ACPI: irq: Introduce acpi_get_gsi_dispatcher()
Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device()
Drivers: hv: vmbus: Get the IRQ number from DeviceTree
dt-bindings: microsoft,vmbus: Add interrupt and DMA coherence properties
arm64, x86: hyperv: Report the VTL the system boots in
arm64: hyperv: Initialize the Virtual Trust Level field
Drivers: hv: Provide arch-neutral implementation of get_vtl()
Drivers: hv: Enable VTL mode for arm64
...
|
|
Pull more kvm updates from Paolo Bonzini:
Generic:
- Clean up locking of all vCPUs for a VM by using the *_nest_lock()
family of functions, and move duplicated code to virt/kvm/. kernel/
patches acked by Peter Zijlstra
- Add MGLRU support to the access tracking perf test
ARM fixes:
- Make the irqbypass hooks resilient to changes in the GSI<->MSI
routing, avoiding behind stale vLPI mappings being left behind. The
fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
and nuking the vLPI mapping upon a routing change
- Close another VGIC race where vCPU creation races with VGIC
creation, leading to in-flight vCPUs entering the kernel w/o
private IRQs allocated
- Fix a build issue triggered by the recently added workaround for
Ampere's AC04_CPU_23 erratum
- Correctly sign-extend the VA when emulating a TLBI instruction
potentially targeting a VNCR mapping
- Avoid dereferencing a NULL pointer in the VGIC debug code, which
can happen if the device doesn't have any mapping yet
s390:
- Fix interaction between some filesystems and Secure Execution
- Some cleanups and refactorings, preparing for an upcoming big
series
x86:
- Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE
to fix a race between AP destroy and VMRUN
- Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for
the VM
- Refine and harden handling of spurious faults
- Add support for ALLOWED_SEV_FEATURES
- Add #VMGEXIT to the set of handlers special cased for
CONFIG_RETPOLINE=y
- Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing
features that utilize those bits
- Don't account temporary allocations in sev_send_update_data()
- Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock
Threshold
- Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU
IBPB, between SVM and VMX
- Advertise support to userspace for WRMSRNS and PREFETCHI
- Rescan I/O APIC routes after handling EOI that needed to be
intercepted due to the old/previous routing, but not the
new/current routing
- Add a module param to control and enumerate support for device
posted interrupts
- Fix a potential overflow with nested virt on Intel systems running
32-bit kernels
- Flush shadow VMCSes on emergency reboot
- Add support for SNP to the various SEV selftests
- Add a selftest to verify fastops instructions via forced emulation
- Refine and optimize KVM's software processing of the posted
interrupt bitmap, and share the harvesting code between KVM and the
kernel's Posted MSI handler"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
rtmutex_api: provide correct extern functions
KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer
KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race
KVM: arm64: Unmap vLPIs affected by changes to GSI routing information
KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()
KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock
KVM: arm64: Use lock guard in vgic_v4_set_forwarding()
KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation
arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue
KVM: s390: Simplify and move pv code
KVM: s390: Refactor and split some gmap helpers
KVM: s390: Remove unneeded srcu lock
s390: Remove unneeded includes
s390/uv: Improve splitting of large folios that cannot be split while dirty
s390/uv: Always return 0 from s390_wiggle_split_folio() if successful
s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful
rust: add helper for mutex_trylock
RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs
KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs
x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation
...
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.16, take #1
- Make the irqbypass hooks resilient to changes in the GSI<->MSI
routing, avoiding behind stale vLPI mappings being left behind. The
fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
and nuking the vLPI mapping upon a routing change.
- Close another VGIC race where vCPU creation races with VGIC
creation, leading to in-flight vCPUs entering the kernel w/o private
IRQs allocated.
- Fix a build issue triggered by the recently added workaround for
Ampere's AC04_CPU_23 erratum.
- Correctly sign-extend the VA when emulating a TLBI instruction
potentially targeting a VNCR mapping.
- Avoid dereferencing a NULL pointer in the VGIC debug code, which can
happen if the device doesn't have any mapping yet.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull compiler version requirement update from Arnd Bergmann:
"Require gcc-8 and binutils-2.30
x86 already uses gcc-8 as the minimum version, this changes all other
architectures to the same version. gcc-8 is used is Debian 10 and Red
Hat Enterprise Linux 8, both of which are still supported, and
binutils 2.30 is the oldest corresponding version on those.
Ubuntu Pro 18.04 and SUSE Linux Enterprise Server 15 both use gcc-7 as
the system compiler but additionally include toolchains that remain
supported.
With the new minimum toolchain versions, a number of workarounds for
older versions can be dropped, in particular on x86_64 and arm64.
Importantly, the updated compiler version allows removing two of the
five remaining gcc plugins, as support for sancov and structeak
features is already included in modern compiler versions.
I tried collecting the known changes that are possible based on the
new toolchain version, but expect that more cleanups will be possible.
Since this touches multiple architectures, I merged the patches
through the asm-generic tree."
* tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
Makefile.kcov: apply needed compiler option unconditionally in CFLAGS_KCOV
Documentation: update binutils-2.30 version reference
gcc-plugins: remove SANCOV gcc plugin
Kbuild: remove structleak gcc plugin
arm64: drop binutils version checks
raid6: skip avx512 checks
kbuild: require gcc-8 and binutils-2.30
|
|
Dan reports that iterating over a device ITEs can legitimately lead
to a NULL pointer, and that the NULL check is placed *after* the
pointer has already been dereferenced.
Hoist the pointer check as early as possible and be done with it.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 30deb51a677b ("KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables")
Link: https://lore.kernel.org/r/aDBylI1YnjPatAbr@stanley.mountain
Cc: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20250530091647.1152489-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|