Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks
Pull kthread updates from Frederic Weisbecker:
"Kthreads affinity follow either of 4 existing different patterns:
1) Per-CPU kthreads must stay affine to a single CPU and never
execute relevant code on any other CPU. This is currently handled
by smpboot code which takes care of CPU-hotplug operations.
Affinity here is a correctness constraint.
2) Some kthreads _have_ to be affine to a specific set of CPUs and
can't run anywhere else. The affinity is set through
kthread_bind_mask() and the subsystem takes care by itself to
handle CPU-hotplug operations. Affinity here is assumed to be a
correctness constraint.
3) Per-node kthreads _prefer_ to be affine to a specific NUMA node.
This is not a correctness constraint but merely a preference in
terms of memory locality. kswapd and kcompactd both fall into this
category. The affinity is set manually like for any other task and
CPU-hotplug is supposed to be handled by the relevant subsystem so
that the task is properly reaffined whenever a given CPU from the
node comes up. Also care should be taken so that the node affinity
doesn't cross isolated (nohz_full) cpumask boundaries.
4) Similar to the previous point except kthreads have a _preferred_
affinity different than a node. Both RCU boost kthreads and RCU
exp kworkers fall into this category as they refer to "RCU nodes"
from a distinctly distributed tree.
Currently the preferred affinity patterns (3 and 4) have at least 4
identified users, with more or less success when it comes to handle
CPU-hotplug operations and CPU isolation. Each of which do it in its
own ad-hoc way.
This is an infrastructure proposal to handle this with the following
API changes:
- kthread_create_on_node() automatically affines the created kthread
to its target node unless it has been set as per-cpu or bound with
kthread_bind[_mask]() before the first wake-up.
- kthread_affine_preferred() is a new function that can be called
right after kthread_create_on_node() to specify a preferred
affinity different than the specified node.
When the preferred affinity can't be applied because the possible
targets are offline or isolated (nohz_full), the kthread is affine to
the housekeeping CPUs (which means to all online CPUs most of the time
or only the non-nohz_full CPUs when nohz_full= is set).
kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been
converted, along with a few old drivers.
Summary of the changes:
- Consolidate a bunch of ad-hoc implementations of
kthread_run_on_cpu()
- Introduce task_cpu_fallback_mask() that defines the default last
resort affinity of a task to become nohz_full aware
- Add some correctness check to ensure kthread_bind() is always
called before the first kthread wake up.
- Default affine kthread to its preferred node.
- Convert kswapd / kcompactd and remove their halfway working ad-hoc
affinity implementation
- Implement kthreads preferred affinity
- Unify kthread worker and kthread API's style
- Convert RCU kthreads to the new API and remove the ad-hoc affinity
implementation"
* tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks:
kthread: modify kernel-doc function name to match code
rcu: Use kthread preferred affinity for RCU exp kworkers
treewide: Introduce kthread_run_worker[_on_cpu]()
kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format
rcu: Use kthread preferred affinity for RCU boost
kthread: Implement preferred affinity
mm: Create/affine kswapd to its preferred node
mm: Create/affine kcompactd to its preferred node
kthread: Default affine kthread to its preferred NUMA node
kthread: Make sure kthread hasn't started while binding it
sched,arm64: Handle CPU isolation on last resort fallback rq selection
arm64: Exclude nohz_full CPUs from 32bits el0 support
lib: test_objpool: Use kthread_run_on_cpu()
kallsyms: Use kthread_run_on_cpu()
soc/qman: test: Use kthread_run_on_cpu()
arm/bL_switcher: Use kthread_run_on_cpu()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Have fprobes built on top of function graph infrastructure
The fprobe logic is an optimized kprobe that uses ftrace to attach to
functions when a probe is needed at the start or end of the function.
The fprobe and kretprobe logic implements a similar method as the
function graph tracer to trace the end of the function. That is to
hijack the return address and jump to a trampoline to do the trace
when the function exits. To do this, a shadow stack needs to be
created to store the original return address. Fprobes and function
graph do this slightly differently. Fprobes (and kretprobes) has
slots per callsite that are reserved to save the return address. This
is fine when just a few points are traced. But users of fprobes, such
as BPF programs, are starting to add many more locations, and this
method does not scale.
The function graph tracer was created to trace all functions in the
kernel. In order to do this, when function graph tracing is started,
every task gets its own shadow stack to hold the return address that
is going to be traced. The function graph tracer has been updated to
allow multiple users to use its infrastructure. Now have fprobes be
one of those users. This will also allow for the fprobe and kretprobe
methods to trace the return address to become obsolete. With new
technologies like CFI that need to know about these methods of
hijacking the return address, going toward a solution that has only
one method of doing this will make the kernel less complex.
- Cleanup with guard() and free() helpers
There were several places in the code that had a lot of "goto out" in
the error paths to either unlock a lock or free some memory that was
allocated. But this is error prone. Convert the code over to use the
guard() and free() helpers that let the compiler unlock locks or free
memory when the function exits.
- Remove disabling of interrupts in the function graph tracer
When function graph tracer was first introduced, it could race with
interrupts and NMIs. To prevent that race, it would disable
interrupts and not trace NMIs. But the code has changed to allow NMIs
and also interrupts. This change was done a long time ago, but the
disabling of interrupts was never removed. Remove the disabling of
interrupts in the function graph tracer is it is not needed. This
greatly improves its performance.
- Allow the :mod: command to enable tracing module functions on the
kernel command line.
The function tracer already has a way to enable functions to be
traced in modules by writing ":mod:<module>" into set_ftrace_filter.
That will enable either all the functions for the module if it is
loaded, or if it is not, it will cache that command, and when the
module is loaded that matches <module>, its functions will be
enabled. This also allows init functions to be traced. But currently
events do not have that feature.
Because enabling function tracing can be done very early at boot up
(before scheduling is enabled), the commands that can be done when
function tracing is started is limited. Having the ":mod:" command to
trace module functions as they are loaded is very useful. Update the
kernel command line function filtering to allow it.
* tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
ftrace: Implement :mod: cache filtering on kernel command line
tracing: Adopt __free() and guard() for trace_fprobe.c
bpf: Use ftrace_get_symaddr() for kprobe_multi probes
ftrace: Add ftrace_get_symaddr to convert fentry_ip to symaddr
Documentation: probes: Update fprobe on function-graph tracer
selftests/ftrace: Add a test case for repeating register/unregister fprobe
selftests: ftrace: Remove obsolate maxactive syntax check
tracing/fprobe: Remove nr_maxactive from fprobe
fprobe: Add fprobe_header encoding feature
fprobe: Rewrite fprobe on function-graph tracer
s390/tracing: Enable HAVE_FTRACE_GRAPH_FUNC
ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
tracing: Add ftrace_fill_perf_regs() for perf event
tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
fprobe: Use ftrace_regs in fprobe exit handler
fprobe: Use ftrace_regs in fprobe entry handler
fgraph: Pass ftrace_regs to retfunc
fgraph: Replace fgraph_ret_regs with ftrace_regs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"We've got a little less than normal thanks to the holidays in
December, but there's the usual summary below. The highlight is
probably the 52-bit physical addressing (LPA2) clean-up from Ard.
Confidential Computing:
- Register a platform device when running in CCA realm mode to enable
automatic loading of dependent modules
CPU Features:
- Update a bunch of system register definitions to pick up new field
encodings from the architectural documentation
- Add hwcaps and selftests for the new (2024) dpISA extensions
Documentation:
- Update EL3 (firmware) requirements for booting Linux on modern
arm64 designs
- Remove stale information about the kernel virtual memory map
Miscellaneous:
- Minor cleanups and typo fixes
Memory management:
- Fix vmemmap_check_pmd() to look at the PMD type bits
- LPA2 (52-bit physical addressing) cleanups and minor fixes
- Adjust physical address space depending upon whether or not LPA2 is
enabled
Perf and PMUs:
- Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU
- Extend AXI filtering support for the DDR PMU on NXP IMX SoCs
- Fix Designware PCIe PMU event numbering
- Add generic branch events for the Apple M1 CPU PMU
- Add support for Marvell Odyssey DDR and LLC-TAD PMUs
- Cleanups to the Hisilicon DDRC and Uncore PMU code
- Advertise discard mode for the SPE PMU
- Add the perf users mailing list to our MAINTAINERS entry"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
Documentation: arm64: Remove stale and redundant virtual memory diagrams
perf docs: arm_spe: Document new discard mode
perf: arm_spe: Add format option for discard mode
MAINTAINERS: Add perf list for drivers/perf/
arm64: Remove duplicate included header
drivers/perf: apple_m1: Map generic branch events
arm64: rsi: Add automatic arm-cca-guest module loading
kselftest/arm64: Add 2024 dpISA extensions to hwcap test
KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1
arm64/hwcap: Describe 2024 dpISA extensions to userspace
arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12
arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented
drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association
arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu()
arm64: mm: Test for pmd_sect() in vmemmap_check_pmd()
arm64/mm: Replace open encodings with PXD_TABLE_BIT
arm64/mm: Rename pte_mkpresent() as pte_mkvalid()
arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09
...
|
|
* for-next/mm:
arm64: mm: Test for pmd_sect() in vmemmap_check_pmd()
arm64/mm: Replace open encodings with PXD_TABLE_BIT
arm64/mm: Rename pte_mkpresent() as pte_mkvalid()
arm64: Kconfig: force ARM64_PAN=y when enabling TTBR0 sw PAN
arm64/kvm: Avoid invalid physical addresses to signal owner updates
arm64/kvm: Configure HYP TCR.PS/DS based on host stage1
arm64/mm: Override PARange for !LPA2 and use it consistently
arm64/mm: Reduce PA space to 48 bits when LPA2 is not enabled
|
|
* for-next/misc:
arm64: Remove duplicate included header
arm64/Kconfig: Drop EXECMEM dependency from ARCH_WANTS_EXECMEM_LATE
arm64: asm: Fix typo in pgtable.h
arm64/mm: Ensure adequate HUGE_MAX_HSTATE
arm64/mm: Replace open encodings with PXD_TABLE_BIT
arm64/mm: Drop INIT_MM_CONTEXT()
|
|
* for-next/cpufeature:
kselftest/arm64: Add 2024 dpISA extensions to hwcap test
KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1
arm64/hwcap: Describe 2024 dpISA extensions to userspace
arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12
arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented
arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu()
arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64ISAR3_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64PFR2_EL1 to DDI0601 2024-09
arm64/sysreg: Get rid of CPACR_ELx SysregFields
arm64/sysreg: Convert *_EL12 accessors to Mapping
arm64/sysreg: Get rid of the TCR2_EL1x SysregFields
arm64/sysreg: Allow a 'Mapping' descriptor for system registers
arm64/cpufeature: Refactor conditional logic in init_cpu_ftr_reg()
arm64: cpufeature: Add HAFT to cpucap_is_possible()
|
|
* kvm-arm64/misc-6.14:
: .
: Misc KVM/arm64 changes for 6.14
:
: - Don't expose AArch32 EL0 capability when NV is enabled
:
: - Update documentation to reflect the full gamut of kvm-arm.mode
: behaviours
:
: - Use the hypervisor VA bit width when dumping stacktraces
:
: - Decouple the hypervisor stack size from PAGE_SIZE, at least
: on the surface...
:
: - Make use of str_enabled_disabled() when advertising GICv4.1 support
:
: - Explicitly handle BRBE traps as UNDEFINED
: .
KVM: arm64: Explicitly handle BRBE traps as UNDEFINED
KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()
arm64: kvm: Introduce nvhe stack size constants
KVM: arm64: Fix nVHE stacktrace VA bits mask
Documentation: Update the behaviour of "kvm-arm.mode"
KVM: arm64: nv: Advertise the lack of AArch32 EL0 support
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/nv-resx-fixes-6.14:
: .
: Fixes for NV sysreg accessors. From the cover letter:
:
: "Joey recently reported that some rather basic tests were failing on
: NV, and managed to track it down to critical register fields (such as
: HCR_EL2.E2H) not having their expect value.
:
: Further investigation has outlined a couple of critical issues:
:
: - Evaluating HCR_EL2.E2H must always be done with a sanitising
: accessor, no ifs, no buts. Given that KVM assumes a fixed value for
: this bit, we cannot leave it to the guest to mess with.
:
: - Resetting the sysreg file must result in the RESx bits taking
: effect. Otherwise, we may end-up making the wrong decision (see
: above), and we definitely expose invalid values to the guest. Note
: that because we compute the RESx masks very late in the VM setup, we
: need to apply these masks at that particular point as well.
: [...]"
: .
KVM: arm64: nv: Apply RESx settings to sysreg reset values
KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors
Signed-off-by: Marc Zyngier <maz@kernel.org>
# Conflicts:
# arch/arm64/kvm/nested.c
|
|
* kvm-arm64/coresight-6.14:
: .
: Trace filtering update from James Clark. From the cover letter:
:
: "The guest filtering rules from the Perf session are now honored for both
: nVHE and VHE modes. This is done by either writing to TRFCR_EL12 at the
: start of the Perf session and doing nothing else further, or caching the
: guest value and writing it at guest switch for nVHE. In pKVM, trace is
: now be disabled for both protected and unprotected guests."
: .
KVM: arm64: Fix selftests after sysreg field name update
coresight: Pass guest TRFCR value to KVM
KVM: arm64: Support trace filtering for guests
KVM: arm64: coresight: Give TRBE enabled state to KVM
coresight: trbe: Remove redundant disable call
arm64/sysreg/tools: Move TRFCR definitions to sysreg
tools: arm64: Update sysreg.h header files
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/nv-timers:
: .
: Nested Virt support for the EL2 timers. From the initial cover letter:
:
: "Here's another batch of NV-related patches, this time bringing in most
: of the timer support for EL2 as well as nested guests.
:
: The code is pretty convoluted for a bunch of reasons:
:
: - FEAT_NV2 breaks the timer semantics by redirecting HW controls to
: memory, meaning that a guest could setup a timer and never see it
: firing until the next exit
:
: - We go try hard to reflect the timer state in memory, but that's not
: great.
:
: - With FEAT_ECV, we can finally correctly emulate the virtual timer,
: but this emulation is pretty costly
:
: - As a way to make things suck less, we handle timer reads as early as
: possible, and only defer writes to the normal trap handling
:
: - Finally, some implementations are badly broken, and require some
: hand-holding, irrespective of NV support. So we try and reuse the NV
: infrastructure to make them usable. This could be further optimised,
: but I'm running out of patience for this sort of HW.
:
: [...]"
: .
KVM: arm64: nv: Fix doc header layout for timers
KVM: arm64: nv: Document EL2 timer API
KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosity
KVM: arm64: nv: Sanitise CNTHCTL_EL2
KVM: arm64: nv: Propagate CNTHCTL_EL2.EL1NV{P,V}CT bits
KVM: arm64: nv: Add trap routing for CNTHCTL_EL2.EL1{NVPCT,NVVCT,TVT,TVCT}
KVM: arm64: Handle counter access early in non-HYP context
KVM: arm64: nv: Accelerate EL0 counter accesses from hypervisor context
KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV in use
KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state
KVM: arm64: nv: Sync nested timer state with FEAT_NV2
KVM: arm64: nv: Add handling of EL2-specific timer registers
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
While we have sanitisation in place for the guest sysregs, we lack
that sanitisation out of reset. So some of the fields could be
evaluated and not reflect their RESx status, which sounds like
a very bad idea.
Apply the RESx masks to the the sysreg file in two situations:
- when going via a reset of the sysregs
- after having computed the RESx masks
Having this separate reset phase from the actual reset handling is
a bit grotty, but we need to apply this after the ID registers are
final.
Tested-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250112165029.1181056-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
A lot of the NV code depends on HCR_EL2.{E2H,TGE}, and we assume
in places that at least HCR_EL2.E2H is invariant for a given guest.
However, we make a point in *not* using the sanitising accessor
that would enforce this, and are at the mercy of the guest doing
stupid things. Clearly, that's not good.
Rework the HCR_EL2 accessors to use __vcpu_sys_reg() instead,
guaranteeing that the RESx settings get applied, specially
when HCR_EL2.E2H is evaluated. This results in fewer accessors
overall.
Huge thanks to Joey who spent a long time tracking this bug down.
Reported-by: Joey Gouly <Joey.Gouly@arm.com>
Tested-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250112165029.1181056-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
For nVHE, switch the filter value in and out if the Coresight driver
asks for it. This will support filters for guests when sinks other than
TRBE are used.
For VHE, just write the filter directly to TRFCR_EL1 where trace can be
used even with TRBE sinks.
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250106142446.628923-7-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Currently in nVHE, KVM has to check if TRBE is enabled on every guest
switch even if it was never used. Because it's a debug feature and is
more likely to not be used than used, give KVM the TRBE buffer status to
allow a much simpler and faster do-nothing path in the hyp.
Protected mode now disables trace regardless of TRBE (because
trfcr_while_in_guest is always 0), which was not previously done.
However, it continues to flush whenever the buffer is enabled
regardless of the filter status. This avoids the hypothetical case of a
host that had disabled the filter but not flushed which would arise if
only doing the flush when the filter was enabled.
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250106142446.628923-6-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Convert TRFCR to automatic generation. Add separate definitions for ELx
and EL2 as TRFCR_EL1 doesn't have CX. This also mirrors the previous
definition so no code change is required.
Also add TRFCR_EL12 which will start to be used in a later commit.
Unfortunately, to avoid breaking the Perf build with duplicate
definition errors, the tools copy of the sysreg.h header needs to be
updated at the same time rather than the usual second commit. This is
because the generated version of sysreg
(arch/arm64/include/generated/asm/sysreg-defs.h), is currently shared
and tools/ does not have its own copy.
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250106142446.628923-4-james.clark@linaro.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pkvm-fixed-features-6.14: (24 commits)
: .
: Complete rework of the pKVM handling of features, catching up
: with the rest of the code deals with it these days.
: Patches courtesy of Fuad Tabba. From the cover letter:
:
: "This patch series uses the vm's feature id registers to track the
: supported features, a framework similar to nested virt to set the
: trap values, and removes the need to store cptr_el2 per vcpu in
: favor of setting its value when traps are activated, as VHE mode
: does."
:
: This branch drags the arm64/for-next/cpufeature branch to solve
: ugly conflicts in -next.
: .
KVM: arm64: Fix FEAT_MTE in pKVM
KVM: arm64: Use kvm_vcpu_has_feature() directly for struct kvm
KVM: arm64: Convert the SVE guest vcpu flag to a vm flag
KVM: arm64: Remove PtrAuth guest vcpu flag
KVM: arm64: Fix the value of the CPTR_EL2 RES1 bitmask for nVHE
KVM: arm64: Refactor kvm_reset_cptr_el2()
KVM: arm64: Calculate cptr_el2 traps on activating traps
KVM: arm64: Remove redundant setting of HCR_EL2 trap bit
KVM: arm64: Remove fixed_config.h header
KVM: arm64: Rework specifying restricted features for protected VMs
KVM: arm64: Set protected VM traps based on its view of feature registers
KVM: arm64: Fix RAS trapping in pKVM for protected VMs
KVM: arm64: Initialize feature id registers for protected VMs
KVM: arm64: Use KVM extension checks for allowed protected VM capabilities
KVM: arm64: Remove KVM_ARM_VCPU_POWER_OFF from protected VMs allowed features in pKVM
KVM: arm64: Move checking protected vcpu features to a separate function
KVM: arm64: Group setting traps for protected VMs by control register
KVM: arm64: Consolidate allowed and restricted VM feature checks
arm64/sysreg: Get rid of CPACR_ELx SysregFields
arm64/sysreg: Convert *_EL12 accessors to Mapping
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
# Conflicts:
# arch/arm64/kvm/fpsimd.c
# arch/arm64/kvm/hyp/nvhe/pkvm.c
|
|
* kvm-arm64/pkvm-np-guest:
: .
: pKVM support for non-protected guests using the standard MM
: infrastructure, courtesy of Quentin Perret. From the cover letter:
:
: "This series moves the stage-2 page-table management of non-protected
: guests to EL2 when pKVM is enabled. This is only intended as an
: incremental step towards a 'feature-complete' pKVM, there is however a
: lot more that needs to come on top.
:
: With that series applied, pKVM provides near-parity with standard KVM
: from a functional perspective all while Linux no longer touches the
: stage-2 page-tables itself at EL1. The majority of mm-related KVM
: features work out of the box, including MMU notifiers, dirty logging,
: RO memslots and things of that nature. There are however two gotchas:
:
: - We don't support mapping devices into guests: this requires
: additional hypervisor support for tracking the 'state' of devices,
: which will come in a later series. No device assignment until then.
:
: - Stage-2 mappings are forced to page-granularity even when backed by a
: huge page for the sake of simplicity of this series. I'm only aiming
: at functional parity-ish (from userspace's PoV) for now, support for
: HP can be added on top later as a perf improvement."
: .
KVM: arm64: Plumb the pKVM MMU in KVM
KVM: arm64: Introduce the EL1 pKVM MMU
KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()
KVM: arm64: Introduce __pkvm_host_wrprotect_guest()
KVM: arm64: Introduce __pkvm_host_relax_guest_perms()
KVM: arm64: Introduce __pkvm_host_unshare_guest()
KVM: arm64: Introduce __pkvm_host_share_guest()
KVM: arm64: Introduce __pkvm_vcpu_{load,put}()
KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers
KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function
KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms
KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung
KVM: arm64: Move host page ownership tracking to the hyp vmemmap
KVM: arm64: Make hyp_page::order a u8
KVM: arm64: Move enum pkvm_page_state to memory.h
KVM: arm64: Change the layout of enum pkvm_page_state
Signed-off-by: Marc Zyngier <maz@kernel.org>
# Conflicts:
# arch/arm64/kvm/arm.c
|
|
kvm-arm64/pkvm-fixed-features-6.14
Merge arm64/for-next/cpufeature to solve extensive conflicts
caused by the CPACR_ELx->CPACR_EL1 repainting.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The header asm/unistd_compat_32.h is included whether CONFIG_COMPAT is
defined or not.
Include it only once and remove the following make includecheck warning:
asm/unistd_compat_32.h is included more than once
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250109104636.124507-2-thorsten.blum@linux.dev
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Remove all hyperv-tlfs.h files. These are no longer included
anywhere. hyperv/hvhdk.h serves the same role, but with an easier
path for adding new definitions.
Remove the relevant lines in MAINTAINERS.
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Link: https://lore.kernel.org/r/1732577084-2122-6-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1732577084-2122-6-git-send-email-nunodasneves@linux.microsoft.com>
|
|
Switch to using hvhdk.h everywhere in the kernel. This header
includes all the new Hyper-V headers in include/hyperv, which form a
superset of the definitions found in hyperv-tlfs.h.
This makes it easier to add new Hyper-V interfaces without being
restricted to those in the TLFS doc (reflected in hyperv-tlfs.h).
To be more consistent with the original Hyper-V code, the names of
some definitions are changed slightly. Update those where needed.
Update comments in mshyperv.h files to point to include/hyperv for
adding new definitions.
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Link: https://lore.kernel.org/r/1732577084-2122-5-git-send-email-nunodasneves@linux.microsoft.com
Link: https://lore.kernel.org/r/20250108222138.1623703-3-romank@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
When a kthread or any other task has an affinity mask that is fully
offline or unallowed, the scheduler reaffines the task to all possible
CPUs as a last resort.
This default decision doesn't mix up very well with nohz_full CPUs that
are part of the possible cpumask but don't want to be disturbed by
unbound kthreads or even detached pinned user tasks.
Make the fallback affinity setting aware of nohz_full.
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
|
The TSM module provides guest identification and attestation when a
guest runs in CCA realm mode. By creating a dummy platform device,
let's ensure the module is automatically loaded. The udev daemon loads
the TSM module after it receives a device addition event. Once that
happens, it can be used earlier in the boot process to decrypt the
rootfs.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20241220181236.172060-2-jeremy.linton@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The 2024 dpISA introduces a number of architecture features all of which
only add new instructions so only require the addition of hwcaps and ID
register visibility.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250107-arm64-2024-dpisa-v5-3-7578da51fc3d@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Refactor nvhe stack code to use NVHE_STACK_SIZE/SHIFT constants,
instead of directly using PAGE_SIZE/SHIFT. This makes the code a bit
easier to read, without introducing any functional changes.
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Link: https://lore.kernel.org/r/20241112003336.1375584-1-kaleshsingh@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The hypervisor VA space size depends on both the ID map's
(IDMAP_VA_BITS) and the kernel stage-1 (VA_BITS). However, the
hypervisor stacktrace decoding is solely relying on VA_BITS. This is
especially an issue when VA_BITS < IDMAP_VA_BITS (i.e. VA_BITS is
39-bit): the hypervisor may have addresses bigger than the stacktrace is
masking.
Align this mask with hyp_va_bits.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250107112821.416591-1-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
[pgd|p4d]_bad() helpers have open encodings for their respective table bits
which can be replaced with corresponding macros. This makes things clearer,
thus improving their readability as well.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20250107015529.798319-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
pte_present() is no longer synonymous with pte_valid() as it also tests for
pte_present_invalid() as well. Hence pte_mkpresent() is misleading, because
all that does is make an entry mapped, via setting PTE_VALID. Hence rename
the helper as pte_mkvalid() which reflects its functionality appropriately.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250107023016.829416-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
It appears that on Qualcomm's x1e CPU, CNTVOFF_EL2 doesn't really
work, specially with HCR_EL2.E2H=1.
A non-zero offset results in a screaming virtual timer interrupt,
to the tune of a few 100k interrupts per second on a 4 vcpu VM.
This is also evidenced by this CPU's inability to correctly run
any of the timer selftests.
The only case this doesn't break is when this register is set to 0,
which breaks VM migration.
When HCR_EL2.E2H=0, the timer seems to behave normally, and does
not result in an interrupt storm.
As a workaround, use the fact that this CPU implements FEAT_ECV,
and trap all accesses to the virtual timer and counter, keeping
CNTVOFF_EL2 set to zero, and emulate accesses to CVAL/TVAL/CTL
and the counter itself, fixing up the timer to account for the
missing offset.
And if you think this is disgusting, you'd probably be right.
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-12-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Inject some sanity in CNTHCTL_EL2, ensuring that we don't handle
more than we advertise to the guest.
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add the required handling for EL2 and EL02 registers, as
well as EL1 registers used in the E2H context. This includes
handling the virtual timer accesses when CNTHCTL_EL2.EL1TVT
or CNTHCTL_EL2.EL1TVCT are set.
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
This introduces ftrace_get_symaddr() which tries to convert fentry_ip
passed by ftrace or fgraph callback to symaddr without calling
kallsyms API. It returns the symbol address or 0 if it fails to
convert it.
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/173519011487.391279.5450806886342723151.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412061423.K79V55Hd-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202412061804.5VRzF14E-lkp@intel.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Fprobe store its data structure address and size on the fgraph return stack
by __fprobe_header. But most 64bit architecture can combine those to
one unsigned long value because 4 MSB in the kernel address are the same.
With this encoding, fprobe can consume less space on ret_stack.
This introduces asm/fprobe.h to define arch dependent encode/decode
macros. Note that since fprobe depends on CONFIG_HAVE_FUNCTION_GRAPH_FREGS,
currently only arm64, loongarch, riscv, s390 and x86 are supported.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173519005783.391279.5307910947400277525.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Rewrite fprobe implementation on function-graph tracer.
Major API changes are:
- 'nr_maxactive' field is deprecated.
- This depends on CONFIG_DYNAMIC_FTRACE_WITH_ARGS or
!CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and
CONFIG_HAVE_FUNCTION_GRAPH_FREGS. So currently works only
on x86_64.
- Currently the entry size is limited in 15 * sizeof(long).
- If there is too many fprobe exit handler set on the same
function, it will fail to probe.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/173519003970.391279.14406792285453830996.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add ftrace_fill_perf_regs() which should be compatible with the
perf_fetch_caller_regs(). In other words, the pt_regs returned from the
ftrace_fill_perf_regs() must satisfy 'user_mode(regs) == false' and can be
used for stack tracing.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/173518997908.391279.15910334347345106424.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add ftrace_partial_regs() which converts the ftrace_regs to pt_regs.
This is for the eBPF which needs this to keep the same pt_regs interface
to access registers.
Thus when replacing the pt_regs with ftrace_regs in fprobes (which is
used by kprobe_multi eBPF event), this will be used.
If the architecture defines its own ftrace_regs, this copies partial
registers to pt_regs and returns it. If not, ftrace_regs is the same as
pt_regs and ftrace_partial_regs() will return ftrace_regs::regs.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Florent Revest <revest@chromium.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Link: https://lore.kernel.org/173518996761.391279.4987911298206448122.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Use ftrace_regs instead of fgraph_ret_regs for tracing return value
on function_graph tracer because of simplifying the callback interface.
The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by
CONFIG_HAVE_FUNCTION_GRAPH_FREGS.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173518991508.391279.16635322774382197642.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The vcpu flag GUEST_HAS_SVE is per-vcpu, but it is based on what
is now a per-vm feature. Make the flag per-vm.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-17-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The vcpu flag GUEST_HAS_PTRAUTH is always associated with the
vcpu PtrAuth features, which are defined per vm rather than per
vcpu.
Remove the flag, and replace it with checks for the features
instead.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-16-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Since the introduction of SME, bit 12 in CPTR_EL2 (nVHE) is TSM
for trapping SME, instead of RES1, as per ARM ARM DDI 0487K.a,
section D23.2.34.
Fix the value of CPTR_NVHE_EL2_RES1 to reflect that, and adjust
the code that relies on it accordingly.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-15-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Fold kvm_get_reset_cptr_el2() into kvm_reset_cptr_el2(), since it
is its only caller. Add a comment to clarify that this function
is meant for the host value of cptr_el2.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-14-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Similar to VHE, calculate the value of cptr_el2 from scratch on
activate traps. This removes the need to store cptr_el2 in every
vcpu structure. Moreover, some traps, such as whether the guest
owns the fp registers, need to be set on every vcpu run.
Reported-by: James Clark <james.clark@linaro.org>
Fixes: 5294afdbf45a ("KVM: arm64: Exclude FP ownership from kvm_vcpu_arch")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-13-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The existing code didn't properly distinguish between signed and
unsigned features, and was difficult to read and to maintain.
Rework it using the same method used in other parts of KVM when
handling vcpu features.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-10-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Use KVM extension checks as the source for determining which
capabilities are allowed for protected VMs. KVM extension checks
is the natural place for this, since it is also the interface
exposed to users.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Introduce the KVM_PGT_CALL() helper macro to allow switching from the
traditional pgtable code to the pKVM version easily in mmu.c. The cost
of this 'indirection' is expected to be very minimal due to
is_protected_kvm_enabled() being backed by a static key.
With this, everything is in place to allow the delegation of
non-protected guest stage-2 page-tables to pKVM, so let's stop using the
host's kvm_s2_mmu from EL2 and enjoy the ride.
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-19-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Introduce a set of helper functions allowing to manipulate the pKVM
guest stage-2 page-tables from EL1 using pKVM's HVC interface.
Each helper has an exact one-to-one correspondance with the traditional
kvm_pgtable_stage2_*() functions from pgtable.c, with a strictly
matching prototype. This will ease plumbing later on in mmu.c.
These callbacks track the gfn->pfn mappings in a simple rb_tree indexed
by IPA in lieu of a page-table. This rb-tree is kept in sync with pKVM's
state and is protected by the mmu_lock like a traditional stage-2
page-table.
Signed-off-by: Quentin Perret <qperret@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-18-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Introduce a new hypercall to flush the TLBs of non-protected guests. The
host kernel will be responsible for issuing this hypercall after changing
stage-2 permissions using the __pkvm_host_relax_guest_perms() or
__pkvm_host_wrprotect_guest() paths. This is left under the host's
responsibility for performance reasons.
Note however that the TLB maintenance for all *unmap* operations still
remains entirely under the hypervisor's responsibility for security
reasons -- an unmapped page may be donated to another entity, so a stale
TLB entry could be used to leak private data.
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-17-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Plumb the kvm_pgtable_stage2_mkyoung() callback into pKVM for
non-protected guests. It will be called later from the fault handling
path.
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-16-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Plumb the kvm_stage2_test_clear_young() callback into pKVM for
non-protected guest. It will be later be called from MMU notifiers.
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-15-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Introduce a new hypercall to remove the write permission from a
non-protected guest stage-2 mapping. This will be used for e.g. enabling
dirty logging.
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-14-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|