summaryrefslogtreecommitdiff
path: root/arch/arm64/include
AgeCommit message (Collapse)Author
2024-09-04Merge remote-tracking branch 'kvmarm/arm64-shared-6.12' into for-next/poeWill Deacon
Pull in the AT instruction conversion patch from the KVM arm64 tree, as this is a shared dependency between the POE series from Joey and the AT emulation series for Nested Virtualisation from Marc.
2024-09-03arch, mm: move definition of node_data to generic codeMike Rapoport (Microsoft)
Every architecture that supports NUMA defines node_data in the same way: struct pglist_data *node_data[MAX_NUMNODES]; No reason to keep multiple copies of this definition and its forward declarations, especially when such forward declaration is the only thing in include/asm/mmzone.h for many architectures. Add definition and declaration of node_data to generic code and drop architecture-specific versions. Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-08-30arm64: mm: Add confidential computing hook to ioremap_prot()Will Deacon
Confidential Computing environments such as pKVM and Arm's CCA distinguish between shared (i.e. emulated) and private (i.e. assigned) MMIO regions. Introduce a hook into our implementation of ioremap_prot() so that MMIO regions can be shared if necessary. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-6-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30arm64: mm: Add top-level dispatcher for internal mem_encrypt APIWill Deacon
Implementing the internal mem_encrypt API for arm64 depends entirely on the Confidential Computing environment in which the kernel is running. Introduce a simple dispatcher so that backend hooks can be registered depending upon the environment in which the kernel finds itself. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-4-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30drivers/virt: pkvm: Add initial support for running as a protected guestWill Deacon
Implement a pKVM protected guest driver to probe the presence of pKVM and determine the memory protection granule using the HYP_MEMINFO hypercall. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-3-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30firmware/smccc: Call arch-specific hook on discovering KVM servicesMarc Zyngier
arm64 will soon require its own callback to initialise services that are only available on this architecture. Introduce a hook that can be overloaded by the architecture. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-2-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30arm64: errata: Enable the AC03_CPU_38 workaround for ampere1aD Scott Phillips
The ampere1a cpu is affected by erratum AC04_CPU_10 which is the same bug as AC03_CPU_38. Add ampere1a to the AC03_CPU_38 workaround midr list. Cc: <stable@vger.kernel.org> Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827211701.2216719-1-scott@os.amperecomputing.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30KVM: arm64: nv: Add support for FEAT_ATS1AMarc Zyngier
Handling FEAT_ATS1A (which provides the AT S1E{1,2}A instructions) is pretty easy, as it is just the usual AT without the permission check. This basically amounts to plumbing the instructions in the various dispatch tables, and handling FEAT_ATS1A being disabled in the ID registers. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: nv: Make ps_to_output_size() generally availableMarc Zyngier
Make this helper visible to at.c, we are going to need it. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W}Marc Zyngier
On the face of it, AT S12E{0,1}{R,W} is pretty simple. It is the combination of AT S1E{0,1}{R,W}, followed by an extra S2 walk. However, there is a great deal of complexity coming from combining the S1 and S2 attributes to report something consistent in PAR_EL1. This is an absolute mine field, and I have a splitting headache. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: nv: Add basic emulation of AT S1E2{R,W}Marc Zyngier
Similar to our AT S1E{0,1} emulation, we implement the AT S1E2 handling. This emulation of course suffers from the same problems, but is somehow simpler due to the lack of PAN2 and the fact that we are guaranteed to execute it from the correct context. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: nv: Add basic emulation of AT S1E{0,1}{R,W}Marc Zyngier
Emulating AT instructions is one the tasks devolved to the host hypervisor when NV is on. Here, we take the basic approach of emulating AT S1E{0,1}{R,W} using the AT instructions themselves. While this mostly work, it doesn't *always* work: - S1 page tables can be swapped out - shadow S2 can be incomplete and not contain mappings for the S1 page tables We are not trying to handle these case here, and defer it to a later patch. Suitable comments indicate where we are in dire need of better handling. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: nv: Turn upper_attr for S2 walk into the full descriptorMarc Zyngier
The upper_attr attribute has been badly named, as it most of the time carries the full "last walked descriptor". Rename it to "desc" and make ti contain the full 64bit descriptor. This will be used by the S1 PTW. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: nv: Enforce S2 alignment when contiguous bit is setMarc Zyngier
Despite KVM not using the contiguous bit for anything related to TLBs, the spec does require that the alignment defined by the contiguous bit for the page size and the level is enforced. Add the required checks to offset the point where PA and VA merge. Fixes: 61e30b9eef7f ("KVM: arm64: nv: Implement nested Stage-2 page table walk logic") Reported-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30arm64: Add ESR_ELx_FSC_ADDRSZ_L() helperMarc Zyngier
Although we have helpers that encode the level of a given fault type, the Address Size fault type is missing it. While we're at it, fix the bracketting for ESR_ELx_FSC_ACCESS_L() and ESR_ELx_FSC_PERM_L(). Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30arm64: Add system register encoding for PSTATE.PANMarc Zyngier
Although we already have the primitives to set PSTATE.PAN with an immediate, we don't have a way to read the current state nor set it ot an arbitrary value (i.e. we can generally save/restore it). Thankfully, all that is missing for this is the definition for the PAN pseudo system register, here named SYS_PSTATE_PAN. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30arm64: Add PAR_EL1 field descriptionMarc Zyngier
As KVM is about to grow a full emulation for the AT instructions, add the layout of the PAR_EL1 register in its non-D128 configuration. Note that the constants are a bit ugly, as the register has two layouts, based on the state of the F bit. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30arm64: Add missing APTable and TCR_ELx.HPD masksMarc Zyngier
Although Linux doesn't make use of hierarchical permissions (TFFT!), KVM needs to know where the various bits related to this feature live in the TCR_ELx registers as well as in the page tables. Add the missing bits. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30KVM: arm64: Make kvm_at() take an OP_AT_*Joey Gouly
To allow using newer instructions that current assemblers don't know about, replace the `at` instruction with the underlying SYS instruction. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Add ICH_HCR_EL2 to the vcpu stateMarc Zyngier
As we are about to describe the trap routing for ICH_HCR_EL2, add the register to the vcpu state in its VNCR form, as well as reset Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27arm64: Implement prctl(PR_{G,S}ET_TSC)Peter Collingbourne
On arm64, this prctl controls access to CNTVCT_EL0, CNTVCTSS_EL0 and CNTFRQ_EL0 via CNTKCTL_EL1.EL0VCTEN. Since this bit is also used to implement various erratum workarounds, check whether the CPU needs a workaround whenever we potentially need to change it. This is needed for a correct implementation of non-instrumenting record-replay debugging on arm64 (i.e. rr; https://rr-project.org/). rr must trap and record any sources of non-determinism from the userspace program's perspective so it can be replayed later. This includes the results of syscalls as well as the results of access to architected timers exposed directly to the program. This prctl was originally added for x86 by commit 8fb402bccf20 ("generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSC"), and rr uses it to trap RDTSC on x86 for the same reason. We also considered exposing this as a PTRACE_EVENT. However, prctl seems like a better choice for these reasons: 1) In general an in-process control seems more useful than an out-of-process control, since anything that you would be able to do with ptrace could also be done with prctl (tracer can inject a call to the prctl and handle signal-delivery-stops), and it avoids needing an additional process (which will complicate debugging of the ptraced process since it cannot have more than one tracer, and will be incompatible with ptrace_scope=3) in cases where that is not otherwise necessary. 2) Consistency with x86_64. Note that on x86_64, RDTSC has been there since the start, so it's the same situation as on arm64. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/I233a1867d1ccebe2933a347552e7eae862344421 Link: https://lore.kernel.org/r/20240824015415.488474-1-pcc@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-27KVM: arm64: Add save/restore support for FPMRMarc Zyngier
Just like the rest of the FP/SIMD state, FPMR needs to be context switched. The only interesting thing here is that we need to treat the pKVM part a bit differently, as the host FP state is never written back to the vcpu thread, but instead stored locally and eagerly restored. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Move FPMR into the sysreg arrayMarc Zyngier
Just like SVCR, FPMR is currently stored at the wrong location. Let's move it where it belongs. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Add predicate for FPMR support in a VMMarc Zyngier
As we are about to check for the advertisement of FPMR support to a guest in a number of places, add a predicate that will gate most of the support code for FPMR. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Move SVCR into the sysreg arrayMarc Zyngier
SVCR is just a system register, and has no purpose being outside of the sysreg array. If anything, it only makes it more difficult to eventually support SME one day. If ever. Move it into the array with its little friends, and associate it with a visibility predicate. Although this is dead code, it at least paves the way for the next set of FP-related extensions. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-16Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Fix the arm64 __get_mem_asm() to use the _ASM_EXTABLE_##type##ACCESS() macro instead of the *_ERR() one in order to avoid writing -EFAULT to the value register in case of a fault - Initialise all elements of the acpi_early_node_map[] to NUMA_NO_NODE. Prior to this fix, only the first element was initialised - Move the KASAN random tag seed initialisation after the per-CPU areas have been initialised (prng_state is __percpu) * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix KASAN random tag seed initialization arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE arm64: uaccess: correct thinko in __get_mem_asm()
2024-08-16perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counterRob Herring (Arm)
Armv9.4/8.9 PMU adds optional support for a fixed instruction counter similar to the fixed cycle counter. Support for the feature is indicated in the ID_AA64DFR1_EL1 register PMICNTR field. The counter is not accessible in AArch32. Existing userspace using direct counter access won't know how to handle the fixed instruction counter, so we have to avoid using the counter when user access is requested. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-7-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL accessRob Herring (Arm)
ARMV8_PMU_COUNTER_MASK is really a mask for the PMSELR_EL0.SEL register field. Make that clear by adding a standard sysreg definition for the register, and using it instead. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-4-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16KVM: arm64: pmu: Use arm_pmuv3.h register accessorsRob Herring (Arm)
Commit df29ddf4f04b ("arm64: perf: Abstract system register accesses away") split off PMU register accessor functions to a standalone header. Let's use it for KVM PMU code and get rid one copy of the ugly switch macro. Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-3-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16perf: arm_pmuv3: Prepare for more than 32 countersRob Herring (Arm)
Various PMUv3 registers which are a mask of counters are 64-bit registers, but the accessor functions take a u32. This has been fine as the upper 32-bits have been RES0 as there has been a maximum of 32 counters prior to Armv9.4/8.9. With Armv9.4/8.9, a 33rd counter is added. Update the accessor functions to use a u64 instead. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-2-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16arm64/sve: Remove unused declaration read_smcr_features()Yue Haibing
Commit 391208485c3a ("arm64/sve: Remove SMCR pseudo register from cpufeature code") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20240810093944.2587809-1-yuehaibing@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16arm64: mm: Remove unused declaration early_io_map()Yue Haibing
Commit bf4b558eba92 ("arm64: add early_ioremap support") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240805140038.1366033-1-yuehaibing@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16arm64: el2_setup.h: Rename some labels to be more diff-friendlyDave Martin
A minor anti-pattern has established itself in __init_el2_fgt, where each block of instructions is skipped by jumping to a label named for the next (typically unrelated) block. This makes diffs more noisy than necessary, since appending each new block to deal with some new architecture feature now requires altering a branch destination in the existing code. Fix it by naming the affected labels based on the block that is skipping itself instead, as is done elsewhere in the el2_setup code. No functional change. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Link: https://lore.kernel.org/r/20240729162542.367059-1-Dave.Martin@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16arm64: signal: Fix some under-bracketed UAPI macrosDave Martin
A few SME-related sigcontext UAPI macros leave an argument unprotected from misparsing during macro expansion. Add parentheses around references to macro arguments where appropriate. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Fixes: ee072cf70804 ("arm64/sme: Implement signal handling for ZT") Fixes: 39782210eb7e ("arm64/sme: Implement ZA signal handling") Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240729152005.289844-1-Dave.Martin@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16arm64/mm: Drop PMD_SECT_VALIDAnshuman Khandual
This just drops off the macro PMD_SECT_VALID which remains unused. Because macro PMD_TYPE_SECT with same value (_AT(pmdval_t, 1) << 0), gets used for creating or updating given block mappings. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20240724044712.602210-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-14arm64: uaccess: correct thinko in __get_mem_asm()Mark Rutland
In the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y version of __get_mem_asm(), we incorrectly use _ASM_EXTABLE_##type##ACCESS_ERR() such that upon a fault the extable fixup handler writes -EFAULT into "%w0", which is the register containing 'x' (the result of the load). This was a thinko in commit: 86a6a68febfcf57b ("arm64: start using 'asm goto' for get_user() when available") Prior to that commit _ASM_EXTABLE_##type##ACCESS_ERR_ZERO() was used such that the extable fixup handler wrote -EFAULT into "%w0" (the register containing 'err'), and zero into "%w1" (the register containing 'x'). When the 'err' variable was removed, the extable entry was updated incorrectly. Writing -EFAULT to the value register is unnecessary but benign: * We never want -EFAULT in the value register, and previously this would have been zeroed in the extable fixup handler. * In __get_user_error() the value is overwritten with zero explicitly in the error path. * The asm goto outputs cannot be used when the goto label is taken, as older compilers (e.g. clang < 16.0.0) do not guarantee that asm goto outputs are usable in this path and may use a stale value rather than the value in an output register. Consequently, zeroing in the extable fixup handler is insufficient to ensure callers see zero in the error path. * The expected usage of unsafe_get_user() and get_kernel_nofault() requires that the value is not consumed in the error path. Some versions of GCC would mis-compile asm goto with outputs, and erroneously omit subsequent assignments, breaking the error path handling in __get_user_error(). This was discussed at: https://lore.kernel.org/lkml/ZpfxLrJAOF2YNqCk@J2N7QTR9R3.cambridge.arm.com/ ... and was fixed by removing support for asm goto with outputs on those broken compilers in commit: f2f6a8e887172503 ("init/Kconfig: remove CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND") With that out of the way, we can safely replace the usage of _ASM_EXTABLE_##type##ACCESS_ERR() with _ASM_EXTABLE_##type##ACCESS(), leaving the value register unchanged in the case a fault is taken, as was originally intended. This matches other architectures and matches our __put_mem_asm(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240807103731.2498893-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-13Merge tag 'kvmarm-fixes-6.11-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.11, round #1 - Use kvfree() for the kvmalloc'd nested MMUs array - Set of fixes to address warnings in W=1 builds - Make KVM depend on assembler support for ARMv8.4 - Fix for vgic-debug interface for VMs without LPIs - Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest - Minor code / comment cleanups for configuring PAuth traps - Take kvm->arch.config_lock to prevent destruction / initialization race for a vCPU's CPUIF which may lead to a UAF
2024-08-07KVM: arm64: Tidying up PAuth code in KVMFuad Tabba
Tidy up some of the PAuth trapping code to clear up some comments and avoid clang/checkpatch warnings. Also, don't bother setting PAuth HCR_EL2 bits in pKVM, since it's handled by the hypervisor. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240722163311.1493879-1-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-02arm64: jump_label: Ensure patched jump_labels are visible to all CPUsWill Deacon
Although the Arm architecture permits concurrent modification and execution of NOP and branch instructions, it still requires some synchronisation to ensure that other CPUs consistently execute the newly written instruction: > When the modified instructions are observable, each PE that is > executing the modified instructions must execute an ISB or perform a > context synchronizing event to ensure execution of the modified > instructions Prior to commit f6cc0c501649 ("arm64: Avoid calling stop_machine() when patching jump labels"), the arm64 jump_label patching machinery performed synchronisation using stop_machine() after each modification, however this was problematic when flipping static keys from atomic contexts (namely, the arm_arch_timer CPU hotplug startup notifier) and so we switched to the _nosync() patching routines to avoid "scheduling while atomic" BUG()s during boot. In hindsight, the analysis of the issue in f6cc0c501649 isn't quite right: it cites the use of IPIs in the default patching routines as the cause of the lockup, whereas stop_machine() does not rely on IPIs and the I-cache invalidation is performed using __flush_icache_range(), which elides the call to kick_all_cpus_sync(). In fact, the blocking wait for other CPUs is what triggers the BUG() and the problem remains even after f6cc0c501649, for example because we could block on the jump_label_mutex. Eventually, the arm_arch_timer driver was fixed to avoid the static key entirely in commit a862fc2254bd ("clocksource/arm_arch_timer: Remove use of workaround static key"). This all leaves the jump_label patching code in a funny situation on arm64 as we do not synchronise with other CPUs to reduce the likelihood of a bug which no longer exists. Consequently, toggling a static key on one CPU cannot be assumed to take effect on other CPUs, leading to potential issues, for example with missing preempt notifiers. Rather than revert f6cc0c501649 and go back to stop_machine() for each patch site, implement arch_jump_label_transform_apply() and kick all the other CPUs with an IPI at the end of patching. Cc: Alexander Potapenko <glider@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Fixes: f6cc0c501649 ("arm64: Avoid calling stop_machine() when patching jump labels") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240731133601.3073-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-01arm64: cputype: Add Cortex-A725 definitionsMark Rutland
Add cputype definitions for Cortex-A725. These will be used for errata detection in subsequent patches. These values can be found in the Cortex-A725 TRM: https://developer.arm.com/documentation/107652/0001/ ... in table A-247 ("MIDR_EL1 bit descriptions"). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20240801101803.1982459-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-01arm64: cputype: Add Cortex-X1C definitionsMark Rutland
Add cputype definitions for Cortex-X1C. These will be used for errata detection in subsequent patches. These values can be found in the Cortex-X1C TRM: https://developer.arm.com/documentation/101968/0002/ ... in section B2.107 ("MIDR_EL1, Main ID Register, EL1"). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20240801101803.1982459-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-07-26Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The usual summary below, but the main fix is for the fast GUP lockless page-table walk when we have a combination of compile-time and run-time folding of the p4d and the pud respectively. - Remove some redundant Kconfig conditionals - Fix string output in ptrace selftest - Fix fast GUP crashes in some page-table configurations - Remove obsolete linker option when building the vDSO - Fix some sysreg field definitions for the GIC" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mm: Fix lockless walks with static and dynamic page-table folding arm64/sysreg: Correct the values for GICv4.1 arm64/vdso: Remove --hash-style=sysv kselftest: missing arg in ptrace.c arm64/Kconfig: Remove redundant 'if HAVE_FUNCTION_GRAPH_TRACER' arm64: remove redundant 'if HAVE_ARCH_KASAN' in Kconfig
2024-07-25arm64: mm: Fix lockless walks with static and dynamic page-table foldingWill Deacon
Lina reports random oopsen originating from the fast GUP code when 16K pages are used with 4-level page-tables, the fourth level being folded at runtime due to lack of LPA2. In this configuration, the generic implementation of p4d_offset_lockless() will return a 'p4d_t *' corresponding to the 'pgd_t' allocated on the stack of the caller, gup_fast_pgd_range(). This is normally fine, but when the fourth level of page-table is folded at runtime, pud_offset_lockless() will offset from the address of the 'p4d_t' to calculate the address of the PUD in the same page-table page. This results in a stray stack read when the 'p4d_t' has been allocated on the stack and can send the walker into the weeds. Fix the problem by providing our own definition of p4d_offset_lockless() when CONFIG_PGTABLE_LEVELS <= 4 which returns the real page-table pointer rather than the address of the local stack variable. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/50360968-13fb-4e6f-8f52-1725b3177215@asahilina.net Fixes: 0dd4f60a2c76 ("arm64: mm: Add support for folding PUDs at runtime") Reported-by: Asahi Lina <lina@asahilina.net> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240725090345.28461-1-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...
2024-07-18Merge tag 'ftrace-v6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: "Rewrite of function graph tracer to allow multiple users Up until now, the function graph tracer could only have a single user attached to it. If another user tried to attach to the function graph tracer while one was already attached, it would fail. Allowing function graph tracer to have more than one user has been asked for since 2009, but it required a rewrite to the logic to pull it off so it never happened. Until now! There's three systems that trace the return of a function. That is kretprobes, function graph tracer, and BPF. kretprobes and function graph tracing both do it similarly. The difference is that kretprobes uses a shadow stack per callback and function graph tracer creates a shadow stack for all tasks. The function graph tracer method makes it possible to trace the return of all functions. As kretprobes now needs that feature too, allowing it to use function graph tracer was needed. BPF also wants to trace the return of many probes and its method doesn't scale either. Having it use function graph tracer would improve that. By allowing function graph tracer to have multiple users allows both kretprobes and BPF to use function graph tracer in these cases. This will allow kretprobes code to be removed in the future as it's version will no longer be needed. Note, function graph tracer is only limited to 16 simultaneous users, due to shadow stack size and allocated slots" * tag 'ftrace-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (49 commits) fgraph: Use str_plural() in test_graph_storage_single() function_graph: Add READ_ONCE() when accessing fgraph_array[] ftrace: Add missing kerneldoc parameters to unregister_ftrace_direct() function_graph: Everyone uses HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, remove it function_graph: Fix up ftrace_graph_ret_addr() function_graph: Make fgraph_update_pid_func() a stub for !DYNAMIC_FTRACE function_graph: Rename BYTE_NUMBER to CHAR_NUMBER in selftests fgraph: Remove some unused functions ftrace: Hide one more entry in stack trace when ftrace_pid is enabled function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not enabled function_graph: Make fgraph_do_direct static key static ftrace: Fix prototypes for ftrace_startup/shutdown_subops() ftrace: Assign RCU list variable with rcu_assign_ptr() ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU ftrace: Declare function_trace_op in header to quiet sparse warning ftrace: Add comments to ftrace_hash_move() and friends ftrace: Convert "inc" parameter to bool in ftrace_hash_rec_update_modify() ftrace: Add comments to ftrace_hash_rec_disable/enable() ftrace: Remove "filter_hash" parameter from __ftrace_hash_rec_update() ftrace: Rename dup_hash() and comment it ...
2024-07-16Merge tag 'asm-generic-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "Most of this is part of my ongoing work to clean up the system call tables. In this bit, all of the newer architectures are converted to use the machine readable syscall.tbl format instead in place of complex macros in include/uapi/asm-generic/unistd.h. This follows an earlier series that fixed various API mismatches and in turn is used as the base for planned simplifications. The other two patches are dead code removal and a warning fix" * tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: vmlinux.lds.h: catch .bss..L* sections into BSS") fixmap: Remove unused set_fixmap_offset_io() riscv: convert to generic syscall table openrisc: convert to generic syscall table nios2: convert to generic syscall table loongarch: convert to generic syscall table hexagon: use new system call table csky: convert to generic syscall table arm64: rework compat syscall macros arm64: generate 64-bit syscall.tbl arm64: convert unistd_32.h to syscall.tbl format arc: convert to generic syscall table clone3: drop __ARCH_WANT_SYS_CLONE3 macro kbuild: add syscall table generation to scripts/Makefile.asm-headers kbuild: verify asm-generic header list loongarch: avoid generating extra header files um: don't generate asm/bpf_perf_event.h csky: drop asm/gpio.h wrapper syscalls: add generic scripts/syscall.tbl
2024-07-16Merge tag 'kvm-x86-generic-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM generic changes for 6.11 - Enable halt poll shrinking by default, as Intel found it to be a clear win. - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86. - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out(). - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs. - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout. - A few minor cleanups
2024-07-16Merge tag 'kvmarm-6.11' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.11 - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates
2024-07-15Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "The biggest part is the virtual CPU hotplug that touches ACPI, irqchip. We also have some GICv3 optimisation for pseudo-NMIs that has been queued via the arm64 tree. Otherwise the usual perf updates, kselftest, various small cleanups. Core: - Virtual CPU hotplug support for arm64 ACPI systems - cpufeature infrastructure cleanups and making the FEAT_ECBHB ID bits visible to guests - CPU errata: expand the speculative SSBS workaround to more CPUs - GICv3, use compile-time PMR values: optimise the way regular IRQs are masked/unmasked when GICv3 pseudo-NMIs are used, removing the need for a static key in fast paths by using a priority value chosen dynamically at boot time ACPI: - 'acpi=nospcr' option to disable SPCR as default console for arm64 - Move some ACPI code (cpuidle, FFH) to drivers/acpi/arm64/ Perf updates: - Rework of the IMX PMU driver to enable support for I.MX95 - Enable support for tertiary match groups in the CMN PMU driver - Initial refactoring of the CPU PMU code to prepare for the fixed instruction counter introduced by Arm v9.4 - Add missing PMU driver MODULE_DESCRIPTION() strings - Hook up DT compatibles for recent CPU PMUs Kselftest updates: - Kernel mode NEON fp-stress - Cleanups, spelling mistakes Miscellaneous: - arm64 Documentation update with a minor clarification on TBI - Fix missing IPI statistics - Implement raw_smp_processor_id() using thread_info rather than a per-CPU variable (better code generation) - Make MTE checking of in-kernel asynchronous tag faults conditional on KASAN being enabled - Minor cleanups, typos" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (69 commits) selftests: arm64: tags: remove the result script selftests: arm64: tags_test: conform test to TAP output perf: add missing MODULE_DESCRIPTION() macros arm64: smp: Fix missing IPI statistics irqchip/gic-v3: Fix 'broken_rdists' unused warning when !SMP and !ACPI ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64 Documentation: arm64: Update memory.rst for TBI arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1 KVM: arm64: Replace custom macros with fields from ID_AA64PFR0_EL1 perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h perf: arm_v6/7_pmu: Drop non-DT probe support perf/arm: Move 32-bit PMU drivers to drivers/perf/ perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold arm64: Kconfig: Fix dependencies to enable ACPI_HOTPLUG_CPU perf: imx_perf: add support for i.MX95 platform perf: imx_perf: fix counter start and config sequence perf: imx_perf: refactor driver for imx93 perf: imx_perf: let the driver manage the counter usage rather the user perf: imx_perf: add macro definitions for parsing config attr ...