summaryrefslogtreecommitdiff
path: root/arch/arm64/include
AgeCommit message (Collapse)Author
13 daysMerge tag 'mm-stable-2025-05-31-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...
14 daysMerge tag 'gcc-minimum-version-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull compiler version requirement update from Arnd Bergmann: "Require gcc-8 and binutils-2.30 x86 already uses gcc-8 as the minimum version, this changes all other architectures to the same version. gcc-8 is used is Debian 10 and Red Hat Enterprise Linux 8, both of which are still supported, and binutils 2.30 is the oldest corresponding version on those. Ubuntu Pro 18.04 and SUSE Linux Enterprise Server 15 both use gcc-7 as the system compiler but additionally include toolchains that remain supported. With the new minimum toolchain versions, a number of workarounds for older versions can be dropped, in particular on x86_64 and arm64. Importantly, the updated compiler version allows removing two of the five remaining gcc plugins, as support for sancov and structeak features is already included in modern compiler versions. I tried collecting the known changes that are possible based on the new toolchain version, but expect that more cleanups will be possible. Since this touches multiple architectures, I merged the patches through the asm-generic tree." * tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: Makefile.kcov: apply needed compiler option unconditionally in CFLAGS_KCOV Documentation: update binutils-2.30 version reference gcc-plugins: remove SANCOV gcc plugin Kbuild: remove structleak gcc plugin arm64: drop binutils version checks raid6: skip avx512 checks kbuild: require gcc-8 and binutils-2.30
2025-05-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "As far as x86 goes this pull request "only" includes TDX host support. Quotes are appropriate because (at 6k lines and 100+ commits) it is much bigger than the rest, which will come later this week and consists mostly of bugfixes and selftests. s390 changes will also come in the second batch. ARM: - Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. - Enable nested virtualisation support on systems that support it, though it is disabled by default. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups. LoongArch: - Don't flush tlb if the host supports hardware page table walks. - Add KVM selftests support. RISC-V: - Add vector registers to get-reg-list selftest - VCPU reset related improvements - Remove scounteren initialization from VCPU reset - Support VCPU reset from userspace using set_mpstate() ioctl x86: - Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit 7bcf7246c42a ('Merge branch 'kvm-tdx-finish-initial' into HEAD')" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits) x86/tdx: mark tdh_vp_enter() as __flatten Documentation: virt/kvm: remove unreferenced footnote RISC-V: KVM: lock the correct mp_state during reset KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET RISC-V: KVM: Remove scounteren initialization KVM: RISC-V: remove unnecessary SBI reset state ...
2025-05-28Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The headline feature is the re-enablement of support for Arm's Scalable Matrix Extension (SME) thanks to a bumper crop of fixes from Mark Rutland. If matrices aren't your thing, then Ryan's page-table optimisation work is much more interesting. Summary: ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls - Fix a node refcount imbalance in the PSCI device-tree code CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4 - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code Entry code: - Hook up TIF_NEED_RESCHED_LAZY so that CONFIG_PREEMPT_LAZY can be selected Memory management: - Prevent BSS exports from being used by the early PI code - Propagate level and stride information to the low-level TLB invalidation routines when operating on hugetlb entries - Use the page-table contiguous hint for vmap() mappings with VM_ALLOW_HUGE_VMAP where possible - Optimise vmalloc()/vmap() page-table updates to use "lazy MMU mode" and hook this up on arm64 so that the trailing DSB (used to publish the updates to the hardware walker) can be deferred until the end of the mapping operation - Extend mmap() randomisation for 52-bit virtual addresses (on par with 48-bit addressing) and remove limited support for randomisation of the linear map Perf and PMUs: - Add support for probing the CMN-S3 driver using ACPI - Minor driver fixes to the CMN, Arm-NI and amlogic PMU drivers Selftests: - Fix FPSIMD and SME tests to align with the freshly re-enabled SME support - Fix default setting of the OUTPUT variable so that tests are installed in the right location vDSO: - Replace raw counter access from inline assembly code with a call to the the __arch_counter_get_cntvct() helper function Miscellaneous: - Add some missing header inclusions to the CCA headers - Rework rendering of /proc/cpuinfo to follow the x86-approach and avoid repeated buffer expansion (the user-visible format remains identical) - Remove redundant selection of CONFIG_CRC32 - Extend early error message when failing to map the device-tree blob" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits) arm64: cputype: Add cputype definition for HIP12 arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again perf/arm-cmn: Add CMN S3 ACPI binding arm64/boot: Disallow BSS exports to startup code arm64/boot: Move global CPU override variables out of BSS arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace perf/arm-cmn: Initialise cmn->cpu earlier kselftest/arm64: Set default OUTPUT path when undefined arm64: Update comment regarding values in __boot_cpu_mode arm64: mm: Drop redundant check in pmd_trans_huge() arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1 arm64/mm: Permit lazy_mmu_mode to be nested arm64/mm: Disable barrier batching in interrupt contexts arm64/cpuinfo: only show one cpu's info in c_show() arm64/mm: Batch barriers when updating kernel mappings mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes arm64/mm: Support huge pte-mapped pages in vmap mm/vmalloc: Gracefully unmap huge ptes mm/vmalloc: Warn on improper use of vunmap_range() arm64/mm: Hoist barriers out of set_ptes_anysz() loop ...
2025-05-27Merge branch 'for-next/vdso' into for-next/coreWill Deacon
* for-next/vdso: arm64: vdso: Use __arch_counter_get_cntvct()
2025-05-27Merge branch 'for-next/sme-fixes' into for-next/coreWill Deacon
* for-next/sme-fixes: (35 commits) arm64/fpsimd: Allow CONFIG_ARM64_SME to be selected arm64/fpsimd: ptrace: Gracefully handle errors arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state arm64/fpsimd: ptrace: Do not present register data for inactive mode arm64/fpsimd: ptrace: Save task state before generating SVE header arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid state arm64/fpsimd: ptrace/prctl: Ensure VL changes do not resurrect stale data arm64/fpsimd: Make clone() compatible with ZA lazy saving arm64/fpsimd: Clear PSTATE.SM during clone() arm64/fpsimd: Consistently preserve FPSIMD state during clone() arm64/fpsimd: Remove redundant task->mm check arm64/fpsimd: signal: Use SMSTOP behaviour in setup_return() arm64/fpsimd: Add task_smstop_sm() arm64/fpsimd: Factor out {sve,sme}_state_size() helpers arm64/fpsimd: Clarify sve_sync_*() functions arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE arm64/fpsimd: signal: Consistently read FPSIMD context arm64/fpsimd: signal: Mandate SVE payload for streaming-mode state arm64/fpsimd: signal: Clear PSTATE.SM when restoring FPSIMD frame only arm64/fpsimd: Do not discard modified SVE state ...
2025-05-27Merge branch 'for-next/mm' into for-next/coreWill Deacon
* for-next/mm: arm64/boot: Disallow BSS exports to startup code arm64/boot: Move global CPU override variables out of BSS arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace arm64: mm: Drop redundant check in pmd_trans_huge() arm64/mm: Permit lazy_mmu_mode to be nested arm64/mm: Disable barrier batching in interrupt contexts arm64/mm: Batch barriers when updating kernel mappings mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes arm64/mm: Support huge pte-mapped pages in vmap mm/vmalloc: Gracefully unmap huge ptes mm/vmalloc: Warn on improper use of vunmap_range() arm64/mm: Hoist barriers out of set_ptes_anysz() loop arm64: hugetlb: Use __set_ptes_anysz() and __ptep_get_and_clear_anysz() arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear() mm/page_table_check: Batch-check pmds/puds just like ptes arm64: hugetlb: Refine tlb maintenance scope arm64: hugetlb: Cleanup huge_pte size discovery mechanisms arm64: pageattr: Explicitly bail out when changing permissions for vmalloc_huge mappings arm64: Support ARM64_VA_BITS=52 when setting ARCH_MMAP_RND_BITS_MAX arm64/mm: Remove randomization of the linear map
2025-05-27Merge branch 'for-next/misc' into for-next/coreWill Deacon
* for-next/misc: arm64/cpuinfo: only show one cpu's info in c_show() arm64: Extend pr_crit message on invalid FDT arm64: Kconfig: remove unnecessary selection of CRC32 arm64: Add missing includes for mem_encrypt
2025-05-27Merge branch 'for-next/entry' into for-next/coreWill Deacon
* for-next/entry: arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again arm64: Update comment regarding values in __boot_cpu_mode arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1 arm64: enable PREEMPT_LAZY
2025-05-26Merge tag 'kvmarm-6.16' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.16 * New features: - Add large stage-2 mapping support for non-protected pKVM guests, clawing back some performance. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Enable nested virtualisation support on systems that support it (yes, it has been a long time coming), though it is disabled by default. * Improvements, fixes and cleanups: - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. This ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups.
2025-05-23Merge branch kvm-arm64/misc-6.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-6.16: : . : Misc changes and improvements for 6.16: : : - Add a new selftest for the SVE host state being corrupted by a guest : : - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, : ensuring that the window for interrupts is slightly bigger, and avoiding : a pretty bad erratum on the AmpereOne HW : : - Replace a couple of open-coded on/off strings with str_on_off() : : - Get rid of the pKVM memblock sorting, which now appears to be superflous : : - Drop superflous clearing of ICH_LR_EOI in the LR when nesting : : - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from : a pretty bad case of TLB corruption unless accesses to HCR_EL2 are : heavily synchronised : : - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables : in a human-friendly fashion : . KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables arm64: errata: Work around AmpereOne's erratum AC04_CPU_23 KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1 KVM: arm64: Drop sort_memblock_regions() KVM: arm64: selftests: Add test for SVE host corruption KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Replace ternary flags with str_on_off() helper Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/nv-nv into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-nv: : . : Flick the switch on the NV support by adding the missing piece : in the form of the VNCR page management. From the cover letter: : : "This is probably the most interesting bit of the whole NV adventure. : So far, everything else has been a walk in the park, but this one is : where the real fun takes place. : : With FEAT_NV2, most of the NV support revolves around tricking a guest : into accessing memory while it tries to access system registers. The : hypervisor's job is to handle the context switch of the actual : registers with the state in memory as needed." : . KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating KVM: arm64: Document NV caps and vcpu flags KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* KVM: arm64: nv: Remove dead code from ERET handling KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 KVM: arm64: nv: Handle VNCR_EL2-triggered faults KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting KVM: arm64: nv: Move TLBI range decoding to a helper KVM: arm64: nv: Snapshot S1 ASID tagging information during walk KVM: arm64: nv: Extract translation helper from the AT code KVM: arm64: nv: Allocate VNCR page when required arm64: sysreg: Add layout for VNCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/fgt-masks into kvmarm-master/nextMarc Zyngier
* kvm-arm64/fgt-masks: (43 commits) : . : Large rework of the way KVM deals with trap bits in conjunction with : the CPU feature registers. It now draws a direct link between which : the feature set, the system registers that need to UNDEF to match : the configuration and bits that need to behave as RES0 or RES1 in : the trap registers that are visible to the guest. : : Best of all, these definitions are mostly automatically generated : from the JSON description published by ARM under a permissive : license. : . KVM: arm64: Handle TSB CSYNC traps KVM: arm64: Add FGT descriptors for FEAT_FGT2 KVM: arm64: Allow sysreg ranges for FGT descriptors KVM: arm64: Add context-switch for FEAT_FGT2 registers KVM: arm64: Add trap routing for FEAT_FGT2 registers KVM: arm64: Add sanitisation for FEAT_FGT2 registers KVM: arm64: Add FEAT_FGT2 registers to the VNCR page KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits KVM: arm64: Allow kvm_has_feat() to take variable arguments KVM: arm64: Use FGT feature maps to drive RES0 bits KVM: arm64: Validate FGT register descriptions against RES0 masks KVM: arm64: Switch to table-driven FGU configuration KVM: arm64: Handle PSB CSYNC traps KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask KVM: arm64: Remove hand-crafted masks for FGT registers KVM: arm64: Use computed FGT masks to setup FGT registers KVM: arm64: Propagate FGT masks to the nVHE hypervisor KVM: arm64: Unconditionally configure fine-grain traps KVM: arm64: Use computed masks as sanitisers for FGT registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/ubsan-el2: : . : Add UBSAN support to the EL2 portion of KVM, reusing most of the : existing logic provided by CONFIG_IBSAN_TRAP. : : Patches courtesy of Mostafa Saleh. : . KVM: arm64: Handle UBSAN faults KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2 ubsan: Remove regs from report_ubsan_failure() arm64: Introduce esr_is_ubsan_brk() Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/pkvm-np-thp-6.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-np-thp-6.16: (21 commits) : . : Large mapping support for non-protected pKVM guests, courtesy of : Vincent Donnefort. From the cover letter: : : "This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM : np-guests, that is installing PMD-level mappings in the stage-2, : whenever the stage-1 is backed by either Hugetlbfs or THPs." : . KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section KVM: arm64: Unconditionally cross check hyp state KVM: arm64: Defer EL2 stage-1 mapping on share KVM: arm64: Move hyp state to hyp_vmemmap KVM: arm64: Introduce {get,set}_host_state() helpers KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE KVM: arm64: Fix pKVM page-tracking comments ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-22arm64: cputype: Add cputype definition for HIP12Yicong Yang
Add MIDR encoding for HiSilicon HIP12 which is used on HiSilicon HIP12 SoCs. Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250425033845.57671-2-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-22arm64: el2_setup.h: Make __init_el2_fgt labels consistent, againRob Herring (Arm)
Commit 5b39db6037e7 ("arm64: el2_setup.h: Rename some labels to be more diff-friendly") reworked the labels in __init_el2_fgt to say what's skipped rather than what the target location is. The exception was "set_fgt_" which is where registers are written. In reviewing the BRBE additions, Will suggested "set_debug_fgt_" where HDFGxTR_EL2 are written. Doing that would partially revert commit 5b39db6037e7 undoing the goal of minimizing additions here, but it would follow the convention for labels where registers are written. So let's do both. Branches that skip something go to a "skip" label and places that set registers have a "set" label. This results in some double labels, but it makes things entirely consistent. While we're here, the SME skip label was incorrectly named, so fix it. Reported-by: Will Deacon <will@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20250520-arm-brbe-v19-v22-2-c1ddde38e7f8@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-05-21KVM: arm64: np-guest CMOs with PMD_SIZE fixmapVincent Donnefort
With the introduction of stage-2 huge mappings in the pKVM hypervisor, guest pages CMO is needed for PMD_SIZE size. Fixmap only supports PAGE_SIZE and iterating over the huge-page is time consuming (mostly due to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency. Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap) to improve guest page CMOs when stage-2 huge mappings are installed. On a Pixel6, the iterative solution resulted in a latency of ~700us, while the PMD_SIZE fixmap reduces it to ~100us. Because of the horrendous private range allocation that would be necessary, this is disabled for 64KiB pages systems. Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to pkvm_mappingsQuentin Perret
In preparation for supporting stage-2 huge mappings for np-guest, add a nr_pages member for pkvm_mappings to allow EL1 to track the size of the stage-2 mapping. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-9-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Convert pkvm_mappings to interval treeQuentin Perret
In preparation for supporting stage-2 huge mappings for np-guest, let's convert pgt.pkvm_mappings to an interval tree. No functional change intended. Suggested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-8-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21Merge branch kvm-arm64/pkvm-selftest-6.16 into kvm-arm64/pkvm-np-thp-6.16Marc Zyngier
* kvm-arm64/pkvm-selftest-6.16: : . : pKVM selftests covering the memory ownership transitions by : Quentin Perret. From the initial cover letter: : : "We have recently found a bug [1] in the pKVM memory ownership : transitions by code inspection, but it could have been caught with a : test. : : Introduce a boot-time selftest exercising all the known pKVM memory : transitions and importantly checks the rejection of illegal transitions. : : The new test is hidden behind a new Kconfig option separate from : CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the : transition checks ([1] doesn't reproduce with EL2 debug enabled). : : [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/" : . KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19arm64: errata: Work around AmpereOne's erratum AC04_CPU_23D Scott Phillips
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous translations for data addresses initiated by load/store instructions. Only instruction initiated translations are vulnerable, not translations from prefetches for example. A DSB before the store to HCR_EL2 is sufficient to prevent older instructions from hitting the window for corruption, and an ISB after is sufficient to prevent younger instructions from hitting the window for corruption. Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Handle TSB CSYNC trapsMarc Zyngier
The architecture introduces a trap for TSB CSYNC that fits in the same EC as LS64 and PSB CSYNC. Let's deal with it in a similar way. It's not that we expect this to be useful any time soon anyway. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add sanitisation for FEAT_FGT2 registersMarc Zyngier
Just like the FEAT_FGT registers, treat the FGT2 variant the same way. THis is a large update, but a fairly mechanical one. The config dependencies are extracted from the 2025-03 JSON drop. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add FEAT_FGT2 registers to the VNCR pageMarc Zyngier
The FEAT_FGT2 registers are part of the VNCR page. Describe the corresponding offsets and add them to the vcpu sysreg enumeration. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Allow kvm_has_feat() to take variable argumentsMarc Zyngier
In order to be able to write more compact (and easier to read) code, let kvm_has_feat() and co take variable arguments. This enables constructs such as: #define FEAT_SME ID_AA64PFR1_EL1, SME, IMP if (kvm_has_feat(kvm, FEAT_SME)) [...] which is admitedly more readable. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Use FGT feature maps to drive RES0 bitsMarc Zyngier
Another benefit of mapping bits to features is that it becomes trivial to define which bits should be handled as RES0. Let's apply this principle to the guest's view of the FGT registers. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*Marc Zyngier
Since we're (almost) feature complete, let's allow userspace to request KVM_ARM_VCPU_EL2* by bumping KVM_VCPU_MAX_FEATURES up. We also now advertise the features to userspace with new capabilities. It's going to be great... Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250514103501.2225951-17-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2Marc Zyngier
A TLBI by VA for S1 must take effect on our pseudo-TLB for VNCR and potentially knock the fixmap mapping. Even worse, that TLBI must be able to work cross-vcpu. For that, we track on a per-VM basis if any VNCR is mapped, using an atomic counter. Whenever a TLBI S1E2 occurs and that this counter is non-zero, we take the long road all the way back to the core code. There, we iterate over all vcpus and check whether this particular invalidation has any damaging effect. If it does, we nuke the pseudo TLB and the corresponding fixmap. Yes, this is costly. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-14-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2Marc Zyngier
Now that we can handle faults triggered through VNCR_EL2, we need to map the corresponding page at EL2. But where, you'll ask? Since each CPU in the system can run a vcpu, we need a per-CPU mapping. For that, we carve a NR_CPUS range in the fixmap, giving us a per-CPU va at which to map the guest's VNCR's page. The mapping occurs both on vcpu load and on the back of a fault, both generating a request that will take care of the mapping. That mapping will also get dropped on vcpu put. Yes, this is a bit heavy handed, but it is simple. Eventually, we may want to have a per-VM, per-CPU mapping, which would avoid all the TLBI overhead. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Handle VNCR_EL2-triggered faultsMarc Zyngier
As VNCR_EL2.BADDR contains a VA, it is bound to trigger faults. These faults can have multiple source: - We haven't mapped anything on the host: we need to compute the resulting translation, populate a TLB, and eventually map the corresponding page - The permissions are out of whack: we need to tell the guest about this state of affairs Note that the kernel doesn't support S1POE for itself yet, so the particular case of a VNCR page mapped with no permissions or with write-only permissions is not correctly handled yet. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2Marc Zyngier
Plug VNCR_EL2 in the vcpu_sysreg enum, define its RES0/RES1 bits, and make it accessible to userspace when the VM is configured to support FEAT_NV2. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2Marc Zyngier
FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR is a virtual address in the EL2&0 (or EL2, but we thankfully ignore this) translation regime. As we need to replicate such mapping in the real EL2, it means that we need to remember that there is such a translation, and that any TLBI affecting EL2 can possibly affect this translation. It also means that any invalidation driven by an MMU notifier must be able to shoot down any such mapping. All in all, we need a data structure that represents this mapping, and that is extremely close to a TLB. Given that we can only use one of those per vcpu at any given time, we only allocate one. No effort is made to keep that structure small. If we need to start caching multiple of them, we may want to revisit that design point. But for now, it is kept simple so that we can reason about it. Oh, and add a braindump of how things are supposed to work, because I will definitely page this out at some point. Yes, pun intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nestingMarc Zyngier
We currently check for HCR_EL2.NV being set to decide whether we need to repaint PSTATE.M to say EL2 instead of EL1 on exit. However, this isn't correct when L2 is itself a hypervisor, and that L1 as set its own HCR_EL2.NV. That's because we "flatten" the state and inherit parts of the guest's own setup. In that case, we shouldn't adjust PSTATE.M, as this is really EL1 for both us and the guest. Instead of trying to try and work out how we ended-up with HCR_EL2.NV being set by introspecting both the host and guest states, use a per-CPU flag to remember the context (HYP or not), and use that information to decide whether PSTATE needs tweaking. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Move TLBI range decoding to a helperMarc Zyngier
As we are about to expand out TLB invalidation capabilities to support recursive virtualisation, move the decoding of a TLBI by range into a helper that returns the base, the range and the ASID. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Snapshot S1 ASID tagging information during walkMarc Zyngier
We currently completely ignore any sort of ASID tagging during a S1 walk, as AT doesn't care about it. However, such information is required if we are going to create anything that looks like a TLB from this walk. Let's capture it both the nG and ASID information while walking the page tables. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Extract translation helper from the AT codeMarc Zyngier
The address translation infrastructure is currently pretty tied to the AT emulation. However, we also need to features that require the use of VAs, such as VNCR_EL2 (and maybe one of these days SPE), meaning that we need a slightly more generic infrastructure. Start this by introducing a new helper (__kvm_translate_va()) that performs a S1 walk for a given translation regime, EL and PAN settings. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19arm64: sysreg: Add layout for VNCR_EL2Marc Zyngier
Now that we're about to emulate VNCR_EL2, we need its full layout. Add it to the sysreg file. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-16arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespaceArd Biesheuvel
init_pgdir[] is only referenced from the startup code, but lives after BSS in the linker map. Before tightening the rules about accessing BSS from startup code, move init_pgdir[] into the __pi_ namespace, so it does not need to be exported explicitly. For symmetry, do the same with init_idmap_pgdir[], although it lives before BSS. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://lore.kernel.org/r/20250508114328.2460610-6-ardb+git@google.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-16arm64: Update comment regarding values in __boot_cpu_modeBen Horgan
The values stored in __boot_cpu_mode were changed without updating the comment. Rectify that. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Link: https://lore.kernel.org/r/20250513124525.677736-1-ben.horgan@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-16arm64: mm: Drop redundant check in pmd_trans_huge()Gavin Shan
pmd_val(pmd) is redundant because a positive pmd_present(pmd) ensures a positive pmd_val(pmd) according to their definitions like below. #define pmd_val(x) ((x).pmd) #define pmd_present(pmd) pte_present(pmd_pte(pmd)) #define pte_present(pte) (pte_valid(pte) || pte_present_invalid(pte)) #define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID)) #define pte_present_invalid(pte) \ ((pte_val(pte) & (PTE_VALID | PTE_PRESENT_INVALID)) == PTE_PRESENT_INVALID) pte_present() can't be positive unless either of the flag PTE_VALID or PTE_PRESENT_INVALID is set. In this case, pmd_val(pmd) should be positive either. So lets drop the redundant check pmd_val(pmd) and no functional changes intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250508085251.204282-1-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-14arm64/mm: Permit lazy_mmu_mode to be nestedRyan Roberts
lazy_mmu_mode is not supposed to permit nesting. But in practice this does happen with CONFIG_DEBUG_PAGEALLOC, where a page allocation inside a lazy_mmu_mode section (such as zap_pte_range()) will change permissions on the linear map with apply_to_page_range(), which re-enters lazy_mmu_mode (see stack trace below). The warning checking that nesting was not happening was previously being triggered due to this. So let's relax by removing the warning and tolerate nesting in the arm64 implementation. The first (inner) call to arch_leave_lazy_mmu_mode() will flush and clear the flag such that the remainder of the work in the outer nest behaves as if outside of lazy mmu mode. This is safe and keeps tracking simple. Code review suggests powerpc deals with this issue in the same way. ------------[ cut here ]------------ WARNING: CPU: 6 PID: 1 at arch/arm64/include/asm/pgtable.h:89 __apply_to_page_range+0x85c/0x9f8 Modules linked in: ip_tables x_tables ipv6 CPU: 6 UID: 0 PID: 1 Comm: systemd Not tainted 6.15.0-rc5-00075-g676795fe9cf6 #1 PREEMPT Hardware name: QEMU KVM Virtual Machine, BIOS 2024.08-4 10/25/2024 pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __apply_to_page_range+0x85c/0x9f8 lr : __apply_to_page_range+0x2b4/0x9f8 sp : ffff80008009b3c0 x29: ffff80008009b460 x28: ffff0000c43a3000 x27: ffff0001ff62b108 x26: ffff0000c43a4000 x25: 0000000000000001 x24: 0010000000000001 x23: ffffbf24c9c209c0 x22: ffff80008009b4d0 x21: ffffbf24c74a3b20 x20: ffff0000c43a3000 x19: ffff0001ff609d18 x18: 0000000000000001 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000003 x14: 0000000000000028 x13: ffffbf24c97c1000 x12: ffff0000c43a3fff x11: ffffbf24cacc9a70 x10: ffff0000c43a3fff x9 : ffff0001fffff018 x8 : 0000000000000012 x7 : ffff0000c43a4000 x6 : ffff0000c43a4000 x5 : ffffbf24c9c209c0 x4 : ffff0000c43a3fff x3 : ffff0001ff609000 x2 : 0000000000000d18 x1 : ffff0000c03e8000 x0 : 0000000080000000 Call trace: __apply_to_page_range+0x85c/0x9f8 (P) apply_to_page_range+0x14/0x20 set_memory_valid+0x5c/0xd8 __kernel_map_pages+0x84/0xc0 get_page_from_freelist+0x1110/0x1340 __alloc_frozen_pages_noprof+0x114/0x1178 alloc_pages_mpol+0xb8/0x1d0 alloc_frozen_pages_noprof+0x48/0xc0 alloc_pages_noprof+0x10/0x60 get_free_pages_noprof+0x14/0x90 __tlb_remove_folio_pages_size.isra.0+0xe4/0x140 __tlb_remove_folio_pages+0x10/0x20 unmap_page_range+0xa1c/0x14c0 unmap_single_vma.isra.0+0x48/0x90 unmap_vmas+0xe0/0x200 vms_clear_ptes+0xf4/0x140 vms_complete_munmap_vmas+0x7c/0x208 do_vmi_align_munmap+0x180/0x1a8 do_vmi_munmap+0xac/0x188 __vm_munmap+0xe0/0x1e0 __arm64_sys_munmap+0x20/0x38 invoke_syscall+0x48/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x4c/0x16c el0t_64_sync_handler+0x10c/0x140 el0t_64_sync+0x198/0x19c irq event stamp: 281312 hardirqs last enabled at (281311): [<ffffbf24c780fd04>] bad_range+0x164/0x1c0 hardirqs last disabled at (281312): [<ffffbf24c89c4550>] el1_dbg+0x24/0x98 softirqs last enabled at (281054): [<ffffbf24c752d99c>] handle_softirqs+0x4cc/0x518 softirqs last disabled at (281019): [<ffffbf24c7450694>] __do_softirq+0x14/0x20 ---[ end trace 0000000000000000 ]--- Fixes: 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings") Reported-by: Catalin Marinas <catalin.marinas@arm.com> Closes: https://lore.kernel.org/linux-arm-kernel/aCH0TLRQslXHin5Q@arm.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250512150333.5589-1-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-14arm64/mm: Disable barrier batching in interrupt contextsRyan Roberts
Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF flags in order to defer barriers until exiting the mode. At the same time, it added warnings to check that pte manipulations were never performed in interrupt context, because the tracking implementation could not deal with nesting. But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC) do manipulate ptes in softirq context, which triggered the warnings. So let's take the simplest and safest route and disable the batching optimization in interrupt contexts. This makes these users no worse off than prior to the optimization. Additionally the known offenders are debug features that only manipulate a single PTE, so there is no performance gain anyway. There may be some obscure case of encrypted/decrypted DMA with the dma_free_coherent called from an interrupt context, but again, this is no worse off than prior to the commit. Some options for supporting nesting were considered, but there is a difficult to solve problem if any code manipulates ptes within interrupt context but *outside of* a lazy mmu region. If this case exists, the code would expect the updates to be immediate, but because the task context may have already been in lazy mmu mode, the updates would be deferred, which could cause incorrect behaviour. This problem is avoided by always ensuring updates within interrupt context are immediate. Fixes: 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings") Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-arm-kernel/681f2a09.050a0220.f2294.0006.GAE@google.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250512102242.4156463-1-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-11arm64/mm: define ptdesc_tAnshuman Khandual
Define ptdesc_t type which describes the basic page table descriptor layout on arm64 platform. Subsequently all level specific pxxval_t descriptors are derived from ptdesc_t thus establishing a common original format, which can also be appropriate for page table entries, masks and protection values etc which are used at all page table levels. Link: https://lkml.kernel.org/r/20250407053113.746295-4-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm/ptdump: split note_page() into level specific callbacksAnshuman Khandual
Patch series "mm/ptdump: Drop assumption that pxd_val() is u64", v2. Last argument passed down in note_page() is u64 assuming pxd_val() returned value (all page table levels) is 64 bit - which might not be the case going ahead when D128 page tables is enabled on arm64 platform. Besides pxd_val() is very platform specific and its type should not be assumed in generic MM. A similar problem exists for effective_prot(), although it is restricted to x86 platform. This series splits note_page() and effective_prot() into individual page table level specific callbacks which accepts corresponding pxd_t page table entry as an argument instead and later on all subscribing platforms could derive pxd_val() from the table entries as required and proceed as before. Define ptdesc_t type which describes the basic page table descriptor layout on arm64 platform. Subsequently all level specific pxxval_t descriptors are derived from ptdesc_t thus establishing a common original format, which can also be appropriate for page table entries, masks and protection values etc which are used at all page table levels. This patch (of 3): Last argument passed down in note_page() is u64 assuming pxd_val() returned value (all page table levels) is 64 bit - which might not be the case going ahead when D128 page tables is enabled on arm64 platform. Besides pxd_val() is very platform specific and its type should not be assumed in generic MM. Split note_page() into individual page table level specific callbacks which accepts corresponding pxd_t argument instead and then subscribing platforms just derive pxd_val() from the entries as required and proceed as earlier. Also add a note_page_flush() callback for flushing the last page table page that was being handled earlier via level = -1. Link: https://lkml.kernel.org/r/20250407053113.746295-1-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/20250407053113.746295-2-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11syscall.h: introduce syscall_set_nr()Dmitry V. Levin
Similar to syscall_set_arguments() that complements syscall_get_arguments(), introduce syscall_set_nr() that complements syscall_get_nr(). syscall_set_nr() is going to be needed along with syscall_set_arguments() on all HAVE_ARCH_TRACEHOOK architectures to implement PTRACE_SET_SYSCALL_INFO API. Link: https://lkml.kernel.org/r/20250303112020.GD24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> # mips Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexey Gladkov (Intel) <legion@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11syscall.h: add syscall_set_arguments()Dmitry V. Levin
This function is going to be needed on all HAVE_ARCH_TRACEHOOK architectures to implement PTRACE_SET_SYSCALL_INFO API. This partially reverts commit 7962c2eddbfe ("arch: remove unused function syscall_set_arguments()") by reusing some of old syscall_set_arguments() implementations. [nathan@kernel.org: fix compile time fortify checks] Link: https://lkml.kernel.org/r/20250408213131.GA2872426@ax162 Link: https://lkml.kernel.org/r/20250303112009.GC24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> [mips] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexey Gladkov (Intel) <legion@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11arch: remove mk_pmd()Matthew Wilcox (Oracle)
There are now no callers of mk_huge_pmd() and mk_pmd(). Remove them. Link: https://lkml.kernel.org/r/20250402181709.2386022-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm: introduce a common definition of mk_pte()Matthew Wilcox (Oracle)
Most architectures simply call pfn_pte(). Centralise that as the normal definition and remove the definition of mk_pte() from the architectures which have either that exact definition or something similar. Link: https://lkml.kernel.org/r/20250402181709.2386022-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Cc: Zi Yan <ziy@nvidia.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11Merge tag 'arm64_cbpf_mitigation_2025_05_08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 cBPF BHB mitigation from James Morse: "This adds the BHB mitigation into the code JITted for cBPF programs as these can be loaded by unprivileged users via features like seccomp. The existing mechanisms to disable the BHB mitigation will also prevent the mitigation being JITted. In addition, cBPF programs loaded by processes with the SYS_ADMIN capability are not mitigated as these could equally load an eBPF program that does the same thing. For good measure, the list of 'k' values for CPU's local mitigations is updated from the version on arm's website" * tag 'arm64_cbpf_mitigation_2025_05_08' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: proton-pack: Add new CPUs 'k' values for branch mitigation arm64: bpf: Only mitigate cBPF programs loaded by unprivileged users arm64: bpf: Add BHB mitigation to the epilogue for cBPF programs arm64: proton-pack: Expose whether the branchy loop k value arm64: proton-pack: Expose whether the platform is mitigated by firmware arm64: insn: Add support for encoding DSB