summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2025-07-10KVM: x86: Rename lapic get/set_reg() helpersNeeraj Upadhyay
In preparation for moving kvm-internal __kvm_lapic_set_reg(), __kvm_lapic_get_reg() to apic.h for use in Secure AVIC APIC driver, rename them as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-8-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Rename find_highest_vector()Neeraj Upadhyay
In preparation for moving kvm-internal find_highest_vector() to apic.h for use in Secure AVIC APIC driver, rename find_highest_vector() to apic_find_highest_vector() as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-7-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Change lapic regs base address to void pointerNeeraj Upadhyay
Change APIC base address from "char *" to "void *" in KVM lapic's set/get helper functions. Pointer arithmetic for "void *" and "char *" operate identically. With "void *" there is less of a chance of doing the wrong thing, e.g. neglecting to cast and reading a byte instead of the desired APIC register size. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-6-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Rename VEC_POS/REG_POS macro usagesNeeraj Upadhyay
In preparation for moving most of the KVM's lapic helpers which use VEC_POS/REG_POS macros to common APIC header for use in Secure AVIC APIC driver, rename all VEC_POS/REG_POS macro usages to APIC_VECTOR_TO_BIT_NUMBER/APIC_VECTOR_TO_REG_OFFSET and remove VEC_POS/REG_POS. While at it, clean up line wrap in find_highest_vector(). No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-5-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10x86/apic: KVM: Deduplicate APIC vector => register+bit mathSean Christopherson
Consolidate KVM's {REG,VEC}_POS() macros and lapic_vector_set_in_irr()'s open coded equivalent logic in anticipation of the kernel gaining more usage of vector => reg+bit lookups. Use lapic_vector_set_in_irr()'s math as using divides for both the bit number and register offset makes it easier to connect the dots, and for at least one user, fixup_irqs(), "/ 32 * 0x10" generates ever so slightly better code with gcc-14 (shaves a whole 3 bytes from the code stream): ((v) >> 5) << 4: c1 ef 05 shr $0x5,%edi c1 e7 04 shl $0x4,%edi 81 c7 00 02 00 00 add $0x200,%edi (v) / 32 * 0x10: c1 ef 05 shr $0x5,%edi 83 c7 20 add $0x20,%edi c1 e7 04 shl $0x4,%edi Keep KVM's tersely named macros as "wrappers" to avoid unnecessary churn in KVM, and because the shorter names yield more readable code overall in KVM. The new macros type cast the vector parameter to "unsigned int". This is required from better code generation for cases where an "int" is passed to these macros in KVM code. int v; ((v) >> 5) << 4: c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax ((v) / 32 * 0x10): 85 ff test %edi,%edi 8d 47 1f lea 0x1f(%rdi),%eax 0f 49 c7 cmovns %edi,%eax c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax ((unsigned int)(v) / 32 * 0x10): c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax (v) & (32 - 1): 89 f8 mov %edi,%eax 83 e0 1f and $0x1f,%eax (v) % 32 89 fa mov %edi,%edx c1 fa 1f sar $0x1f,%edx c1 ea 1b shr $0x1b,%edx 8d 04 17 lea (%rdi,%rdx,1),%eax 83 e0 1f and $0x1f,%eax 29 d0 sub %edx,%eax (unsigned int)(v) % 32: 89 f8 mov %edi,%eax 83 e0 1f and $0x1f,%eax Overall kvm.ko text size is impacted if "unsigned int" is not used. Bin Orig New (w/o unsigned int) New (w/ unsigned int) lapic.o 28580 28772 28580 kvm.o 670810 671002 670810 kvm.ko 708079 708271 708079 No functional change intended. [Neeraj: Type cast vec macro param to "unsigned int", provide data in commit log on "unsigned int" requirement] Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250709033242.267892-4-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Remove redundant parentheses around 'bitmap'Neeraj Upadhyay
When doing pointer arithmetic in apic_test_vector() and kvm_lapic_{set|clear}_vector(), remove the unnecessary parentheses surrounding the 'bitmap' parameter. No functional change intended. Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250709033242.267892-3-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Open code setting/clearing of bits in the ISRNeeraj Upadhyay
Remove __apic_test_and_set_vector() and __apic_test_and_clear_vector(), because the _only_ register that's safe to modify with a non-atomic operation is ISR, because KVM isn't running the vCPU, i.e. hardware can't service an IRQ or process an EOI for the relevant (virtual) APIC. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-2-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: SEV: Prefer WBNOINVD over WBINVD for cache maintenance efficiencyKevin Loughlin
AMD CPUs currently execute WBINVD in the host when unregistering SEV guest memory or when deactivating SEV guests. Such cache maintenance is performed to prevent data corruption, wherein the encrypted (C=1) version of a dirty cache line might otherwise only be written back after the memory is written in a different context (ex: C=0), yielding corruption. However, WBINVD is performance-costly, especially because it invalidates processor caches. Strictly-speaking, unless the SEV ASID is being recycled (meaning the SNP firmware requires the use of WBINVD prior to DF_FLUSH), the cache invalidation triggered by WBINVD is unnecessary; only the writeback is needed to prevent data corruption in remaining scenarios. To improve performance in these scenarios, use WBNOINVD when available instead of WBINVD. WBNOINVD still writes back all dirty lines (preventing host data corruption by SEV guests) but does *not* invalidate processor caches. Note that the implementation of wbnoinvd() ensures fall back to WBINVD if WBNOINVD is unavailable. In anticipation of forthcoming optimizations to limit the WBNOINVD only to physical CPUs that have executed SEV guests, place the call to wbnoinvd_on_all_cpus() in a wrapper function sev_writeback_caches(). Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> Reviewed-by: Mingwei Zhang <mizhang@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250201000259.3289143-3-kevinloughlin@google.com [sean: tweak comment regarding CLFUSH] Cc: Francesco Lavra <francescolavra.fl@gmail.com> Link: https://lore.kernel.org/r/20250522233733.3176144-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: SVM: Remove wbinvd in sev_vm_destroy()Zheyun Shen
Before sev_vm_destroy() is called, kvm_arch_guest_memory_reclaimed() has been called for SEV and SEV-ES and kvm_arch_gmem_invalidate() has been called for SEV-SNP. These functions have already handled flushing the memory. Therefore, this wbinvd_on_all_cpus() can simply be dropped. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250522233733.3176144-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Use wbinvd_on_cpu() instead of an open-coded equivalentSean Christopherson
Use wbinvd_on_cpu() to target a single CPU instead of open-coding an equivalent, and drop KVM's wbinvd_ipi() now that all users have switched to x86 library versions. No functional change intended. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250522233733.3176144-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Many patches, pretty much all of them small, that accumulated while I was on vacation. ARM: - Remove the last leftovers of the ill-fated FPSIMD host state mapping at EL2 stage-1 - Fix unexpected advertisement to the guest of unimplemented S2 base granule sizes - Gracefully fail initialising pKVM if the interrupt controller isn't GICv3 - Also gracefully fail initialising pKVM if the carveout allocation fails - Fix the computing of the minimum MMIO range required for the host on stage-2 fault - Fix the generation of the GICv3 Maintenance Interrupt in nested mode x86: - Reject SEV{-ES} intra-host migration if one or more vCPUs are actively being created, so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM - Use a pre-allocated, per-vCPU buffer for handling de-sparsification of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too large" issue - Allow out-of-range/invalid Xen event channel ports when configuring IRQ routing, to avoid dictating a specific ioctl() ordering to userspace - Conditionally reschedule when setting memory attributes to avoid soft lockups when userspace converts huge swaths of memory to/from private - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest - Add a missing field in struct sev_data_snp_launch_start that resulted in the guest-visible workarounds field being filled at the wrong offset - Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid VM-Fail on INVVPID - Advertise supported TDX TDVMCALLs to userspace - Pass SetupEventNotifyInterrupt arguments to userspace - Fix TSC frequency underflow" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: avoid underflow when scaling TSC frequency KVM: arm64: Remove kvm_arch_vcpu_run_map_fp() KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes KVM: arm64: Don't free hyp pages with pKVM on GICv2 KVM: arm64: Fix error path in init_hyp_mode() KVM: arm64: Adjust range correctly during host stage-2 faults KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi() KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush KVM: SVM: Add missing member in SNP_LAUNCH_START command structure Documentation: KVM: Fix unexpected unindent warnings KVM: selftests: Add back the missing check of MONITOR/MWAIT availability KVM: Allow CPU to reschedule while setting per-page memory attributes KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table. KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
2025-07-10arm64: dts: st: remove empty line in stm32mp251.dtsiPatrick Delaunay
Remove unnecessary empty line in stm32mp251.dtsi Signed-off-by: Patrick Delaunay <patrick.delaunay@foss.st.com> Link: https://lore.kernel.org/r/20250515151238.2.Ia426b4ef1d1200247a950ef9abd54a94dc520acb@changeid Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-07-10arm64: dts: st: fix timer used for ticksPatrick Delaunay
Remove always-on on generic ARM timer as the clock source provided by STGEN is deactivated in low power mode, STOP1 by example. Fixes: 5d30d03aaf78 ("arm64: dts: st: introduce stm32mp25 SoCs family") Signed-off-by: Patrick Delaunay <patrick.delaunay@foss.st.com> Link: https://lore.kernel.org/r/20250515151238.1.I85271ddb811a7cf73532fec90de7281cb24ce260@changeid Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-07-10s390/boot: Introduce jump_to_kernel() functionIlya Leoshkevich
Introduce a global function that jumps from the decompressor to the decompressed kernel. Put its address into svc_old_psw, from where GDB can take it without loading decompressor symbols. It should be available throughout the entire decompressor execution, because it's placed there statically, and nothing in the decompressor uses the SVC instruction. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250625154220.75300-2-iii@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2025-07-10s390/stp: Remove udelay from stp_sync_clock()Sven Schnelle
When an stp sync check is handled on a system with multiple cpus each cpu gets a machine check but only the first one actually handles the sync operation. All other CPUs spin waiting for the first one to finish with a short udelay(). But udelay can't be used here as the first CPU modifies tod_clock_base before performing the sync op. During this timeframe get_tod_clock_monotonic() might return a non-monotonic time. The time spent waiting should be very short and udelay is a busy loop anyways, therefore simply remove the udelay. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2025-07-10x86/lib: Add WBINVD and WBNOINVD helpers to target multiple CPUsZheyun Shen
Extract KVM's open-coded calls to do writeback caches on multiple CPUs to common library helpers for both WBINVD and WBNOINVD (KVM will use both). Put the onus on the caller to check for a non-empty mask to simplify the SMP=n implementation, e.g. so that it doesn't need to check that the one and only CPU in the system is present in the mask. [sean: move to lib, add SMP=n helpers, clarify usage] Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250128015345.7929-2-szy0127@sjtu.edu.cn Link: https://lore.kernel.org/20250522233733.3176144-5-seanjc@google.com
2025-07-10x86/lib: Add WBNOINVD helper functionsKevin Loughlin
In line with WBINVD usage, add WBNOINVD helper functions. Explicitly fall back to WBINVD (via alternative()) if WBNOINVD isn't supported even though the instruction itself is backwards compatible (WBNOINVD is WBINVD with an ignored REP prefix), so that disabling X86_FEATURE_WBNOINVD behaves as one would expect, e.g. in case there's a hardware issue that affects WBNOINVD. Opportunistically, add comments explaining the architectural behavior of WBINVD and WBNOINVD, and provide hints and pointers to uarch-specific behavior. Note, alternative() ensures compatibility with early boot code as needed. [ bp: Massage, fix typos, make export _GPL. ] Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/20250522233733.3176144-4-seanjc@google.com
2025-07-10x86/lib: Drop the unused return value from wbinvd_on_all_cpus()Sean Christopherson
Drop wbinvd_on_all_cpus()'s return value; both the "real" version and the stub always return '0', and none of the callers check the return. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250522233733.3176144-3-seanjc@google.com
2025-07-10arm64: dts: rockchip: Enable HDMI receiver on RK3588 EVB1Sebastian Reichel
Enable HDMI input port of the RK3588 EVB1. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://lore.kernel.org/r/20250704-rk3588-evb1-hdmi-rx-v1-1-248315c36ccd@kernel.org Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-07-10arm64: dts: rockchip: fix PHY handling for ROCK 4DSebastian Reichel
Old revisions of the ROCK 4D board have a dedicated crystal to supply the RTL8211F PHY's 25MHz clock input. At least some newer revisions instead use REFCLKO25M_GMAC0_OUT. The DT already has this half-prepared, but there are some issues: 1. The DT relies on auto-selecting the right PHY driver, which requires that it works good enough to read the ID registers. This does not work without the clock, which is handled by the PHY driver. By updating the compatible to contain the RTL8211F IDs, so that the operating system can choose the right PHY driver without relying on a pre-powered PHY. 2. Despite the name REFCLKO25M_GMAC0_OUT could also provide a different frequency, so ensure it is explicitly set to 25 MHz as expected by the PHY. 3. While at it switch from deprecated "enable-gpio" to standard "enable-gpios". Fixes: a0fb7eca9c09 ("arm64: dts: rockchip: Add Radxa ROCK 4D device tree") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://lore.kernel.org/r/20250704-rk3576-rock4d-phy-handling-fixes-v1-1-1d64130c4139@kernel.org Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-07-10arm64: dts: rockchip: Enable mipi dsi on rk3568-evb1-v10Andy Yan
Enable the w552793baa 1080x1920 dsi panel on rk3568 evb1. Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Link: https://lore.kernel.org/r/20250706113831.330799-1-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-07-10arm64: dts: rockchip: Add UFS support on the ROCK 4DDetlev Casanova
This device supports removable UFS chips, add support for it. Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Link: https://lore.kernel.org/r/20250708155010.401446-1-detlev.casanova@collabora.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-07-09alpha: replace sprintf()/strcpy() with scnprintf()/strscpy()Thorsten Blum
Replace sprintf() with the safer variant scnprintf() and use its return value instead of calculating the string length again using strlen(). Use strscpy() instead of the deprecated strcpy(). No functional changes intended. Link: https://github.com/KSPP/linux/issues/88 Link: https://lkml.kernel.org/r/20250521121840.5653-1-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: guoweikang <guoweikang.kernel@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm/ptdump: take the memory hotplug lock inside ptdump_walk_pgd()Anshuman Khandual
Memory hot remove unmaps and tears down various kernel page table regions as required. The ptdump code can race with concurrent modifications of the kernel page tables. When leaf entries are modified concurrently, the dump code may log stale or inconsistent information for a VA range, but this is otherwise not harmful. But when intermediate levels of kernel page table are freed, the dump code will continue to use memory that has been freed and potentially reallocated for another purpose. In such cases, the ptdump code may dereference bogus addresses, leading to a number of potential problems. To avoid the above mentioned race condition, platforms such as arm64, riscv and s390 take memory hotplug lock, while dumping kernel page table via the sysfs interface /sys/kernel/debug/kernel_page_tables. Similar race condition exists while checking for pages that might have been marked W+X via /sys/kernel/debug/kernel_page_tables/check_wx_pages which in turn calls ptdump_check_wx(). Instead of solving this race condition again, let's just move the memory hotplug lock inside generic ptdump_check_wx() which will benefit both the scenarios. Drop get_online_mems() and put_online_mems() combination from all existing platform ptdump code paths. Link: https://lkml.kernel.org/r/20250620052427.2092093-1-anshuman.khandual@arm.com Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: remove callers of pfn_t functionalityAlistair Popple
All PFN_* pfn_t flags have been removed. Therefore there is no longer a need for the pfn_t type and all uses can be replaced with normal pfns. Link: https://lkml.kernel.org/r/bbedfa576c9822f8032494efbe43544628698b1f.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Björn Töpel <bjorn@rivosinc.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: remove devmap related functions and page table bitsAlistair Popple
Now that DAX and all other reference counts to ZONE_DEVICE pages are managed normally there is no need for the special devmap PTE/PMD/PUD page table bits. So drop all references to these, freeing up a software defined page table bit on architectures supporting it. Link: https://lkml.kernel.org/r/6389398c32cc9daa3dfcaa9f79c7972525d310ce.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: Will Deacon <will@kernel.org> # arm64 Acked-by: David Hildenbrand <david@redhat.com> Suggested-by: Chunyan Zhang <zhang.lyra@gmail.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09powerpc: remove checks for devmap pages and PMDs/PUDsAlistair Popple
PFN_DEV no longer exists. This means no devmap PMDs or PUDs will be created, so checking for them is redundant. Instead mappings of pages that would have previously returned true for pXd_devmap() will return true for pXd_trans_huge() Link: https://lkml.kernel.org/r/31f63cc8dd518f9e2ec300681fe302eb4adf49b4.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Björn Töpel <bjorn@rivosinc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm/percpu: conditionally define _shared_alloc_tag via ↵Hao Ge
CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU Recently discovered this entry while checking kallsyms on ARM64: ffff800083e509c0 D _shared_alloc_tag If ARCH_NEEDS_WEAK_PER_CPU is not defined(it is only defined for s390 and alpha architectures), there's no need to statically define the percpu variable _shared_alloc_tag. Therefore, we need to implement isolation for this purpose. When building the core kernel code for s390 or alpha architectures, ARCH_NEEDS_WEAK_PER_CPU remains undefined (as it is gated by #if defined(MODULE)). However, when building modules for these architectures, the macro is explicitly defined. Therefore, we remove all instances of ARCH_NEEDS_WEAK_PER_CPU from the code and introduced CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU to replace the relevant logic. We can now conditionally define the perpcu variable _shared_alloc_tag based on CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU. This allows architectures (such as s390/alpha) that require weak definitions for percpu variables in modules to include the definition, while others can omit it via compile-time exclusion. Link: https://lkml.kernel.org/r/20250618015809.1235761-1-hao.ge@linux.dev Signed-off-by: Hao Ge <gehao@kylinos.cn> Suggested-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Chistoph Lameter <cl@linux.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: update architecture and driver code to use vm_flags_tLorenzo Stoakes
In future we intend to change the vm_flags_t type, so it isn't correct for architecture and driver code to assume it is unsigned long. Correct this assumption across the board. Overall, this patch does not introduce any functional change. Link: https://lkml.kernel.org/r/b6eb1894abc5555ece80bb08af5c022ef780c8bc.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: change vm_get_page_prot() to accept vm_flags_t argumentLorenzo Stoakes
Patch series "use vm_flags_t consistently". The VMA flags field vma->vm_flags is of type vm_flags_t. Right now this is exactly equivalent to unsigned long, but it should not be assumed to be. Much code that references vma->vm_flags already correctly uses vm_flags_t, but a fairly large chunk of code simply uses unsigned long and assumes that the two are equivalent. This series corrects that and has us use vm_flags_t consistently. This series is motivated by the desire to, in a future series, adjust vm_flags_t to be a u64 regardless of whether the kernel is 32-bit or 64-bit in order to deal with the VMA flag exhaustion issue and avoid all the various problems that arise from it (being unable to use certain features in 32-bit, being unable to add new flags except for 64-bit, etc.) This is therefore a critical first step towards that goal. At any rate, using the correct type is of value regardless. We additionally take the opportunity to refer to VMA flags as vm_flags where possible to make clear what we're referring to. Overall, this series does not introduce any functional change. This patch (of 3): We abstract the type of the VMA flags to vm_flags_t, however in may places it is simply assumed this is unsigned long, which is simply incorrect. At the moment this is simply an incongruity, however in future we plan to change this type and therefore this change is a critical requirement for doing so. Overall, this patch does not introduce any functional change. [lorenzo.stoakes@oracle.com: add missing vm_get_page_prot() instance, remove include] Link: https://lkml.kernel.org/r/552f88e1-2df8-4e95-92b8-812f7c8db829@lucifer.local Link: https://lkml.kernel.org/r/cover.1750274467.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/a12769720a2743f235643b158c4f4f0a9911daf0.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm/pagewalk: split walk_page_range_novma() into kernel/user partsLorenzo Stoakes
walk_page_range_novma() is rather confusing - it supports two modes, one used often, the other used only for debugging. The first mode is the common case of traversal of kernel page tables, which is what nearly all callers use this for. Secondly it provides an unusual debugging interface that allows for the traversal of page tables in a userland range of memory even for that memory which is not described by a VMA. It is far from certain that such page tables should even exist, but perhaps this is precisely why it is useful as a debugging mechanism. As a result, this is utilised by ptdump only. Historically, things were reversed - ptdump was the only user, and other parts of the kernel evolved to use the kernel page table walking here. Since we have some complicated and confusing locking rules for the novma case, it makes sense to separate the two usages into their own functions. Doing this also provide self-documentation as to the intent of the caller - are they doing something rather unusual or are they simply doing a standard kernel page table walk? We therefore establish two separate functions - walk_page_range_debug() for this single usage, and walk_kernel_page_table_range() for general kernel page table walking. The walk_page_range_debug() function is currently used to traverse both userland and kernel mappings, so we maintain this and in the case of kernel mappings being traversed, we have walk_page_range_debug() invoke walk_kernel_page_table_range() internally. We additionally make walk_page_range_debug() internal to mm. Link: https://lkml.kernel.org/r/20250605135104.90720-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Barry Song <baohua@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm/filemap: allow arch to request folio size for exec memoryRyan Roberts
Change the readahead config so that if it is being requested for an executable mapping, do a synchronous read into a set of folios with an arch-specified order and in a naturally aligned manner. We no longer center the read on the faulting page but simply align it down to the previous natural boundary. Additionally, we don't bother with an asynchronous part. On arm64 if memory is physically contiguous and naturally aligned to the "contpte" size, we can use contpte mappings, which improves utilization of the TLB. When paired with the "multi-size THP" feature, this works well to reduce dTLB pressure. However iTLB pressure is still high due to executable mappings having a low likelihood of being in the required folio size and mapping alignment, even when the filesystem supports readahead into large folios (e.g. XFS). The reason for the low likelihood is that the current readahead algorithm starts with an order-0 folio and increases the folio order by 2 every time the readahead mark is hit. But most executable memory tends to be accessed randomly and so the readahead mark is rarely hit and most executable folios remain order-0. So let's special-case the read(ahead) logic for executable mappings. The trade-off is performance improvement (due to more efficient storage of the translations in iTLB) vs potential for making reclaim more difficult (due to the folios being larger so if a part of the folio is hot the whole thing is considered hot). But executable memory is a small portion of the overall system memory so I doubt this will even register from a reclaim perspective. I've chosen 64K folio size for arm64 which benefits both the 4K and 16K base page size configs. Crucially the same amount of data is still read (usually 128K) so I'm not expecting any read amplification issues. I don't anticipate any write amplification because text is always RO. Note that the text region of an ELF file could be populated into the page cache for other reasons than taking a fault in a mmapped area. The most common case is due to the loader read()ing the header which can be shared with the beginning of text. So some text will still remain in small folios, but this simple, best effort change provides good performance improvements as is. Confine this special-case approach to the bounds of the VMA. This prevents wasting memory for any padding that might exist in the file between sections. Previously the padding would have been contained in order-0 folios and would be easy to reclaim. But now it would be part of a larger folio so more difficult to reclaim. Solve this by simply not reading it into memory in the first place. Benchmarking ============ The below shows pgbench and redis benchmarks on Graviton3 arm64 system. First, confirmation that this patch causes more text to be contained in 64K folios: +----------------------+---------------+---------------+---------------+ | File-backed folios by| system boot | pgbench | redis | | size as percentage of+-------+-------+-------+-------+-------+-------+ | all mapped text mem |before | after |before | after |before | after | +======================+=======+=======+=======+=======+=======+=======+ | base-page-4kB | 78% | 30% | 78% | 11% | 73% | 14% | | thp-aligned-8kB | 1% | 0% | 0% | 0% | 1% | 0% | | thp-aligned-16kB | 17% | 4% | 17% | 3% | 20% | 4% | | thp-aligned-32kB | 1% | 1% | 1% | 2% | 1% | 1% | | thp-aligned-64kB | 3% | 63% | 3% | 81% | 4% | 77% | | thp-aligned-128kB | 0% | 1% | 1% | 1% | 1% | 2% | | thp-unaligned-64kB | 0% | 0% | 0% | 1% | 0% | 1% | | thp-unaligned-128kB | 0% | 1% | 0% | 0% | 0% | 0% | | thp-partial | 0% | 0% | 0% | 1% | 0% | 1% | +----------------------+-------+-------+-------+-------+-------+-------+ | cont-aligned-64kB | 4% | 65% | 4% | 83% | 6% | 79% | +----------------------+-------+-------+-------+-------+-------+-------+ The above shows that for both workloads (each isolated with cgroups) as well as the general system state after boot, the amount of text backed by 4K and 16K folios reduces and the amount backed by 64K folios increases significantly. And the amount of text that is contpte-mapped significantly increases (see last row). And this is reflected in performance improvement. "(I)" indicates a statistically significant improvement. Note TPS and Reqs/sec are rates so bigger is better, ms is time so smaller is better: +-------------+-------------------------------------------+------------+ | Benchmark | Result Class | Improvemnt | +=============+===========================================+============+ | pts/pgbench | Scale: 1 Clients: 1 RO (TPS) | (I) 3.47% | | | Scale: 1 Clients: 1 RO - Latency (ms) | -2.88% | | | Scale: 1 Clients: 250 RO (TPS) | (I) 5.02% | | | Scale: 1 Clients: 250 RO - Latency (ms) | (I) -4.79% | | | Scale: 1 Clients: 1000 RO (TPS) | (I) 6.16% | | | Scale: 1 Clients: 1000 RO - Latency (ms) | (I) -5.82% | | | Scale: 100 Clients: 1 RO (TPS) | 2.51% | | | Scale: 100 Clients: 1 RO - Latency (ms) | -3.51% | | | Scale: 100 Clients: 250 RO (TPS) | (I) 4.75% | | | Scale: 100 Clients: 250 RO - Latency (ms) | (I) -4.44% | | | Scale: 100 Clients: 1000 RO (TPS) | (I) 6.34% | | | Scale: 100 Clients: 1000 RO - Latency (ms)| (I) -5.95% | +-------------+-------------------------------------------+------------+ | pts/redis | Test: GET Connections: 50 (Reqs/sec) | (I) 3.20% | | | Test: GET Connections: 1000 (Reqs/sec) | (I) 2.55% | | | Test: LPOP Connections: 50 (Reqs/sec) | (I) 4.59% | | | Test: LPOP Connections: 1000 (Reqs/sec) | (I) 4.81% | | | Test: LPUSH Connections: 50 (Reqs/sec) | (I) 5.31% | | | Test: LPUSH Connections: 1000 (Reqs/sec) | (I) 4.36% | | | Test: SADD Connections: 50 (Reqs/sec) | (I) 2.64% | | | Test: SADD Connections: 1000 (Reqs/sec) | (I) 4.15% | | | Test: SET Connections: 50 (Reqs/sec) | (I) 3.11% | | | Test: SET Connections: 1000 (Reqs/sec) | (I) 3.36% | +-------------+-------------------------------------------+------------+ [ryan.roberts@arm.com: fix use-after-free] Link: https://lkml.kernel.org/r/ea7f9da7-9a9f-4b85-9d0a-35b320f5ed25@arm.com [ryan.roberts@arm.com: use the vma_pages() helper instead of open-coding] Link: https://lkml.kernel.org/r/0e0f674b-3b7e-494f-ae7a-fc9dbb98dad4@arm.com Link: https://lkml.kernel.org/r/20250609092729.274960-6-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Will Deacon <will@kernel.org> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09drivers/base/node: rename __register_one_node() to register_one_node()Donet Tom
The register_one_node() function was a simple wrapper around __register_one_node(). To simplify the code, register_one_node() has been removed, and __register_one_node() has been renamed to register_one_node(). Link: https://lkml.kernel.org/r/8262cd0f44eeb048a1fcd3ac8382760d7f7dea60.1748452242.git.donettom@linux.ibm.com Signed-off-by: Donet Tom <donettom@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-10arm64: dts: ti: k3-am69-sk: Add bootph-all property to enable Ethernet bootChintan Vankar
Ethernet boot requires CPSW nodes to be present starting from R5 SPL stage. Add bootph-all property to required nodes to enable Ethernet boot for SK-AM69. Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Chintan Vankar <c-vankar@ti.com> Link: https://lore.kernel.org/r/20250709105326.232608-5-c-vankar@ti.com Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-07-10arm64: dts: ti: k3-j722s-evm: Add bootph-all property to enable Ethernet bootChintan Vankar
Ethernet boot requires CPSW nodes to be present starting from R5 SPL stage. Add bootph-all property to required nodes to enable Ethernet boot for J722S-EVM. Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Chintan Vankar <c-vankar@ti.com> Link: https://lore.kernel.org/r/20250709105326.232608-4-c-vankar@ti.com Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-07-10arm64: dts: ti: k3-am62p5-sk: Add bootph-all property to enable Ethernet bootChintan Vankar
Ethernet boot requires CPSW nodes to be present starting from R5 SPL stage. Add bootph-all property to required nodes to enable Ethernet boot for AM62P5-SK. Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Chintan Vankar <c-vankar@ti.com> Link: https://lore.kernel.org/r/20250709105326.232608-3-c-vankar@ti.com Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-07-10arm64: dts: ti: k3-am68-sk-base-board: Add bootph-all property to enable ↵Chintan Vankar
Ethernet boot Ethernet boot requires CPSW nodes to be present starting from R5 SPL stage. Add bootph-all property to required nodes to enable Ethernet boot on SK-AM68. Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Chintan Vankar <c-vankar@ti.com> Link: https://lore.kernel.org/r/20250709105326.232608-2-c-vankar@ti.com Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-07-10arm64: dts: ti: Add support for AM62D2-EVMParesh Bhagat
AM62D-EVM evaluation module (EVM) is a low-cost expandable platform board designed for AM62D2 SoC from TI. It supports the following interfaces: * 4 GB LPDDR4 RAM * x2 Gigabit Ethernet expansion connectors * x4 3.5mm TRS Audio Jack Line In * x4 3.5mm TRS Audio Jack Line Out * x2 Audio expansion connectors * x1 Type-A USB 2.0, x1 Type-C dual-role device (DRD) USB 2.0 * x1 UHS-1 capable micro SD card slot * 32 GB eMMC Flash * 512 Mb OSPI NOR flash * x4 UARTs via USB 2.0-B * XDS110 for onboard JTAG debug using USB * Temperature sensors, user push buttons and LEDs Although AM62D2 and AM62A7 differ in peripheral capabilities example multimedia, VPAC, and display subsystems, the core architecture remains same. To reduce duplication, AM62D support reuses the AM62A dtsi and the necessary overrides will be handled in SOC specific dtsi file and a board specific dts. Add basic support for AM62D2-EVM. Schematics Link - https://www.ti.com/lit/zip/sprcal5 Signed-off-by: Paresh Bhagat <p-bhagat@ti.com> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Andrew Davis <afd@ti.com> Reviewed-by: Bryan Brattlof <bb@ti.com> Link: https://lore.kernel.org/r/20250708085839.1498505-5-p-bhagat@ti.com Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-07-10arm64: dts: ti: Add pinctrl entries for AM62D2 family of SoCsParesh Bhagat
Update k3-pinctrl file to include pin definitions for AM62D2 family of SoCs. Signed-off-by: Paresh Bhagat <p-bhagat@ti.com> Reviewed-by: Devarsh Thakkar <devarsht@ti.com> Link: https://lore.kernel.org/r/20250708085839.1498505-4-p-bhagat@ti.com Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-07-10arm64: dts: ti: Add bootph property to nodes at source for am62aParesh Bhagat
Add bootph property directly into the original definitions of relevant nodes (e.g., power domains, USB controllers, and other peripherals) within their respective DTSI files (ex. main, mcu, and wakeup) for am62a. By defining bootph in the nodes source definitions instead of appending it later in final DTS files, this change ensures that the property is inherently present wherever the nodes are reused across derived device trees. Signed-off-by: Paresh Bhagat <p-bhagat@ti.com> Reviewed-by: Bryan Brattlof <bb@ti.com> Link: https://lore.kernel.org/r/20250708085839.1498505-2-p-bhagat@ti.com Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-07-09x86/hyperv: Clean up hv_map/unmap_interrupt() return valuesNuno Das Neves
Fix the return values of these hypercall helpers so they return a negated errno either directly or via hv_result_to_errno(). Update the callers to check for errno instead of using hv_status_success(), and remove redundant error printing. While at it, rearrange some variable declarations to adhere to style guidelines i.e. "reverse fir tree order". Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1751582677-30930-5-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1751582677-30930-5-git-send-email-nunodasneves@linux.microsoft.com>
2025-07-09x86/hyperv: Fix usage of cpu_online_mask to get valid cpuNuno Das Neves
Accessing cpu_online_mask here is problematic because the cpus read lock is not held in this context. However, cpu_online_mask isn't needed here since the effective affinity mask is guaranteed to be valid in this callback. So, just use cpumask_first() to get the cpu instead of ANDing it with cpus_online_mask unnecessarily. Fixes: e39397d1fd68 ("x86/hyperv: implement an MSI domain for root partition") Reported-by: Michael Kelley <mhklinux@outlook.com> Closes: https://lore.kernel.org/linux-hyperv/SN6PR02MB4157639630F8AD2D8FD8F52FD475A@SN6PR02MB4157.namprd02.prod.outlook.com/ Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1751582677-30930-4-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1751582677-30930-4-git-send-email-nunodasneves@linux.microsoft.com>
2025-07-09x86/hyperv: Fix warnings for missing export.h header inclusionNaman Jain
Fix below warning in Hyper-V drivers that comes when kernel is compiled with W=1 option. Include export.h in driver files to fix it. * warning: EXPORT_SYMBOL() is used, but #include <linux/export.h> is missing Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://lore.kernel.org/r/20250611100459.92900-3-namjain@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250611100459.92900-3-namjain@linux.microsoft.com>
2025-07-09riscv: mm: Add page fault trace pointsNam Cao
Add page fault trace points, which are useful to implement RV monitor that watches page faults. Signed-off-by: Nam Cao <namcao@linutronix.de> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-09KVM: x86: avoid underflow when scaling TSC frequencyPaolo Bonzini
In function kvm_guest_time_update(), __scale_tsc() is used to calculate a TSC *frequency* rather than a TSC value. With low-enough ratios, a TSC value that is less than 1 would underflow to 0 and to an infinite while loop in kvm_get_time_scale(): kvm_guest_time_update(struct kvm_vcpu *v) if (kvm_caps.has_tsc_control) tgt_tsc_khz = kvm_scale_tsc(tgt_tsc_khz, v->arch.l1_tsc_scaling_ratio); __scale_tsc(u64 ratio, u64 tsc) ratio=122380531, tsc=2299998, N=48 ratio*tsc >> N = 0.999... -> 0 Later in the function: Call Trace: <TASK> kvm_get_time_scale arch/x86/kvm/x86.c:2458 [inline] kvm_guest_time_update+0x926/0xb00 arch/x86/kvm/x86.c:3268 vcpu_enter_guest.constprop.0+0x1e70/0x3cf0 arch/x86/kvm/x86.c:10678 vcpu_run+0x129/0x8d0 arch/x86/kvm/x86.c:11126 kvm_arch_vcpu_ioctl_run+0x37a/0x13d0 arch/x86/kvm/x86.c:11352 kvm_vcpu_ioctl+0x56b/0xe60 virt/kvm/kvm_main.c:4188 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl+0x12d/0x190 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x59/0x110 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x78/0xe2 This can really happen only when fuzzing, since the TSC frequency would have to be nonsensically low. Fixes: 35181e86df97 ("KVM: x86: Add a common TSC scaling function") Reported-by: Yuntao Liu <liuyuntao12@huawei.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-09KVM: arm64: Populate ESR_ELx.EC for emulated SError injectionOliver Upton
The hardware vSError injection mechanism populates ESR_ELx.EC as part of ESR propagation and the contents of VSESR_EL2 populate the ISS field. Of course, this means our emulated injection needs to set up the EC correctly for an SError too. Fixes: ce66109cec86 ("KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set") Link: https://lore.kernel.org/r/20250708230632.1954240-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-09KVM: x86: Provide a capability to disable APERF/MPERF read interceptsJim Mattson
Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs without interception. The IA32_APERF and IA32_MPERF MSRs are not virtualized. Writes are not handled at all. The MSR values are not zeroed on vCPU creation, saved on suspend, or restored on resume. No accommodation is made for processor migration or for sharing a logical processor with other tasks. No adjustments are made for non-unit TSC multipliers. The MSRs do not account for time the same way as the comparable PMU events, whether the PMU is virtualized by the traditional emulation method or the new mediated pass-through approach. Nonetheless, in a properly constrained environment, this capability can be combined with a guest CPUID table that advertises support for CPUID.6:ECX.APERFMPERF[bit 0] to induce a Linux guest to report the effective physical CPU frequency in /proc/cpuinfo. Moreover, there is no performance cost for this capability. Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250530185239.2335185-3-jmattson@google.com Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250626001225.744268-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09KVM: x86: Replace growing set of *_in_guest bools with a u64Jim Mattson
Store each "disabled exit" boolean in a single bit rather than a byte. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250530185239.2335185-2-jmattson@google.com Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250626001225.744268-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09KVM: x86: Advertise support for LKGSXin Li
Advertise support for LKGS (load into IA32_KERNEL_GS_BASE) to userspace if the instruction is supported by the underlying CPU. LKGS is introduced with FRED to completely eliminate the need to swapgs explicilty. It behaves like the MOV to GS instruction except that it loads the base address into the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor cache, which is exactly what Linux kernel does to load a user level GS base. Thus there is no need to SWAPGS away from the kernel GS base. LKGS is an independent CPU feature that works correctly in a KVM guest without requiring explicit enablement. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Link: https://lore.kernel.org/r/20250626173521.2301088-1-xin@zytor.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09KVM: VMX: Add a macro to track which DEBUGCTL bits are host-ownedSean Christopherson
Add VMX_HOST_OWNED_DEBUGCTL_BITS to track which bits are host-owned, i.e. need to be preserved when running the guest, to dedup the logic without having to incur a memory load to get at kvm_x86_ops.HOST_OWNED_DEBUGCTL. No functional change intended. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/all/aF1yni8U6XNkyfRf@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>