summaryrefslogtreecommitdiff
path: root/arch/loongarch
AgeCommit message (Collapse)Author
2024-09-17LoongArch: Enable objtool for ClangTiezhu Yang
For now, it can enable objtool for Clang, just remove !CC_IS_CLANG for HAVE_OBJTOOL in arch/loongarch/Kconfig. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-17Merge tag 'irq-core-2024-09-16' into loongarch-nextHuacai Chen
LoongArch architecture changes for 6.12 depend on the irq core changes about AVEC irqchip to avoid confliction, so merge them to create a base.
2024-09-17mm: z3fold: deprecate CONFIG_Z3FOLDYosry Ahmed
The z3fold compressed pages allocator is rarely used, most users use zsmalloc. The only disadvantage of zsmalloc in comparison is the dependency on MMU, and zbud is a more common option for !MMU as it was the default zswap allocator for a long time. Historically, zsmalloc had worse latency than zbud and z3fold but offered better memory savings. This is no longer the case as shown by a simple recent analysis [1]. That analysis showed that z3fold does not have any advantage over zsmalloc or zbud considering both performance and memory usage. In a kernel build test on tmpfs in a limited cgroup, z3fold took 3% more time and used 1.8% more memory. The latency of zswap_load() was 7% higher, and that of zswap_store() was 10% higher. Zsmalloc is better in all metrics. Moreover, z3fold apparently has latent bugs, which was made noticeable by a recent soft lockup bug report with z3fold [2]. Switching to zsmalloc not only fixed the problem, but also reduced the swap usage from 6~8G to 1~2G. Other users have also reported being bitten by mistakenly enabling z3fold. Other than hurting users, z3fold is repeatedly causing wasted engineering effort. Apart from investigating the above bug, it came up in multiple development discussions (e.g. [3]) as something we need to handle, when there aren't any legit users (at least not intentionally). The natural course of action is to deprecate z3fold, and remove in a few cycles if no objections are raised from active users. Next on the list should be zbud, as it offers marginal latency gains at the cost of huge memory waste when compared to zsmalloc. That one will need to wait until zsmalloc does not depend on MMU. Rename the user-visible config option from CONFIG_Z3FOLD to CONFIG_Z3FOLD_DEPRECATED so that users with CONFIG_Z3FOLD=y get a new prompt with explanation during make oldconfig. Also, remove CONFIG_Z3FOLD=y from defconfigs. [1]https://lore.kernel.org/lkml/CAJD7tkbRF6od-2x_L8-A1QL3=2Ww13sCj4S3i4bNndqF+3+_Vg@mail.gmail.com/ [2]https://lore.kernel.org/lkml/EF0ABD3E-A239-4111-A8AB-5C442E759CF3@gmail.com/ [3]https://lore.kernel.org/lkml/CAJD7tkbnmeVugfunffSovJf9FAgy9rhBVt_tx=nxUveLUfqVsA@mail.gmail.com/ [arnd@arndb.de: deprecate ZSWAP_ZPOOL_DEFAULT_Z3FOLD as well] Link: https://lkml.kernel.org/r/20240909202625.1054880-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20240904233343.933462-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vitaly Wool <vitaly.wool@konsulko.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17Merge tag 'irq-core-2024-09-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Core: - Remove a global lock in the affinity setting code The lock protects a cpumask for intermediate results and the lock causes a bottleneck on simultaneous start of multiple virtual machines. Replace the lock and the static cpumask with a per CPU cpumask which is nicely serialized by raw spinlock held when executing this code. - Provide support for giving a suffix to interrupt domain names. That's required to support devices with subfunctions so that the domain names are distinct even if they originate from the same device node. - The usual set of cleanups and enhancements all over the place Drivers: - Support for longarch AVEC interrupt chip - Refurbishment of the Armada driver so it can be extended for new variants. - The usual set of cleanups and enhancements all over the place" * tag 'irq-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits) genirq: Use cpumask_intersects() genirq/cpuhotplug: Use cpumask_intersects() irqchip/apple-aic: Only access system registers on SoCs which provide them irqchip/apple-aic: Add a new "Global fast IPIs only" feature level irqchip/apple-aic: Skip unnecessary enabling of use_fast_ipi dt-bindings: apple,aic: Document A7-A11 compatibles irqdomain: Use IS_ERR_OR_NULL() in irq_domain_trim_hierarchy() genirq/msi: Use kmemdup_array() instead of kmemdup() genirq/proc: Change the return value for set affinity permission error genirq/proc: Use irq_move_pending() in show_irq_affinity() genirq/proc: Correctly set file permissions for affinity control files genirq: Get rid of global lock in irq_do_set_affinity() genirq: Fix typo in struct comment irqchip/loongarch-avec: Add AVEC irqchip support irqchip/loongson-pch-msi: Prepare get_pch_msi_handle() for AVECINTC irqchip/loongson-eiointc: Rename CPUHP_AP_IRQ_LOONGARCH_STARTING LoongArch: Architectural preparation for AVEC irqchip LoongArch: Move irqchip function prototypes to irq-loongson.h irqchip/loongson-pch-msi: Switch to MSI parent domains softirq: Remove unused 'action' parameter from action callback ...
2024-09-13LoongArch: vDSO: Wire up getrandom() vDSO implementationXi Ruoyao
Hook up the generic vDSO implementation to the LoongArch vDSO data page by providing the required __arch_chacha20_blocks_nostack, __arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also wire up the selftests. Signed-off-by: Xi Ruoyao <xry111@xry111.site> Acked-by: Huacai Chen <chenhuacai@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-12LoongArch: KVM: Implement function kvm_para_has_feature()Bibo Mao
Implement function kvm_para_has_feature() to detect supported paravirt features. It can be used by device driver to detect and enable paravirt features, such as the EIOINTC irqchip driver is able to detect feature KVM_FEATURE_VIRT_EXTIOI and do some optimization. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-12LoongArch: KVM: Enable paravirt feature control from VMMBibo Mao
Export kernel paravirt features to user space, so that VMM can control each single paravirt feature. By default paravirt features will be the same with kvm supported features if VMM does not set it. Also a new feature KVM_FEATURE_VIRT_EXTIOI is added which can be set from user space. This feature indicates that the virt EIOINTC can route interrupts to 256 vCPUs, rather than 4 vCPUs like with real HW. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-12LoongArch: KVM: Add PMU support for guestSong Gao
On LoongArch, the host and guest have their own PMU CSRs registers and they share PMU hardware resources. A set of PMU CSRs consists of a CTRL register and a CNTR register. We can set which PMU CSRs are used by the guest by writing to the GCFG register [24:26] bits. On KVM side: - Save the host PMU CSRs into structure kvm_context. - If the host supports the PMU feature. - When entering guest mode, save the host PMU CSRs and restore the guest PMU CSRs. - When exiting guest mode, save the guest PMU CSRs and restore the host PMU CSRs. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11LoongArch: KVM: Add vm migration support for LBT registersBibo Mao
Every vcpu has separate LBT registers. And there are four scr registers, one flags and ftop register for LBT extension. When VM migrates, VMM needs to get LBT registers for every vcpu. Here macro KVM_REG_LOONGARCH_LBT is added for new vcpu lbt register type, the following macro is added to get/put LBT registers. KVM_REG_LOONGARCH_LBT_SCR0 KVM_REG_LOONGARCH_LBT_SCR1 KVM_REG_LOONGARCH_LBT_SCR2 KVM_REG_LOONGARCH_LBT_SCR3 KVM_REG_LOONGARCH_LBT_EFLAGS KVM_REG_LOONGARCH_LBT_FTOP Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11LoongArch: KVM: Add Binary Translation extension supportBibo Mao
Loongson Binary Translation (LBT) is used to accelerate binary translation, which contains 4 scratch registers (scr0 to scr3), x86/ARM eflags (eflags) and x87 fpu stack pointer (ftop). Like FPU extension, here a lazy enabling method is used for LBT. the LBT context is saved/restored on the vcpu context switch path. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11LoongArch: KVM: Add VM feature detection functionBibo Mao
Loongson SIMD Extension (LSX), Loongson Advanced SIMD Extension (LASX) and Loongson Binary Translation (LBT) features are defined in register CPUCFG2. Two kinds of LSX/LASX/LBT feature detection are added here, one is VCPU feature, and the other is VM feature. VCPU feature dection can only work with VCPU thread itself, and requires VCPU thread is created already. So LSX/LASX/LBT feature detection for VM is added also, it can be done even if VM is not created, and also can be done by any threads besides VCPU threads. Here ioctl command KVM_HAS_DEVICE_ATTR is added for VM, and macro KVM_LOONGARCH_VM_FEAT_CTRL is added to check supported feature. And five sub-features relative with LSX/LASX/LBT are added as following: KVM_LOONGARCH_VM_FEAT_LSX KVM_LOONGARCH_VM_FEAT_LASX KVM_LOONGARCH_VM_FEAT_X86BT KVM_LOONGARCH_VM_FEAT_ARMBT KVM_LOONGARCH_VM_FEAT_MIPSBT Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11LoongArch: Revert qspinlock to test-and-set simple lock on VMBibo Mao
Similar with x86, when VM is detected, revert to a simple test-and-set lock to avoid the horrors of queue preemption. Tested on 3C5000 Dual-way machine with 32 cores and 2 numa nodes, test case is kcbench on kernel mainline 6.10, the detailed command is "kcbench --src /root/src/linux" Performance on host machine kernel compile time performance impact Original 150.29 seconds With patch 150.19 seconds almost no impact Performance on virtual machine: 1. 1 VM with 32 vCPUs and 2 numa node, numa node pinned kernel compile time performance impact Original 170.87 seconds With patch 171.73 seconds almost no impact 2. 2 VMs, each VM with 32 vCPUs and 2 numa node, numa node pinned kernel compile time performance impact Original 2362.04 seconds With patch 354.73 seconds +565% Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-09mm: make arch_get_unmapped_area() take vm_flags by defaultMark Brown
Patch series "mm: Care about shadow stack guard gap when getting an unmapped area", v2. As covered in the commit log for c44357c2e76b ("x86/mm: care about shadow stack guard gap during placement") our current mmap() implementation does not take care to ensure that a new mapping isn't placed with existing mappings inside it's own guard gaps. This is particularly important for shadow stacks since if two shadow stacks end up getting placed adjacent to each other then they can overflow into each other which weakens the protection offered by the feature. On x86 there is a custom arch_get_unmapped_area() which was updated by the above commit to cover this case by specifying a start_gap for allocations with VM_SHADOW_STACK. Both arm64 and RISC-V have equivalent features and use the generic implementation of arch_get_unmapped_area() so let's make the equivalent change there so they also don't get shadow stack pages placed without guard pages. The arm64 and RISC-V shadow stack implementations are currently on the list: https://lore.kernel.org/r/20240829-arm64-gcs-v12-0-42fec94743 https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ Given the addition of the use of vm_flags in the generic implementation we also simplify the set of possibilities that have to be dealt with in the core code by making arch_get_unmapped_area() take vm_flags as standard. This is a bit invasive since the prototype change touches quite a few architectures but since the parameter is ignored the change is straightforward, the simplification for the generic code seems worth it. This patch (of 3): When we introduced arch_get_unmapped_area_vmflags() in 961148704acd ("mm: introduce arch_get_unmapped_area_vmflags()") we did so as part of properly supporting guard pages for shadow stacks on x86_64, which uses a custom arch_get_unmapped_area(). Equivalent features are also present on both arm64 and RISC-V, both of which use the generic implementation of arch_get_unmapped_area() and will require equivalent modification there. Rather than continue to deal with having two versions of the functions let's bite the bullet and have all implementations of arch_get_unmapped_area() take vm_flags as a parameter. The new parameter is currently ignored by all implementations other than x86. The only caller that doesn't have a vm_flags available is mm_get_unmapped_area(), as for the x86 implementation and the wrapper used on other architectures this is modified to supply no flags. No functional changes. Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-0-a46b8b6dc0ed@kernel.org Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-1-a46b8b6dc0ed@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Helge Deller <deller@gmx.de> [parisc] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-04KVM: Rename arch hooks related to per-CPU virtualization enablingSean Christopherson
Rename the per-CPU hooks used to enable virtualization in hardware to align with the KVM-wide helpers in kvm_main.c, and to better capture that the callbacks are invoked on every online CPU. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-ID: <20240830043600.127750-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-03arch, mm: pull out allocation of NODE_DATA to generic codeMike Rapoport (Microsoft)
Architectures that support NUMA duplicate the code that allocates NODE_DATA on the node-local memory with slight variations in reporting of the addresses where the memory was allocated. Use x86 version as the basis for the generic alloc_node_data() function and call this function in architecture specific numa initialization. Round up node data size to SMP_CACHE_BYTES rather than to PAGE_SIZE like x86 used to do since the bootmem era when allocation granularity was PAGE_SIZE anyway. Link: https://lkml.kernel.org/r/20240807064110.1003856-10-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03arch, mm: move definition of node_data to generic codeMike Rapoport (Microsoft)
Every architecture that supports NUMA defines node_data in the same way: struct pglist_data *node_data[MAX_NUMNODES]; No reason to keep multiple copies of this definition and its forward declarations, especially when such forward declaration is the only thing in include/asm/mmzone.h for many architectures. Add definition and declaration of node_data to generic code and drop architecture-specific versions. Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-08-26LoongArch: KVM: Invalidate guest steal time address on vCPU resetBibo Mao
If ParaVirt steal time feature is enabled, there is a percpu gpa address passed from guest vCPU and host modifies guest memory space with this gpa address. When vCPU is reset normally, it will notify host and invalidate gpa address. However if VM is crashed and VMM reboots VM forcely, the vCPU reboot notification callback will not be called in VM. Host needs invalidate the gpa address, else host will modify guest memory during VM reboots. Here it is invalidated from the vCPU KVM_REG_LOONGARCH_VCPU_RESET ioctl interface. Also funciton kvm_reset_timer() is removed at vCPU reset stage, since SW emulated timer is only used in vCPU block state. When a vCPU is removed from the block waiting queue, kvm_restore_timer() is called and SW timer is cancelled. And the timer register is also cleared at VMM when a vCPU is reset. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-26LoongArch: Add ifdefs to fix LSX and LASX related warningsTiezhu Yang
There exist some warnings when building kernel if CONFIG_CPU_HAS_LBT is set but CONFIG_CPU_HAS_LSX and CONFIG_CPU_HAS_LASX are not set. In this case, there are no definitions of _restore_lsx & _restore_lasx and there are also no definitions of kvm_restore_lsx & kvm_restore_lasx in fpu.S and switch.S respectively, just add some ifdefs to fix these warnings. AS arch/loongarch/kernel/fpu.o arch/loongarch/kernel/fpu.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 arch/loongarch/kernel/fpu.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 AS [M] arch/loongarch/kvm/switch.o arch/loongarch/kvm/switch.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 arch/loongarch/kvm/switch.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 MODPOST Module.symvers ERROR: modpost: "kvm_restore_lsx" [arch/loongarch/kvm/kvm.ko] undefined! ERROR: modpost: "kvm_restore_lasx" [arch/loongarch/kvm/kvm.ko] undefined! Cc: stable@vger.kernel.org # 6.9+ Fixes: cb8a2ef0848c ("LoongArch: Add ORC stack unwinder support") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408120955.qls5oNQY-lkp@intel.com/ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-26LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBEHuacai Chen
Currently we call irq_set_noprobe() in a loop for all IRQs, but indeed it only works for IRQs below NR_IRQS_LEGACY because at init_IRQ() only legacy interrupts have been allocated. Instead, we can define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE in asm/hwirq.h and the core will automatically set the flag for all interrupts. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
2024-08-25LoongArch: Remove the unused dma-direct.hMiao Wang
dma-direct.h is introduced in commit d4b6f1562a3c3284 ("LoongArch: Add Non-Uniform Memory Access (NUMA) support"). In commit c78c43fe7d42524c ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA"), ARCH_HAS_PHYS_TO_DMA was deselected and the coresponding phys_to_dma()/ dma_to_phys() functions were removed. However, the unused dma-direct.h was left behind, which is removed by this patch. Cc: <stable@vger.kernel.org> Fixes: c78c43fe7d42 ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA") Signed-off-by: Miao Wang <shankerwangmiao@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-23irqchip/loongarch-avec: Add AVEC irqchip supportTianyang Zhang
Introduce the advanced extended interrupt controllers (AVECINTC). This feature will allow each core to have 256 independent interrupt vectors and MSI interrupts can be independently routed to any vector on any CPU. The whole topology of irqchips in LoongArch machines looks like this if AVECINTC is supported: +-----+ +-----------------------+ +-------+ | IPI | --> | CPUINTC | <-- | Timer | +-----+ +-----------------------+ +-------+ ^ ^ ^ | | | +---------+ +----------+ +---------+ +-------+ | EIOINTC | | AVECINTC | | LIOINTC | <-- | UARTs | +---------+ +----------+ +---------+ +-------+ ^ ^ | | +---------+ +---------+ | PCH-PIC | | PCH-MSI | +---------+ +---------+ ^ ^ ^ | | | +---------+ +---------+ +---------+ | Devices | | PCH-LPC | | Devices | +---------+ +---------+ +---------+ ^ | +---------+ | Devices | +---------+ Co-developed-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Co-developed-by: Liupu Wang <wangliupu@loongson.cn> Signed-off-by: Liupu Wang <wangliupu@loongson.cn> Co-developed-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240823104337.25577-2-zhangtianyang@loongson.cn
2024-08-23LoongArch: Architectural preparation for AVEC irqchipHuacai Chen
Add architectural preparation for AVEC irqchip, including: 1. CPUCFG feature bits definition for AVEC; 2. Detection of AVEC irqchip in cpu_probe(); 3. New IPI type definition (IPI_CLEAR_VECTOR) for AVEC; 4. Provide arch_probe_nr_irqs() for large NR_IRQS; 5. Other related changes about the number of interrupts. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240823103936.25092-2-zhangtianyang@loongson.cn
2024-08-23LoongArch: Move irqchip function prototypes to irq-loongson.hHuacai Chen
Some irqchip functions are only for internal use by irqchip drivers, so move their prototypes from asm/irq.h to drivers/irqchip/irq-loongson.h. All related driver files include the new irq-loongson.h. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240823103936.25092-1-zhangtianyang@loongson.cn
2024-08-07LoongArch: KVM: Remove undefined a6 argument comment for kvm_hypercall()Dandan Zhang
The kvm_hypercall() set for LoongArch is limited to a1-a5. So the mention of a6 in the comment is undefined that needs to be rectified. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Dandan Zhang <zhangdandan@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-07LoongArch: KVM: Remove unnecessary definition of KVM_PRIVATE_MEM_SLOTSYuli Wang
1. "KVM_PRIVATE_MEM_SLOTS" is renamed as "KVM_INTERNAL_MEM_SLOTS". 2. "KVM_INTERNAL_MEM_SLOTS" defaults to zero, so it is not necessary to define it in LoongArch's asm/kvm_host.h. Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bdd1c37a315bc50ab14066c4852bc8dcf070451e Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b075450868dbc0950f0942617f222eeb989cad10 Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-07LoongArch: Use accessors to page table entries instead of direct dereferenceHuacai Chen
As very well explained in commit 20a004e7b017cce282 ("arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables"), an architecture whose page table walker can modify the PTE in parallel must use READ_ONCE()/ WRITE_ONCE() macro to avoid any compiler transformation. So apply that to LoongArch which is such an architecture, in order to avoid potential problems. Similar to commit edf955647269422e ("riscv: Use accessors to page table entries instead of direct dereference"). Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-07LoongArch: Enable general EFI poweroff methodMiao Wang
efi_shutdown_init() can register a general sys_off handler named efi_power_off(). Enable this by providing efi_poweroff_required(), like arm and x86. Since EFI poweroff is also supported on LoongArch, and the enablement makes the poweroff function usable for hardwares which lack ACPI S5. We prefer ACPI poweroff rather than EFI poweroff (like x86), so we only require EFI poweroff if acpi_gbl_reduced_hardware or acpi_no_s5 is true. Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Miao Wang <shankerwangmiao@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-02syscalls: fix syscall macros for newfstat/newfstatatArnd Bergmann
The __NR_newfstat and __NR_newfstatat macros accidentally got renamed in the conversion to the syscall.tbl format, dropping the 'new' portion of the name. In an unrelated change, the two syscalls are no longer architecture specific but are once more defined on all 64-bit architectures, so the 'newstat' ABI keyword can be dropped from the table as a simplification. Fixes: Fixes: 4fe53bf2ba0a ("syscalls: add generic scripts/syscall.tbl") Closes: https://lore.kernel.org/lkml/838053e0-b186-4e9f-9668-9a3384a71f23@app.fastmail.com/T/#t Reported-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-27Merge tag 'riscv-for-linus-6.11-mw2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for NUMA (via SRAT and SLIT), console output (via SPCR), and cache info (via PPTT) on ACPI-based systems. - The trap entry/exit code no longer breaks the return address stack predictor on many systems, which results in an improvement to trap latency. - Support for HAVE_ARCH_STACKLEAK. - The sv39 linear map has been extended to support 128GiB mappings. - The frequency of the mtime CSR is now visible via hwprobe. * tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits) RISC-V: Provide the frequency of time CSR via hwprobe riscv: Extend sv39 linear mapping max size to 128G riscv: enable HAVE_ARCH_STACKLEAK riscv: signal: Remove unlikely() from WARN_ON() condition riscv: Improve exception and system call latency RISC-V: Select ACPI PPTT drivers riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init() RISC-V: ACPI: Enable SPCR table for console output on RISC-V riscv: boot: remove duplicated targets line trace: riscv: Remove deprecated kprobe on ftrace support riscv: cpufeature: Extract common elements from extension checking riscv: Introduce vendor variants of extension helpers riscv: Add vendor extensions to /proc/cpuinfo riscv: Extend cpufeature.c to detect vendor extensions RISC-V: run savedefconfig for defconfig RISC-V: hwprobe: sort EXT_KEY()s in hwprobe_isa_ext0() alphabetically ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_init ACPI: NUMA: change the ACPI_NUMA to a hidden option ACPI: NUMA: Add handler for SRAT RINTC affinity structure ...
2024-07-22Merge tag 'loongarch-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Define __ARCH_WANT_NEW_STAT in unistd.h - Always enumerate MADT and setup logical-physical CPU mapping - Add irq_work support via self IPIs - Add RANDOMIZE_KSTACK_OFFSET support - Add ARCH_HAS_PTE_DEVMAP support - Add ARCH_HAS_DEBUG_VM_PGTABLE support - Add writecombine support for DMW-based ioremap() - Add architectural preparation for CPUFreq - Add ACPI standard hardware register based S3 support - Add support for relocating the kernel with RELR relocation - Some bug fixes and other small changes * tag 'loongarch-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Make the users of larch_insn_gen_break() constant LoongArch: Check TIF_LOAD_WATCH to enable user space watchpoint LoongArch: Use rustc option -Zdirect-access-external-data LoongArch: Add support for relocating the kernel with RELR relocation LoongArch: Remove a redundant checking in relocator LoongArch: Use correct API to map cmdline in relocate_kernel() LoongArch: Automatically disable KASLR for hibernation LoongArch: Add ACPI standard hardware register based S3 support LoongArch: Add architectural preparation for CPUFreq LoongArch: Add writecombine support for DMW-based ioremap() LoongArch: Add ARCH_HAS_DEBUG_VM_PGTABLE support LoongArch: Add ARCH_HAS_PTE_DEVMAP support LoongArch: Add RANDOMIZE_KSTACK_OFFSET support LoongArch: Add irq_work support via self IPIs LoongArch: Always enumerate MADT and setup logical-physical CPU mapping LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
2024-07-22Merge patch series "Add ACPI NUMA support for RISC-V"Palmer Dabbelt
Haibo Xu <haibo1.xu@intel.com> says: This patch series enable RISC-V ACPI NUMA support which was based on the recently approved ACPI ECR[1]. Patch 1/4 add RISC-V specific acpi_numa.c file to parse NUMA information from SRAT and SLIT ACPI tables. Patch 2/4 add the common SRAT RINTC affinity structure handler. Patch 3/4 change the ACPI_NUMA to a hidden option since it would be selected by default on all supported platform. Patch 4/4 replace pr_info with pr_debug in arch_acpi_numa_init() to avoid potential boot noise on ACPI platforms that are not NUMA. Based-on: https://github.com/linux-riscv/linux-riscv/tree/for-next [1] https://drive.google.com/file/d/1YTdDx2IPm5IeZjAW932EYU-tUtgS08tX/view?usp=sharing Testing: Since the ACPI AIA/PLIC support patch set is still under upstream review, hence it is tested using the poll based HVC SBI console and RAM disk. 1) Build latest Qemu with the following patch backported https://github.com/vlsunil/qemu/commit/42bd4eeefd5d4410a68f02d54fee406d8a1269b0 2) Build latest EDK-II https://github.com/tianocore/edk2/blob/master/OvmfPkg/RiscVVirt/README.md 3) Build Linux with the following configs enabled CONFIG_RISCV_SBI_V01=y CONFIG_SERIAL_EARLYCON_RISCV_SBI=y CONFIG_NONPORTABLE=y CONFIG_HVC_RISCV_SBI=y CONFIG_NUMA=y CONFIG_ACPI_NUMA=y 4) Build buildroot rootfs.cpio 5) Launch the Qemu machine qemu-system-riscv64 -nographic \ -machine virt,pflash0=pflash0,pflash1=pflash1 -smp 4 -m 8G \ -blockdev node-name=pflash0,driver=file,read-only=on,filename=RISCV_VIRT_CODE.fd \ -blockdev node-name=pflash1,driver=file,filename=RISCV_VIRT_VARS.fd \ -object memory-backend-ram,size=4G,id=m0 \ -object memory-backend-ram,size=4G,id=m1 \ -numa node,memdev=m0,cpus=0-1,nodeid=0 \ -numa node,memdev=m1,cpus=2-3,nodeid=1 \ -numa dist,src=0,dst=1,val=30 \ -kernel linux/arch/riscv/boot/Image \ -initrd buildroot/output/images/rootfs.cpio \ -append "root=/dev/ram ro console=hvc0 earlycon=sbi" [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x80000000-0x17fffffff] [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x180000000-0x27fffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x17fe3bc40-0x17fe3cfff] [ 0.000000] NUMA: NODE_DATA [mem 0x27fff4c40-0x27fff5fff] ... [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> HARTID 0x0 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> HARTID 0x1 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> HARTID 0x2 -> Node 1 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> HARTID 0x3 -> Node 1 * b4-shazam-merge: ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_init ACPI: NUMA: change the ACPI_NUMA to a hidden option ACPI: NUMA: Add handler for SRAT RINTC affinity structure ACPI: RISCV: Add NUMA support based on SRAT and SLIT Link: https://lore.kernel.org/r/cover.1718268003.git.haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22ACPI: NUMA: change the ACPI_NUMA to a hidden optionHaibo Xu
x86/arm64/loongarch would select ACPI_NUMA by default and riscv would do the same thing, so change it to a hidden option and the select statements except for the X86_64_ACPI_NUMA can also go away. Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Sunil V L <sunilvl@ventanamicro.com> Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/f1f96377b8ecd6e3183f28abf5c9ac21cb9855ea.1718268003.git.haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...
2024-07-20LoongArch: Make the users of larch_insn_gen_break() constantOleg Nesterov
LoongArch defines UPROBE_SWBP_INSN as a function call and this breaks arch_uprobe_trampoline() which uses it to initialize a static variable. Add the new "__builtin_constant_p" helper, __emit_break(), and redefine the current users of larch_insn_gen_break() to use it. Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/all/20240614174822.GA1185149@thelio-3990X/ Suggested-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Check TIF_LOAD_WATCH to enable user space watchpointTiezhu Yang
Currently, there are some places to set CSR.PRMD.PWE, the first one is in hw_breakpoint_thread_switch() to enable user space singlestep via checking TIF_SINGLESTEP, the second one is in hw_breakpoint_control() to enable user space watchpoint. For the latter case, it should also check TIF_LOAD_WATCH to make the logic correct and clear. Fixes: c8e57ab0995c ("LoongArch: Trigger user-space watchpoints correctly") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Use rustc option -Zdirect-access-external-dataWANG Rui
-Zdirect-access-external-data is a new Rust compiler option added in Rust 1.78, which we use to optimize the access of external data in the Linux kernel's Rust code. This patch modifies the Rust code in vmlinux to directly access externa data, using PC-REL instead of GOT. However, Rust code whithin modules is constrained by the PC-REL addressing range and is explicitly set to use an indirect method. Acked-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Add support for relocating the kernel with RELR relocationXi Ruoyao
RELR as a relocation packing format for relative relocations for reducing the size of relative relocation records. In a position independent executable there are often many relative relocation records, and our vmlinux is a PIE. The LLD linker (since 17.0.0) and the BFD linker (since 2.43) supports packing the relocations in the RELR format for LoongArch, with the flag -z pack-relative-relocs. Commits 5cf896fb6be3eff ("arm64: Add support for relocating the kernel with RELR relocations") and ccb2d173b983984bfa ("Makefile: use -z pack-relative-relocs") have already added the framework to use RELR. We just need to wire it up and process the RELR relocation records in relocate_relative() in addition to the RELA relocation records. A ".p2align 3" directive is added to la_abs macro or the BFD linker cannot pack the relocation records against the .la_abs section (the ". = ALIGN(8);" directive in vmlinux.lds.S is too late in the linking process). With defconfig and CONFIG_RELR vmlinux.efi is 2.1 MiB (6%) smaller, and vmlinuz.efi (using gzip compression) is 384 KiB (2.8%) smaller. Link: https://groups.google.com/d/topic/generic-abi/bX460iggiKg Link: https://reviews.llvm.org/D138135#4531389 Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d89ecf33ab6d Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Remove a redundant checking in relocatorXi Ruoyao
With our linker script "relocated_addr >= VMLINUX_LOAD_ADDRESS" should be always true. Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Use correct API to map cmdline in relocate_kernel()Huacai Chen
fw_arg1 is in memory space rather than I/O space, so we should use early_memremap_ro() instead of early_ioremap() to map the cmdline. Moreover, we should unmap it after using. Suggested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Automatically disable KASLR for hibernationHuacai Chen
Hibernation assumes the memory layout after resume be the same as that before sleep, so it expects the kernel is loaded at the same position. To achieve this goal we automatically disable KASLR if user explicitly requests hibernation via the "resume=" command line. Since "nohibernate" and "noresume" have higher priorities than "resume=", we only disable KASLR if there is no "nohibernate" and "noresume". Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Add ACPI standard hardware register based S3 supportJiaxun Yang
Most LoongArch 64 machines are using custom "SADR" ACPI extension to perform ACPI S3 sleep. However the standard ACPI way to perform sleep is to write a value to ACPI PM1/SLEEP_CTL register, and this is never supported properly in kernel. Add standard S3 sleep by providing a default DoSuspend function which calls ACPI's acpi_enter_sleep_state() routine when SADR is not provided by the firmware. Also fix suspend assembly code so that ra is set properly before go into sleep routine. (Previously linked address of jirl was set to a0, some firmware do require return address in a0 but it's already set with la.pcrel before). Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Add architectural preparation for CPUFreqHuacai Chen
Add architectural preparation for CPUFreq driver, including: Kconfig, register definition and platform device registration. Some of LoongArch processors support DVFS, their IOCSR.FEATURES has IOCSRF_FREQSCALE set. And they has a micro-core in the package called SMC (System Management Controller) to scale frequency, voltage, etc. Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Add writecombine support for DMW-based ioremap()Huacai Chen
Currently, only TLB-based ioremap() support writecombine, so add the counterpart for DMW-based ioremap() with help of DMW2. The base address (WRITECOMBINE_BASE) is configured as 0xa000000000000000. DMW3 is unused by kernel now, however firmware may leave garbage in them and interfere kernel's address mapping. So clear it as necessary. BTW, centralize the DMW configuration to macro SETUP_DMWINS. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Add ARCH_HAS_DEBUG_VM_PGTABLE supportHuacai Chen
Add ARCH_HAS_DEBUG_VM_PGTABLE selection in Kconfig, in order to make corresponding vm debug features usable on LoongArch. Also update the corresponding arch-support.txt document. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Add ARCH_HAS_PTE_DEVMAP supportHuacai Chen
In order for things like get_user_pages() to work on ZONE_DEVICE memory, we need a software PTE bit to identify device-backed PFNs. Hook this up along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Add RANDOMIZE_KSTACK_OFFSET supportJinjie Ruan
Add support of kernel stack offset randomization while handling syscall, the offset is defaultly limited by KSTACK_OFFSET_MAX(). In order to avoid triggering stack canaries (due to __builtin_alloca()) and slowing down the entry path, use __no_stack_protector attribute to disable stack protector for do_syscall() at function level. With this patch, the REPORT_STACK test show that: `loongarch64 bits of stack entropy: 7` Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Add irq_work support via self IPIsHuacai Chen
Add irq_work support for LoongArch via self IPIs. This make it possible to run works in hardware interrupt context, which is a prerequisite for NOHZ_FULL. Implement: - arch_irq_work_raise() - arch_irq_work_has_interrupt() Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Always enumerate MADT and setup logical-physical CPU mappingHuacai Chen
Some drivers want to use cpu_logical_map(), early_cpu_to_node() and some other CPU mapping APIs, even if we use "nr_cpus=1" to hard limit the CPU number. This is strongly required for the multi-bridges machines. Currently, we stop parsing the MADT if the nr_cpus limit is reached, but to achieve the above goal we should always enumerate the MADT table and setup logical-physical CPU mapping whether there is a nr_cpus limit. Rework the MADT enumeration: 1. Define a flag "cpu_enumerated" to distinguish the first enumeration (cpu_enumerated=0) and the physical hotplug case (cpu_enumerated=1) for set_processor_mask(). 2. If cpu_enumerated=0, stop parsing only when NR_CPUS limit is reached, so we can setup logical-physical CPU mapping; if cpu_enumerated=1, stop parsing when nr_cpu_ids limit is reached, so we can avoid some runtime bugs. Once logical-physical CPU mapping is setup, we will let cpu_enumerated=1. 3. Use find_first_zero_bit() instead of cpumask_next_zero() to find the next zero bit (free logical CPU id) in the cpu_present_mask, because cpumask_next_zero() will stop at nr_cpu_ids. 4. Only touch cpu_possible_mask if cpu_enumerated=0, this is in order to avoid some potential crashes, because cpu_possible_mask is marked as __ro_after_init. 5. In prefill_possible_map(), clear cpu_present_mask bits greater than nr_cpu_ids, in order to avoid a CPU be "present" but not "possible". Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.hHuacai Chen
Chromium sandbox apparently wants to deny statx [1] so it could properly inspect arguments after the sandboxed process later falls back to fstat. Because there's currently not a "fd-only" version of statx, so that the sandbox has no way to ensure the path argument is empty without being able to peek into the sandboxed process's memory. For architectures able to do newfstatat though, glibc falls back to newfstatat after getting -ENOSYS for statx, then the respective SIGSYS handler [2] takes care of inspecting the path argument, transforming allowed newfstatat's into fstat instead which is allowed and has the same type of return value. But, as LoongArch is the first architecture to not have fstat nor newfstatat, the LoongArch glibc does not attempt falling back at all when it gets -ENOSYS for statx -- and you see the problem there! Actually, back when the LoongArch port was under review, people were aware of the same problem with sandboxing clone3 [3], so clone was eventually kept. Unfortunately it seemed at that time no one had noticed statx, so besides restoring fstat/newfstatat to LoongArch uapi (and postponing the problem further), it seems inevitable that we would need to tackle seccomp deep argument inspection. However, this is obviously a decision that shouldn't be taken lightly, so we just restore fstat/newfstatat by defining __ARCH_WANT_NEW_STAT in unistd.h. This is the simplest solution for now, and so we hope the community will tackle the long-standing problem of seccomp deep argument inspection in the future [4][5]. Also add "newstat" to syscall_abis_64 in Makefile.syscalls due to upstream asm-generic changes. More infomation please reading this thread [6]. [1] https://chromium-review.googlesource.com/c/chromium/src/+/2823150 [2] https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/seccomp-bpf-helpers/sigsys_handlers.cc#355 [3] https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal.cx/ [4] https://lwn.net/Articles/799557/ [5] https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-inspection.pdf [6] https://lore.kernel.org/loongarch/20240226-granit-seilschaft-eccc2433014d@brauner/T/#t Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>