summaryrefslogtreecommitdiff
path: root/arch/loongarch/include/asm/pgalloc.h
AgeCommit message (Collapse)Author
2025-05-11mm: pass mm down to pagetable_{pte,pmd}_ctorKevin Brodsky
Patch series "Always call constructor for kernel page tables", v2. There has been much confusion around exactly when page table constructors/destructors (pagetable_*_[cd]tor) are supposed to be called. They were initially introduced for user PTEs only (to support split page table locks), then at the PMD level for the same purpose. Accounting was added later on, starting at the PTE level and then moving to higher levels (PMD, PUD). Finally, with my earlier series "Account page tables at all levels" [1], the ctor/dtor is run for all levels, all the way to PGD. I thought this was the end of the story, and it hopefully is for user pgtables, but I was wrong for what concerns kernel pgtables. The current situation there makes very little sense: * At the PTE level, the ctor/dtor is not called (at least in the generic implementation). Specific helpers are used for kernel pgtables at this level (pte_{alloc,free}_kernel()) and those have never called the ctor/dtor, most likely because they were initially irrelevant in the kernel case. * At all other levels, the ctor/dtor is normally called. This is potentially wasteful at the PMD level (more on that later). This series aims to ensure that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. Besides consistency, the main motivation is to guarantee that ctor/dtor hooks are systematically called; this makes it possible to insert hooks to protect page tables [2], for instance. There is however an extra challenge: split locks are not used for kernel pgtables, and it would therefore be wasteful to initialise them (ptlock_init()). It is worth clarifying exactly when split locks are used. They clearly are for user pgtables, but as illustrated in commit 61444cde9170 ("ARM: 8591/1: mm: use fully constructed struct pages for EFI pgd allocations"), they also are for special page tables like efi_mm. The one case where split locks are definitely unused is pgtables owned by init_mm; this is consistent with the behaviour of apply_to_pte_range(). The approach chosen in this series is therefore to pass the mm associated to the pgtables being constructed to pagetable_{pte,pmd}_ctor() (patch 1), and skip ptlock_init() if mm == &init_mm (patch 3 and 7). This makes it possible to call the PTE ctor/dtor from pte_{alloc,free}_kernel() without unintended consequences (patch 3). As a result the accounting functions are now called at all levels for kernel pgtables, and split locks are never initialised. In configurations where ptlocks are dynamically allocated (32-bit, PREEMPT_RT, etc.) and ARCH_ENABLE_SPLIT_PMD_PTLOCK is selected, this series results in the removal of a kmem_cache allocation for every kernel PMD. Additionally, for certain architectures that do not use <asm-generic/pgalloc.h> such as s390, the same optimisation occurs at the PTE level. === Things get more complicated when it comes to special pgtable allocators (patch 8-12). All architectures need such allocators to create initial kernel pgtables; we are not concerned with those as the ctor cannot be called so early in the boot sequence. However, those allocators may also be used later in the boot sequence or during normal operations. There are two main use-cases: 1. Mapping EFI memory: efi_mm (arm, arm64, riscv) 2. arch_add_memory(): init_mm The ctor is already explicitly run (at the PTE/PMD level) in the first case, as required for pgtables that are not associated with init_mm. However the same allocators may also be used for the second use-case (or others), and this is where it gets messy. Patch 1 calls the ctor with NULL as mm in those situations, as the actual mm isn't available. Practically this means that ptlocks will be unconditionally initialised. This is fine on arm - create_mapping_late() is only used for the EFI mapping. On arm64, __create_pgd_mapping() is also used by arch_add_memory(); patch 8/9/11 ensure that ctors are called at all levels with the appropriate mm. The situation is similar on riscv, but propagating the mm down to the ctor would require significant refactoring. Since they are already called unconditionally, this series leaves riscv no worse off - patch 10 adds comments to clarify the situation. From a cursory look at other architectures implementing arch_add_memory(), s390 and x86 may also need a similar treatment to add constructor calls. This is to be taken care of in a future version or as a follow-up. === The complications in those special pgtable allocators beg the question: does it really make sense to treat efi_mm and init_mm differently in e.g. apply_to_pte_range()? Maybe what we really need is a way to tell if an mm corresponds to user memory or not, and never use split locks for non-user mm's. Feedback and suggestions welcome! This patch (of 12): In preparation for calling constructors for all kernel page tables while eliding unnecessary ptlock initialisation, let's pass down the associated mm to the PTE/PMD level ctors. (These are the two levels where ptlocks are used.) In most cases the mm is already around at the point of calling the ctor so we simply pass it down. This is however not the case for special page table allocators: * arch/arm/mm/mmu.c * arch/arm64/mm/mmu.c * arch/riscv/mm/init.c In those cases, the page tables being allocated are either for standard kernel memory (init_mm) or special page directories, which may not be associated to any mm. For now let's pass NULL as mm; this will be refined where possible in future patches. No functional change in this patch. Link: https://lore.kernel.org/linux-mm/20250103184415.2744423-1-kevin.brodsky@arm.com/ [1] Link: https://lore.kernel.org/linux-hardening/20250203101839.1223008-1-kevin.brodsky@arm.com/ [2] Link: https://lkml.kernel.org/r/20250408095222.860601-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20250408095222.860601-2-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Linus Waleij <linus.walleij@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: <x86@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01mm: pgtable: convert some architectures to use tlb_remove_ptdesc()Qi Zheng
Now, the nine architectures of csky, hexagon, loongarch, m68k, mips, nios2, openrisc, sh and um do not select CONFIG_MMU_GATHER_RCU_TABLE_FREE, and just call pagetable_dtor() + tlb_remove_page_ptdesc() (the wrapper of tlb_remove_page()). This is the same as the implementation of tlb_remove_{ptdesc|table}() under !CONFIG_MMU_GATHER_TABLE_FREE, so convert these architectures to use tlb_remove_ptdesc(). The ultimate goal is to make the architecture only use tlb_remove_ptdesc() or tlb_remove_table() for page table pages. [zhengqi.arch@bytedance.com: v2] Link: https://lkml.kernel.org/r/20250303072603.45423-1-zhengqi.arch@bytedance.com [akpm@linux-foundation.org: remove trailing semi in arch/loongarch/include/asm/pgalloc.h] Link: https://lkml.kernel.org/r/19db3e8673b67bad2f1df1ab37f1c89d99eacfea.1740454179.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickens <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: pgtable: introduce pagetable_dtor()Qi Zheng
The pagetable_p*_dtor() are exactly the same except for the handling of ptlock. If we make ptlock_free() handle the case where ptdesc->ptl is NULL and remove VM_BUG_ON_PAGE() from pmd_ptlock_free(), we can unify pagetable_p*_dtor() into one function. Let's introduce pagetable_dtor() to do this. Later, pagetable_dtor() will be moved to tlb_remove_ptdesc(), so that ptlock and page table pages can be freed together (regardless of whether RCU is used). This prevents the use-after-free problem where the ptlock is freed immediately but the page table pages is freed later via RCU. Link: https://lkml.kernel.org/r/47f44fff9dc68d9d9e9a0d6c036df275f820598a.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-21LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel spaceBibo Mao
There are two pages in one TLB entry on LoongArch system. For kernel space, it requires both two pte entries (buddies) with PAGE_GLOBAL bit set, otherwise HW treats it as non-global tlb, there will be potential problems if tlb entry for kernel space is not global. Such as fail to flush kernel tlb with the function local_flush_tlb_kernel_range() which supposed only flush tlb with global bit. Kernel address space areas include percpu, vmalloc, vmemmap, fixmap and kasan areas. For these areas both two consecutive page table entries should be enabled with PAGE_GLOBAL bit. So with function set_pte() and pte_clear(), pte buddy entry is checked and set besides its own pte entry. However it is not atomic operation to set both two pte entries, there is problem with test_vmalloc test case. So function kernel_pte_init() is added to init a pte table when it is created for kernel address space, and the default initial pte value is PAGE_GLOBAL rather than zero at beginning. Then only its own pte entry need update with function set_pte() and pte_clear(), nothing to do with the pte buddy entry. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-10-06mm: add statistics for PUD level pagetableBaolin Wang
Recently, we found that cross-die access to pagetable pages on ARM64 machines can cause performance fluctuations in our business. Currently, there are no PMU events available to track this situation on our ARM64 machines, so accurate pagetable accounting can help to analyze this issue, but now the PUD level pagetable accounting is missed. So introduce pagetable_pud_ctor/dtor() to help to get accurate PUD pagetable accounting, as well as converting the architectures which use generic PUD pagetable allocation to add corresponding PUD pagetable accounting. Moreover this patch will mark the PUD level pagetable with PG_table flag, which will help to do sanity validation in unpoison_memory(). On my testing machine, I can see more pagetables statistics after the patch with page-types tool: Before patch: flags page-count MB symbolic-flags long-symbolic-flags 0x0000000004000000 27326 106 __________________________g_________________ pgtable After patch: 0x0000000004000000 27541 107 __________________________g_________________ pgtable Link: https://lkml.kernel.org/r/876c71c03a7e69c17722a690e3225a4f7b172fb2.1695017383.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-08Merge tag 'loongarch-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Allow usage of LSX/LASX in the kernel, and use them for SIMD-optimized RAID5/RAID6 routines - Add Loongson Binary Translation (LBT) extension support - Add basic KGDB & KDB support - Add building with kcov coverage - Add KFENCE (Kernel Electric-Fence) support - Add KASAN (Kernel Address Sanitizer) support - Some bug fixes and other small changes - Update the default config file * tag 'loongarch-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (25 commits) LoongArch: Update Loongson-3 default config file LoongArch: Add KASAN (Kernel Address Sanitizer) support LoongArch: Simplify the processing of jumping new kernel for KASLR kasan: Add (pmd|pud)_init for LoongArch zero_(pud|p4d)_populate process kasan: Add __HAVE_ARCH_SHADOW_MAP to support arch specific mapping LoongArch: Add KFENCE (Kernel Electric-Fence) support LoongArch: Get partial stack information when providing regs parameter LoongArch: mm: Add page table mapped mode support for virt_to_page() kfence: Defer the assignment of the local variable addr LoongArch: Allow building with kcov coverage LoongArch: Provide kaslr_offset() to get kernel offset LoongArch: Add basic KGDB & KDB support LoongArch: Add Loongson Binary Translation (LBT) extension support raid6: Add LoongArch SIMD recovery implementation raid6: Add LoongArch SIMD syndrome calculation LoongArch: Add SIMD-optimized XOR routines LoongArch: Allow usage of LSX/LASX in the kernel LoongArch: Define symbol 'fault' as a local label in fpu.S LoongArch: Adjust {copy, clear}_user exception handler behavior LoongArch: Use static defined zero page rather than allocated ...
2023-09-06LoongArch: mm: Introduce unified function populate_kernel_pte()Bibo Mao
Function pcpu_populate_pte() and fixmap_pte() are similar, they populate one page from kernel address space. And there is confusion between pgd and p4d in the function fixmap_pte(), such as pgd_none() always returns zero. This patch introduces a unified function populate_kernel_pte() and then replaces pcpu_populate_pte() and fixmap_pte(). Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-08-21loongarch: convert various functions to use ptdescsVishal Moola (Oracle)
As part of the conversions to replace pgtable constructor/destructors with ptdesc equivalents, convert various page table functions to use ptdescs. Some of the functions use the *get*page*() helper functions. Convert these to use pagetable_alloc() and ptdesc_address() instead to help standardize page tables further. Link: https://lkml.kernel.org/r/20230807230513.102486-22-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Matthew Wilcox <willy@infradead.org> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11MIPS&LoongArch&NIOS2: adjust prototypes of p?d_init()Feiyang Chen
Patch series "mm/sparse-vmemmap: Generalise helpers and enable for LoongArch", v14. This series is in order to enable sparse-vmemmap for LoongArch. But LoongArch cannot use generic helpers directly because MIPS&LoongArch need to call pgd_init()/pud_init()/pmd_init() when populating page tables. So we adjust the prototypes of p?d_init() to make generic helpers can call them, then enable sparse-vmemmap with generic helpers, and to be further, generalise vmemmap_populate_hugepages() for ARM64, X86 and LoongArch. This patch (of 4): We are preparing to add sparse vmemmap support to LoongArch. MIPS and LoongArch need to call pgd_init()/pud_init()/pmd_init() when populating page tables, so adjust their prototypes to make generic helpers can call them. NIOS2 declares pmd_init() but doesn't use, just remove it to avoid build errors. Link: https://lkml.kernel.org/r/20221027125253.3458989-1-chenhuacai@loongson.cn Link: https://lkml.kernel.org/r/20221027125253.3458989-2-chenhuacai@loongson.cn Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Will Deacon <will@kernel.org> Cc: Xuefeng Li <lixuefeng@loongson.cn> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Min Zhou <zhoumin@loongson.cn> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17loongarch: drop definition of PUD_ORDERMike Rapoport
This is the order of the page table allocation, not the order of a PUD. Since its always hardwired to 0, simply drop it. Link: https://lkml.kernel.org/r/20220703141203.147893-12-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Xuerui Wang <kernel@xen0n.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17loongarch: drop definition of PMD_ORDERMike Rapoport
This is the order of the page table allocation, not the order of a PMD. Since its always hardwired to 0, simply drop it. Link: https://lkml.kernel.org/r/20220703141203.147893-11-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Xuerui Wang <kernel@xen0n.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-03LoongArch: Add memory managementHuacai Chen
Add memory management support for LoongArch, including: cache and tlb management, page fault handling and ioremap/mmap support. Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>