summaryrefslogtreecommitdiff
path: root/arch/parisc/include/asm/pgalloc.h
AgeCommit message (Collapse)Author
2025-05-11mm: pass mm down to pagetable_{pte,pmd}_ctorKevin Brodsky
Patch series "Always call constructor for kernel page tables", v2. There has been much confusion around exactly when page table constructors/destructors (pagetable_*_[cd]tor) are supposed to be called. They were initially introduced for user PTEs only (to support split page table locks), then at the PMD level for the same purpose. Accounting was added later on, starting at the PTE level and then moving to higher levels (PMD, PUD). Finally, with my earlier series "Account page tables at all levels" [1], the ctor/dtor is run for all levels, all the way to PGD. I thought this was the end of the story, and it hopefully is for user pgtables, but I was wrong for what concerns kernel pgtables. The current situation there makes very little sense: * At the PTE level, the ctor/dtor is not called (at least in the generic implementation). Specific helpers are used for kernel pgtables at this level (pte_{alloc,free}_kernel()) and those have never called the ctor/dtor, most likely because they were initially irrelevant in the kernel case. * At all other levels, the ctor/dtor is normally called. This is potentially wasteful at the PMD level (more on that later). This series aims to ensure that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. Besides consistency, the main motivation is to guarantee that ctor/dtor hooks are systematically called; this makes it possible to insert hooks to protect page tables [2], for instance. There is however an extra challenge: split locks are not used for kernel pgtables, and it would therefore be wasteful to initialise them (ptlock_init()). It is worth clarifying exactly when split locks are used. They clearly are for user pgtables, but as illustrated in commit 61444cde9170 ("ARM: 8591/1: mm: use fully constructed struct pages for EFI pgd allocations"), they also are for special page tables like efi_mm. The one case where split locks are definitely unused is pgtables owned by init_mm; this is consistent with the behaviour of apply_to_pte_range(). The approach chosen in this series is therefore to pass the mm associated to the pgtables being constructed to pagetable_{pte,pmd}_ctor() (patch 1), and skip ptlock_init() if mm == &init_mm (patch 3 and 7). This makes it possible to call the PTE ctor/dtor from pte_{alloc,free}_kernel() without unintended consequences (patch 3). As a result the accounting functions are now called at all levels for kernel pgtables, and split locks are never initialised. In configurations where ptlocks are dynamically allocated (32-bit, PREEMPT_RT, etc.) and ARCH_ENABLE_SPLIT_PMD_PTLOCK is selected, this series results in the removal of a kmem_cache allocation for every kernel PMD. Additionally, for certain architectures that do not use <asm-generic/pgalloc.h> such as s390, the same optimisation occurs at the PTE level. === Things get more complicated when it comes to special pgtable allocators (patch 8-12). All architectures need such allocators to create initial kernel pgtables; we are not concerned with those as the ctor cannot be called so early in the boot sequence. However, those allocators may also be used later in the boot sequence or during normal operations. There are two main use-cases: 1. Mapping EFI memory: efi_mm (arm, arm64, riscv) 2. arch_add_memory(): init_mm The ctor is already explicitly run (at the PTE/PMD level) in the first case, as required for pgtables that are not associated with init_mm. However the same allocators may also be used for the second use-case (or others), and this is where it gets messy. Patch 1 calls the ctor with NULL as mm in those situations, as the actual mm isn't available. Practically this means that ptlocks will be unconditionally initialised. This is fine on arm - create_mapping_late() is only used for the EFI mapping. On arm64, __create_pgd_mapping() is also used by arch_add_memory(); patch 8/9/11 ensure that ctors are called at all levels with the appropriate mm. The situation is similar on riscv, but propagating the mm down to the ctor would require significant refactoring. Since they are already called unconditionally, this series leaves riscv no worse off - patch 10 adds comments to clarify the situation. From a cursory look at other architectures implementing arch_add_memory(), s390 and x86 may also need a similar treatment to add constructor calls. This is to be taken care of in a future version or as a follow-up. === The complications in those special pgtable allocators beg the question: does it really make sense to treat efi_mm and init_mm differently in e.g. apply_to_pte_range()? Maybe what we really need is a way to tell if an mm corresponds to user memory or not, and never use split locks for non-user mm's. Feedback and suggestions welcome! This patch (of 12): In preparation for calling constructors for all kernel page tables while eliding unnecessary ptlock initialisation, let's pass down the associated mm to the PTE/PMD level ctors. (These are the two levels where ptlocks are used.) In most cases the mm is already around at the point of calling the ctor so we simply pass it down. This is however not the case for special page table allocators: * arch/arm/mm/mmu.c * arch/arm64/mm/mmu.c * arch/riscv/mm/init.c In those cases, the page tables being allocated are either for standard kernel memory (init_mm) or special page directories, which may not be associated to any mm. For now let's pass NULL as mm; this will be refined where possible in future patches. No functional change in this patch. Link: https://lore.kernel.org/linux-mm/20250103184415.2744423-1-kevin.brodsky@arm.com/ [1] Link: https://lore.kernel.org/linux-hardening/20250203101839.1223008-1-kevin.brodsky@arm.com/ [2] Link: https://lkml.kernel.org/r/20250408095222.860601-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20250408095222.860601-2-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Linus Waleij <linus.walleij@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: <x86@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25asm-generic: pgalloc: provide generic __pgd_{alloc,free}Kevin Brodsky
We already have a generic implementation of alloc/free up to P4D level, as well as pgd_free(). Let's finish the work and add a generic PGD-level alloc helper as well. Unlike at lower levels, almost all architectures need some specific magic at PGD level (typically initialising PGD entries), so introducing a generic pgd_alloc() isn't worth it. Instead we introduce two new helpers, __pgd_alloc() and __pgd_free(), and make use of them in the arch-specific pgd_alloc() and pgd_free() wherever possible. To accommodate as many arch as possible, __pgd_alloc() takes a page allocation order. Because pagetable_alloc() allocates zeroed pages, explicit zeroing in pgd_alloc() becomes redundant and we can get rid of it. Some trivial implementations of pgd_free() also become unnecessary once __pgd_alloc() is used; remove them. Another small improvement is consistent accounting of PGD pages by using GFP_PGTABLE_{USER,KERNEL} as appropriate. Not all PGD allocations can be handled by the generic helpers. In particular, multiple architectures allocate PGDs from a kmem_cache, and those PGDs may not be page-sized. Link: https://lkml.kernel.org/r/20250103184415.2744423-6-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25parisc: mm: ensure pagetable_pmd_[cd]tor are calledKevin Brodsky
The implementation of pmd_{alloc_one,free} on parisc requires a non-zero allocation order, but is completely standard aside from that. Let's reuse the generic implementation of pmd_alloc_one(). Explicit zeroing is not needed as GFP_PGTABLE_KERNEL includes __GFP_ZERO. The generic pmd_free() can handle higher allocation orders so we don't need to define our own. These changes ensure that pagetable_pmd_[cd]tor are called, improving the accounting of page table pages. Link: https://lkml.kernel.org/r/20250103184415.2744423-3-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17parisc: rename PGD_ORDER to PGD_TABLE_ORDERMike Rapoport
This is the order of the page table allocation, not the order of a PGD. Link: https://lkml.kernel.org/r/20220703141203.147893-14-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Xuerui Wang <kernel@xen0n.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2021-08-30parisc: Rename PMD_ORDER to PMD_TABLE_ORDERMatthew Wilcox (Oracle)
This is the order of the page table allocation, not the order of a PMD. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Helge Deller <deller@gmx.de>
2021-07-01mm/thp: define default pmd_pgtable()Anshuman Khandual
Currently most platforms define pmd_pgtable() as pmd_page() duplicating the same code all over. Instead just define a default value i.e pmd_page() for pmd_pgtable() and let platforms override when required via <asm/pgtable.h>. All the existing platform that override pmd_pgtable() have been moved into their respective <asm/pgtable.h> header in order to precede before the new generic definition. This makes it much cleaner with reduced code. Link: https://lkml.kernel.org/r/1623646133-20306-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-12parisc: Optimize per-pagetable spinlocksHelge Deller
On parisc a spinlock is stored in the next page behind the pgd which protects against parallel accesses to the pgd. That's why one additional page (PGD_ALLOC_ORDER) is allocated for the pgd. Matthew Wilcox suggested that we instead should use a pointer in the struct page table for this spinlock and noted, that the comments for the PGD_ORDER and PMD_ORDER defines were wrong. Both suggestions are addressed with this patch. Instead of having an own spinlock to protect the pgd, we now switch to use the existing page_table_lock. Additionally, beside loading the pgd into cr25 in switch_mm_irqs_off(), the physical address of this lock is loaded into cr28 (tr4), so that we can avoid implementing a complicated lookup in assembly for this lock in the TLB fault handlers. The existing Hybrid L2/L3 page table scheme (where the pmd is adjacent to the pgd) has been dropped with this patch. Remove the locking in set_pte() and the huge-page pte functions too. They trigger a spinlock recursion on 32bit machines and seem unnecessary. Suggested-by: Matthew Wilcox <willy@infradead.org> Fixes: b37d1c1898b2 ("parisc: Use per-pagetable spinlock") Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2020-08-16parisc: fix PMD pages allocation by restoring pmd_alloc_one()Mike Rapoport
Commit 1355c31eeb7e ("asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()") converted parisc to use generic version of pmd_alloc_one() but it missed the fact that parisc uses order-1 pages for PMD. Restore the original version of pmd_alloc_one() for parisc, just use GFP_PGTABLE_KERNEL that implies __GFP_ZERO instead of GFP_KERNEL and memset. Fixes: 1355c31eeb7e ("asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()") Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Meelis Roos <mroos@linux.ee> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lkml.kernel.org/r/9f2b5ebd-e4a4-0fa1-6cd3-4b9f6892d1ad@linux.ee Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07asm-generic: pgalloc: provide generic pgd_free()Mike Rapoport
Most architectures define pgd_free() as a wrapper for free_page(). Provide a generic version in asm-generic/pgalloc.h and enable its use for most architectures. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200627143453.31835-7-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()Mike Rapoport
For most architectures that support >2 levels of page tables, pmd_alloc_one() is a wrapper for __get_free_pages(), sometimes with __GFP_ZERO and sometimes followed by memset(0) instead. More elaborate versions on arm64 and x86 account memory for the user page tables and call to pgtable_pmd_page_ctor() as the part of PMD page initialization. Move the arm64 version to include/asm-generic/pgalloc.h and use the generic version on several architectures. The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is not enabled, so there is no functional change for most architectures except of the addition of __GFP_ACCOUNT for allocation of user page tables. The pmd_free() is a wrapper for free_page() in all the cases, so no functional change here. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/20200627143453.31835-5-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-04parisc: use pgtable-nopXd instead of 4level-fixupMike Rapoport
parisc has two or three levels of page tables and can use appropriate pgtable-nopXd and folding of the upper layers. Replace usage of include/asm-generic/4level-fixup.h and explicit definitions of __PAGETABLE_PxD_FOLDED in parisc with include/asm-generic/pgtable-nopmd.h for two-level configurations and with include/asm-generic/pgtable-nopud.h for three-lelve configurations and adjust page table manipulation macros and functions accordingly. Link: http://lkml.kernel.org/r/1572938135-31886-9-git-send-email-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Cc: Anatoly Pugachev <matorola@gmail.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Peter Rosin <peda@axentia.se> Cc: Richard Weinberger <richard@nod.at> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Creasey <sammy@sammy.net> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24mm: remove quicklist page table cachesNicholas Piggin
Patch series "mm: remove quicklist page table caches". A while ago Nicholas proposed to remove quicklist page table caches [1]. I've rebased his patch on the curren upstream and switched ia64 and sh to use generic versions of PTE allocation. [1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com This patch (of 3): Remove page table allocator "quicklists". These have been around for a long time, but have not got much traction in the last decade and are only used on ia64 and sh architectures. The numbers in the initial commit look interesting but probably don't apply anymore. If anybody wants to resurrect this it's in the git history, but it's unhelpful to have this code and divergent allocator behaviour for minor archs. Also it might be better to instead make more general improvements to page allocator if this is still so slow. Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12parisc: switch to generic version of pte allocationMike Rapoport
parisc allocates PTE pages with __get_free_page() and uses GFP_KERNEL | __GFP_ZERO for the allocations. Switch it to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The pte_free_kernel() and pte_free() versions on are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/1557296232-15361-12-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> Cc: Helge Deller <deller@gmx.de> Cc: Ley Foon Tan <lftan@altera.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Creasey <sammy@sammy.net> Cc: Vincent Chen <deanbo422@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-03parisc: Use per-pagetable spinlockMikulas Patocka
PA-RISC uses a global spinlock to protect pagetable updates in the TLB fault handlers. When multiple cores are taking TLB faults simultaneously, the cache line containing the spinlock becomes a bottleneck. This patch embeds the spinlock in the top level page directory, so that every process has its own lock. It improves performance by 30% when doing parallel compilations. At least on the N class systems, only one PxTLB inter processor broadcast can be active at any one time on the Merced bus. If a Merced bus is found, this patch serializes the TLB flushes with the pa_tlb_flush_lock spinlock. v1: Initial patch by Mikulas v2: Added Merced detection by Helge v3: Revised TLB serialization by Dave & Helge Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2019-01-04mm: treewide: remove unused address argument from pte_alloc functionsJoel Fernandes (Google)
Patch series "Add support for fast mremap". This series speeds up the mremap(2) syscall by copying page tables at the PMD level even for non-THP systems. There is concern that the extra 'address' argument that mremap passes to pte_alloc may do something subtle architecture related in the future that may make the scheme not work. Also we find that there is no point in passing the 'address' to pte_alloc since its unused. This patch therefore removes this argument tree-wide resulting in a nice negative diff as well. Also ensuring along the way that the enabled architectures do not do anything funky with the 'address' argument that goes unnoticed by the optimization. Build and boot tested on x86-64. Build tested on arm64. The config enablement patch for arm64 will be posted in the future after more testing. The changes were obtained by applying the following Coccinelle script. (thanks Julia for answering all Coccinelle questions!). Following fix ups were done manually: * Removal of address argument from pte_fragment_alloc * Removal of pte_alloc_one_fast definitions from m68k and microblaze. // Options: --include-headers --no-includes // Note: I split the 'identifier fn' line, so if you are manually // running it, please unsplit it so it runs for you. virtual patch @pte_alloc_func_def depends on patch exists@ identifier E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; type T2; @@ fn(... - , T2 E2 ) { ... } @pte_alloc_func_proto_noarg depends on patch exists@ type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1, T2); + T3 fn(T1); | - T3 fn(T1, T2, T4); + T3 fn(T1, T2); ) @pte_alloc_func_proto depends on patch exists@ identifier E1, E2, E4; type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1 E1, T2 E2); + T3 fn(T1 E1); | - T3 fn(T1 E1, T2 E2, T4 E4); + T3 fn(T1 E1, T2 E2); ) @pte_alloc_func_call depends on patch exists@ expression E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ fn(... -, E2 ) @pte_alloc_macro depends on patch exists@ identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; identifier a, b, c; expression e; position p; @@ ( - #define fn(a, b, c) e + #define fn(a, b) e | - #define fn(a, b) e + #define fn(a) e ) Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24parisc: get rid of superfluous __GFP_REPEATMichal Hocko
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. pmd_alloc_one allocate PMD_ORDER which is 1. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/1464599699-30131-10-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24tree wide: get rid of __GFP_REPEAT for order-0 allocations part IMichal Hocko
This is the third version of the patchset previously sent [1]. I have basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get rid of superfluous gfp flags" which went through dm tree. I am sending it now because it is tree wide and chances for conflicts are reduced considerably when we want to target rc2. I plan to send the next step and rename the flag and move to a better semantic later during this release cycle so we will have a new semantic ready for 4.8 merge window hopefully. Motivation: While working on something unrelated I've checked the current usage of __GFP_REPEAT in the tree. It seems that a majority of the usage is and always has been bogus because __GFP_REPEAT has always been about costly high order allocations while we are using it for order-0 or very small orders very often. It seems that a big pile of them is just a copy&paste when a code has been adopted from one arch to another. I think it makes some sense to get rid of them because they are just making the semantic more unclear. Please note that GFP_REPEAT is documented as * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt * _might_ fail. This depends upon the particular VM implementation. while !costly requests have basically nofail semantic. So one could reasonably expect that order-0 request with __GFP_REPEAT will not loop for ever. This is not implemented right now though. I would like to move on with __GFP_REPEAT and define a better semantic for it. $ git grep __GFP_REPEAT origin/master | wc -l 111 $ git grep __GFP_REPEAT | wc -l 36 So we are down to the third after this patch series. The remaining places really seem to be relying on __GFP_REPEAT due to large allocation requests. This still needs some double checking which I will do later after all the simple ones are sorted out. I am touching a lot of arch specific code here and I hope I got it right but as a matter of fact I even didn't compile test for some archs as I do not have cross compiler for them. Patches should be quite trivial to review for stupid compile mistakes though. The tricky parts are usually hidden by macro definitions and thats where I would appreciate help from arch maintainers. [1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org This patch (of 19): __GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. Yet we have the full kernel tree with its usage for apparently order-0 allocations. This is really confusing because __GFP_REPEAT is explicitly documented to allow allocation failures which is a weaker semantic than the current order-0 has (basically nofail). Let's simply drop __GFP_REPEAT from those places. This would allow to identify place which really need allocator to retry harder and formulate a more specific semantic for what the flag is supposed to do actually. Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile] Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: John Crispin <blogic@openwrt.org> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20parisc: Fix wrong comment regarding first pmd entry flagsHelge Deller
The first pmd entry is marked with PxD_FLAG_ATTACHED instead of _PAGE_GATEWAY. Signed-off-by: Helge Deller <deller@gmx.de>
2015-07-19parisc: mm: Fix a memory leak related to pmd not attached to the pgdChristophe Jaillet
Commit 0e0da48dee8d ("parisc: mm: don't count preallocated pmds") introduced a memory leak. After this commit, the 'return' statement in pmd_free is executed in all cases. Even for pmd that are not attached to the pgd. So 'free_pages' can never be called anymore, leading to a memory leak. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Helge Deller <deller@gmx.de>
2015-04-21parisc: Replace PT_NLEVELS with CONFIG_PGTABLE_LEVELSGuenter Roeck
The following warning is seen when compiling parisc images ./arch/parisc/include/asm/pgalloc.h: In function 'pgd_alloc': ./arch/parisc/include/asm/pgalloc.h:29:5: warning: "PT_NLEVELS" is not defined Some definitions of PT_NLEVELS were missed with the conversion to CONFIG_PGTABLE_LEVELS. Fixes: f24ffde43237 ("parisc: expose number of page table levels on Kconfig level") Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Helge Deller <deller@gmx.de>
2015-04-14parisc: expose number of page table levels on Kconfig levelKirill A. Shutemov
We would want to use number of page table level to define mm_struct. Let's expose it as CONFIG_PGTABLE_LEVELS. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-23parisc: Fix pmd code to depend on PT_NLEVELS value, not on CONFIG_64BITHelge Deller
Make the code which sets up the pmd depend on PT_NLEVELS == 3, not on CONFIG_64BIT. The reason is, that a 64bit kernel with a page size greater than 4k doesn't need the pmd and thus has PT_NLEVELS = 2. Signed-off-by: Helge Deller <deller@gmx.de>
2015-03-23parisc: mm: don't count preallocated pmdsMikulas Patocka
The patch dc6c9a35b66b520cf67e05d8ca60ebecad3b0479 that counts pmds allocated for a process introduced a bug on 64-bit PA-RISC kernels. The PA-RISC architecture preallocates one pmd with each pgd. This preallocated pmd can never be freed - pmd_free does nothing when it is called with this pmd. When the kernel attempts to free this preallocated pmd, it decreases the count of allocated pmds. The result is that the counter underflows and this error is reported. This patch fixes the bug by artifically incrementing the counter in pmd_free when the kernel tries to free the preallocated pmd. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Helge Deller <deller@gmx.de>
2013-11-15parisc: handle pgtable_page_ctor() failKirill A. Shutemov
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-10parisc: move include/asm-parisc to arch/parisc/include/asmKyle McMartin