summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2023-08-18powerpc/book3s64/mm: enable transparent pud hugepageAneesh Kumar K.V
This is enabled only with radix translation and 1G hugepage size. This will be used with devdax device memory with a namespace alignment of 1G. Anon transparent hugepage is not supported even though we do have helpers checking pud_trans_huge(). We should never find that return true. The only expected pte bit combination is _PAGE_PTE | _PAGE_DEVMAP. Some of the helpers are never expected to get called on hash translation and hence is marked to call BUG() in such a case. Link: https://lkml.kernel.org/r/20230724190759.483013-10-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18powerpc/mm/trace: convert trace event to trace event classAneesh Kumar K.V
A follow-up patch will add a pud variant for this same event. Using event class makes that addition simpler. No functional change in this patch. Link: https://lkml.kernel.org/r/20230724190759.483013-9-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18mm: remove CONFIG_PER_VMA_LOCK ifdefsMatthew Wilcox (Oracle)
Patch series "Handle most file-backed faults under the VMA lock", v3. This patchset adds the ability to handle page faults on parts of files which are already in the page cache without taking the mmap lock. This patch (of 10): Provide lock_vma_under_rcu() when CONFIG_PER_VMA_LOCK is not defined to eliminate ifdefs in the users. Link: https://lkml.kernel.org/r/20230724185410.1124082-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230724185410.1124082-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18mmu_notifiers: rename invalidate_range notifierAlistair Popple
There are two main use cases for mmu notifiers. One is by KVM which uses mmu_notifier_invalidate_range_start()/end() to manage a software TLB. The other is to manage hardware TLBs which need to use the invalidate_range() callback because HW can establish new TLB entries at any time. Hence using start/end() can lead to memory corruption as these callbacks happen too soon/late during page unmap. mmu notifier users should therefore either use the start()/end() callbacks or the invalidate_range() callbacks. To make this usage clearer rename the invalidate_range() callback to arch_invalidate_secondary_tlbs() and update documention. Link: https://lkml.kernel.org/r/6f77248cd25545c8020a54b4e567e8b72be4dca1.1690292440.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Andrew Donnellan <ajd@linux.ibm.com> Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Frederic Barrat <fbarrat@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nicolin Chen <nicolinc@nvidia.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: SeongJae Park <sj@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Zhi Wang <zhi.wang.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18mmu_notifiers: call invalidate_range() when invalidating TLBsAlistair Popple
The invalidate_range() is going to become an architecture specific mmu notifier used to keep the TLB of secondary MMUs such as an IOMMU in sync with the CPU page tables. Currently it is called from separate code paths to the main CPU TLB invalidations. This can lead to a secondary TLB not getting invalidated when required and makes it hard to reason about when exactly the secondary TLB is invalidated. To fix this move the notifier call to the architecture specific TLB maintenance functions for architectures that have secondary MMUs requiring explicit software invalidations. This fixes a SMMU bug on ARM64. On ARM64 PTE permission upgrades require a TLB invalidation. This invalidation is done by the architecture specific ptep_set_access_flags() which calls flush_tlb_page() if required. However this doesn't call the notifier resulting in infinite faults being generated by devices using the SMMU if it has previously cached a read-only PTE in it's TLB. Moving the invalidations into the TLB invalidation functions ensures all invalidations happen at the same time as the CPU invalidation. The architecture specific flush_tlb_all() routines do not call the notifier as none of the IOMMUs require this. Link: https://lkml.kernel.org/r/0287ae32d91393a582897d6c4db6f7456b1001f2.1690292440.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Tested-by: SeongJae Park <sj@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Andrew Donnellan <ajd@linux.ibm.com> Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Frederic Barrat <fbarrat@linux.ibm.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nicolin Chen <nicolinc@nvidia.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Zhi Wang <zhi.wang.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18powerpc: mm: convert to GENERIC_IOREMAPChristophe Leroy
By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(), generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap() and iounmap() are all visible and available to arch. Arch needs to provide wrapper functions to override the generic versions if there's arch specific handling in its ioremap_prot(), ioremap() or iounmap(). This change will simplify implementation by removing duplicated code with generic_ioremap_prot() and generic_iounmap(), and has the equivalent functioality as before. Here, add wrapper functions ioremap_prot() and iounmap() for powerpc's special operation when ioremap() and iounmap(). Link: https://lkml.kernel.org/r/20230706154520.11257-18-bhe@redhat.com Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Rich Felker <dalias@libc.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18mm: move is_ioremap_addr() into new header fileBaoquan He
Now is_ioremap_addr() is only used in kernel/iomem.c and gonna be used in mm/ioremap.c. Move it into its own new header file linux/ioremap.h. Link: https://lkml.kernel.org/r/20230706154520.11257-17-bhe@redhat.com Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Rich Felker <dalias@libc.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18asm-generic/iomap.h: remove ARCH_HAS_IOREMAP_xx macrosBaoquan He
Patch series "mm: ioremap: Convert architectures to take GENERIC_IOREMAP way", v8. Motivation and implementation: ============================== Currently, many architecutres have't taken the standard GENERIC_IOREMAP way to implement ioremap_prot(), iounmap(), and ioremap_xx(), but make these functions specifically under each arch's folder. Those cause many duplicated code of ioremap() and iounmap(). In this patchset, firstly introduce generic_ioremap_prot() and generic_iounmap() to extract the generic code for GENERIC_IOREMAP. By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(), generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap() and iounmap() are all visible and available to arch. Arch needs to provide wrapper functions to override the generic version if there's arch specific handling in its corresponding ioremap_prot(), ioremap() or iounmap(). With these changes, duplicated ioremap/iounmap() code uder ARCH-es are removed, and the equivalent functioality is kept as before. Background info: ================ 1) The converting more architectures to take GENERIC_IOREMAP way is suggested by Christoph in below discussion: https://lore.kernel.org/all/Yp7h0Jv6vpgt6xdZ@infradead.org/T/#u 2) In the previous v1 to v3, it's basically further action after arm64 has converted to GENERIC_IOREMAP way in below patchset. It's done by adding hook ioremap_allowed() and iounmap_allowed() in ARCH to add ARCH specific handling the middle of ioremap_prot() and iounmap(). [PATCH v5 0/6] arm64: Cleanup ioremap() and support ioremap_prot() https://lore.kernel.org/all/20220607125027.44946-1-wangkefeng.wang@huawei.com/T/#u Later, during v3 reviewing, Christophe Leroy suggested to introduce generic_ioremap_prot() and generic_iounmap() to generic codes, and ARCH can provide wrapper function ioremap_prot(), ioremap() or iounmap() if needed. Christophe made a RFC patchset as below to specially demonstrate his idea. This is what v4 and now v5 is doing. [RFC PATCH 0/8] mm: ioremap: Convert architectures to take GENERIC_IOREMAP way https://lore.kernel.org/all/cover.1665568707.git.christophe.leroy@csgroup.eu/T/#u Testing: ======== In v8, I only applied this patchset onto the latest linus's tree to build and run on arm64 and s390. This patch (of 19): Let's use '#define ioremap_xx' and "#ifdef ioremap_xx" instead. To remove defined ARCH_HAS_IOREMAP_xx macros in <asm/io.h> of each ARCH, the ARCH's own ioremap_wc|wt|np definition need be above "#include <asm-generic/iomap.h>. Otherwise the redefinition error would be seen during compiling. So the relevant adjustments are made to avoid compiling error: loongarch: - doesn't include <asm-generic/iomap.h>, defining ARCH_HAS_IOREMAP_WC is redundant, so simply remove it. m68k: - selected GENERIC_IOMAP, <asm-generic/iomap.h> has been added in <asm-generic/io.h>, and <asm/kmap.h> is included above <asm-generic/iomap.h>, so simply remove ARCH_HAS_IOREMAP_WT defining. mips: - move "#include <asm-generic/iomap.h>" below ioremap_wc definition in <asm/io.h> powerpc: - remove "#include <asm-generic/iomap.h>" in <asm/io.h> because it's duplicated with the one in <asm-generic/io.h>, let's rely on the latter. x86: - selected GENERIC_IOMAP, remove #include <asm-generic/iomap.h> in the middle of <asm/io.h>. Let's rely on <asm-generic/io.h>. Link: https://lkml.kernel.org/r/20230706154520.11257-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Helge Deller <deller@gmx.de> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Rich Felker <dalias@libc.org> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18powerpc: add pte_free_defer() for pgtables sharing pageHugh Dickins
Add powerpc-specific pte_free_defer(), to free table page via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. This is awkward because the struct page contains only one rcu_head, but that page may be shared between PTE_FRAG_NR pagetables, each wanting to use the rcu_head at the same time. But powerpc never reuses a fragment once it has been freed: so mark the page Active in pte_free_defer(), before calling pte_fragment_free() directly; and there call_rcu() to pte_free_now() when last fragment is freed and the page is PageActive. Link: https://lkml.kernel.org/r/6e3ca5f1-334d-4b14-b92d-fc8e99914fcb@google.com Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@infradead.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: SeongJae Park <sj@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zack Rusin <zackr@vmware.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18powerpc: assert_pte_locked() use pte_offset_map_nolock()Hugh Dickins
Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() in assert_pte_locked(). BUG if pte_offset_map_nolock() fails. This mod might cause new crashes: which either expose my ignorance, or indicate issues to be fixed, or limit the usage of assert_pte_locked(). [hughd@google.com: assert_pte_locked() still needs the pmd_none() check] Link: https://lkml.kernel.org/r/c73d1543-532c-3da2-8cf2-a95363a14116@google.com Link: https://lkml.kernel.org/r/e8d56c95-c132-a82e-5f5f-7bb1b738b057@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@infradead.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: SeongJae Park <sj@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zack Rusin <zackr@vmware.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18mm: remove arguments of show_mem()Kefeng Wang
All callers of show_mem() pass 0 and NULL, so we can remove the two arguments by directly calling __show_mem(0, NULL, MAX_NR_ZONES - 1) in show_mem(). Link: https://lkml.kernel.org/r/20230630062253.189440-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18powerpc/perf: Convert fsl_emb notifier to state machine callbacksChristophe Leroy
CC arch/powerpc/perf/core-fsl-emb.o arch/powerpc/perf/core-fsl-emb.c:675:6: error: no previous prototype for 'hw_perf_event_setup' [-Werror=missing-prototypes] 675 | void hw_perf_event_setup(int cpu) | ^~~~~~~~~~~~~~~~~~~ Looks like fsl_emb was completely missed by commit 3f6da3905398 ("perf: Rework and fix the arch CPU-hotplug hooks") So, apply same changes as commit 3f6da3905398 ("perf: Rework and fix the arch CPU-hotplug hooks") then commit 57ecde42cc74 ("powerpc/perf: Convert book3s notifier to state machine callbacks") While at it, also fix following error: arch/powerpc/perf/core-fsl-emb.c: In function 'perf_event_interrupt': arch/powerpc/perf/core-fsl-emb.c:648:13: error: variable 'found' set but not used [-Werror=unused-but-set-variable] 648 | int found = 0; | ^~~~~ Fixes: 3f6da3905398 ("perf: Rework and fix the arch CPU-hotplug hooks") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/603e1facb32608f88f40b7d7b9094adc50e7b2dc.1692349125.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/rtas: export rtas_error_rc() for reuse.Mahesh Salgaonkar
Also, #define descriptive names for common rtas return codes and use it instead of numeric values. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/169235811556.193557.1023625262204809514.stgit@jupiter
2023-08-18powerpc/idle: Add support for nohltVaibhav Jain
This patch enables config option GENERIC_IDLE_POLL_SETUP for arch powerpc. This adds support for kernel param 'nohlt'. Powerpc kernel also supports another kernel boot-time param called 'powersave' which can also be used to disable all cpu idle-states and forces CPU to an idle-loop similar to what cpu_idle_poll() does. This patch however makes powerpc kernel-parameters better aligned to the generic boot-time parameters. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230818050739.827851-1-vaibhav@linux.ibm.com
2023-08-18powerpc/fadump: invoke ibm,os-term with rtas_call_unlocked()Hari Bathini
Invoke ibm,os-term call with rtas_call_unlocked(), without using the RTAS spinlock, to avoid deadlock in the unlikely event of a machine crash while making an RTAS call. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230609071404.425529-1-hbathini@linux.ibm.com
2023-08-18powerpc: Move DMA64_PROPNAME define to a headerMichal Suchanek
Avoid redefining the same value in multiple source. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230817162411.429-1-msuchanek@suse.de
2023-08-18Revert "powerpc/xmon: Relax frame size for clang"Nick Desaulniers
This reverts commit 9c87156cce5a63735d1218f0096a65c50a7a32aa. I have not been able to reproduce the reported -Wframe-larger-than= warning (or disassembly) with clang-11 or clang-18. I don't know precisely when this was fixed in llvm, but it may be time to revert this. Closes: https://github.com/ClangBuiltLinux/linux/issues/252 Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230817-ppc_xmon-v1-1-8cc2d51b9995@google.com
2023-08-18powerpc/mm: Cleanup memory block size probingAneesh Kumar K.V
Parse the device tree in early init to find the memory block size to be used by the kernel. Consolidate the memory block size device tree parsing to one helper and use that on both powernv and pseries. We still want to use machine-specific callback because on all machine types other than powernv and pseries we continue to return MIN_MEMORY_BLOCK_SIZE. pseries_memory_block_size used to look for the second memory block (memory@x) to determine the memory_block_size value. This patch changed that to look at all memory blocks and make sure we can map them all correctly using the computed memory block size value. Add workaround to force 256MB memory block size if device driver managed memory such as GPU memory is present. This helps to add GPU memory that is not aligned to 1G. Co-developed-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801044447.11275-1-aneesh.kumar@linux.ibm.com
2023-08-18powerpc/configs: Drop CONFIG_IP_NF_TARGET_CLUSTERIPTrevor Woerner
Drop CONFIG_IP_NF_TARGET_CLUSTERIP as it was removed in commit 9db5d918e2c0 ("netfilter: ip_tables: remove clusterip target"). Signed-off-by: Trevor Woerner <twoerner@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230817115017.35663-5-twoerner@gmail.com
2023-08-18powerpc/fadump: reset dump area size if fadump memory reserve failsSourabh Jain
In case fadump_reserve_mem() fails to reserve memory, the reserve_dump_area_size variable will retain the reserve area size. This will lead to /sys/kernel/fadump/mem_reserved node displaying an incorrect memory reserved by fadump. To fix this problem, reserve dump area size variable is set to 0 if fadump failed to reserve memory. Fixes: 8255da95e545 ("powerpc/fadump: release all the memory above boot memory size") Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230704050715.203581-1-sourabhjain@linux.ibm.com
2023-08-18powerpc/4xx: Add missing includes to fix no previous prototype errorsChristophe Leroy
A W=1 build of ppc40x_defconfig throws the followings errors: CC arch/powerpc/platforms/4xx/uic.o arch/powerpc/platforms/4xx/uic.c:274:13: warning: no previous prototype for 'uic_init_tree' [-Wmissing-prototypes] 274 | void __init uic_init_tree(void) | ^~~~~~~~~~~~~ arch/powerpc/platforms/4xx/uic.c:319:14: warning: no previous prototype for 'uic_get_irq' [-Wmissing-prototypes] 319 | unsigned int uic_get_irq(void) | ^~~~~~~~~~~ CC arch/powerpc/platforms/4xx/machine_check.o CC arch/powerpc/platforms/4xx/soc.o arch/powerpc/platforms/4xx/soc.c:193:6: warning: no previous prototype for 'ppc4xx_reset_system' [-Wmissing-prototypes] 193 | void ppc4xx_reset_system(char *cmd) | ^~~~~~~~~~~~~~~~~~~ Add missing includes to get the missing prototypes. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/c8253017e355638132737ff47936e290df8738d1.1692282432.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/47x: Add prototype for mmu_init_secondary()Christophe Leroy
A W=1 build of 44x/iss476-smp_defconfig gives: arch/powerpc/mm/nohash/44x.c:220:13: error: no previous prototype for 'mmu_init_secondary' [-Werror=missing-prototypes] 220 | void __init mmu_init_secondary(int cpu) | ^~~~~~~~~~~~~~~~~~ That function is called from head_4xx.S Add a prototype in mmu_decl.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/e89d9927c926044e54fd056a849785f526c6414f.1692282340.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/47x: Remove early_init_mmu_47x() to fix no previous prototypeChristophe Leroy
4xx/iss476-smp_defconfig leads to: CC arch/powerpc/mm/nohash/tlb.o arch/powerpc/mm/nohash/tlb.c:322:13: error: no previous prototype for 'early_init_mmu_47x' [-Werror=missing-prototypes] 322 | void __init early_init_mmu_47x(void) | ^~~~~~~~~~~~~~~~~~ early_init_mmu_47x() is used only at one place and only locally. Fold it into its only caller and remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/0a667b7c2e05d3cf41ecd38f33cc334083a61c8d.1692282396.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/4xx: Remove pika_dtm_[un]register_shutdown() to fix no previous ↵Christophe Leroy
prototype ppc4xx_defconfig with W=1 results in: CC arch/powerpc/platforms/44x/warp.o arch/powerpc/platforms/44x/warp.c:369:5: error: no previous prototype for 'pika_dtm_register_shutdown' [-Werror=missing-prototypes] 369 | int pika_dtm_register_shutdown(void (*func)(void *arg), void *arg) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/platforms/44x/warp.c:374:5: error: no previous prototype for 'pika_dtm_unregister_shutdown' [-Werror=missing-prototypes] 374 | int pika_dtm_unregister_shutdown(void (*func)(void *arg), void *arg) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ The functions were added by commit 4ebef31fa6e0 ("[POWERPC] PIKA Warp: Update platform code to support Rev B boards") Those functions are not used localy and allthough their symbols are exported they are not declared in any header file so they can't be used. Remove them, then remove the associated list as it will now remain empty hence becomes useless. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/830923f0e0375a14609204246d302c7476a8f948.1692279855.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/4xx: Remove WatchdogHandler() to fix no previous prototype errorChristophe Leroy
Building ppc40x_defconfig throws the following error: CC arch/powerpc/kernel/traps.o arch/powerpc/kernel/traps.c:2232:29: warning: no previous prototype for 'WatchdogHandler' [-Wmissing-prototypes] 2232 | void __attribute__ ((weak)) WatchdogHandler(struct pt_regs *regs) | ^~~~~~~~~~~~~~~ This function was imported by commit 14cf11af6cf6 ("powerpc: Merge enough to start building in arch/powerpc.") as a weak function but never defined and/or called outside traps.c As it has only one caller fold it inside its caller and remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/38fe1078eb403eef74dc8f29387636fd7ecdf43c.1692276041.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/32s: Cleanup the mess in __set_pte_at()Christophe Leroy
__set_pte_at() handles 3 main cases with #ifdefs plus the 'percpu' subcase which leads to code duplication. Rewrite the function using IS_ENABLED() to minimise the total number of cases and remove duplicated code. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/2322dd08217bccab25456fe8b189edf0e6a8b6dd.1692121353.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/8xx: Remove init_internal_rtc() to fix no previous prototype errorChristophe Leroy
A W=1 build of mpc885_ads_defconfig throws the following error: CC arch/powerpc/platforms/8xx/m8xx_setup.o arch/powerpc/platforms/8xx/m8xx_setup.c:41:1: error: no previous prototype for 'init_internal_rtc' [-Werror=missing-prototypes] 41 | init_internal_rtc(void) | ^~~~~~~~~~~~~~~~~ init_internal_rtc() was introduced by commit df34403dcaac ("[POWERPC] 8xx: Add mpc885ads support and common mpc8xx files") as a weak function but has never been defined and/or used outside m8xx_setup.c As it is called only once there, just fold it into its caller and remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/0aa1141e18a84d926e199093204b37ec993f0c87.1692275185.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/82xx: Remove CONFIG_8260 and CONFIG_8272Christophe Leroy
CONFIG_8272 is never used, remove it. CONFIG_8260 is redundant with CONFIG_PPC_82xx, remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/80930252a5167f3cdaa7eb694074d75521a0bdf9.1692259495.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/82xx: Remove pq2_init_pciChristophe Leroy
Commit 859b21a008eb ("powerpc: drop PowerQUICC II Family ADS platform support") removed last user of pq2_init_pci. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/8b2db7c3c2c346aa8aa49507415c360d441e5bf5.1692259498.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/83xx: Split usb.cChristophe Leroy
usb.c contains three independent parts with no common part. Split it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Drop usb.o from Makefile to fix build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/75712b54bf9cb85ab10e47cd2772cd2a098ca895.1692199324.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/83xx: Fix style problems in usb.c and remove unneccessary includes ↵Christophe Leroy
from mpc83xx.h Replace printk(KERN_WARN with pr_warn( Remove a couple of blank lines Re-align multi-line code. Replace asm/io.h by linux/io.h mpc83xx.h doesn't need linux/device.h or asm/pci-bridge.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/2cb498f637e082a4af8032311fad3cae84d6aa5d.1692199324.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/fsl_pci: Make fsl_add_bridge() staticChristophe Leroy
Since commit 905e75c46dba ("powerpc/fsl-pci: Unify pci/pcie initialization code") fsl_add_bridge() is not used anymore outside of fsl_pci.c Make it static. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/2115e3597d81e72a865820af54f0e290d0fd2b3a.1692199186.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/512x: Make mpc512x_select_reset_compat() staticChristophe Leroy
mpc512x_select_reset_compat() is only used in the file it is defined. Make it static. Move mpc512x_restart_init() after mpc512x_select_reset_compat(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/36a19e13025dbf17e92e832dd24150642b0e9bad.1692341499.git.christophe.leroy@csgroup.eu
2023-08-18powerpc/ps3: refactor strncpy usageJustin Stitt
`strncpy` is deprecated for use on NUL-terminated destination strings [1]. `make_first_field()` should use similar implementation to `make_field()` due to memcpy having more obvious behavior here. The end result yields the same behavior as the previous `strncpy`-based implementation including the NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230816-strncpy-arch-powerpc-platforms-ps3-repository-v1-1-88283b02fb09@google.com
2023-08-17powerpc/rtas_flash: allow user copy to flash block cache objectsNathan Lynch
With hardened usercopy enabled (CONFIG_HARDENED_USERCOPY=y), using the /proc/powerpc/rtas/firmware_update interface to prepare a system firmware update yields a BUG(): kernel BUG at mm/usercopy.c:102! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 2232 Comm: dd Not tainted 6.5.0-rc3+ #2 Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 of:IBM,FW860.50 (SV860_146) hv:phyp pSeries NIP: c0000000005991d0 LR: c0000000005991cc CTR: 0000000000000000 REGS: c0000000148c76a0 TRAP: 0700 Not tainted (6.5.0-rc3+) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002242 XER: 0000000c CFAR: c0000000001fbd34 IRQMASK: 0 [ ... GPRs omitted ... ] NIP usercopy_abort+0xa0/0xb0 LR usercopy_abort+0x9c/0xb0 Call Trace: usercopy_abort+0x9c/0xb0 (unreliable) __check_heap_object+0x1b4/0x1d0 __check_object_size+0x2d0/0x380 rtas_flash_write+0xe4/0x250 proc_reg_write+0xfc/0x160 vfs_write+0xfc/0x4e0 ksys_write+0x90/0x160 system_call_exception+0x178/0x320 system_call_common+0x160/0x2c4 The blocks of the firmware image are copied directly from user memory to objects allocated from flash_block_cache, so flash_block_cache must be created using kmem_cache_create_usercopy() to mark it safe for user access. Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> [mpe: Trim and indent oops] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230810-rtas-flash-vs-hardened-usercopy-v2-1-dcf63793a938@linux.ibm.com
2023-08-16powerpc/ptrace: Split gpr32_set_commonChristophe Leroy
objtool reports the following warning: arch/powerpc/kernel/ptrace/ptrace-view.o: warning: objtool: gpr32_set_common+0x23c (.text+0x860): redundant UACCESS disable gpr32_set_common() conditionally opens and closes UACCESS based on whether kbuf pointer is NULL or not. This is wackelig. Split gpr32_set_common() in two fonctions, one for user one for kernel. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Fix oops in gpr32_set_common_user() due to NULL kbuf] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/b8d6ae4483fcfd17524e79d803c969694a85cc02.1687428075.git.christophe.leroy@csgroup.eu
2023-08-16powerpc/watchpoints: Remove ptrace/perf exclusion trackingBenjamin Gray
ptrace and perf watchpoints were considered incompatible in commit 29da4f91c0c1 ("powerpc/watchpoint: Don't allow concurrent perf and ptrace events"), but the logic in that commit doesn't really apply. Ptrace doesn't automatically single step; the ptracer must request this explicitly. And the ptracer can do so regardless of whether a ptrace/perf watchpoint triggered or not: it could single step every instruction if it wanted to. Whatever stopped the ptracee before executing the instruction that would trigger the perf watchpoint is no longer relevant by this point. To get correct behaviour when perf and ptrace are watching the same data we must ignore the perf watchpoint. After all, ptrace has before-execute semantics, and perf is after-execute, so perf doesn't actually care about the watchpoint trigger at this point in time. Pausing before execution does not mean we will actually end up executing the instruction. Importantly though, we don't remove the perf watchpoint yet. This is key. The ptracer is free to do whatever it likes right now. E.g., it can continue the process, single step. or even set the child PC somewhere completely different. If it does try to execute the instruction though, without reinserting the watchpoint (in which case we go back to the start of this example), the perf watchpoint would immediately trigger. This time there is no ptrace watchpoint, so we can safely perform a single step and increment the perf counter. Upon receiving the single step exception, the existing code already handles propagating or consuming it based on whether another subsystem (e.g. ptrace) requested a single step. Again, this is needed with or without perf/ptrace exclusion, because ptrace could be single stepping this instruction regardless of if a watchpoint is involved. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801011744.153973-6-bgray@linux.ibm.com
2023-08-16powerpc/watchpoints: Simplify watchpoint reinsertionBenjamin Gray
We only remove watchpoints when they have the perf_single_step flag set, so we can reinsert them during the first iteration. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801011744.153973-5-bgray@linux.ibm.com
2023-08-16powerpc/watchpoints: Track perf single step directly on the breakpointBenjamin Gray
There is a bug in the current watchpoint tracking logic, where the teardown in arch_unregister_hw_breakpoint() uses bp->ctx->task, which it does not have a reference of and parallel threads may be in the process of destroying. This was partially addressed in commit fb822e6076d9 ("powerpc/hw_breakpoint: Fix oops when destroying hw_breakpoint event"), but the underlying issue of accessing a struct member in an unknown state still remained. Syzkaller managed to trigger a null pointer derefernce due to the race between the task destructor and checking the pointer and dereferencing it in the loop. While this null pointer dereference could be fixed by using READ_ONCE to access the task up front, that just changes the error to manipulating possbily freed memory. Instead, the breakpoint logic needs to be reworked to remove any dependency on a context or task struct during breakpoint removal. The reason we have this currently is to clear thread.last_hit_ubp. This member is used to differentiate the perf DAWR single-step sequence from other causes of single-step, such as userspace just calling ptrace(PTRACE_SINGLESTEP, ...). We need to differentiate them because, when the single step interrupt is received, we need to know whether to re-insert the DAWR breakpoint (perf) or not (ptrace / other). arch_unregister_hw_breakpoint() needs to clear this information to prevent dangling pointers to possibly freed memory. These pointers are dereferenced in single_step_dabr_instruction() without a way to check their validity. This patch moves the tracking of this information to the breakpoint itself. This means we no longer have to do anything special to clean up. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801011744.153973-4-bgray@linux.ibm.com
2023-08-16powerpc/watchpoints: Don't track info persistentlyBenjamin Gray
info is cheap to retrieve, and is likely optimised by the compiler anyway. On the other hand, propagating it across the functions makes it possible to be inconsistent and adds needless complexity. Remove it, and invoke counter_arch_bp() when we need to work with it. As we don't persist it, we just use the local bp array to track whether we are ignoring a breakpoint. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801011744.153973-3-bgray@linux.ibm.com
2023-08-16powerpc/watchpoints: Explain thread_change_pc() moreBenjamin Gray
The behaviour of the thread_change_pc() function is a bit cryptic without being more familiar with how the watchpoint logic handles perf's after-execute semantics. Expand the comment to explain why we can re-insert the breakpoint and unset the perf_single_step flag. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801011744.153973-2-bgray@linux.ibm.com
2023-08-16powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity ↵Kajol Jain
domain via partition information The hcall H_GET_PERF_COUNTER_INFO with counter request value as AFFINITY_DOMAIN_INFORMATION_BY_PARTITION(0XB1), can be used to get the system affinity domain via partition information. To expose the system affinity domain via partition information, patch adds sysfs file called "affinity_domain_via_partition" to the "/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver. Add new entry for AFFINITY_DOMAIN_VIA_PAR in sysinfo_counter_request array, which points to the counter request value "affinity_domain_via_partition" in hv-gpci.c file. Also add a new function called "affinity_domain_via_partition_result_parse" to parse the hcall result and store it in output buffer. The affinity_domain_via_partition sysfs file is only available for power10 and above platforms. Add a macro called INTERFACE_AFFINITY_DOMAIN_VIA_PAR_ATTR, which points to the index of NULL placeholder, for affinity_domain_via_partition attribute in interface_attrs array. Also updated the value of INTERFACE_NULL_ATTR macro in hv-gpci.c file. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230729073455.7918-10-kjain@linux.ibm.com
2023-08-16powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity ↵Kajol Jain
domain via domain information The hcall H_GET_PERF_COUNTER_INFO with counter request value as AFFINITY_DOMAIN_INFORMATION_BY_DOMAIN(0XB0), can be used to get the system affinity domain via domain information. To expose the system affinity domain via domain information, patch adds sysfs file called "affinity_domain_via_domain" to the "/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver. Add new entry for AFFINITY_DOMAIN_VIA_DOM in sysinfo_counter_request array, which points to the counter request value "affinity_domain_via_domain" in hv-gpci.c file. The affinity_domain_via_domain sysfs file is only available for power10 and above platforms. Add a macro called INTERFACE_AFFINITY_DOMAIN_VIA_DOM_ATTR, which points to the index of NULL placeholder, for affinity_domain_via_domain attribute in interface_attrs array. Also updated the value of INTERFACE_NULL_ATTR macro in hv-gpci.c file. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230729073455.7918-8-kjain@linux.ibm.com
2023-08-16powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity ↵Kajol Jain
domain via virtual processor information The hcall H_GET_PERF_COUNTER_INFO with counter request value as AFFINITY_DOMAIN_INFORMATION_BY_VIRTUAL_PROCESSOR(0XA0), can be used to get the system affinity domain via virtual processor information. To expose the system affinity domain via virtual processor information, patch adds sysfs file called "affinity_domain_via_virtual_processor" to the "/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver. The affinity_domain_via_virtual_processor sysfs file is only available for power10 and above platforms. Add a macro called INTERFACE_AFFINITY_DOMAIN_VIA_VP_ATTR, which points to the index of NULL placeholder, for affinity_domain_via_virtual_processor attribute in interface_attrs array. Also updated the value of INTERFACE_NULL_ATTR macro in hv-gpci.c file. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230729073455.7918-6-kjain@linux.ibm.com
2023-08-16powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor ↵Kajol Jain
config information The hcall H_GET_PERF_COUNTER_INFO with counter request value as PROCESSOR_CONFIG(0X90), can be used to get the system processor configuration information. To expose the system processor config information, patch adds sysfs file called "processor_config" to the "/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver. Add enum and sysinfo_counter_request array to get required counter request value in hv-gpci.c file. Also add a new function called "sysinfo_device_attr_create", which will create and return required device attribute to the add_sysinfo_interface_files function. The processor_config sysfs file is only available for power10 and above platforms. Add a new macro called INTERFACE_PROCESSOR_CONFIG_ATTR, which points to the index of NULL placefolder, for processor_config attribute in the interface_attrs array. Also add macro INTERFACE_NULL_ATTR which points to index of NULL attribute in interface_attrs array. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230729073455.7918-4-kjain@linux.ibm.com
2023-08-16powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor bus ↵Kajol Jain
topology information The hcall H_GET_PERF_COUNTER_INFO with counter request value as PROCESSOR_BUS_TOPOLOGY(0XD0), can be used to get the system topology information. To expose the system topology information, patch adds sysfs file called "processor_bus_topology" to the "/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver. Add macro for PROCESSOR_BUS_TOPOLOGY counter request value in hv-gpci.c file. Also add a new function called "systeminfo_gpci_request", to make the H_GET_PERF_COUNTER_INFO hcall with added macro and populates the output buffer. The processor_bus_topology sysfs file is only available for power10 and above platforms. Add a new function called "add_sysinfo_interface_files", which will add processor_bus_topology attribute in the interface_attrs array, only for power10 and above platforms. Also add macro INTERFACE_PROCESSOR_BUS_TOPOLOGY_ATTR in hv-gpci.c file, which points to the index of NULL placefolder, for processor_bus_topology attribute. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230729073455.7918-2-kjain@linux.ibm.com
2023-08-16powerpc: Make virt_to_pfn() a static inlineLinus Walleij
Making virt_to_pfn() a static inline taking a strongly typed (const void *) makes the contract of a passing a pointer of that type to the function explicit and exposes any misuse of the macro virt_to_pfn() acting polymorphic and accepting many types such as (void *), (unitptr_t) or (unsigned long) as arguments without warnings. Move the virt_to_pfn() and related functions below the declaration of __pa() so it compiles. For symmetry do the same with pfn_to_kaddr(). As the file is included right into the linker file, we need to surround the functions with ifndef __ASSEMBLY__ so we don't cause compilation errors. The conversion moreover exposes the fact that pmd_page_vaddr() was returning an unsigned long rather than a const void * as could be expected, so all the sites defining pmd_page_vaddr() had to be augmented as well. Finally the KVM code in book3s_64_mmu_hv.c was passing an unsigned int to virt_to_phys() so fix that up with a cast so the result compiles. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> [mpe: Fixup kfence.h, simplify pfn_to_kaddr() & pmd_page_vaddr()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230809-virt-to-phys-powerpc-v1-1-12e912a7d439@linaro.org
2023-08-16powerpc/powernv/pci: use pci_dev_id() to simplify the codeXiongfeng Wang
PCI core API pci_dev_id() can be used to get the BDF number for a pci device. We don't need to compose it mannually. Use pci_dev_id() to simplify the code a little bit. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230804080435.191196-1-wangxiongfeng2@huawei.com
2023-08-16powerpc/xics: Remove unnecessary endian conversionGautam Menghani
Remove an unnecessary piece of code that does an endianness conversion but does not use the result. The following warning was reported by Clang's static analyzer: arch/powerpc/sysdev/xics/ics-opal.c:114:2: warning: Value stored to 'server' is never read [deadcode.DeadStores] server = be16_to_cpu(oserver); 'server' was used as a parameter to opal_get_xive() in commit 5c7c1e9444d8 ("powerpc/powernv: Add OPAL ICS backend") when it was introduced. 'server' was also used in an error message for the call to opal_get_xive(). 'server' was always later set by a call to ics_opal_mangle_server() before being used. Commit bf8e0f891a32 ("powerpc/powernv: Fix endian issues in OPAL ICS backend") used a new variable 'oserver' as the parameter to opal_get_xive() instead of 'server' for endian correctness. It also removed 'server' from the error message for the call to opal_get_xive(). Fix the warning by removing the server variable assignment. Fixes: bf8e0f891a32 ("powerpc/powernv: Fix endian issues in OPAL ICS backend") Reviewed-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230731115543.36991-1-gautam@linux.ibm.com
2023-08-16powerpc/pseries: fix possible memory leak in ibmebus_bus_init()ruanjinjie
If device_register() returns error in ibmebus_bus_init(), name of kobject which is allocated in dev_set_name() called in device_add() is leaked. As comment of device_add() says, it should call put_device() to drop the reference count that was set in device_initialize() when it fails, so the name can be freed in kobject_cleanup(). Signed-off-by: ruanjinjie <ruanjinjie@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20221110011929.3709774-1-ruanjinjie@huawei.com