summaryrefslogtreecommitdiff
path: root/arch/powerpc/mm
AgeCommit message (Collapse)Author
2023-06-28Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
2023-06-24powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma()Linus Torvalds
This is one of the simple cases, except there's no pt_regs pointer. Which is fine, as lock_mm_and_find_vma() is set up to work fine with a NULL pt_regs. Powerpc already enabled LOCK_MM_AND_FIND_VMA for the main CPU faulting, so we can just use the helper without any extra work. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-24powerpc/mm: Convert to using lock_mm_and_find_vma()Michael Ellerman
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-21powerpc/mm/dax: Fix the condition when checking if altmap vmemap can ↵Aneesh Kumar K.V
cross-boundary Without this fix, the last subsection vmemmap can end up in memory even if the namespace is created with -M mem and has sufficient space in the altmap area. Fixes: cf387d9644d8 ("libnvdimm/altmap: Track namespace boundaries in altmap") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616110826.344417-6-aneesh.kumar@linux.ibm.com
2023-06-21powerpc/book3s64/mm: Use PAGE_KERNEL instead of opencodingAneesh Kumar K.V
No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616110826.344417-5-aneesh.kumar@linux.ibm.com
2023-06-21powerpc/book3s64/mm: Fix DirectMap stats in /proc/meminfoAneesh Kumar K.V
On memory unplug reduce DirectMap page count correctly. root@ubuntu-guest:# grep Direct /proc/meminfo DirectMap4k: 0 kB DirectMap64k: 0 kB DirectMap2M: 115343360 kB DirectMap1G: 0 kB Before fix: root@ubuntu-guest:# ndctl disable-namespace all disabled 1 namespace root@ubuntu-guest:# grep Direct /proc/meminfo DirectMap4k: 0 kB DirectMap64k: 0 kB DirectMap2M: 115343360 kB DirectMap1G: 0 kB After fix: root@ubuntu-guest:# ndctl disable-namespace all disabled 1 namespace root@ubuntu-guest:# grep Direct /proc/meminfo DirectMap4k: 0 kB DirectMap64k: 0 kB DirectMap2M: 104857600 kB DirectMap1G: 0 kB Fixes: a2dc009afa9a ("powerpc/mm/book3s/radix: Add mapping statistics") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616110826.344417-4-aneesh.kumar@linux.ibm.com
2023-06-20powerpc/mm/book3s64: Use pmdp_ptep helper instead of typecasting.Aneesh Kumar K.V
No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616110826.344417-2-aneesh.kumar@linux.ibm.com
2023-06-19powerpc/hugetlb: pte_alloc_huge()Hugh Dickins
pte_alloc_map() expects to be followed by pte_unmap(), but hugetlb omits that: to keep balance in future, use the recently added pte_alloc_huge() instead. huge_pte_offset() is using __find_linux_pte(), which is using pte_offset_kernel() - don't rename that to _huge, it's more complicated. Link: https://lkml.kernel.org/r/36b4e5d-954b-8569-4fe2-bd1797362441@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: John David Anglin <dave.anglin@bell.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19powerpc: allow pte_offset_map[_lock]() to failHugh Dickins
In rare transient cases, not yet made possible, pte_offset_map() and pte_offset_map_lock() may not find a page table: handle appropriately. Balance successful pte_offset_map() with pte_unmap() where omitted. Link: https://lkml.kernel.org/r/54c8b578-ca9-a0f-bfd2-d72976f8d73a@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: John David Anglin <dave.anglin@bell.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-14powerpc/32s: Fix LLVM SMP buildNicholas Piggin
LLVM assembler does not recognise 3-operand cmpi, use cmpwi. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606131828.315427-1-npiggin@gmail.com
2023-06-09mm/gup: remove vmas parameter from pin_user_pages()Lorenzo Stoakes
We are now in a position where no caller of pin_user_pages() requires the vmas parameter at all, so eliminate this parameter from the function and all callers. This clears the way to removing the vmas parameter from GUP altogether. Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> [qib] Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> [drivers/media] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian König <christian.koenig@amd.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs enabledNicholas Piggin
Switching mm and tinkering with current->active_mm should be done with irqs disabled. There is a path where exit_lazy_flush_tlb can be called with irqs enabled: exit_lazy_flush_tlb flush_type_needed __flush_all_mm tlb_finish_mmu exit_mmap Which results in the switching being done with irqs enabled, which is incorrect. Fixes: a665eec0a22e ("powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm") Cc: stable@vger.kernel.org # v5.10+ Reported-by: Sachin Sant <sachinp@linux.ibm.com> Link: https://lore.kernel.org/linuxppc-dev/A9A5D83D-BA70-47A4-BCB4-30C1AE19BC22@linux.ibm.com/ Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230607005601.583293-1-npiggin@gmail.com
2023-05-11powerpc/64s/radix: Fix soft dirty trackingMichael Ellerman
It was reported that soft dirty tracking doesn't work when using the Radix MMU. The tracking is supposed to work by clearing the soft dirty bit for a mapping and then write protecting the PTE. If/when the page is written to, a page fault occurs and the soft dirty bit is added back via pte_mkdirty(). For example in wp_page_reuse(): entry = maybe_mkwrite(pte_mkdirty(entry), vma); if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1)) update_mmu_cache(vma, vmf->address, vmf->pte); Unfortunately on radix _PAGE_SOFTDIRTY is being dropped by radix__ptep_set_access_flags(), called from ptep_set_access_flags(), meaning the soft dirty bit is not set even though the page has been written to. Fix it by adding _PAGE_SOFTDIRTY to the set of bits that are able to be changed in radix__ptep_set_access_flags(). Fixes: b0b5e9b13047 ("powerpc/mm/radix: Add radix pte #defines") Cc: stable@vger.kernel.org # v4.7+ Reported-by: Dan Horák <dan@danny.cz> Link: https://lore.kernel.org/r/20230511095558.56663a50f86bdc4cd97700b7@danny.cz Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230511114224.977423-1-mpe@ellerman.id.au
2023-04-28Merge tag 'powerpc-6.4-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for building the kernel using PC-relative addressing on Power10. - Allow HV KVM guests on Power10 to use prefixed instructions. - Unify support for the P2020 CPU (85xx) into a single machine description. - Always build the 64-bit kernel with 128-bit long double. - Drop support for several obsolete 2000's era development boards as identified by Paul Gortmaker. - A series fixing VFIO on Power since some generic changes. - Various other small features and fixes. Thanks to Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu, Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel Stanley, Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro Yamada, Michael Neuling, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Nick Desaulniers, Nysal Jan K.A, Pali Rohár, Paul Gortmaker, Paul Mackerras, Petr Vaněk, Randy Dunlap, Rob Herring, Sachin Sant, Sean Christopherson, Segher Boessenkool, and Timothy Pearson. * tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits) powerpc/64s: Disable pcrel code model on Clang powerpc: Fix merge conflict between pcrel and copy_thread changes powerpc/configs/powernv: Add IGB=y powerpc/configs/64s: Drop JFS Filesystem powerpc/configs/64s: Use EXT4 to mount EXT2 filesystems powerpc/configs: Make pseries_defconfig an alias for ppc64le_guest powerpc/configs: Make pseries_le an alias for ppc64le_guest powerpc/configs: Incorporate generic kvm_guest.config into guest configs powerpc/configs: Add IBMVETH=y and IBMVNIC=y to guest configs powerpc/configs/64s: Enable Device Mapper options powerpc/configs/64s: Enable PSTORE powerpc/configs/64s: Enable VLAN support powerpc/configs/64s: Enable BLK_DEV_NVME powerpc/configs/64s: Drop REISERFS powerpc/configs/64s: Use SHA512 for module signatures powerpc/configs/64s: Enable IO_STRICT_DEVMEM powerpc/configs/64s: Enable SCHEDSTATS powerpc/configs/64s: Enable DEBUG_VM & other options powerpc/configs/64s: Enable EMULATED_STATS powerpc/configs/64s: Enable KUNIT and most tests ...
2023-04-27Merge tag 'mm-stable-2023-04-27-15-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
2023-04-05powerc/mm: try VMA lock-based page fault handling firstLaurent Dufour
Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. Copied from "x86/mm: try VMA lock-based page fault handling first" [ldufour@linux.ibm.com: powerpc/mm: fix mmap_lock bad unlock] Link: https://lkml.kernel.org/r/20230306154244.17560-1-ldufour@linux.ibm.com Link: https://lore.kernel.org/linux-mm/842502FB-F99C-417C-9648-A37D0ECDC9CE@linux.ibm.com Link: https://lkml.kernel.org/r/20230227173632.3292573-32-surenb@google.com Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05mm, treewide: redefine MAX_ORDER sanelyKirill A. Shutemov
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-04powerpc: Use of_address_to_resource()Rob Herring
Replace open coded reading of "reg" or of_get_address()/ of_translate_address() calls with a single call to of_address_to_resource(). Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230329220337.141295-1-robh@kernel.org
2023-04-04powerpc/papr_scm: Update the NUMA distance table for the target nodeAneesh Kumar K.V
Platform device helper routines won't update the NUMA distance table while creating a platform device, even if the device is present on a NUMA node that doesn't have memory or CPU. This is especially true for pmem devices. If the target node of the pmem device is not online, we find the nearest online node to the device and associate the pmem device with that online node. To find the nearest online node, we should have the numa distance table updated correctly. Update the distance information during the device probe. For a papr scm device on NUMA node 3 distance_lookup_table value for distance_ref_points_depth = 2 before and after fix is below: Before fix: node 3 distance depth 0 - 0 node 3 distance depth 1 - 0 node 4 distance depth 0 - 4 node 4 distance depth 1 - 2 node 5 distance depth 0 - 5 node 5 distance depth 1 - 1 After fix node 3 distance depth 0 - 3 node 3 distance depth 1 - 1 node 4 distance depth 0 - 4 node 4 distance depth 1 - 2 node 5 distance depth 0 - 5 node 5 distance depth 1 - 1 Without the fix, the nearest numa node to the pmem device (NUMA node 3) will be picked as 4. After the fix, we get the correct numa node which is 5. Fixes: da1115fdbd6e ("powerpc/nvdimm: Pick nearby online node if the device node is not online") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230404041433.1781804-1-aneesh.kumar@linux.ibm.com
2023-03-28lazy tlb: introduce lazy tlb mm refcount helper functionsNicholas Piggin
Add explicit _lazy_tlb annotated functions for lazy tlb mm refcounting. This makes the lazy tlb mm references more obvious, and allows the refcounting scheme to be modified in later changes. There is no functional change with this patch. Link: https://lkml.kernel.org/r/20230203071837.1136453-3-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-15powerpc/mm: Fix false detection of read faultsRussell Currey
To support detection of read faults with Radix execute-only memory, the vma_is_accessible() check in access_error() (which checks for PROT_NONE) was replaced with a check to see if VM_READ was missing, and if so, returns true to assert the fault was caused by a bad read. This is incorrect, as it ignores that both VM_WRITE and VM_EXEC imply read on powerpc, as defined in protection_map[]. This causes mappings containing VM_WRITE or VM_EXEC without VM_READ to misreport the cause of page faults, since the MMU is still allowing reads. Correct this by restoring the original vma_is_accessible() check for PROT_NONE mappings, and adding a separate check for Radix PROT_EXEC-only mappings. Fixes: 395cac7752b9 ("powerpc/mm: Support execute-only memory on the Radix MMU") Reported-by: Michal Suchánek <msuchanek@suse.de> Link: https://lore.kernel.org/r/20230308152702.GR19419@kitsune.suse.cz Tested-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230310050834.63105-1-ruscur@russell.cc
2023-02-25Merge tag 'powerpc-6.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Support for configuring secure boot with user-defined keys on PowerVM LPARs - Simplify the replay of soft-masked IRQs by making it non-recursive - Add support for KCSAN on 64-bit Book3S - Improvements to the API & code which interacts with RTAS (pseries firmware) - Change 32-bit powermac to assign PCI bus numbers per domain by default - Some improvements to the 32-bit BPF JIT - Various other small features and fixes Thanks to Anders Roxell, Andrew Donnellan, Andrew Jeffery, Benjamin Gray, Christophe Leroy, Frederic Barrat, Ganesh Goudar, Geoff Levand, Greg Kroah-Hartman, Jan-Benedict Glaw, Josh Poimboeuf, Kajol Jain, Laurent Dufour, Mahesh Salgaonkar, Mathieu Desnoyers, Mimi Zohar, Murphy Zhou, Nathan Chancellor, Nathan Lynch, Nayna Jain, Nicholas Piggin, Pali Rohár, Petr Mladek, Rohan McLure, Russell Currey, Sachin Sant, Sathvika Vasireddy, Sourabh Jain, Stefan Berger, Stephen Rothwell, and Sudhakar Kuppusamy. * tag 'powerpc-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (114 commits) powerpc/pseries: Avoid hcall in plpks_is_available() on non-pseries powerpc: dts: turris1x.dts: Set lower priority for CPLD syscon-reboot powerpc/e500: Add missing prototype for 'relocate_init' powerpc/64: Fix unannotated intra-function call warning powerpc/epapr: Don't use wrteei on non booke powerpc: Pass correct CPU reference to assembler powerpc/mm: Rearrange if-else block to avoid clang warning powerpc/nohash: Fix build with llvm-as powerpc/nohash: Fix build error with binutils >= 2.38 powerpc/pseries: Fix endianness issue when parsing PLPKS secvar flags macintosh: windfarm: Use unsigned type for 1-bit bitfields powerpc/kexec_file: print error string on usable memory property update failure powerpc/machdep: warn when machine_is() used too early powerpc/64: Replace -mcpu=e500mc64 by -mcpu=e5500 powerpc/eeh: Set channel state after notifying the drivers selftests/powerpc: Fix incorrect kernel headers search path powerpc/rtas: arch-wide function token lookup conversions powerpc/rtas: introduce rtas_function_token() API powerpc/pseries/lpar: convert to papr_sysparm API powerpc/pseries/hv-24x7: convert to papr_sysparm API ...
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-20powerpc/e500: Add missing prototype for 'relocate_init'Christophe Leroy
Kernel test robot reports: arch/powerpc/mm/nohash/e500.c:314:21: warning: no previous prototype for 'relocate_init' [-Wmissing-prototypes] 314 | notrace void __init relocate_init(u64 dt_ptr, phys_addr_t start) | ^~~~~~~~~~~~~ Add it in mm/mmu_decl.h, close to associated is_second_reloc variable declaration. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/oe-kbuild-all/202302181136.wgyCKUcs-lkp@intel.com/ Link: https://lore.kernel.org/r/ac9107acf24135e1a07e8f84d2090572d43e3fe4.1676712510.git.christophe.leroy@csgroup.eu
2023-02-16powerpc/mm: Rearrange if-else block to avoid clang warningAnders Roxell
Clang warns: arch/powerpc/mm/book3s64/radix_tlb.c:1191:23: error: variable 'hstart' is uninitialized when used here __tlbiel_va_range(hstart, hend, pid, ^~~~~~ arch/powerpc/mm/book3s64/radix_tlb.c:1191:31: error: variable 'hend' is uninitialized when used here __tlbiel_va_range(hstart, hend, pid, ^~~~ Rework the 'if (IS_ENABLE(CONFIG_TRANSPARENT_HUGEPAGE))' so hstart/hend is always initialized to silence the warnings. That will also simplify the 'else' path. Clang is getting confused with these warnings, but the warnings is a false-positive. Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220810114318.3220630-1-anders.roxell@linaro.org
2023-02-16powerpc/nohash: Fix build with llvm-asMichael Ellerman
When using the LLVM integrated assembler (llvm-as), the book3e build fails with: arch/powerpc/mm/nohash/tlb_low_64e.S:354:2: error: invalid instruction tlbilxva 0,%r15 ^ tlbilxva is an extended mnemonic for tlbilx, but llvm-as also doesn't support tlbilx, despite it being an e500mc instruction. Fix it by using the existing PPC_TLBILX_VA macro. The resulting binary is identical when building with binutils. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230216112915.1681631-1-mpe@ellerman.id.au
2023-02-16powerpc/nohash: Fix build error with binutils >= 2.38Michael Ellerman
With bintils >= 2.38 the ppc64_book3e_allmodconfig build fails: {standard input}: Assembler messages: {standard input}:196: Error: unrecognized opcode: `lbarx' {standard input}:196: Error: unrecognized opcode: `stbcx.' make[5]: *** [scripts/Makefile.build:252: arch/powerpc/mm/nohash/e500_hugetlbpage.o] Error 1 That happens because the default CPU for that config is e5500, set via CONFIG_TARGET_CPU, and so the assembler is building for e5500, which doesn't support those instructions. Fix it by using machine directives to tell the assembler to assemble the relevant code for e6500, which does support lbarx/stbcx. That is safe because the code already has the CPU_FTR_SMT check, which ensures the lbarx sequence doesn't run on e5500, which doesn't support SMT. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230213112322.998003-1-mpe@ellerman.id.au
2023-02-12Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch to bring in some changes that conflict with upcoming next content.
2023-02-10powerpc/64s: Fix stress_hpt memblock alloc alignmentNicholas Piggin
The stress_hpt memblock allocation did not pass in an alignment, which causes a stack dump in early boot (that I missed, oops). Fixes: 6b34a099faa1 ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221216115930.2667772-2-npiggin@gmail.com
2023-02-09mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-08powerpc/64s/radix: Remove TLB_FLUSH_ALL test from range flushesNicholas Piggin
This looks like it came across from x86, but x86 uses TLB_FLUSH_ALL as a parameter to internal functions. Powerpc never sets it anywhere. Remove the associated logic and leave a warning for now. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230203111718.1149852-4-npiggin@gmail.com
2023-02-08powerpc/64s/radix: mm->context.id should always be validNicholas Piggin
The MMU_NO_CONTEXT checks are an unnecessary complication. Make these warn to prepare to remove them in future. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230203111718.1149852-3-npiggin@gmail.com
2023-02-08powerpc/64s/radix: Remove need_flush_all test from radix__tlb_flushNicholas Piggin
need_flush_all is only set by arch code to instruct generic tlb_flush to flush all. It is never set by powerpc, so it can be removed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230203111718.1149852-2-npiggin@gmail.com
2023-01-31powerpc/64s/radix: Fix RWX mapping with relocated kernelMichael Ellerman
If a relocatable kernel is loaded at a non-zero address and told not to relocate to zero (kdump or RELOCATABLE_TEST), the mapping of the interrupt code at zero is left with RWX permissions. That is a security weakness, and leads to a warning at boot if CONFIG_DEBUG_WX is enabled: powerpc/mm: Found insecure W+X mapping at address 00000000056435bc/0xc000000000000000 WARNING: CPU: 1 PID: 1 at arch/powerpc/mm/ptdump/ptdump.c:193 note_page+0x484/0x4c0 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc1-00001-g8ae8e98aea82-dirty #175 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-dd0dca hv:linux,kvm pSeries NIP: c0000000004a1c34 LR: c0000000004a1c30 CTR: 0000000000000000 REGS: c000000003503770 TRAP: 0700 Not tainted (6.2.0-rc1-00001-g8ae8e98aea82-dirty) MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 24000220 XER: 00000000 CFAR: c000000000545a58 IRQMASK: 0 ... NIP note_page+0x484/0x4c0 LR note_page+0x480/0x4c0 Call Trace: note_page+0x480/0x4c0 (unreliable) ptdump_pmd_entry+0xc8/0x100 walk_pgd_range+0x618/0xab0 walk_page_range_novma+0x74/0xc0 ptdump_walk_pgd+0x98/0x170 ptdump_check_wx+0x94/0x100 mark_rodata_ro+0x30/0x70 kernel_init+0x78/0x1a0 ret_from_kernel_thread+0x5c/0x64 The fix has two parts. Firstly the pages from zero up to the end of interrupts need to be marked read-only, so that they are left with R-X permissions. Secondly the mapping logic needs to be taught to ensure there is a page boundary at the end of the interrupt region, so that the permission change only applies to the interrupt text, and not the region following it. Fixes: c55d7b5e6426 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230110124753.1325426-2-mpe@ellerman.id.au
2023-01-31powerpc/64s/radix: Fix crash with unaligned relocated kernelMichael Ellerman
If a relocatable kernel is loaded at an address that is not 2MB aligned and told not to relocate to zero, the kernel can crash due to mark_rodata_ro() incorrectly changing some read-write data to read-only. Scenarios where the misalignment can occur are when the kernel is loaded by kdump or using the RELOCATABLE_TEST config option. Example crash with the kernel loaded at 5MB: Run /sbin/init as init process BUG: Unable to handle kernel data access on write at 0xc000000000452000 Faulting instruction address: 0xc0000000005b6730 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 #166 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries NIP: c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380 REGS: c000000004503250 TRAP: 0300 Not tainted (6.2.0-rc1-00011-g349188be4841) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44288480 XER: 00000000 CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0 ... NIP memset+0x68/0x104 LR zero_user_segments.constprop.0+0xa8/0xf0 Call Trace: ext4_mpage_readpages+0x7f8/0x830 ext4_readahead+0x48/0x60 read_pages+0xb8/0x380 page_cache_ra_unbounded+0x19c/0x250 filemap_fault+0x58c/0xae0 __do_fault+0x60/0x100 __handle_mm_fault+0x1230/0x1a40 handle_mm_fault+0x120/0x300 ___do_page_fault+0x20c/0xa80 do_page_fault+0x30/0xc0 data_access_common_virt+0x210/0x220 This happens because mark_rodata_ro() tries to change permissions on the range _stext..__end_rodata, but _stext sits in the middle of the 2MB page from 4MB to 6MB: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec) The logic that changes the permissions assumes the linear mapping was split correctly at boot, so it marks the entire 2MB page read-only. That leads to the write fault above. To fix it, the boot time mapping logic needs to consider that if the kernel is running at a non-zero address then _stext is a boundary where it must split the mapping. That leads to the mapping being split correctly, allowing the rodata permission change to take happen correctly, with no spillover: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec) radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec) If the kernel is loaded at a 2MB aligned address, the mapping continues to use 2MB pages as before: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages Fixes: c55d7b5e6426 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au
2023-01-12powerpc/64s/hash: Make stress_hpt_timer_fn() staticYang Yingliang
stress_hpt_timer_fn() is only used in hash_utils.c, make it static. Fixes: 6b34a099faa1 ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221228093603.3166599-1-yangyingliang@huawei.com
2022-12-19Merge tag 'powerpc-6.2-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add powerpc qspinlock implementation optimised for large system scalability and paravirt. See the merge message for more details - Enable objtool to be built on powerpc to generate mcount locations - Use a temporary mm for code patching with the Radix MMU, so the writable mapping is restricted to the patching CPU - Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI - Sanitise user registers on interrupt entry on 64-bit Book3S - Many other small features and fixes Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, and Wolfram Sang. * tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits) powerpc/code-patching: Fix oops with DEBUG_VM enabled powerpc/qspinlock: Fix 32-bit build powerpc/prom: Fix 32-bit build powerpc/rtas: mandate RTAS syscall filtering powerpc/rtas: define pr_fmt and convert printk call sites powerpc/rtas: clean up includes powerpc/rtas: clean up rtas_error_log_max initialization powerpc/pseries/eeh: use correct API for error log size powerpc/rtas: avoid scheduling in rtas_os_term() powerpc/rtas: avoid device tree lookups in rtas_os_term() powerpc/rtasd: use correct OF API for event scan rate powerpc/rtas: document rtas_call() powerpc/pseries: unregister VPA when hot unplugging a CPU powerpc/pseries: reset the RCU watchdogs after a LPM powerpc: Take in account addition CPU node when building kexec FDT powerpc: export the CPU node count powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state powerpc/dts/fsl: Fix pca954x i2c-mux node names cxl: Remove unnecessary cxl_pci_window_alignment() selftests/powerpc: Fix resource leaks ...
2022-12-02powerpc/code-patching: Remove protection against patching init addresses ↵Christophe Leroy
after init Once init section is freed, attempting to patch init code ends up in the weed. Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") protected patch_instruction() against that, but it is the responsibility of the caller to ensure that the patched memory is valid. All callers have now been verified and fixed so the check can be removed. This improves ftrace activation by about 2% on 8xx. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/504310828f473d424e2ed229eff57bf075f52796.1669969781.git.christophe.leroy@csgroup.eu
2022-12-02powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faultsNicholas Piggin
This option increases the number of hash misses by limiting the number of kernel HPT entries, by keeping a per-CPU record of the last kernel HPTEs installed, and removing that from the hash table on the next hash insertion. A timer round-robins CPUs removing remaining kernel HPTEs and clearing the TLB (in the case of bare metal) to increase and slightly randomise kernel fault activity. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add comment about NR_CPUS usage, fixup whitespace] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221024030150.852517-1-npiggin@gmail.com
2022-11-30powerpc/tlb: Add local flush for page given mm_struct and psizeBenjamin Gray
Adds a local TLB flush operation that works given an mm_struct, VA to flush, and page size representation. Most implementations mirror the surrounding code. The book3s/32/tlbflush.h implementation is left as a BUILD_BUG because it is more complicated and not required for anything as yet. This removes the need to create a vm_area_struct, which the temporary patching mm work does not need. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221109045112.187069-8-bgray@linux.ibm.com
2022-11-30powerpc/book3e: remove #include <generated/utsrelease.h>Thomas Weißschuh
Commit 7ad4bd887d27 ("powerpc/book3e: get rid of #include <generated/compile.h>") removed the usage of the define UTS_RELEASE but forgot to drop the include. utsrelease.h is potentially generated on each build. By removing the unused include we can get rid of some spurious recompilations. Fixes: 7ad4bd887d27 ("powerpc/book3e: get rid of #include <generated/compile.h>") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Fix typo in change log and add more explanation] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126051002.123199-2-linux@weissschuh.net
2022-11-30Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch to bring in some changes that are prerequisites for work in next.
2022-11-24powerpc: Remove find_current_mm_pte()Christophe Leroy
Last usage of find_current_mm_pte() was removed by commit 15759cb054ef ("powerpc/perf/callchain: Use __get_user_pages_fast in read_user_stack_slow") Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ec79f462a3bfa8365b7df505e574d5d85246bc68.1646818177.git.christophe.leroy@csgroup.eu
2022-11-08hugetlb: simplify hugetlb handling in follow_page_maskMike Kravetz
During discussions of this series [1], it was suggested that hugetlb handling code in follow_page_mask could be simplified. At the beginning of follow_page_mask, there currently is a call to follow_huge_addr which 'may' handle hugetlb pages. ia64 is the only architecture which provides a follow_huge_addr routine that does not return error. Instead, at each level of the page table a check is made for a hugetlb entry. If a hugetlb entry is found, a call to a routine associated with that entry is made. Currently, there are two checks for hugetlb entries at each page table level. The first check is of the form: if (p?d_huge()) page = follow_huge_p?d(); the second check is of the form: if (is_hugepd()) page = follow_huge_pd(). We can replace these checks, as well as the special handling routines such as follow_huge_p?d() and follow_huge_pd() with a single routine to handle hugetlb vmas. A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the beginning of follow_page_mask. hugetlb_follow_page_mask will use the existing routine huge_pte_offset to walk page tables looking for hugetlb entries. huge_pte_offset can be overwritten by architectures, and already handles special cases such as hugepd entries. [1] https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.wang@linux.alibaba.com/ [mike.kravetz@oracle.com: remove vma (pmd sharing) per Peter] Link: https://lkml.kernel.org/r/20221028181108.119432-1-mike.kravetz@oracle.com [mike.kravetz@oracle.com: remove left over hugetlb_vma_unlock_read()] Link: https://lkml.kernel.org/r/20221030225825.40872-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20220919021348.22151-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-18powerpc/64s: Fix hash__change_memory_range preemption warningNicholas Piggin
stop_machine_cpuslocked takes a mutex so it must be called in a preemptible context, so it can't simply be fixed by disabling preemption. This is not a bug, because CPU hotplug is locked, so this processor will call in to the stop machine function. So raw_smp_processor_id() could be used. This leaves a small chance that this thread will be migrated to another CPU, so the master work would be done by a CPU from a different context. Better for test coverage to make that a common case by just having the first CPU to call in become the master. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221013151647.1857994-2-npiggin@gmail.com
2022-10-18powerpc/64s: make linear_map_hash_lock a raw spinlockNicholas Piggin
This lock is taken while the raw kfence_freelist_lock is held, so it must also be a raw spinlock, as reported by lockdep when raw lock nesting checking is enabled. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221013230710.1987253-3-npiggin@gmail.com
2022-10-18powerpc/64s: make HPTE lock and native_tlbie_lock irq-safeNicholas Piggin
With kfence enabled, there are several cases where HPTE and TLBIE locks are called from softirq context, for example: WARNING: inconsistent lock state 6.0.0-11845-g0cbbc95b12ac #1 Tainted: G N -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes: c000000002734de8 (native_tlbie_lock){+.?.}-{2:2}, at: .native_hpte_updateboltedpp+0x1a4/0x600 {IN-SOFTIRQ-W} state was registered at: .lock_acquire+0x20c/0x520 ._raw_spin_lock+0x4c/0x70 .native_hpte_invalidate+0x62c/0x840 .hash__kernel_map_pages+0x450/0x640 .kfence_protect+0x58/0xc0 .kfence_guarded_free+0x374/0x5a0 .__slab_free+0x3d0/0x630 .put_cred_rcu+0xcc/0x120 .rcu_core+0x3c4/0x14e0 .__do_softirq+0x1dc/0x7dc .do_softirq_own_stack+0x40/0x60 Fix this by consistently disabling irqs while taking either of these locks. Don't just disable bh because several of the more common cases already disable irqs, so this just makes the locks always irq-safe. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221013230710.1987253-2-npiggin@gmail.com
2022-10-18powerpc/64s: Add lockdep for HPTE lockNicholas Piggin
Add lockdep annotation for the HPTE bit-spinlock. Modern systems don't take the tlbie lock, so this shows up some of the same lockdep warnings that were being reported by the ppc970. And they're not taken in exactly the same places so this is nice to have in its own right. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221013230710.1987253-1-npiggin@gmail.com
2022-10-10Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2022-10-09Merge tag 'powerpc-6.1-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. - Add support for syscall wrappers. - Add support for KFENCE on 64-bit. - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. - Support execute-only memory when using the Radix MMU (P9 or later). - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. - Updates to our linker script to move more data into read-only sections. - Allow the VDSO to be randomised on 32-bit. - Many other small features and fixes. Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas, Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool, Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng Yongjun. * tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) KVM: PPC: Book3S HV: Fix stack frame regs marker powerpc: Don't add __powerpc_ prefix to syscall entry points powerpc/64s/interrupt: Fix stack frame regs marker powerpc/64: Fix msr_check_and_set/clear MSR[EE] race powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN powerpc/pseries: Add firmware details to the hardware description powerpc/powernv: Add opal details to the hardware description powerpc: Add device-tree model to the hardware description powerpc/64: Add logical PVR to the hardware description powerpc: Add PVR & CPU name to hardware description powerpc: Add hardware description string powerpc/configs: Enable PPC_UV in powernv_defconfig powerpc/configs: Update config files for removed/renamed symbols powerpc/mm: Fix UBSAN warning reported on hugetlb powerpc/mm: Always update max/min_low_pfn in mem_topology_setup() powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range powerpc: Drops STABS_DEBUG from linker scripts powerpc/64s: Remove lost/old comment powerpc/64s: Remove old STAB comment powerpc: remove orphan systbl_chk.sh ...