summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-24crash: add generic infrastructure for crash hotplug supportEric DeVolder
To support crash hotplug, a mechanism is needed to update the crash elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/ onlining). The crash elfcorehdr describes the CPUs and memory to be written into the vmcore. To track CPU changes, callbacks are registered with the cpuhp mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The crash hotplug elfcorehdr update has no explicit ordering requirement (relative to other cpuhp states), so meets the criteria for utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic state and avoids the need to introduce a new state for crash hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE group, just prior to the STARTING group, which is very close to the CPU starting up in a plug/online situation, or stopping in a unplug/ offline situation. This minimizes the window of time during an actual plug/online or unplug/offline situation in which the elfcorehdr would be inaccurate. Note that for a CPU being unplugged or offlined, the CPU will still be present in the list of CPUs generated by crash_prepare_elf64_headers(). However, there is no need to explicitly omit the CPU, see justification in 'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'. To track memory changes, a notifier is registered to capture the memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier(). The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event() which performs needed tasks and then dispatches the event to the architecture specific arch_crash_handle_hotplug_event() to update the elfcorehdr with the current state of CPUs and memory. During the process, the kexec_lock is held. Link: https://lkml.kernel.org/r/20230814214446.6659-3-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24crash: move a few code bits to setup support of crash hotplugEric DeVolder
Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28. Once the kdump service is loaded, if changes to CPUs or memory occur, either by hot un/plug or off/onlining, the crash elfcorehdr must also be updated. The elfcorehdr describes to kdump the CPUs and memory in the system, and any inaccuracies can result in a vmcore with missing CPU context or memory regions. The current solution utilizes udev to initiate an unload-then-reload of the kdump image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by the userspace kexec utility. In the original post I outlined the significant performance problems related to offloading this activity to userspace. This patchset introduces a generic crash handler that registers with the CPU and memory notifiers. Upon CPU or memory changes, from either hot un/plug or off/onlining, this generic handler is invoked and performs important housekeeping, for example obtaining the appropriate lock, and then invokes an architecture specific handler to do the appropriate elfcorehdr update. Note the description in patch 'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that enables further optimizations related to CPU plug/unplug/online/offline performance of elfcorehdr updates. In the case of x86_64, the arch specific handler generates a new elfcorehdr, and overwrites the old one in memory; thus no involvement with userspace needed. To realize the benefits/test this patchset, one must make a couple of minor changes to userspace: - Prevent udev from updating kdump crash kernel on hot un/plug changes. Add the following as the first lines to the RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules: # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" With this changeset applied, the two rules evaluate to false for CPU and memory change events and thus skip the userspace unload-then-reload of kdump. - Change to the kexec_file_load for loading the kdump kernel: Eg. on RHEL: in /usr/bin/kdumpctl, change to: standard_kexec_args="-p -d -s" which adds the -s to select kexec_file_load() syscall. This kernel patchset also supports kexec_load() with a modified kexec userspace utility. A working changeset to the kexec userspace utility is posted to the kexec-tools mailing list here: http://lists.infradead.org/pipermail/kexec/2023-May/027049.html To use the kexec-tools patch, apply, build and install kexec-tools, then change the kdumpctl's standard_kexec_args to replace the -s with --hotplug. The removal of -s reverts to the kexec_load syscall and the addition of --hotplug invokes the changes put forth in the kexec-tools patch. This patch (of 8): The crash hotplug support leans on the work for the kexec_file_load() syscall. To also support the kexec_load() syscall, a few bits of code need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are moved out of kexec_file.c and into a common location crash_core.c. In addition, struct crash_mem and crash_notes were moved to new locales so that PROC_KCORE, which sets CRASH_CORE alone, builds correctly. No functionality change intended. Link: https://lkml.kernel.org/r/20230814214446.6659-1-eric.devolder@oracle.com Link: https://lkml.kernel.org/r/20230814214446.6659-2-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24maple_tree: shrink struct maple_treeMateusz Guzik
Pack the members of struct maple_tree to avoid holes on 64-bit. The size shrinks from 24 to 16 bytes which will save eight bytes in every structure which embeds it. [willy@infradead.org: changelog alterations] Link: https://lkml.kernel.org/r/20230821225145.2169848-1-mjguzik@gmail.com Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24maple_tree: clean up mas_wr_append()Liam R. Howlett
Avoid setting the variables until necessary, and actually use the variables where applicable. Introducing a variable for the slots array avoids spanning multiple lines. Add the missing argument to the documentation. Use the node type when setting the metadata instead of blindly assuming the type. Finally, add a trace point to the function for successful store. Link: https://lkml.kernel.org/r/20230819004356.1454718-3-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24secretmem: convert page_is_secretmem() to folio_is_secretmem()Matthew Wilcox (Oracle)
The only caller already has a folio, so use it to save calling compound_head() in PageLRU() and remove a use of page->mapping. Link: https://lkml.kernel.org/r/20230822202335.179081-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24nios2: fix flush_dcache_page() for usage from irq contextHelge Deller
Since at least kernel 6.1, flush_dcache_page() is called with IRQs disabled, e.g. from aio_complete(). But the current implementation for flush_dcache_page() on NIOS2 unintentionally re-enables IRQs, which may lead to deadlocks. Fix it by using xa_lock_irqsave() and xa_unlock_irqrestore() for the flush_dcache_mmap_*lock() macros instead. Link: https://lkml.kernel.org/r/ZOTF5WWURQNH9+iw@p100 Signed-off-by: Helge Deller <deller@gmx.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24hugetlb: add documentation for vma_kernel_pagesize()Matthew Wilcox (Oracle)
This is an exported symbol, so it should have kernel-doc. Update it to mention folios, and point out that they might be larger than the supported page size for this VMA. Link: https://lkml.kernel.org/r/20230822172459.4190699-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: add orphaned kernel-doc to the rst files.Matthew Wilcox (Oracle)
There are many files in mm/ that contain kernel-doc which is not currently published on kernel.org. Some of it is easily categorisable, but most of it is going into the miscellaneous documentation section to be organised later. Some files aren't ready to be included; they contain documentation with build errors. Or they're nommu.c which duplicates documentation from "real" MMU systems. Those files are noted with a # mark (although really anything which isn't a recognised directive would do to prevent inclusion) Link: https://lkml.kernel.org/r/20230818200630.2719595-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: fix clean_record_shared_mapping_range kernel-docMatthew Wilcox (Oracle)
Turn the a), b) into an unordered ReST list and remove the unnecessary 'Note:' prefix. Link: https://lkml.kernel.org/r/20230818200630.2719595-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: fix get_mctgt_type() kernel-docMatthew Wilcox (Oracle)
Convert the return values to an ReST list and tidy up the wording while I'm touching it. [akpm@linux-foundation.org: changes suggested by Randy] [willy@infradead.org: another change suggested by Randy] Link: https://lkml.kernel.org/r/ZOUZtZizeQG7PcsM@casper.infradead.org Link: https://lkml.kernel.org/r/20230818200630.2719595-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: fix kernel-doc warning from tlb_flush_rmaps()Matthew Wilcox (Oracle)
Patch series "Improve mm documentation". If you build with W=1, kernel-doc complains about tlb_flush_rmaps(). Then I ran scripts/find-unused-docs.sh against mm/ and found a large number of files which weren't included in the ReST documentation. I fixed up a couple of them, and added all those without erros to the rst files. There's a lot more work to do to organise all of this, but at least now if we have documentation that refers to these functions, we'll get a nice link to them. This patch (of 4): The vma parameter wasn't described. Link: https://lkml.kernel.org/r/20230818200630.2719595-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230818200630.2719595-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: remove enum page_entry_sizeMatthew Wilcox (Oracle)
Remove the unnecessary encoding of page order into an enum and pass the page order directly. That lets us get rid of pe_order(). The switch constructs have to be changed to if/else constructs to prevent GCC from warning on builds with 3-level page tables where PMD_ORDER and PUD_ORDER have the same value. If you are looking at this commit because your driver stopped compiling, look at the previous commit as well and audit your driver to be sure it doesn't depend on mmap_lock being held in its ->huge_fault method. [willy@infradead.org: use "order %u" to match the (non dev_t) style] Link: https://lkml.kernel.org/r/ZOUYekbtTv+n8hYf@casper.infradead.org Link: https://lkml.kernel.org/r/20230818202335.2739663-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: allow ->huge_fault() to be called without the mmap_lock heldMatthew Wilcox (Oracle)
Remove the checks for the VMA lock being held, allowing the page fault path to call into the filesystem instead of retrying with the mmap_lock held. This will improve scalability for DAX page faults. Also update the documentation to match (and fix some other changes that have happened recently). Link: https://lkml.kernel.org/r/20230818202335.2739663-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: move PMD_ORDER to pgtable.hMatthew Wilcox (Oracle)
Patch series "Change calling convention for ->huge_fault", v2. There are two unrelated changes to the calling convention for ->huge_fault. I've bundled them together to help people notice the change. The first is to improve scalability of DAX page faults by allowing them to be handled under the VMA lock. The second is to remove enum page_entry_size since it's really unnecessary. The changelogs and documentation updates hopefully work to that end. This patch (of 3): Allow this to be used in generic code. Also add PUD_ORDER. Link: https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230818202335.2739663-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: remove checks for pte_indexMatthew Wilcox (Oracle)
Since pte_index is always defined, we don't need to check whether it's defined or not. Delete the slow version that doesn't depend on it and remove the #define since nobody needs to test for it. Link: https://lkml.kernel.org/r/20230819031837.3160096-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Christian Dietrich <stettberger@dokucode.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24memcg: remove duplication detection for mem_cgroup_uncharge_swapLu Jialin
__mem_cgroup_uncharge_swap is only called in mem_cgroup_uncharge_swap, if mem cgroup is disabled, __mem_cgroup_uncharge_swap cannot be called. Therefore, there is no need to judge whether mem_cgroup is disabled or not. Link: https://lkml.kernel.org/r/20230819081302.1217098-1-lujialin4@huawei.com Signed-off-by: Lu Jialin <lujialin4@huawei.com> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm/huge_memory: work on folio->swap instead of page->private when splitting ↵David Hildenbrand
folio Let's work on folio->swap instead. While at it, use folio_test_anon() and folio_test_swapcache() -- the original folio remains valid even after splitting (but is then an order-0 folio). We can probably convert a lot more to folios in that code, let's focus on folio->swap handling only for now. Link: https://lkml.kernel.org/r/20230821160849.531668-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Chris Li <chrisl@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm/swap: inline folio_set_swap_entry() and folio_swap_entry()David Hildenbrand
Let's simply work on the folio directly and remove the helpers. Link: https://lkml.kernel.org/r/20230821160849.531668-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Chris Li <chrisl@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm/swap: use dedicated entry for swap in folioMatthew Wilcox
Let's stop working on the private field and use an explicit swap field. We have to move the swp_entry_t typedef. Link: https://lkml.kernel.org/r/20230821160849.531668-3-david@redhat.com Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Chris Li <chrisl@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm/swap: stop using page->private on tail pages for THP_SWAPDavid Hildenbrand
Patch series "mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups". This series stops using page->private on tail pages for THP_SWAP, replaces folio->private by folio->swap for swapcache folios, and starts using "new_folio" for tail pages that we are splitting to remove the usage of page->private for swapcache handling completely. This patch (of 4): Let's stop using page->private on tail pages, making it possible to just unconditionally reuse that field in the tail pages of large folios. The remaining usage of the private field for THP_SWAP is in the THP splitting code (mm/huge_memory.c), that we'll handle separately later. Update the THP_SWAP documentation and sanity checks in mm_types.h and __split_huge_page_tail(). [david@redhat.com: stop using page->private on tail pages for THP_SWAP] Link: https://lkml.kernel.org/r/6f0a82a3-6948-20d9-580b-be1dbf415701@redhat.com Link: https://lkml.kernel.org/r/20230821160849.531668-1-david@redhat.com Link: https://lkml.kernel.org/r/20230821160849.531668-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24selftests/mm: fix WARNING comparing pointer to 0Anh Tuan Phan
Remove comparing pointer to 0 to avoid this warning from coccinelle: ./tools/testing/selftests/mm/map_populate.c:80:16-17: WARNING comparing pointer to 0, suggest !E ./tools/testing/selftests/mm/map_populate.c:80:16-17: WARNING comparing pointer to 0 Link: https://lkml.kernel.org/r/20230817160033.90079-1-tuananhlfc@gmail.com Signed-off-by: Anh Tuan Phan <tuananhlfc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24selftests: cgroup: fix test_kmem_memcg_deletion kernel mem checkLucas Karpinski
Currently, not all kernel memory usage is being accounted for. This commit switches to using the kernel entry within memory.stat which already includes kernel_stack, pagetables, and slab. The kernel entry also includes vmalloc and other additional kernel memory use cases which were missing. Link: https://lkml.kernel.org/r/bvrhe2tpsts2azaroq4ubp2slawmop6orndsswrewuscw3ugvk@kmemmrttsnc7 Signed-off-by: Lucas Karpinski <lkarpins@redhat.com> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: userfaultfd: remove stale comment about core dump lockingJann Horn
Since commit 7f3bfab52cab ("mm/gup: take mmap_lock in get_dump_page()"), which landed in v5.10, core dumping doesn't enter fault handling without holding the mmap_lock anymore. Remove the stale parts of the comments, but leave the behavior as-is - letting core dumping block on userfault handling would be a bad idea and could lead to deadlocks if the dumping process was handling its own userfaults. Link: https://lkml.kernel.org/r/20230815212216.264445-1-jannh@google.com Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24arm64: mm: use ptep_clear() instead of pte_clear() in clear_flush()Qi Zheng
In clear_flush(), the original pte may be a present entry, so we should use ptep_clear() to let page_table_check track the pte clearing operation, otherwise it may cause false positive in subsequent set_pte_at(). Link: https://lkml.kernel.org/r/20230810093241.1181142-1-qi.zheng@linux.dev Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: call update_mmu_cache_range() in more page fault handling pathsMatthew Wilcox (Oracle)
Pass the vm_fault to the architecture to help it make smarter decisions about which PTEs to insert into the TLB. Link: https://lkml.kernel.org/r/20230802151406.3735276-39-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24filemap: batch PTE mappingsYin Fengwei
Call set_pte_range() once per contiguous range of the folio instead of once per page. This batches the updates to mm counters and the rmap. With a will-it-scale.page_fault3 like app (change file write fault testing to read fault testing. Trying to upstream it to will-it-scale at [1]) got 15% performance gain on a 48C/96T Cascade Lake test box with 96 processes running against xfs. Perf data collected before/after the change: 18.73%--page_add_file_rmap | --11.60%--__mod_lruvec_page_state | |--7.40%--__mod_memcg_lruvec_state | | | --5.58%--cgroup_rstat_updated | --2.53%--__mod_lruvec_state | --1.48%--__mod_node_page_state 9.93%--page_add_file_rmap_range | --2.67%--__mod_lruvec_page_state | |--1.95%--__mod_memcg_lruvec_state | | | --1.57%--cgroup_rstat_updated | --0.61%--__mod_lruvec_state | --0.54%--__mod_node_page_state The running time of __mode_lruvec_page_state() is reduced about 9%. [1]: https://github.com/antonblanchard/will-it-scale/pull/37 Link: https://lkml.kernel.org/r/20230802151406.3735276-38-willy@infradead.org Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: convert do_set_pte() to set_pte_range()Yin Fengwei
set_pte_range() allows to setup page table entries for a specific range. It takes advantage of batched rmap update for large folio. It now takes care of calling update_mmu_cache_range(). Link: https://lkml.kernel.org/r/20230802151406.3735276-37-willy@infradead.org Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24rmap: add folio_add_file_rmap_range()Yin Fengwei
folio_add_file_rmap_range() allows to add pte mapping to a specific range of file folio. Comparing to page_add_file_rmap(), it batched updates __lruvec_stat for large folio. Link: https://lkml.kernel.org/r/20230802151406.3735276-36-willy@infradead.org Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24filemap: add filemap_map_folio_range()Yin Fengwei
filemap_map_folio_range() maps partial/full folio. Comparing to original filemap_map_pages(), it updates refcount once per folio instead of per page and gets minor performance improvement for large folio. With a will-it-scale.page_fault3 like app (change file write fault testing to read fault testing. Trying to upstream it to will-it-scale at [1]), got 2% performance gain on a 48C/96T Cascade Lake test box with 96 processes running against xfs. [1]: https://github.com/antonblanchard/will-it-scale/pull/37 Link: https://lkml.kernel.org/r/20230802151406.3735276-35-willy@infradead.org Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: use flush_icache_pages() in do_set_pmd()Matthew Wilcox (Oracle)
Push the iteration over each page down to the architectures (many can flush the entire THP without iteration). Link: https://lkml.kernel.org/r/20230802151406.3735276-34-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: tidy up set_ptes definitionMatthew Wilcox (Oracle)
Now that all architectures are converted, we can remove the PFN_PTE_SHIFT ifdef and we can define set_pte_at() unconditionally. Link: https://lkml.kernel.org/r/20230802151406.3735276-33-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: rationalise flush_icache_pages() and flush_icache_page()Matthew Wilcox (Oracle)
Move the default (no-op) implementation of flush_icache_pages() to <linux/cacheflush.h> from <asm-generic/cacheflush.h>. Remove the flush_icache_page() wrapper from each architecture into <linux/cacheflush.h>. Link: https://lkml.kernel.org/r/20230802151406.3735276-32-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mm: remove page_mapping_file()Matthew Wilcox (Oracle)
This function has no more users. Link: https://lkml.kernel.org/r/20230802151406.3735276-31-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24xtensa: implement the new page table range APIMatthew Wilcox (Oracle)
Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio() and flush_icache_pages(). Link: https://lkml.kernel.org/r/20230802151406.3735276-30-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24x86: implement the new page table range APIMatthew Wilcox (Oracle)
Add PFN_PTE_SHIFT and a noop update_mmu_cache_range(). Link: https://lkml.kernel.org/r/20230802151406.3735276-29-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24um: implement the new page table range APIMatthew Wilcox (Oracle)
Add PFN_PTE_SHIFT and update_mmu_cache_range(). Link: https://lkml.kernel.org/r/20230802151406.3735276-28-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24sparc64: implement the new page table range APIMatthew Wilcox (Oracle)
Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and flush_icache_pages(). Convert the PG_dcache_dirty flag from being per-page to per-folio. Link: https://lkml.kernel.org/r/20230802151406.3735276-27-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24sparc32: implement the new page table range APIMatthew Wilcox (Oracle)
Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio() and flush_icache_pages(). Link: https://lkml.kernel.org/r/20230802151406.3735276-26-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24sh: implement the new page table range APIMatthew Wilcox (Oracle)
Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio() and flush_icache_pages(). Change the PG_dcache_clean flag from being per-page to per-folio. Flush the entire folio containing the pages in flush_icache_pages() for ease of implementation. Link: https://lkml.kernel.org/r/20230802151406.3735276-25-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24s390: implement the new page table range APIMatthew Wilcox (Oracle)
Add set_ptes() and update_mmu_cache_range(). Link: https://lkml.kernel.org/r/20230802151406.3735276-24-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24riscv: implement the new page table range APIMatthew Wilcox (Oracle)
Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio(). Change the PG_dcache_clean flag from being per-page to per-folio. Link: https://lkml.kernel.org/r/20230802151406.3735276-23-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24powerpc: implement the new page table range APIMatthew Wilcox (Oracle)
Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio(). Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to per-folio. [willy@infradead.org: re-export flush_dcache_icache_folio()] Link: https://lkml.kernel.org/r/ZMx1daYwvD9EM7Cv@casper.infradead.org Link: https://lkml.kernel.org/r/20230802151406.3735276-22-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24parisc: implement the new page table range APIMatthew Wilcox (Oracle)
Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and flush_icache_pages(). Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to per-folio. Link: https://lkml.kernel.org/r/20230802151406.3735276-21-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24openrisc: implement the new page table range APIMatthew Wilcox (Oracle)
Add PFN_PTE_SHIFT, update_mmu_cache_range() and flush_dcache_folio(). Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to per-folio. Link: https://lkml.kernel.org/r/20230802151406.3735276-20-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24nios2: implement the new page table range APIMatthew Wilcox (Oracle)
Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio(). Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to per-folio. Link: https://lkml.kernel.org/r/20230802151406.3735276-19-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24mips: implement the new page table range APIMatthew Wilcox (Oracle)
Rename _PFN_SHIFT to PFN_PTE_SHIFT. Convert a few places to call set_pte() instead of set_pte_at(). Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio(). Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to per-folio. Link: https://lkml.kernel.org/r/20230802151406.3735276-18-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24microblaze: implement the new page table range APIMatthew Wilcox (Oracle)
Rename PFN_SHIFT_OFFSET to PTE_PFN_SHIFT. Change the calling convention for set_pte() to be the same as other architectures. Add update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio(). [arnd@arndb.de: mark flush_dcache_folio() inline] Link: https://lkml.kernel.org/r/20230810141947.1236730-9-arnd@kernel.org Link: https://lkml.kernel.org/r/20230802151406.3735276-17-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24m68k: implement the new page table range APIMatthew Wilcox (Oracle)
Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio(). Link: https://lkml.kernel.org/r/20230802151406.3735276-16-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24loongarch: implement the new page table range APIMatthew Wilcox (Oracle)
Add update_mmu_cache_range() and change _PFN_SHIFT to PFN_PTE_SHIFT. It would probably be more efficient to implement __update_tlb() by flushing the entire folio instead of calling __update_tlb() N times, but I'll leave that for someone who understands the architecture better. Link: https://lkml.kernel.org/r/20230802151406.3735276-15-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24ia64: implement the new page table range APIMatthew Wilcox (Oracle)
Add PFN_PTE_SHIFT, update_mmu_cache_range() and flush_dcache_folio(). Change the PG_arch_1 (aka PG_dcache_clean) flag from being per-page to per-folio, which makes arch_dma_mark_clean() and mark_clean() a little more exciting. [willy@infradead.org: fix folio_size() handling] Link: https://lkml.kernel.org/r/ZNPlOCe8F+nrzPxr@casper.infradead.org Link: https://lkml.kernel.org/r/20230802151406.3735276-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>