git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-01-13	seqlock: add raw_seqcount_try_begin	Suren Baghdasaryan
	Add raw_seqcount_try_begin() to opens a read critical section of the given seqcount_t if the counter is even. This enables eliding the critical section entirely if the counter is odd, instead of doing the speculation knowing it will fail. Link: https://lkml.kernel.org/r/20241122174416.1367052-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	mm: add per-order mTHP swap-in fallback/fallback_charge counters	Wenchao Hao
	Currently, large folio swap-in is supported, but we lack a method to analyze their success ratio. Similar to anon_fault_fallback, we introduce per-order mTHP swpin_fallback and swpin_fallback_charge counters for calculating their success ratio. The new counters are located at: /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats/ swpin_fallback swpin_fallback_charge Link: https://lkml.kernel.org/r/20241202124730.2407037-1-haowenchao22@gmail.com Signed-off-by: Wenchao Hao <haowenchao22@gmail.com> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Lance Yang <ioworker0@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	x86: mm: free page table pages by RCU instead of semi RCU	Qi Zheng
	Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, the page table pages will be freed by semi RCU, that is: - batch table freeing: asynchronous free by RCU - single table freeing: IPI + synchronous free In this way, the page table can be lockless traversed by disabling IRQ in paths such as fast GUP. But this is not enough to free the empty PTE page table pages in paths other that munmap and exit_mmap path, because IPI cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}(). In preparation for supporting empty PTE page table pages reclaimation, let single table also be freed by RCU like batch table freeing. Then we can also use pte_offset_map() etc to prevent PTE page from being freed. Like pte_free_defer(), we can also safely use ptdesc->pt_rcu_head to free the page table pages: - The pt_rcu_head is unioned with pt_list and pmd_huge_pte. - For pt_list, it is used to manage the PGD page in x86. Fortunately tlb_remove_table() will not be used for free PGD pages, so it is safe to use pt_rcu_head. - For pmd_huge_pte, it is used for THPs, so it is safe. After applying this patch, if CONFIG_PT_RECLAIM is enabled, the function call of free_pte() is as follows: free_pte pte_free_tlb __pte_free_tlb ___pte_free_tlb paravirt_tlb_remove_table tlb_remove_table [!CONFIG_PARAVIRT, Xen PV, Hyper-V, KVM] [no-free-memory slowpath:] tlb_table_invalidate tlb_remove_table_one __tlb_remove_table_one [frees via RCU] [fastpath:] tlb_table_flush tlb_remove_table_free [frees via RCU] native_tlb_remove_table [CONFIG_PARAVIRT on native] tlb_remove_table [see above] Link: https://lkml.kernel.org/r/0287d442a973150b0e1019cc406e6322d148277a.1733305182.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)	Qi Zheng
	Now in order to pursue high performance, applications mostly use some high-performance user-mode memory allocators, such as jemalloc or tcmalloc. These memory allocators use madvise(MADV_DONTNEED or MADV_FREE) to release physical memory, but neither MADV_DONTNEED nor MADV_FREE will release page table memory, which may cause huge page table memory usage. The following are a memory usage snapshot of one process which actually happened on our server: VIRT: 55t RES: 590g VmPTE: 110g In this case, most of the page table entries are empty. For such a PTE page where all entries are empty, we can actually free it back to the system for others to use. As a first step, this commit aims to synchronously free the empty PTE pages in madvise(MADV_DONTNEED) case. We will detect and free empty PTE pages in zap_pte_range(), and will add zap_details.reclaim_pt to exclude cases other than madvise(MADV_DONTNEED). Once an empty PTE is detected, we first try to hold the pmd lock within the pte lock. If successful, we clear the pmd entry directly (fast path). Otherwise, we wait until the pte lock is released, then re-hold the pmd and pte locks and loop PTRS_PER_PTE times to check pte_none() to re-detect whether the PTE page is empty and free it (slow path). For other cases such as madvise(MADV_FREE), consider scanning and freeing empty PTE pages asynchronously in the future. The following code snippet can show the effect of optimization: mmap 50G while (1) { for (; i < 1024 * 25; i++) { touch 2M memory madvise MADV_DONTNEED 2M } } As we can see, the memory usage of VmPTE is reduced: before after VIRT 50.0 GB 50.0 GB RES 3.1 MB 3.1 MB VmPTE 102640 KB 240 KB [zhengqi.arch@bytedance.com: fix uninitialized symbol 'ptl'] Link: https://lkml.kernel.org/r/20241206112348.51570-1-zhengqi.arch@bytedance.com Link: https://lore.kernel.org/linux-mm/224e6a4e-43b5-4080-bdd8-b0a6fb2f0853@stanley.mountain/ Link: https://lkml.kernel.org/r/92aba2b319a734913f18ba41e7d86a265f0b84e2.1733305182.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	mm: zap_install_uffd_wp_if_needed: return whether uffd-wp pte has been ↵	Qi Zheng
	re-installed In some cases, we'll replace the none pte with an uffd-wp swap special pte marker when necessary. Let's expose this information to the caller through the return value, so that subsequent commits can use this information to detect whether the PTE page is empty. Link: https://lkml.kernel.org/r/9d4516554724eda87d6576468042a1741c475413.1733305182.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	mm/page_isolation: don't pass gfp flags to start_isolate_page_range()	David Hildenbrand
	The parameter is unused, so let's stop passing it. Link: https://lkml.kernel.org/r/20241203094732.200195-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	mm: abstract get_arg_page() stack expansion and mmap read lock	Lorenzo Stoakes
	Right now fs/exec.c invokes expand_downwards(), an otherwise internal implementation detail of the VMA logic in order to ensure that an arg page can be obtained by get_user_pages_remote(). In order to be able to move the stack expansion logic into mm/vma.c to make it available to userland testing we need to find an alternative approach here. We do so by providing the mmap_read_lock_maybe_expand() function which also helpfully documents what get_arg_page() is doing here and adds an additional check against VM_GROWSDOWN to make explicit that the stack expansion logic is only invoked when the VMA is indeed a downward-growing stack. This allows expand_downwards() to become a static function. Importantly, the VMA referenced by mmap_read_maybe_expand() must NOT be currently user-visible in any way, that is place within an rmap or VMA tree. It must be a newly allocated VMA. This is the case when exec invokes this function. Link: https://lkml.kernel.org/r/5295d1c70c58e6aa63d14be68d4e1de9fa1c8e6d.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	lazy tlb: fix hotplug exit race with MMU_LAZY_TLB_SHOOTDOWN	Nicholas Piggin
	CPU unplug first calls __cpu_disable(), and that's where powerpc calls cleanup_cpu_mmu_context(), which clears this CPU from mm_cpumask() of all mms in the system. However this CPU may still be using a lazy tlb mm, and its mm_cpumask bit will be cleared from it. The CPU does not switch away from the lazy tlb mm until arch_cpu_idle_dead() calls idle_task_exit(). If that user mm exits in this window, it will not be subject to the lazy tlb mm shootdown and may be freed while in use as a lazy mm by the CPU that is being unplugged. cleanup_cpu_mmu_context() could be moved later, but it looks better to move the lazy tlb mm switching earlier. The problem with doing the lazy mm switching in idle_task_exit() is explained in commit bf2c59fce4074 ("sched/core: Fix illegal RCU from offline CPUs"), which added a wart to switch away from the mm but leave it set in active_mm to be cleaned up later. So instead, switch away from the lazy tlb mm at sched_cpu_wait_empty(), which is the last hotplug state before teardown (CPUHP_AP_SCHED_WAIT_EMPTY). This CPU will never switch to a user thread from this point, so it has no chance to pick up a new lazy tlb mm. This removes the lazy tlb mm handling wart in CPU unplug. With this, idle_task_exit() is not needed anymore and can be cleaned up. This leaves the prototype alone, to be cleaned after this change. herton: took the suggestions from https://lore.kernel.org/all/87jzvyprsw.ffs@tglx/ and made adjustments on the initial patch proposed by Nicholas. Link: https://lkml.kernel.org/r/20230524060455.147699-1-npiggin@gmail.com Link: https://lore.kernel.org/all/20230525205253.E2FAEC433EF@smtp.kernel.org/ Link: https://lkml.kernel.org/r/20241104142318.3295663-1-herton@redhat.com Fixes: 2655421ae69f ("lazy tlb: shoot lazies, non-refcounting lazy tlb mm reference handling scheme") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	filemap: remove unused folio_add_wait_queue	Dr. David Alan Gilbert
	folio_add_wait_queue() has been unused since 2021's commit 850cba069c26 ("cachefiles: Delete the cachefiles driver pending rewrite") Remove it. Link: https://lkml.kernel.org/r/20241116151446.95555-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	list_lru: expand list_lru_add() docs with info about sublists	Alice Ryhl
	The documentation for list_lru_add() and list_lru_del() has not been updated since lru lists were originally introduced by commit a38e40824844 ("list: add a new LRU list type"). Back then, list_lru stored all of the items in a single list, but the implementation has since been expanded to use many sublists internally. Thus, update the docs to mention that the requirements about not using the item with several lists at the same time also applies not using different sublists. Also mention that list_lru items are reparented when the memcg is deleted as discussed on the LKML [1]. Also fix incorrect use of 'Return value:' which should be 'Return:'. Link: https://lore.kernel.org/all/Z0eXrllVhRI9Ag5b@dread.disaster.area/ [1] Link: https://lkml.kernel.org/r/20241129-list_lru_memcg_docs-v2-1-e285ff1c481b@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Acked-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	mm: factor out the order calculation into a new helper	Baolin Wang
	Patch series "Support large folios for tmpfs", v3. Traditionally, tmpfs only supported PMD-sized large folios. However nowadays with other file systems supporting any sized large folios, and extending anonymous to support mTHP, we should not restrict tmpfs to allocating only PMD-sized large folios, making it more special. Instead, we should allow tmpfs can allocate any sized large folios. Considering that tmpfs already has the 'huge=' option to control the PMD-sized large folios allocation, we can extend the 'huge=' option to allow any sized large folios. The semantics of the 'huge=' mount option are: huge=never: no any sized large folios huge=always: any sized large folios huge=within_size: like 'always' but respect the i_size huge=advise: like 'always' if requested with madvise() Note: for tmpfs mmap() faults, due to the lack of a write size hint, still allocate the PMD-sized large folios if huge=always/within_size/advise is set. Moreover, the 'deny' and 'force' testing options controlled by '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same semantics. The 'deny' can disable any sized large folios for tmpfs, while the 'force' can enable PMD sized large folios for tmpfs. This patch (of 6): Factor out the order calculation into a new helper, which can be reused by shmem in the following patch. Link: https://lkml.kernel.org/r/cover.1732779148.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/5505f9ea50942820c1924d1803bfdd3a524e54f6.1732779148.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	kasan: make kasan_record_aux_stack_noalloc() the default behaviour	Peter Zijlstra
	kasan_record_aux_stack_noalloc() was introduced to record a stack trace without allocating memory in the process. It has been added to callers which were invoked while a raw_spinlock_t was held. More and more callers were identified and changed over time. Is it a good thing to have this while functions try their best to do a locklessly setup? The only downside of having kasan_record_aux_stack() not allocate any memory is that we end up without a stacktrace if stackdepot runs out of memory and at the same stacktrace was not recorded before To quote Marco Elver from https://lore.kernel.org/all/CANpmjNPmQYJ7pv1N3cuU8cP18u7PP_uoZD8YxwZd4jtbof9nVQ@mail.gmail.com/ \| I'd be in favor, it simplifies things. And stack depot should be \| able to replenish its pool sufficiently in the "non-aux" cases \| i.e. regular allocations. Worst case we fail to record some \| aux stacks, but I think that's only really bad if there's a bug \| around one of these allocations. In general the probabilities \| of this being a regression are extremely small [...] Make the kasan_record_aux_stack_noalloc() behaviour default as kasan_record_aux_stack(). [bigeasy@linutronix.de: dressed the diff as patch] Link: https://lkml.kernel.org/r/20241122155451.Mb2pmeyJ@linutronix.de Fixes: 7cb3007ce2da ("kasan: generic: introduce kasan_record_aux_stack_noalloc()") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: syzbot+39f85d612b7c20d8db48@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67275485.050a0220.3c8d68.0a37.GAE@google.com Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Ben Segall <bsegall@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: <kasan-dev@googlegroups.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: syzkaller-bugs@googlegroups.com Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	mm: pgtable: make ptep_clear() non-atomic	Qi Zheng
	In the generic ptep_get_and_clear() implementation, it is just a simple combination of ptep_get() and pte_clear(). But for some architectures (such as x86 and arm64, etc), the hardware will modify the A/D bits of the page table entry, so the ptep_get_and_clear() needs to be overwritten and implemented as an atomic operation to avoid contention, which has a performance cost. The commit d283d422c6c4 ("x86: mm: add x86_64 support for page table check") adds the ptep_clear() on the x86, and makes it call ptep_get_and_clear() when CONFIG_PAGE_TABLE_CHECK is enabled. The page table check feature does not actually care about the A/D bits, so only ptep_get() + pte_clear() should be called. But considering that the page table check is a debug option, this should not have much of an impact. But then the commit de8c8e52836d ("mm: page_table_check: add hooks to public helpers") changed ptep_clear() to unconditionally call ptep_get_and_clear(), so that the CONFIG_PAGE_TABLE_CHECK check can be put into the page table check stubs (in include/linux/page_table_check.h). This also cause performance loss to the kernel without CONFIG_PAGE_TABLE_CHECK enabled, which doesn't make sense. Currently ptep_clear() is only used in debug code and in khugepaged collapse paths, which are fairly expensive. So the cost of an extra atomic RMW operation does not matter. But this may be used for other paths in the future. After all, for the present pte entry, we need to call ptep_clear() instead of pte_clear() to ensure that PAGE_TABLE_CHECK works properly. So to be more precise, just calling ptep_get() and pte_clear() in the ptep_clear(). Link: https://lkml.kernel.org/r/20241122073652.54030-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Tong Tiangen <tongtiangen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	mm: mmap_lock: optimize mmap_lock tracepoints	Shakeel Butt
	We are starting to deploy mmap_lock tracepoint monitoring across our fleet and the early results showed that these tracepoints are consuming significant amount of CPUs in kernfs_path_from_node when enabled. It seems like the kernel is trying to resolve the cgroup path in the fast path of the locking code path when the tracepoints are enabled. In addition for some application their metrics are regressing when monitoring is enabled. The cgroup path resolution can be slow and should not be done in the fast path. Most userspace tools, like bpftrace, provides functionality to get the cgroup path from cgroup id, so let's just trace the cgroup id and the users can use better tools to get the path in the slow path. Link: https://lkml.kernel.org/r/20241125171617.113892-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	mm: make alloc_pages_mpol() static	Matthew Wilcox (Oracle)
	All callers outside mempolicy.c now use folio_alloc_mpol() thanks to Kefeng's cleanups, so we can remove this as a visible symbol. And also remove the alloc_hooks for alloc_pages_mpol(), since all users in mempolicy.c are using the nonprof version. Link: https://lkml.kernel.org/r/20241125210149.2976098-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	mm: migrate: remove unused argument vma from migrate_misplaced_folio()	Donet Tom
	Commit ee86814b0562 ("mm/migrate: move NUMA hinting fault folio isolation + checks under PTL") removed the code that had used the vma argument in migrate_misplaced_folio. Since the vma argument was no longer used in migrate_misplaced_folio, this patch removes it. Link: https://lkml.kernel.org/r/20241126155655.466186-1-donettom@linux.ibm.com Signed-off-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-14	crypto: hisilicon/qm - support new function communication	Yang Shen
	On the HiSilicon accelerators drivers, the PF/VFs driver can send messages to the VFs/PF by writing hardware registers, and the VFs/PF driver receives messages from the PF/VFs by reading hardware registers. To support this feature, a new version id is added, different communication mechanism are used based on different version id. Signed-off-by: Yang Shen <shenyang39@huawei.com> Signed-off-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-13	net/mlx5: fs, add HWS packet reformat API function	Moshe Shemesh
	Add packet reformat alloc and dealloc API functions to provide packet reformat actions for steering rules. Add HWS action pools for each of the following packet reformat types: - decapl3: decapsulate l3 tunnel to l2 - encapl2: encapsulate l2 to tunnel l2 - encapl3: encapsulate l2 to tunnel l3 - insert_hdr: insert header In addition cache remove header action for remove vlan header as this is currently the only use case of remove header action in the driver. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250109160546.1733647-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13	net: remove init_dummy_netdev()	Jakub Kicinski
	init_dummy_netdev() can initialize statically declared or embedded net_devices. Such netdevs did not come from alloc_netdev_mqs(). After recent work by Breno, there are the only two cases where we have do that. Switch those cases to alloc_netdev_mqs() and delete init_dummy_netdev(). Dealing with static netdevs is not worth the maintenance burden. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250113003456.3904110-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13	of: Warn when of_property_read_bool() is used on non-boolean properties	Rob Herring (Arm)
	The use of of_property_read_bool() for non-boolean properties is deprecated. The primary use of it was to test property presence, but that has been replaced in favor of of_property_present(). With those uses now fixed, add a warning to discourage new ones. Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20250109-dt-type-warnings-v1-2-0150e32e716c@kernel.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-01-13	device property: Split property reading bool and presence test ops	Rob Herring (Arm)
	The fwnode/device property API currently implement (fwnode\|device)_property_read_bool() with (fwnode\|device)_property_present(). That does not allow having different behavior depending on the backend. Specifically, the usage of (fwnode\|device)_property_read_bool() on non-boolean properties is deprecated on DT. In order to add a warning on this deprecated use, these 2 APIs need separate ops for the backend. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250109-dt-type-warnings-v1-1-0150e32e716c@kernel.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-01-13	SUNRPC: only put task on cl_tasks list after the RPC call slot is reserved.	Dai Ngo
	Under heavy write load, we've seen the cl_tasks list grows to millions of entries. Even though the list is extremely long, the system still runs fine until the user wants to get the information of all active RPC tasks by doing: When this happens, tasks_start acquires the cl_lock to walk the cl_tasks list, returning one entry at a time to the caller. The cl_lock is held until all tasks on this list have been processed. While the cl_lock is held, completed RPC tasks have to spin wait in rpc_task_release_client for the cl_lock. If there are millions of entries in the cl_tasks list it will take a long time before tasks_stop is called and the cl_lock is released. The spin wait tasks can use up all the available CPUs in the system, preventing other jobs to run, this causes the system to temporarily lock up. This patch fixes this problem by delaying inserting the RPC task on the cl_tasks list until the RPC call slot is reserved. This limits the length of the cl_tasks to the number of call slots available in the system. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-13	Merge tag 'mm-hotfixes-stable-2025-01-13-00-03' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "18 hotfixes. 11 are cc:stable. 13 are MM and 5 are non-MM. All patches are singletons - please see the relevant changelogs for details" * tag 'mm-hotfixes-stable-2025-01-13-00-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: fs/proc: fix softlockup in __read_vmcore (part 2) mm: fix assertion in folio_end_read() mm: vmscan : pgdemote vmstat is not getting updated when MGLRU is enabled. vmstat: disable vmstat_work on vmstat_cpu_down_prep() zram: fix potential UAF of zram table selftests/mm: set allocated memory to non-zero content in cow test mm: clear uffd-wp PTE/PMD state on mremap() module: fix writing of livepatch relocations in ROX text mm: zswap: properly synchronize freeing resources during CPU hotunplug Revert "mm: zswap: fix race between [de]compression and CPU hotunplug" hugetlb: fix NULL pointer dereference in trace_hugetlbfs_alloc_inode mm: fix div by zero in bdi_ratio_from_pages x86/execmem: fix ROX cache usage in Xen PV guests filemap: avoid truncating 64-bit offset to 32 bits tools: fix atomic_set() definition to set the value correctly mm/mempolicy: count MPOL_WEIGHTED_INTERLEAVE to "interleave_hit" scripts/decode_stacktrace.sh: fix decoding of lines with an additional info mm/kmemleak: fix percpu memory leak detection failure
2025-01-13	wifi: mac80211: Fix common size calculation for ML element	Ilan Peer
	When the ML type is EPCS the control bitmap is reserved, the length is always 7 and is captured by the 1st octet after the control. Fixes: 0f48b8b88aa9 ("wifi: ieee80211: add definitions for multi-link element") Signed-off-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250102161730.5790376754a7.I381208cbb72b1be2a88239509294099e9337e254@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13	wifi: mac80211: Support dynamic link addition and removal	Ilan Peer
	Add support for adding and removing station links: - Adding links is done asynchronously, i.e., first an ML reconfiguration action frame is sent to the AP requesting to add links, and only when the AP replies, links which were added successfully by the AP are added locally. - Removing links is done synchronously, i.e., the links are removed before sending the ML reconfiguration action frame to the AP (to avoid using this links after the AP MLD removed them but before the station got the ML reconfiguration response). In case the AP replies with a status indicating that a link removal was not successful, disconnect (as this should not happen an is an indication that something might be wrong on the AP MLD). Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250102161730.ec0492a8dd21.I2869686642bbc0f86c40f284ebf7e6f644b551ab@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13	wifi: ieee80211: Add some missing MLO related definitions	Ilan Peer
	As a preparation to support ML reconfiguration request and response, add additional ML reconfiguration definitions required to support the flow. See Section 9.4.2.321.4 in Draft P802.11be_D6.0. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250102161730.4970ca10ebfd.Ibe7f6108cd0e04b8c739a8e35a4f485f664a17e6@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13	wifi: mac80211: skip all known membership selectors	Benjamin Berg
	The GLK and EPD Selectors are also not rates, so add a new macro for the minimum value of a selector and test against that instead of the entire list. Also fix the typo in the EPD selector define. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250101070249.2c19a2dc53db.If187b7d93d8b43a6c70e422c837b7636538fb358@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13	wifi: cfg80211: check extended MLD capa/ops in assoc	Johannes Berg
	Check that additionally extended MLD capa/ops for the MLD is consistent, i.e. the same value is reported by all affiliated APs/links. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250101070249.e29f42c7ae21.Ib2cdce608321ad154e4b13103cc315c3e3cb6b2b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13	Merge tag 'nvme-6.14-2025-01-12' of git://git.infradead.org/nvme into ↵	Jens Axboe
	for-6.14/block Pull NVMe updates from Keith: "nvme updates for Linux 6.14 - Target support for PCI-Endpoint transport (Damien) - TCP IO queue spreading fixes (Sagi, Chaitanya) - Target handling for "limited retry" flags (Guixen) - Poll type fix (Yongsoo) - Xarray storage error handling (Keisuke) - Host memory buffer free size fix on error (Francis)" * tag 'nvme-6.14-2025-01-12' of git://git.infradead.org/nvme: (25 commits) nvme-pci: use correct size to free the hmb buffer nvme: Add error path for xa_store in nvme_init_effects nvme-pci: fix comment typo Documentation: Document the NVMe PCI endpoint target driver nvmet: New NVMe PCI endpoint function target driver nvmet: Implement arbitration feature support nvmet: Implement interrupt config feature support nvmet: Implement interrupt coalescing feature support nvmet: Implement host identifier set feature support nvmet: Introduce get/set_feature controller operations nvmet: Do not require SGL for PCI target controller commands nvmet: Add support for I/O queue management admin commands nvmet: Introduce nvmet_sq_create() and nvmet_cq_create() nvmet: Introduce nvmet_req_transfer_len() nvmet: Improve nvmet_alloc_ctrl() interface and implementation nvme: Add PCI transport type nvmet: Add drvdata field to struct nvmet_ctrl nvmet: Introduce nvmet_get_cmd_effects_admin() nvmet: Export nvmet_update_cc() and nvmet_cc_xxx() helpers nvmet: Add vendor_id and subsys_vendor_id subsystem attributes ...
2025-01-13	rbtree: add rb_find_add_cached() to rbtree.h	Roger L. Beckermeyer III
	Adds rb_find_add_cached() as a helper function for use with red-black trees. Used in btrfs to reduce boilerplate code. And since it's a new helper, the cmp() function will require both parameter to be const rb_node pointers. Suggested-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Roger L. Beckermeyer III <beckerlee3@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13	Merge 6.13-rc7 into tty-next	Greg Kroah-Hartman
	We need the serial driver fixes in here to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-13	Merge 6.13-rc7 into driver-core-next	Greg Kroah-Hartman
	We need the debugfs / driver-core fixes in here as well for testing and to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-13	Merge 6.13-rc4 into char-misc-next	Greg Kroah-Hartman
	We need the IIO fixes in here as well, and it resolves a merge conflict in: drivers/iio/adc/ti-ads1119.c Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-13	usb: phy: Remove API devm_usb_put_phy()	Zijun Hu
	Static devm_usb_phy_match() is only called by API devm_usb_put_phy(), and the API has no caller now. Remove the API and the static function. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20250112-remove_api-v1-1-49cc8f792ac9@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-13	Merge 6.13-rc7 into usb-next	Greg Kroah-Hartman
	We need the USB fixes in here as well for testing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-12	delayacct: add delay min to record delay peak	Wang Yaxin
	Delay accounting can now calculate the average delay of processes, detect the overall system load, and also record the 'delay max' to identify potential abnormal delays. However, 'delay min' can help us identify another useful delay peak. By comparing the difference between 'delay max' and 'delay min', we can understand the optimization space for latency, providing a reference for the optimization of latency performance. Use case ========= bash-4.4# ./getdelays -d -t 242 print delayacct stats ON TGID 242 CPU count real total virtual total delay total delay average delay max delay min 39 156000000 156576579 2111069 0.054ms 0.212296ms 0.031307ms IO count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms SWAP count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms RECLAIM count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms THRASHING count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms COMPACT count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms WPCOPY count delay total delay average delay max delay min 156 11215873 0.072ms 0.207403ms 0.033913ms IRQ count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms Link: https://lkml.kernel.org/r/20241220173105906EOdsPhzjMLYNJJBqgz1ga@zte.com.cn Co-developed-by: Wang Yong <wang.yong12@zte.com.cn> Signed-off-by: Wang Yong <wang.yong12@zte.com.cn> Co-developed-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Co-developed-by: Kun Jiang <jiang.kun2@zte.com.cn> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Peilin He <he.peilin@zte.com.cn> Cc: tuqiang <tu.qiang35@zte.com.cn> Cc: ye xingchen <ye.xingchen@zte.com.cn> Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12	kasan: fix typo in kasan_poison_new_object documentation	Dominik Karol Piątkowski
	Fix presumed copy-paste typo of kasan_poison_new_object documentation referring to kasan_unpoison_new_object. No functional changes. Link: https://lkml.kernel.org/r/20241220181205.9663-1-dominik.karol.piatkowski@protonmail.com Fixes: 1ce9a0523938 ("kasan: rename and document kasan_(un)poison_object_data") ta") Signed-off-by: Dominik Karol Piątkowski <dominik.karol.piatkowski@protonmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12	include: update references to include/asm-<arch>	Geert Uytterhoeven
	"include/asm-<arch>" was replaced by "arch/<arch>/include/asm" a long time ago. Link: https://lkml.kernel.org/r/541258219b0441fa1da890e2f8458a7ac18c2ef9.1733404444.git.geert+renesas@glider.be Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Andy Whitcroft <apw@canonical.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12	delayacct: add delay max to record delay peak	Wang Yaxin
	Introduce the use cases of delay max, which can help quickly detect potential abnormal delays in the system and record the types and specific details of delay spikes. Problem ======== Delay accounting can track the average delay of processes to show system workload. However, when a process experiences a significant delay, maybe a delay spike, which adversely affects performance, getdelays can only display the average system delay over a period of time. Yet, average delay is unhelpful for diagnosing delay peak. It is not even possible to determine which type of delay has spiked, as this information might be masked by the average delay. Solution ========= the 'delay max' can display delay peak since the system's startup, which can record potential abnormal delays over time, including the type of delay and the maximum delay. This is helpful for quickly identifying crash caused by delay. Use case ========= bash# ./getdelays -d -p 244 print delayacct stats ON PID 244 CPU count real total virtual total delay total delay average delay max 68 192000000 213676651 705643 0.010ms 0.306381ms IO count delay total delay average delay max 0 0 0.000ms 0.000000ms SWAP count delay total delay average delay max 0 0 0.000ms 0.000000ms RECLAIM count delay total delay average delay max 0 0 0.000ms 0.000000ms THRASHING count delay total delay average delay max 0 0 0.000ms 0.000000ms COMPACT count delay total delay average delay max 0 0 0.000ms 0.000000ms WPCOPY count delay total delay average delay max 235 15648284 0.067ms 0.263842ms IRQ count delay total delay average delay max 0 0 0.000ms 0.000000ms [wang.yaxin@zte.com.cn: update docs and fix some spelling errors] Link: https://lkml.kernel.org/r/20241213192700771XKZ8H30OtHSeziGqRVMs0@zte.com.cn Link: https://lkml.kernel.org/r/20241203164848805CS62CQPQWG9GLdQj2_BxS@zte.com.cn Co-developed-by: Wang Yong <wang.yong12@zte.com.cn> Signed-off-by: Wang Yong <wang.yong12@zte.com.cn> Co-developed-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Peilin He <he.peilin@zte.com.cn> Cc: tuqiang <tu.qiang35@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: ye xingchen <ye.xingchen@zte.com.cn> Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12	lib min_heap: add brief introduction to Min Heap API	Kuan-Wei Chiu
	A short description of the Min Heap API is added to the min_heap.h, explaining its purpose for managing min-heaps and emphasizing the use of macro wrappers instead of direct function calls. For more details, users are directed to the documentation at Documentation/core-api/min_heap.rst. Link: https://lkml.kernel.org/r/20241129181222.646855-4-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12	lib min_heap: improve type safety in min_heap macros by using container_of	Kuan-Wei Chiu
	Patch series "lib min_heap: Improve min_heap safety, testing, and documentation". Improve the min heap implementation by enhancing type safety with container_of, reducing the attack vector by replacing test function calls with inline variants, and adding a brief API introduction in min_heap.h. It also includes author information in Documentation/core-api/min_heap.rst. This patch (of 4): The current implementation of min_heap macros uses explicit casting to min_heap_char , which prevents the compiler from detecting incorrect pointer types. This can lead to errors if non-min_heap pointers are passed inadvertently. To enhance safety, replace all explicit casts to min_heap_char with the use of container_of(&(_heap)->nr, min_heap_char, nr). This approach ensures that the _heap parameter is indeed a min_heap_char-compatible structure, allowing the compiler to catch improper usages. Link: https://lkml.kernel.org/r/20241129181222.646855-1-visitorckw@gmail.com Link: https://lore.kernel.org/lkml/CAMuHMdVO5DPuD9HYWBFqKDHphx7+0BEhreUxtVC40A=8p6VAhQ@mail.gmail.com Link: https://lkml.kernel.org/r/20241129181222.646855-2-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12	mm: clear uffd-wp PTE/PMD state on mremap()	Ryan Roberts
	When mremap()ing a memory region previously registered with userfaultfd as write-protected but without UFFD_FEATURE_EVENT_REMAP, an inconsistency in flag clearing leads to a mismatch between the vma flags (which have uffd-wp cleared) and the pte/pmd flags (which do not have uffd-wp cleared). This mismatch causes a subsequent mprotect(PROT_WRITE) to trigger a warning in page_table_check_pte_flags() due to setting the pte to writable while uffd-wp is still set. Fix this by always explicitly clearing the uffd-wp pte/pmd flags on any such mremap() so that the values are consistent with the existing clearing of VM_UFFD_WP. Be careful to clear the logical flag regardless of its physical form; a PTE bit, a swap PTE bit, or a PTE marker. Cover PTE, huge PMD and hugetlb paths. Link: https://lkml.kernel.org/r/20250107144755.1871363-2-ryan.roberts@arm.com Co-developed-by: Mikołaj Lenczewski <miko.lenczewski@arm.com> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Closes: https://lore.kernel.org/linux-mm/810b44a8-d2ae-4107-b665-5a42eae2d948@arm.com/ Fixes: 63b2d4174c4a ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl") Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12	module: fix writing of livepatch relocations in ROX text	Petr Pavlu
	A livepatch module can contain a special relocation section .klp.rela.<objname>.<secname> to apply its relocations at the appropriate time and to additionally access local and unexported symbols. When <objname> points to another module, such relocations are processed separately from the regular module relocation process. For instance, only when the target <objname> actually becomes loaded. With CONFIG_STRICT_MODULE_RWX, when the livepatch core decides to apply these relocations, their processing results in the following bug: [ 25.827238] BUG: unable to handle page fault for address: 00000000000012ba [ 25.827819] #PF: supervisor read access in kernel mode [ 25.828153] #PF: error_code(0x0000) - not-present page [ 25.828588] PGD 0 P4D 0 [ 25.829063] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI [ 25.829742] CPU: 2 UID: 0 PID: 452 Comm: insmod Tainted: G O K 6.13.0-rc4-00078-g059dd502b263 #7820 [ 25.830417] Tainted: [O]=OOT_MODULE, [K]=LIVEPATCH [ 25.830768] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 [ 25.831651] RIP: 0010:memcmp+0x24/0x60 [ 25.832190] Code: [...] [ 25.833378] RSP: 0018:ffffa40b403a3ae8 EFLAGS: 00000246 [ 25.833637] RAX: 0000000000000000 RBX: ffff93bc81d8e700 RCX: ffffffffc0202000 [ 25.834072] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 00000000000012ba [ 25.834548] RBP: ffffa40b403a3b68 R08: ffffa40b403a3b30 R09: 0000004a00000002 [ 25.835088] R10: ffffffffffffd222 R11: f000000000000000 R12: 0000000000000000 [ 25.835666] R13: ffffffffc02032ba R14: ffffffffc007d1e0 R15: 0000000000000004 [ 25.836139] FS: 00007fecef8c3080(0000) GS:ffff93bc8f900000(0000) knlGS:0000000000000000 [ 25.836519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 25.836977] CR2: 00000000000012ba CR3: 0000000002f24000 CR4: 00000000000006f0 [ 25.837442] Call Trace: [ 25.838297] <TASK> [ 25.841083] __write_relocate_add.constprop.0+0xc7/0x2b0 [ 25.841701] apply_relocate_add+0x75/0xa0 [ 25.841973] klp_write_section_relocs+0x10e/0x140 [ 25.842304] klp_write_object_relocs+0x70/0xa0 [ 25.842682] klp_init_object_loaded+0x21/0xf0 [ 25.842972] klp_enable_patch+0x43d/0x900 [ 25.843572] do_one_initcall+0x4c/0x220 [ 25.844186] do_init_module+0x6a/0x260 [ 25.844423] init_module_from_file+0x9c/0xe0 [ 25.844702] idempotent_init_module+0x172/0x270 [ 25.845008] __x64_sys_finit_module+0x69/0xc0 [ 25.845253] do_syscall_64+0x9e/0x1a0 [ 25.845498] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 25.846056] RIP: 0033:0x7fecef9eb25d [ 25.846444] Code: [...] [ 25.847563] RSP: 002b:00007ffd0c5d6de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 25.848082] RAX: ffffffffffffffda RBX: 000055b03f05e470 RCX: 00007fecef9eb25d [ 25.848456] RDX: 0000000000000000 RSI: 000055b001e74e52 RDI: 0000000000000003 [ 25.848969] RBP: 00007ffd0c5d6ea0 R08: 0000000000000040 R09: 0000000000004100 [ 25.849411] R10: 00007fecefac7b20 R11: 0000000000000246 R12: 000055b001e74e52 [ 25.849905] R13: 0000000000000000 R14: 000055b03f05e440 R15: 0000000000000000 [ 25.850336] </TASK> [ 25.850553] Modules linked in: deku(OK+) uinput [ 25.851408] CR2: 00000000000012ba [ 25.852085] ---[ end trace 0000000000000000 ]--- The problem is that the .klp.rela.<objname>.<secname> relocations are processed after the module was already formed and mod->rw_copy was reset. However, the code in __write_relocate_add() calls module_writable_address() which translates the target address 'loc' still to 'loc + (mem->rw_copy - mem->base)', with mem->rw_copy now being 0. Fix the problem by returning directly 'loc' in module_writable_address() when the module is already formed. Function __write_relocate_add() knows to use text_poke() in such a case. Link: https://lkml.kernel.org/r/20250107153507.14733-1-petr.pavlu@suse.com Fixes: 0c133b1e78cd ("module: prepare to handle ROX allocations for text") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reported-by: Marek Maslanka <mmaslanka@google.com> Closes: https://lore.kernel.org/linux-modules/CAGcaFA2hdThQV6mjD_1_U+GNHThv84+MQvMWLgEuX+LVbAyDxg@mail.gmail.com/ Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13	Merge tag 'drm-msm-next-2025-01-07' of gitlab.freedesktop.org:drm/msm into ↵	Dave Airlie
	drm-next Updates for v6.14 MDSS: - properly described UBWC registers - added SM6150 (aka QCS615) support MDP4: - several small fixes DPU: - added SM6150 (aka QCS615) support - enabled wide planes if virtual planes are enabled (by using two SSPPs for a single plane) - fixed modes filtering for platforms w/o 3DMux - fixed DSPP DSPP_2 / _3 links on several platforms - corrected DSPP definitions on SDM670 - added CWB hardware blocks support - added VBIF to DPU snapshots - dropped struct dpu_rm_requirements DP: - reworked DP audio support DSI: - added SM6150 (aka QCS615) support GPU: - Print GMU core fw version - GMU bandwidth voting for a740 and a750 - Expose uche trap base via uapi - UAPI error reporting Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGsutUu4ff6OpXNXxqf1xaV0rV6oV23VXNRiF0_OEfe72Q@mail.gmail.com
2025-01-12	bitops: add generic parity calculation for u8	Wolfram Sang
	There are multiple open coded implementations for getting the parity of a byte in the kernel, even using different approaches. Take the pretty efficient version from SPD5118 driver and make it generally available by putting it into the bitops header. As long as there is just one parity calculation helper, the creation of a distinct 'parity.h' header was discarded. Also, the usage of hweight8() for architectures having a popcnt instruction is postponed until a use case within hot paths is desired. The motivation for this patch is the frequent use of odd parity in the I3C specification and to simplify drivers there. Changes compared to the original SPD5118 version are the addition of kernel documentation, switching the return type from bool to int, and renaming the argument of the function. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Tested-by: Kuan-Wei Chiu <visitorckw@gmail.com> Link: https://lore.kernel.org/r/20250107090204.6593-2-wsa+renesas@sang-engineering.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-01-12	i3c: fix kdoc parameter description for module_i3c_i2c_driver()	Wolfram Sang
	A typo mentioned I3C when it should have been I2C. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20241219220338.10315-1-wsa+renesas@sang-engineering.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-01-12	Merge tag 'iio-for-6.14a' of ↵	Greg Kroah-Hartman
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next Pull IIO updaate from Jonathan: IIO: 1st set of new device support, features and cleanup for 6.14 Fairly quiet cycle. Usual mix of new drivers, device support in existing drivers, features and more general rework and cleanup. There are a few late breaking or long standing but complex fixes in here as well. There is one expected merge conflict due to an upstream fix touching neighboring code in ti-ads1119. The trivial resolution is the right one with the result ending up as: struct { s16 sample; aligned_s64 timestamp; } scan; New device support ================== adi,ad4000 - Add support for many Pulsar ADC devices: AD7685, AD7686, AD7687, AD7688, AD7690, AD7691, AD7693, AD7942, AD7946, AD7980, AD7982, AD7983, AD7984, AD7988-1 and AD7988-5 ADCs. Generally similar to the AD4000 series but with lower sampling rates and no configuration registers. Includes addition of timestamp channels. adi,adis16480 - Add support for ADIS16486, ADIS16487 and ADIS16489 IMUs. Required a few tweaks to existing driver and addition of tables. kionix,kx022a - Add support for KX134ACR-LBZ accelerometer that is similar to the KX132ACR-LBZ but with a wider (+-64G) sensor range. - Add support for KX134-1211 accelerometer that is similar to the KX132-1211 but with a wider (+-16G) sensor range. nxp,fxls8962af - Add support for fxls8974cf and fxls8967af accelerometers, Both are compatible with fxls8962af but with different device IDs which are used in presence checks. renesas,rzg2l - Add support for Renesas RZ/GS3 SoC ADCs (various driver refactors precede this to allow for chip differences). rohm,bd79704 - New driver for this 6 channel DAC. st,mpu6050 - Support he IAM20380 which is effectively a cut down IAM20608 IMU with only a gyroscope (no accelerometer). st,stm-timmer-trigger - Add support for ADC trigger use case for the STM32MP25 SOC. Do not support the counter functionality in this driver as that is handled by the counter subsystem. ti,opt4060 - New driver for this RGBW color sensor. Driver drop =========== rohm,bu20008 - Drop as decision was made to not mass produce this light sensor after Matti had done all the work to get a driver upstream. Features ======== adi,ad_sigma_delta library + ad7124 - Allow for GPIO to check interrupt status, enabling this device on more platforms that don't obey prior (non general) assumptions on how the interrupt chips work. - Allow variation in reset sequence length allowing chip specific optimizations rather than always using worst case. adi,ad7124 - Add temperature channel support. adi,ad7173 - Add support calibration modes for this family of ADCs. adi,adxl345 - Binding update to allow specification of which interrupt line is connected (or none). - Support interrupts and FIFO based data capture. bosch,bme680 - Add regulators support. Note this required a new binding doc rather than use of trivial-devices - Runtime PM support. microchip,pac1921 - Add ACPI support including _DSM for shunt value and label. renesas,rzg2l - Enable runtime autosuspend. - Add suspend and resume support. tyhx,hx9023s - Add loading of a firmware file used to set defaults for some configuration registers. vishay,veml6030 - Support triggered buffers allowing efficient data capture at higher speeds. - Add regmap cache to reduce access to device. Cleanup and minor fixes ======================= cross-tree - Another batch of conversions to devm_regulator_get_enable_read_voltage() helper and related conversions to full devm that this enables. - Various patches using guard() to allow early returns and simpler code flow. - Various conversions from s64 timestamp __aligned(8) to aligned_s64 type. Includes a few cleanups where this unsigned and it should have been signed. - Fix up some missing types for drive-open-drain in dt-binding docs. core - Add missing documentation for iio_dmaengine_buffer_setup_ext() - Add check that all buffers passed to iio_read_channel_ext_info() and iio_read_channel_label() are page sized and page aligned. Done this way because the callbacks are almost always only used to fill sysfs attributes. The check covers the tiny percentage of cases where use is made of this data in a consumer driver. - Mark scan_timestamp memory of struct iio_dev private ensuring no drivers change the value which belongs to the IIO core. documentation - Various missing ABI docs added. - ABI docs made to use Y consistently as the wildcard for channel number. - Combine duplicate in_currentY_raw entries in ABI docs. iio-mux - Fix alignment of buffers passed to iio_channel_read_ext_info(). adi,ad_sigma_delta library - Respect keep_cs_asserted flag in read path. - Close a race condition around irq enabling and disabling. - Use explicit unsigned int in place of unsigned. adi,ad6695 - Move dt-binding header under adc sub-directory and fix include path in dt example. adi,ad7124 - Check number of channels in DT doesn't exceed what the driver can handle. - Check input specified in DT are possible. - Improved error reporting during probe. adi,ad7173 - Drop unused structure element. adi,ad7293 - Ensure power is turned on before resetting. adi,adxl345 - Some documentation simplification and parameter renames. - Add a function than unifies handling of power up and power down. - Add defines to have a complete set of registers defined. - Add missing \n to end of error messages. amlogic,meson_saradc - Simplify handling of the REG11 register access. awinic,aw96104 - Constify iio_info structure. bosch,bmp085 - Add to dt-binding to indicate devices support SPI. bosch,bmp280 - Use sizeof() to replace a somewhat magic 2. - Rename sleep related variables so the unit is included and use fsleep() to replace usleep_range() calls. bosch,bno055 - Constify struct bin_attribute capella,cm3232 - Reset device before checking hardware ID inline with suggested flow from datasheet. diolan,dln2 - Simplify zeroing of structure used to gather up data by just clearing the whole thing before writing rather than trying to clear out he padding after write. freescale,vf610 - Use devm_ and dev_error_probe() to simplify code and allow dropping of explicit remove() callback. invensense,timestamp library - Use a cast to remove possibility of integer overflow. kionix,kx022a - Increase reset delay a little. maxim,max1363 - Use a buffer of sufficient size in iio_priv() rather than allocating variable sized buffer at use time. microchip,mcp4725 - Replace of_property_read_bool() with of_property_present() for detecting presence of regulator which is obviously not a bool. nxp,fxls8962af - Add wakeup-source property to the dt binding to allow these sensors to wake the system up from suspend. - Enable finer grained build when not all bus types need to be supported. renesas,rzg2l - Use dev_err_probe(), improving handling of probe errors and simplifying code. - Convert to devm_ based cleanup. - Remove unnecessary runtime PM complexity as clocks are managed through PM domains. - Switch pm_ptr() removing need for __maybe_unused markings. - use read_poll_timeout() to replace open coded equivalent. samsung, ssp_sensors - Simplify code by always providing timestamp whether or not it is enabled. st,lsm6dsx - Avoid need to include linux/i3c/master by using i3cdev_to_dev() to get to the contained struct device. st,stm32-timer-trigger - Check for clk_enable() fails. vishay,veml6030 - Use new gts-helper functions and fix the _scale attribute to take into account changes in gain and integration time. Various other typo fixes in variable names + documentation and help text. A few whitespace cleanup patches. * tag 'iio-for-6.14a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (142 commits) iio: iio-mux: kzalloc instead of devm_kzalloc to ensure page alignment iio: adc: ad7625: Add ending newlines to error messages iio: accel: adxl345: complete the list of defines iio: accel: adxl345: add FIFO with watermark events iio: accel: adxl345: initialize FIFO delay value for SPI iio: accel: adxl345: introduce interrupt handling iio: light: veml3235: fix scale to conform to ABI iio: gts-helper: add helpers to ease searches of gain_sel and new_gain iio: light: veml3235: extend regmap to add cache iio: light: veml3235: fix code style dt-bindings: iio: accel: adxl345: add interrupt-names dt-bindings: iio: accel: adxl345: make interrupts not a required property dt-bindings: iio: imu: bmi323: add boolean type for drive-open-drain dt-bindings: iio: imu: bmi270: add boolean type for drive-open-drain dt-bindings: iio: imu: bmi160: add boolean type for drive-open-drain iio: adc: meson: simplify MESON_SAR_ADC_REG11 register access iio: adc: meson: use tabs instead of spaces for some REG11 bit fields iio: adc: meson: fix voltage reference selection field name typo iio: adc: rockchip: correct alignment of timestamp iio: imu: inv_icm42600: switch timestamp type from int64_t __aligned(8) to aligned_s64 ...
2025-01-12	Merge tag 'coresight-next-v6.14' of ↵	Greg Kroah-Hartman
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next Pull coresight updates from Suzuki: coresight: Updates for Linux v6.14 Coresight self-hosted tracing subsystem updates for v6.14 includes: - Support for static traceid allocation for devices - Support for impdef, static trace filtering in Qualcomm replicators - Miscellaneous fixes Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> * tag 'coresight-next-v6.14' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux: coresight-tpda: Optimize the function of reading element size coresight: Add support for trace filtering by source coresight: Add a helper to check if a device is source dt-bindings: arm: qcom,coresight-static-replicator: Add property for source filtering coresight: Fix dsb_mode_store() unsigned val is never less than zero coresight: dummy: Add static trace id support for dummy source coresight: Add support to get static id for system trace sources dt-bindings: arm: Add arm,static-trace-id for coresight dummy source coresight: Drop atomics in connection refcounts Coresight: Narrow down the matching range of tpdm
2025-01-12	net/mlx5: Add nic_cap_reg and vhca_icm_ctrl registers	Akiva Goldberger
	Add nic_cap_reg and vhca_icm_ctrl registers interfaces for exposing ICM consumption. Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250109204231.1809851-5-tariqt@nvidia.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-12	net/mlx5: SHAMPO: Introduce new SHAMPO specific HCA caps	Saeed Mahameed
	Read and cache SHAMPO specific caps for header data split capabilities. Will be used in downstream patch. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250109204231.1809851-4-tariqt@nvidia.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>