summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-01-05x86/crash: remove the unused image parameter from prepare_elf_headers()Yuntao Wang
Patch series "crash: Some cleanups and fixes", v2. This patchset includes two cleanups and one fix. This patch (of 3): The image parameter is no longer in use, remove it. Also, tidy up the code formatting. Link: https://lkml.kernel.org/r/20240102144905.110047-1-ytcoode@gmail.com Link: https://lkml.kernel.org/r/20240102144905.110047-2-ytcoode@gmail.com Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05selftests/mm: add separate UFFDIO_MOVE test for PMD splittingSuren Baghdasaryan
Add a test for UFFDIO_MOVE ioctl operating on a hugepage which has to be split because destination is marked with MADV_NOHUGEPAGE. With this we cover all 3 cases: normal page move, hugepage move, hugepage splitting before move. Link: https://lkml.kernel.org/r/20231230025636.2477429-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Nicolas Geoffray <ngeoffray@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05selftests/mm: skip test if application doesn't has root privilegesMuhammad Usama Anjum
The test depends on writing to nr_hugepages which isn't possible without root privileges. So skip the test in this case. Link: https://lkml.kernel.org/r/20240101083614.1076768-2-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05selftests/mm: conform test to TAP format outputMuhammad Usama Anjum
Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Link: https://lkml.kernel.org/r/20240101083614.1076768-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05selftests: mm: hugepage-mmap: conform to TAP format outputMuhammad Usama Anjum
Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Link: https://lkml.kernel.org/r/20240102053223.2099572-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05selftests/mm: gup_test: conform test to TAP format outputMuhammad Usama Anjum
Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Link: https://lkml.kernel.org/r/20240102053807.2114200-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm/selftests: hugepage-mremap: conform test to TAP format outputMuhammad Usama Anjum
Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Link: https://lkml.kernel.org/r/20240102081919.2325570-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCINGLi Zhijian
Demotion can work well without CONFIG_NUMA_BALANCING. But the commit 23e9f0138963 ("mm/vmstat: move pgdemote_* to per-node stats") wrongly hid it behind CONFIG_NUMA_BALANCING. Fix it by moving them out of CONFIG_NUMA_BALANCING. Link: https://lkml.kernel.org/r/20231229022651.3229174-1-lizhijian@fujitsu.com Fixes: 23e9f0138963 ("mm/vmstat: move pgdemote_* to per-node stats") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is ↵Barry Song
too large This is the case the "compressed" data is larger than the original data, it is better to return -ENOSPC which can help zswap record a poor compr rather than an invalid request. Then we get more friendly counting for reject_compress_poor in debugfs. bool zswap_store(struct folio *folio) { ... ret = zpool_malloc(zpool, dlen, gfp, &handle); if (ret == -ENOSPC) { zswap_reject_compress_poor++; goto put_dstmem; } if (ret) { zswap_reject_alloc_fail++; goto put_dstmem; } ... } Also, zbud_alloc() and z3fold_alloc() are returning ENOSPC in the same case, eg static int z3fold_alloc(struct z3fold_pool *pool, size_t size, gfp_t gfp, unsigned long *handle) { ... if (!size || (gfp & __GFP_HIGHMEM)) return -EINVAL; if (size > PAGE_SIZE) return -ENOSPC; ... } Link: https://lkml.kernel.org/r/20231228061802.25280-1-v-songbaohua@oppo.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Chris Li <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm/memcontrol: remove __mod_lruvec_page_state()Matthew Wilcox (Oracle)
There are no more callers of __mod_lruvec_page_state(), so convert the implementation to __lruvec_stat_mod_folio(), removing two calls to compound_head() (one explicit, one hidden inside page_memcg()). Link: https://lkml.kernel.org/r/20231228085748.1083901-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm/khugepaged: use a folio more in collapse_file()Matthew Wilcox (Oracle)
This function is not yet fully converted to the folio API, but this removes a few uses of old APIs. Link: https://lkml.kernel.org/r/20231228085748.1083901-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05slub: use a folio in __kmalloc_large_nodeMatthew Wilcox (Oracle)
Mirror the code in free_large_kmalloc() and alloc_pages_node() and use a folio directly. Avoid the use of folio_alloc() as that will set up an rmappable folio which we do not want here. Link: https://lkml.kernel.org/r/20231228085748.1083901-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05slub: use folio APIs in free_large_kmalloc()Matthew Wilcox (Oracle)
Save a few calls to compound_head() by using the folio APIs directly. Link: https://lkml.kernel.org/r/20231228085748.1083901-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05slub: use alloc_pages_node() in alloc_slab_page()Matthew Wilcox (Oracle)
For no apparent reason, we were open-coding alloc_pages_node() in this function. Link: https://lkml.kernel.org/r/20231228085748.1083901-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm: remove inc/dec lruvec page state functionsMatthew Wilcox (Oracle)
Patch series "Remove some lruvec page accounting functions", v2. Some functions are now unused; remove them. Make __mod_lruvec_page_state() unused and then remove it. This patch (of 6): All callers of these have been converted to their folio equivalents. Link: https://lkml.kernel.org/r/20231228085748.1083901-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231228085748.1083901-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm: ratelimit stat flush from workingset shrinkerShakeel Butt
One of our workloads (Postgres 14 + sysbench OLTP) regressed on newer upstream kernel and on further investigation, it seems like the cause is the always synchronous rstat flush in the count_shadow_nodes() added by the commit f82e6bf9bb9b ("mm: memcg: use rstat for non-hierarchical stats"). On further inspection it seems like we don't really need accurate stats in this function as it was already approximating the amount of appropriate shadow entries to keep for maintaining the refault information. Since there is already 2 sec periodic rstat flush, we don't need exact stats here. Let's ratelimit the rstat flush in this code path. Link: https://lkml.kernel.org/r/20231228073055.4046430-1-shakeelb@google.com Fixes: f82e6bf9bb9b ("mm: memcg: use rstat for non-hierarchical stats") Signed-off-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05kasan: stop leaking stack trace handlesAndrey Konovalov
Commit 773688a6cb24 ("kasan: use stack_depot_put for Generic mode") added support for stack trace eviction for Generic KASAN. However, that commit didn't evict stack traces when the object is not put into quarantine. As a result, some stack traces are never evicted from the stack depot. In addition, with the "kasan: save mempool stack traces" series, the free stack traces for mempool objects are also not properly evicted from the stack depot. Fix both issues by: 1. Evicting all stack traces when an object if freed if it was not put into quarantine; 2. Always evicting an existing free stack trace when a new one is saved. Also do a few related clean-ups: - Do not zero out free track when initializing/invalidating free meta: set a value in shadow memory instead; - Rename KASAN_SLAB_FREETRACK to KASAN_SLAB_FREE_META; - Drop the kasan_init_cache_meta function as it's not used by KASAN; - Add comments for the kasan_alloc_meta and kasan_free_meta structs. [akpm@linux-foundation.org: make release_free_meta() and release_alloc_meta() static] Link: https://lkml.kernel.org/r/20231226225121.235865-1-andrey.konovalov@linux.dev Fixes: 773688a6cb24 ("kasan: use stack_depot_put for Generic mode") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGEKinsey Ho
Improve code readability by removing CONFIG_TRANSPARENT_HUGEPAGE, since the compiler should be able to automatically optimize out the code that promotes THPs during page table walks. No functional changes. Link: https://lkml.kernel.org/r/20231227141205.2200125-6-kinseyho@google.com Signed-off-by: Kinsey Ho <kinseyho@google.com> Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Donet Tom <donettom@linux.vnet.ibm.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm/mglru: add dummy pmd_dirty()Kinsey Ho
Add dummy pmd_dirty() for architectures that don't provide it. This is similar to commit 6617da8fb565 ("mm: add dummy pmd_young() for architectures not having it"). Link: https://lkml.kernel.org/r/20231227141205.2200125-5-kinseyho@google.com Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312210606.1Etqz3M4-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202312210042.xQEiqlEh-lkp@intel.com/ Signed-off-by: Kinsey Ho <kinseyho@google.com> Suggested-by: Yu Zhao <yuzhao@google.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Donet Tom <donettom@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm/mglru: remove CONFIG_MEMCGKinsey Ho
Remove CONFIG_MEMCG in a refactoring to improve code readability at the cost of a few bytes in struct lru_gen_folio per node when CONFIG_MEMCG=n. Link: https://lkml.kernel.org/r/20231227141205.2200125-4-kinseyho@google.com Signed-off-by: Kinsey Ho <kinseyho@google.com> Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Donet Tom <donettom@linux.vnet.ibm.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm/mglru: add CONFIG_LRU_GEN_WALKS_MMUKinsey Ho
Add CONFIG_LRU_GEN_WALKS_MMU such that if disabled, the code that walks page tables to promote pages into the youngest generation will not be built. Also improves code readability by adding two helper functions get_mm_state() and get_next_mm(). Link: https://lkml.kernel.org/r/20231227141205.2200125-3-kinseyho@google.com Signed-off-by: Kinsey Ho <kinseyho@google.com> Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Donet Tom <donettom@linux.vnet.ibm.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm/mglru: add CONFIG_ARCH_HAS_HW_PTE_YOUNGKinsey Ho
Patch series "mm/mglru: Kconfig cleanup", v4. This series is the result of the following discussion: https://lore.kernel.org/47066176-bd93-55dd-c2fa-002299d9e034@linux.ibm.com/ It mainly avoids building the code that walks page tables on CPUs that use it, i.e., those don't support hardware accessed bit. Specifically, it introduces a new Kconfig to guard some of functions added by commit bd74fdaea146 ("mm: multi-gen LRU: support page table walks") on CPUs like POWER9, on which the series was tested. This patch (of 5): Some architectures are able to set the accessed bit in PTEs when PTEs are used as part of linear address translations. Add CONFIG_ARCH_HAS_HW_PTE_YOUNG for such architectures to be able to override arch_has_hw_pte_young(). Link: https://lkml.kernel.org/r/20231227141205.2200125-1-kinseyho@google.com Link: https://lkml.kernel.org/r/20231227141205.2200125-2-kinseyho@google.com Signed-off-by: Kinsey Ho <kinseyho@google.com> Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Donet Tom <donettom@linux.vnet.ibm.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm/rmap: silence VM_WARN_ON_FOLIO() in __folio_rmap_sanity_checks()David Hildenbrand
Unfortunately, vm_insert_page() and friends and up passing driver-allocated folios into folio_add_file_rmap_pte() using insert_page_into_pte_locked(). While these driver-allocated folios can be compound pages (large folios), they are not proper "rmappable" folios. In these VM_MIXEDMAP VMAs, there isn't really the concept of a reverse mapping, so long-term, we should clean that up and not call into rmap code. For the time being, document how we can end up in rmap code with large folios that are not marked rmappable. Link: https://lkml.kernel.org/r/793c5cee-d5fc-4eb1-86a2-39e05686233d@redhat.com Fixes: 68f0320824fa ("mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()") Reported-by: syzbot+50ef73537bbc393a25bb@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000014174060e09316e@google.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05userfaultfd: fix move_pages_pte() splitting folio under RCU read lockSuren Baghdasaryan
While testing the split PMD path with lockdep enabled I've got an "Invalid wait context" error caused by split_huge_page_to_list() trying to lock anon_vma->rwsem while inside RCU read section. The issues is due to move_pages_pte() calling split_folio() under RCU read lock. Fix this by unmapping the PTEs and exiting RCU read section before splitting the folio and then retrying. The same retry pattern is used when locking the folio or anon_vma in this function. After splitting the large folio we unlock and release it because after the split the old folio might not be the one that contains the src_addr. Link: https://lkml.kernel.org/r/20240102233256.1077959-1-surenb@google.com Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Nicolas Geoffray <ngeoffray@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05buffer: fix unintended successful returnMatthew Wilcox (Oracle)
If try_to_free_buffers() succeeded and then folio_alloc_buffers() failed, grow_dev_folio() would return success. This would be incorrect; memory allocation failure is supposed to result in a failure. It's a harmless bug; the caller will simply go around the loop one more time and grow_dev_folio() will correctly return a failure that time. But it was an unintended change and looks like a more serious bug than it is. While I'm in here, improve the commentary about why we return success even though we failed. Link: https://lkml.kernel.org/r/20240101093848.2017115-1-willy@infradead.org Fixes: 6d840a18773f ("buffer: return bool from grow_dev_folio()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info()Tetsuo Handa
syzbot is reporting uninit-value at shrinker_alloc(), for commit 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}") which assumed that the ->unit was allocated with __GFP_ZERO forgot to replace kvmalloc_node() in expand_one_shrinker_info() with kvzalloc_node(). Link: https://lkml.kernel.org/r/9226cc0a-10e0-4489-80c5-58c3b5b4359c@I-love.SAKURA.ne.jp Reported-by: syzbot <syzbot+1e0ed05798af62917464@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=1e0ed05798af62917464 Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05Merge tag 'soc-fixes-6.7-3a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "These are two correctness fixes for handing DT input in the Allwinner (sunxi) SMP startup code" * tag 'soc-fixes-6.7-3a' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: sun9i: smp: fix return code check of of_property_match_string ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
2024-01-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fix from Paolo Bonzini: - Fix boolean logic in intel_guest_get_msrs * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
2024-01-05Merge tag 'probes-fixes-v6.7-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull kprobes/x86 fix from Masami Hiramatsu: - Fix to emulate indirect call which size is not 5 byte. Current code expects the indirect call instructions are 5 bytes, but that is incorrect. Usually indirect call based on register is shorter than that, thus the emulation causes a kernel crash by accessing wrong instruction boundary. This uses the instruction size to calculate the return address correctly. * tag 'probes-fixes-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect
2024-01-05Merge tag '6.7-rc8-smb3-mchan-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: "Three important multichannel smb3 client fixes found in recent testing: - fix oops due to incorrect refcounting of interfaces after disabling multichannel - fix possible unrecoverable session state after disabling multichannel with active sessions - fix two places that were missing use of chan_lock" * tag '6.7-rc8-smb3-mchan-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: do not depend on release_iface for maintaining iface_list cifs: cifs_chan_is_iface_active should be called with chan_lock held cifs: after disabling multichannel, mark tcon for reconnect
2024-01-05Merge branch 'net-gro-reduce-extension-header-parsing-overhead'Jakub Kicinski
Richard Gobert says: ==================== net: gro: reduce extension header parsing overhead This series attempts to reduce the parsing overhead of IPv6 extension headers in GRO and GSO, by removing extension header specific code and enabling the frag0 fast path. The following changes were made: - Removed some unnecessary HBH conditionals by adding HBH offload to inet6_offloads - Added a utility function to support frag0 fast path in ipv6_gro_receive - Added selftests for IPv6 packets with extension headers in GRO v2: https://lore.kernel.org/netdev/127b8199-1cd4-42d7-9b2b-875abaad93fe@gmail.com/ v1: https://lore.kernel.org/netdev/f4eff69d-3917-4c42-8c6b-d09597ac4437@gmail.com/ ==================== Link: https://lore.kernel.org/r/ac6fb684-c00e-449c-92c3-99358a927ade@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05selftests/net: fix GRO coalesce test and add ext header coalesce testsRichard Gobert
Currently there is no test which checks that IPv6 extension header packets successfully coalesce. This commit adds a test, which verifies two IPv6 packets with HBH extension headers do coalesce, and another test which checks that packets with different extension header data do not coalesce in GRO. I changed the receive socket filter to accept a packet with one extension header. This change exposed a bug in the fragment test -- the old BPF did not accept the fragment packet. I updated correct_num_packets in the fragment test accordingly. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/69282fed-2415-47e8-b3d3-34939ec3eb56@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net: gro: parse ipv6 ext headers without frag0 invalidationRichard Gobert
The existing code always pulls the IPv6 header and sets the transport offset initially. Then optionally again pulls any extension headers in ipv6_gso_pull_exthdrs and sets the transport offset again on return from that call. skb->data is set at the start of the first extension header before calling ipv6_gso_pull_exthdrs, and must disable the frag0 optimization because that function uses pskb_may_pull/pskb_pull instead of skb_gro_ helpers. It sets the GRO offset to the TCP header with skb_gro_pull and sets the transport header. Then returns skb->data to its position before this block. This commit introduces a new helper function - ipv6_gro_pull_exthdrs - which is used in ipv6_gro_receive to pull ipv6 ext headers instead of ipv6_gso_pull_exthdrs. Thus, there is no modification of skb->data, all operations use skb_gro_* helpers, and the frag0 fast path can be taken for IPv6 packets with ext headers. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/504130f6-b56c-4dcc-882c-97942c59f5b7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net: gso: add HBH extension header offload supportRichard Gobert
This commit adds net_offload to IPv6 Hop-by-Hop extension headers (as it is done for routing and dstopts) since it is supported in GSO and GRO. This allows to remove specific HBH conditionals in GSO and GRO when pulling and parsing an incoming packet. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/d4f8825a-1d55-4b12-9d67-a254dbbfa6ae@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net/sched: act_ct: fix skb leak and crash on ooo fragsTao Liu
act_ct adds skb->users before defragmentation. If frags arrive in order, the last frag's reference is reset in: inet_frag_reasm_prepare skb_morph which is not straightforward. However when frags arrive out of order, nobody unref the last frag, and all frags are leaked. The situation is even worse, as initiating packet capture can lead to a crash[0] when skb has been cloned and shared at the same time. Fix the issue by removing skb_get() before defragmentation. act_ct returns TC_ACT_CONSUMED when defrag failed or in progress. [0]: [ 843.804823] ------------[ cut here ]------------ [ 843.809659] kernel BUG at net/core/skbuff.c:2091! [ 843.814516] invalid opcode: 0000 [#1] PREEMPT SMP [ 843.819296] CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G S 6.7.0-rc3 #2 [ 843.824107] Hardware name: XFUSION 1288H V6/BC13MBSBD, BIOS 1.29 11/25/2022 [ 843.828953] RIP: 0010:pskb_expand_head+0x2ac/0x300 [ 843.833805] Code: 8b 70 28 48 85 f6 74 82 48 83 c6 08 bf 01 00 00 00 e8 38 bd ff ff 8b 83 c0 00 00 00 48 03 83 c8 00 00 00 e9 62 ff ff ff 0f 0b <0f> 0b e8 8d d0 ff ff e9 b3 fd ff ff 81 7c 24 14 40 01 00 00 4c 89 [ 843.843698] RSP: 0018:ffffc9000cce07c0 EFLAGS: 00010202 [ 843.848524] RAX: 0000000000000002 RBX: ffff88811a211d00 RCX: 0000000000000820 [ 843.853299] RDX: 0000000000000640 RSI: 0000000000000000 RDI: ffff88811a211d00 [ 843.857974] RBP: ffff888127d39518 R08: 00000000bee97314 R09: 0000000000000000 [ 843.862584] R10: 0000000000000000 R11: ffff8881109f0000 R12: 0000000000000880 [ 843.867147] R13: ffff888127d39580 R14: 0000000000000640 R15: ffff888170f7b900 [ 843.871680] FS: 0000000000000000(0000) GS:ffff889ffffc0000(0000) knlGS:0000000000000000 [ 843.876242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 843.880778] CR2: 00007fa42affcfb8 CR3: 000000011433a002 CR4: 0000000000770ef0 [ 843.885336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 843.889809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 843.894229] PKRU: 55555554 [ 843.898539] Call Trace: [ 843.902772] <IRQ> [ 843.906922] ? __die_body+0x1e/0x60 [ 843.911032] ? die+0x3c/0x60 [ 843.915037] ? do_trap+0xe2/0x110 [ 843.918911] ? pskb_expand_head+0x2ac/0x300 [ 843.922687] ? do_error_trap+0x65/0x80 [ 843.926342] ? pskb_expand_head+0x2ac/0x300 [ 843.929905] ? exc_invalid_op+0x50/0x60 [ 843.933398] ? pskb_expand_head+0x2ac/0x300 [ 843.936835] ? asm_exc_invalid_op+0x1a/0x20 [ 843.940226] ? pskb_expand_head+0x2ac/0x300 [ 843.943580] inet_frag_reasm_prepare+0xd1/0x240 [ 843.946904] ip_defrag+0x5d4/0x870 [ 843.950132] nf_ct_handle_fragments+0xec/0x130 [nf_conntrack] [ 843.953334] tcf_ct_act+0x252/0xd90 [act_ct] [ 843.956473] ? tcf_mirred_act+0x516/0x5a0 [act_mirred] [ 843.959657] tcf_action_exec+0xa1/0x160 [ 843.962823] fl_classify+0x1db/0x1f0 [cls_flower] [ 843.966010] ? skb_clone+0x53/0xc0 [ 843.969173] tcf_classify+0x24d/0x420 [ 843.972333] tc_run+0x8f/0xf0 [ 843.975465] __netif_receive_skb_core+0x67a/0x1080 [ 843.978634] ? dev_gro_receive+0x249/0x730 [ 843.981759] __netif_receive_skb_list_core+0x12d/0x260 [ 843.984869] netif_receive_skb_list_internal+0x1cb/0x2f0 [ 843.987957] ? mlx5e_handle_rx_cqe_mpwrq_rep+0xfa/0x1a0 [mlx5_core] [ 843.991170] napi_complete_done+0x72/0x1a0 [ 843.994305] mlx5e_napi_poll+0x28c/0x6d0 [mlx5_core] [ 843.997501] __napi_poll+0x25/0x1b0 [ 844.000627] net_rx_action+0x256/0x330 [ 844.003705] __do_softirq+0xb3/0x29b [ 844.006718] irq_exit_rcu+0x9e/0xc0 [ 844.009672] common_interrupt+0x86/0xa0 [ 844.012537] </IRQ> [ 844.015285] <TASK> [ 844.017937] asm_common_interrupt+0x26/0x40 [ 844.020591] RIP: 0010:acpi_safe_halt+0x1b/0x20 [ 844.023247] Code: ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 65 48 8b 04 25 00 18 03 00 48 8b 00 a8 08 75 0c 66 90 0f 00 2d 81 d0 44 00 fb f4 <fa> c3 0f 1f 00 89 fa ec 48 8b 05 ee 88 ed 00 a9 00 00 00 80 75 11 [ 844.028900] RSP: 0018:ffffc90000533e70 EFLAGS: 00000246 [ 844.031725] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0000000000000000 [ 844.034553] RDX: ffff889ffffc0000 RSI: ffffffff828b7f20 RDI: ffff88a090f45c64 [ 844.037368] RBP: ffff88a0901a2800 R08: ffff88a090f45c00 R09: 00000000000317c0 [ 844.040155] R10: 00ec812281150475 R11: ffff889fffff0e04 R12: ffffffff828b7fa0 [ 844.042962] R13: ffffffff828b7f20 R14: 0000000000000001 R15: 0000000000000000 [ 844.045819] acpi_idle_enter+0x7b/0xc0 [ 844.048621] cpuidle_enter_state+0x7f/0x430 [ 844.051451] cpuidle_enter+0x2d/0x40 [ 844.054279] do_idle+0x1d4/0x240 [ 844.057096] cpu_startup_entry+0x2a/0x30 [ 844.059934] start_secondary+0x104/0x130 [ 844.062787] secondary_startup_64_no_verify+0x16b/0x16b [ 844.065674] </TASK> Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Tao Liu <taoliu828@163.com> Link: https://lore.kernel.org/r/20231228081457.936732-1-taoliu828@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net: fill in MODULE_DESCRIPTION()s for CAIFJakub Kicinski
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to all the CAIF sub-modules. Link: https://lore.kernel.org/r/20240104144855.1320993-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net: fill in MODULE_DESCRIPTION() for AF_PACKETJakub Kicinski
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add description to net/packet/af_packet.c Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240104144119.1319055-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net: fill in MODULE_DESCRIPTION()s for DSA tagsJakub Kicinski
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to all the DSA tag modules. The descriptions are copy/pasted Kconfig names, with s/^Tag/DSA tag/. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20240104143759.1318137-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net: fill in MODULE_DESCRIPTION()s for ATMJakub Kicinski
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to all the ATM modules and drivers. Link: https://lore.kernel.org/r/20240104143737.1317945-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05ubifs: Check @c->dirty_[n|p]n_cnt and @c->nroot state under @c->lp_mutexZhihao Cheng
The checking of @c->nroot->flags and @c->dirty_[n|p]n_cnt in function nothing_to_commit() is not atomic, which could be raced with modifying of lpt, for example: P1 P2 P3 run_gc ubifs_garbage_collect do_commit ubifs_return_leb ubifs_lpt_lookup_dirty dirty_cow_nnode do_commit nothing_to_commit if (test_bit(DIRTY_CNODE, &c->nroot->flags) // false test_and_set_bit(DIRTY_CNODE, &nnode->flags) c->dirty_nn_cnt += 1 ubifs_assert(c, c->dirty_nn_cnt == 0) // false ! Fetch a reproducer in Link: UBIFS error (ubi0:0 pid 2747): ubifs_assert_failed UBIFS assert failed: c->dirty_pn_cnt == 0, in fs/ubifs/commit.c Call Trace: ubifs_ro_mode+0x58/0x70 [ubifs] ubifs_assert_failed+0x6a/0x90 [ubifs] do_commit+0x5b7/0x930 [ubifs] ubifs_run_commit+0xc6/0x1a0 [ubifs] ubifs_sync_fs+0xd8/0x110 [ubifs] sync_filesystem+0xb4/0x120 do_syscall_64+0x6f/0x140 Fix it by checking @c->dirty_[n|p]n_cnt and @c->nroot state with @c->lp_mutex locked. Fixes: 944fdef52ca9 ("UBIFS: do not start the commit if there is nothing to commit") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218162 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-01-05Merge branch 'dpll-expose-fractional-frequency-offset-value-to-user'Jakub Kicinski
Jiri Pirko says: ==================== dpll: expose fractional frequency offset value to user Allow to expose pin fractional frequency offset value over new DPLL generic netlink attribute. Add an op to get the value from the driver. Implement this new op in mlx5 driver. ==================== Link: https://lore.kernel.org/r/20240103132838.1501801-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net/mlx5: DPLL, Implement fractional frequency offset get pin opJiri Pirko
Implement ffo_get() pin op filling it up to MSEED.frequency_diff value. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Acked-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://lore.kernel.org/r/20240103132838.1501801-4-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net/mlx5: DPLL, Use struct to get values from mlx5_dpll_synce_status_get()Jiri Pirko
Instead of passing separate args, introduce struct mlx5_dpll_synce_status to hold the values obtained by mlx5_dpll_synce_status_get(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Acked-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://lore.kernel.org/r/20240103132838.1501801-3-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05dpll: expose fractional frequency offset value to userJiri Pirko
Add a new netlink attribute to expose fractional frequency offset value for a pin. Add an op to get the value from the driver. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Acked-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://lore.kernel.org/r/20240103132838.1501801-2-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05ubifs: describe function parametersSascha Hauer
With 16a26b20d2afd ("ubifs: authentication: Add hashes to index nodes") insert_node() and insert_dent() got a new function parameter 'hash'. Add a description for this new parameter. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311051618.D7YUE1Rr-lkp@intel.com/ Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-01-05ubifs: auth.c: fix kernel-doc function prototype warningRandy Dunlap
Use the correct function name in the kernel-doc comment to prevent a kernel-doc warning: auth.c:30: warning: expecting prototype for ubifs_node_calc_hash(). Prototype was for __ubifs_node_calc_hash() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Closes: lore.kernel.org/r/202311052125.gE1Rylox-lkp@intel.com Cc: Richard Weinberger <richard@nod.at> Cc: linux-mtd@lists.infradead.org Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-01-05ubifs: use crypto_shash_tfm_digest() in ubifs_hmac_wkm()Eric Biggers
Simplify ubifs_hmac_wkm() by using crypto_shash_tfm_digest() instead of an alloc+init+update+final sequence. This should also improve performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-01-05um: Mark 32bit syscall helpers as clobbering memoryBenjamin Berg
The 64bit helper are marked to clobber the memory, but the 32bit ones are not. Add the appropriate clobber to the 32bit helper routines so that the compiler cannot do invalid optimizations. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-01-05um: Remove unused register save/restore functionsBenjamin Berg
These functions were only used when calling PTRACE_ARCH_PRCTL, but this code has been removed. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-01-05um: Rely on PTRACE_SETREGSET to set FS/GS base registersBenjamin Berg
These registers are saved/restored together with the other general registers using ptrace. In arch_set_tls we then just need to set the register and it will be synced back normally. Most of this logic was introduced in commit f355559cf7845 ("[PATCH] uml: x86_64 thread fixes"). However, at least today we can rely on ptrace to restore the base registers for us. As such, only the part of the patch that tracks the FS register for use as thread local storage is actually needed. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Signed-off-by: Richard Weinberger <richard@nod.at>