summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2023-08-21merge mm-hotfixes-stable into mm-stable to pick up depended-upon changesAndrew Morton
2023-08-21maple_tree: replace data before marking dead in split and spanning storeLiam R. Howlett
Reorder the operations for split and spanning stores so that new data is placed in the tree prior to marking the old data as dead. This will limit re-walks on dead data to just once instead of a retry loop. The order of operations is as follows: Create the new data, put the new data in place, mark the top node of the old data as dead. Then repair parent links in the reused nodes through all levels of the tree, following the new nodes downwards. Finally walk the top dead node looking for nodes that are no longer used, or subtrees that should be destroyed (marked dead throughout then freed), follow the partially used nodes downwards to discover other dead nodes and subtrees. Link: https://lkml.kernel.org/r/20230804165951.2661157-7-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21maple_tree: change mas_adopt_children() parent usageLiam R. Howlett
All calls to mas_adopt_children() currently pass the parent as the node in the maple state. Allow for the parent pointer that is passed in to be used instead. Link: https://lkml.kernel.org/r/20230804165951.2661157-6-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21maple_tree: introduce mas_tree_parent() definitionLiam R. Howlett
Add a definition to shorten long code lines and clarify what the code is doing. Use the new definition to get the maple tree parent pointer from the maple state where possible. Link: https://lkml.kernel.org/r/20230804165951.2661157-5-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21maple_tree: introduce mas_put_in_tree()Liam R. Howlett
mas_replace() has a single user that takes a flag which is now always true. Replace this function with mas_put_in_tree() to better align with mas_replace_node(). Inline the remaining logic into the only caller; mas_wmb_replace(). Link: https://lkml.kernel.org/r/20230804165951.2661157-4-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21maple_tree: reorder replacement of nodes to avoid live lockLiam R. Howlett
Replacing nodes may cause a live lock-up if CPU resources are saturated by write operations on the tree by continuously retrying on dead nodes. To avoid the continuous retry scenario, ensure the new node is inserted into the tree prior to marking the old data as dead. This will define a window where old and new data is swapped. When reusing lower level nodes, ensure the parent pointer is updated after the parent is marked dead. This ensures that the child is still reachable from the top of the tree, but walking up to a dead node will result in a single retry that will start a fresh walk from the top down through the new node. Link: https://lkml.kernel.org/r/20230804165951.2661157-3-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21maple_tree: add hex output to maple_arange64 dumpLiam R. Howlett
Patch series "maple_tree: Change replacement strategy". The maple tree marks nodes dead as soon as they are going to be replaced. This could be problematic when used in the RCU context since the writer may be starved of CPU time by the readers. This patch set addresses the issue by switching the data replacement strategy to one that will only mark data as dead once the new data is available. This series changes the ordering of the node replacement so that the new data is live before the old data is marked 'dead'. When readers hit 'dead' nodes, they will restart from the top of the tree and end up in the new data. In more complex scenarios, the replacement strategy means a subtree is built and graphed into the tree leaving some nodes to point to the old parent. The view of tasks into the old data will either remain with the old data, or see the new data once the old data is marked 'dead'. Iterators will see the 'dead' node and restart on their own and switch to the new data. There is no risk of the reader seeing old data in these cases. The 'dead' subtree of data is then fully marked dead, but reused nodes will still point to the dead nodes until the parent pointer is updated. Walking up to a 'dead' node will cause a re-walk from the top of the tree and enter the new data area where old data is not reachable. Once the parent pointers are fully up to date in the active data, the 'dead' subtree is iterated to collect entirely 'dead' subtrees, and dead nodes (nodes that partially contained reused data). This patch (of 6): When dumping the tree, honour formatting request to output hex for the maple node type arange64. Link: https://lkml.kernel.org/r/20230804165951.2661157-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20230804165951.2661157-2-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21radix tree: remove unused variableArnd Bergmann
Recent versions of clang warn about an unused variable, though older versions saw the 'slot++' as a use and did not warn: radix-tree.c:1136:50: error: parameter 'slot' set but not used [-Werror,-Wunused-but-set-parameter] It's clearly not needed any more, so just remove it. Link: https://lkml.kernel.org/r/20230811131023.2226509-1-arnd@kernel.org Fixes: 3a08cd52c37c7 ("radix tree: Remove multiorder support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Rong Tao <rongtao@cestc.cn> Cc: Tom Rix <trix@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: reduce resets during store setupLiam R. Howlett
mas_prealloc() may walk partially down the tree before finding that a split or spanning store is needed. When the write occurs, relax the logic on resetting the walk so that partial walks will not restart, but walks that have gone too far (a store that affects beyond the current node) should be restarted. Link: https://lkml.kernel.org/r/20230724183157.3939892-15-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: refine mas_preallocate() node calculationsLiam R. Howlett
Calculate the number of nodes based on the pending write action instead of assuming the worst case. This addresses a performance regression introduced in platforms that have longer allocation timing. Link: https://lkml.kernel.org/r/20230724183157.3939892-14-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: move mas_wr_end_piv() below mas_wr_extend_null()Liam R. Howlett
Relocate it and call mas_wr_extend_null() from within mas_wr_end_piv(). Extending the NULL may affect the end pivot value so call mas_wr_endtend_null() from within mas_wr_end_piv() to keep it all together. Link: https://lkml.kernel.org/r/20230724183157.3939892-12-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: adjust node allocation on mas_rebalance()Liam R. Howlett
mas_rebalance() is called to rebalance an insufficient node into a single node or two sufficient nodes. The preallocation estimate is always too many in this case as the height of the tree will never grow and there is no possibility to have a three way split in this case, so revise the node allocation count. Link: https://lkml.kernel.org/r/20230724183157.3939892-9-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: re-introduce entry to mas_preallocate() argumentsLiam R. Howlett
The current preallocation strategy is to preallocate the absolute worst-case allocation for a tree modification. The entry (or NULL) is needed to know how many nodes are needed to write to the tree. Start by adding the argument to the mas_preallocate() definition. Link: https://lkml.kernel.org/r/20230724183157.3939892-8-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: add benchmarking for mas_prev()Liam R. Howlett
Add some benchmarking functions in testing for mas_prev(). This is useful to ensure there are no regressions added during modifications. Link: https://lkml.kernel.org/r/20230724183157.3939892-3-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: add benchmarking for mas_for_eachLiam R. Howlett
Patch series "Reduce preallocations for maple tree", v3. Initial work on preallocations showed no regression in performance during testing, but recently some users (both on [1] and off [android] list) have reported that preallocating the worst-case number of nodes has caused some slow down. This patch set addresses the number of allocations in a few ways. During munmap() most munmap() operations will remove a single VMA, so leverage the fact that the maple tree can place a single pointer at range 0 - 0 without allocating. This is done by changing the index of the VMAs to be indexed by the count, starting at 0. Re-introduce the entry argument to mas_preallocate() so that a more intelligent guess of the node count can be made. Implement the more intelligent guess of the node count, although there is more work to be done. During development of v2 of this patch set, I also noticed that the number of nodes being allocated for a rebalance was beyond what could possibly be needed. This is addressed in patch 0008. This patch (of 15): Add a way to test the speed of mas_for_each() to the testing code. Link: https://lkml.kernel.org/r/20230724183157.3939892-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20230724183157.3939892-2-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: Be more strict about lockingLiam R. Howlett
Use lockdep to check the write path in the maple tree holds the lock in write mode. Introduce mt_write_lock_is_held() to check if the lock is held for writing. Update the necessary checks for rcu_dereference_protected() to use the new write lock check. Link: https://lkml.kernel.org/r/20230714195551.894800-5-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oliver Sang <oliver.sang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: mtree_insert: fix typo in kernel-doc description of GFP flagsMike Rapoport (IBM)
Replace FGP_FLAGS with GFP_FLAGS Link: https://lkml.kernel.org/r/20230715084038.987955-1-rppt@kernel.org Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: mtree_insert*: fix typo in kernel-doc descriptionMike Rapoport (IBM)
Replace "Insert and entry at a give index" with "Insert an entry at a given index" Link: https://lkml.kernel.org/r/20230715143920.994812-1-rppt@kernel.org Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18lib/test_meminit: allocate pages up to order MAX_ORDERAndrew Donnellan
test_pages() tests the page allocator by calling alloc_pages() with different orders up to order 10. However, different architectures and platforms support different maximum contiguous allocation sizes. The default maximum allocation order (MAX_ORDER) is 10, but architectures can use CONFIG_ARCH_FORCE_MAX_ORDER to override this. On platforms where this is less than 10, test_meminit() will blow up with a WARN(). This is expected, so let's not do that. Replace the hardcoded "10" with the MAX_ORDER macro so that we test allocations up to the expected platform limit. Link: https://lkml.kernel.org/r/20230714015238.47931-1-ajd@linux.ibm.com Fixes: 5015a300a522 ("lib: introduce test_meminit module") Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Xiaoke Wang <xkernel.wang@foxmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: drop mas_first_entry()Peng Zhang
The internal function mas_first_entry() is no longer used, so drop it. Link: https://lkml.kernel.org/r/20230711035444.526-9-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: replace mas_logical_pivot() with mas_safe_pivot()Peng Zhang
Replace mas_logical_pivot() with mas_safe_pivot() and drop mas_logical_pivot() since it won't be used anymore. We can do this since now all nodes will have node limit pivot (if it is not full node). Link: https://lkml.kernel.org/r/20230711035444.526-8-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: update mt_validate()Peng Zhang
Instead of using mas_first_entry() to find the leftmost leaf, use a simple loop instead. Remove an unneeded check for root node. To make the error message more accurate, check pivots first and then slots, because checking slots depend on the node limit pivot to break the loop. Link: https://lkml.kernel.org/r/20230711035444.526-7-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: make mas_validate_limits() check root node and node limitPeng Zhang
Update mas_validate_limits() to check root node, check node limit pivot if there is enough room for it to exist and check data_end. Remove the check for child existence as it is done in mas_validate_child_slot(). Link: https://lkml.kernel.org/r/20230711035444.526-6-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: fix mas_validate_child_slot() to check last missed slotPeng Zhang
Don't break the loop before checking the last slot. Also here check if non-leaf nodes are missing children. Link: https://lkml.kernel.org/r/20230711035444.526-5-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: make mas_validate_gaps() to check metadataPeng Zhang
Make mas_validate_gaps() check whether the offset in the metadata points to the largest gap. By the way, simplify this function. Add the verification that gaps beyond the node limit are zero. Link: https://lkml.kernel.org/r/20230711035444.526-4-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: don't use MAPLE_ARANGE64_META_MAX to indicate no gapPeng Zhang
Patch series "Improve the validation for maple tree and some cleanup", v2. This patch (of 7): Do not use a special offset to indicate that there is no gap. When there is no gap, offset can point to any valid slots because its gap is 0. Link: https://lkml.kernel.org/r/20230711035444.526-1-zhangpeng.00@bytedance.com Link: https://lkml.kernel.org/r/20230711035444.526-3-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: add a fast path case in mas_wr_slot_store()Peng Zhang
When expanding a range in two directions, only partially overwriting the previous and next ranges, the number of entries will not be increased, so we can just update the pivots as a fast path. However, it may introduce potential risks in RCU mode, because it updates two pivots. We only enable it in non-RCU mode. Link: https://lkml.kernel.org/r/20230628073657.75314-5-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: optimize mas_wr_append(), also improve duplicating VMAsPeng Zhang
When the new range can be completely covered by the original last range without touching the boundaries on both sides, two new entries can be appended to the end as a fast path. We update the original last pivot at the end, and the newly appended two entries will not be accessed before this, so it is also safe in RCU mode. This is useful for sequential insertion, which is what we do in dup_mmap(). Enabling BENCH_FORK in test_maple_tree and just running bench_forking() gives the following time-consuming numbers: before: after: 17,874.83 msec 15,738.38 msec It shows about a 12% performance improvement for duplicating VMAs. Link: https://lkml.kernel.org/r/20230628073657.75314-4-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: add test for mas_wr_modify() fast pathPeng Zhang
Patch series "Optimize the fast path of mas_store()", v4. Add fast paths for mas_wr_append() and mas_wr_slot_store() respectively. The newly added fast path of mas_wr_append() is used in fork() and how much it benefits fork() depends on how many VMAs are duplicated. Thanks Liam for the review. This patch (of 4): Add tests for all cases of mas_wr_append() and mas_wr_slot_store(). Link: https://lkml.kernel.org/r/20230628073657.75314-1-zhangpeng.00@bytedance.com Link: https://lkml.kernel.org/r/20230628073657.75314-2-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18maple_tree: fix a few documentation issuesThomas Gleixner
The documentation of mt_next() claims that it starts the search at the provided index. That's incorrect as it starts the search after the provided index. The documentation of mt_find() is slightly confusing. "Handles locking" is not really helpful as it does not explain how the "locking" works. Also the documentation of index talks about a range, while in reality the index is updated on a succesful search to the index of the found entry plus one. Fix similar issues for mt_find_after() and mt_prev(). Reword the confusing "Note: Will not return the zero entry." comment on mt_for_each() and document @__index correctly. Link: https://lkml.kernel.org/r/87ttw2n556.ffs@tglx Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04crypto, cifs: fix error handling in extract_iter_to_sg()David Howells
Fix error handling in extract_iter_to_sg(). Pages need to be unpinned, not put in extract_user_to_sg() when handling IOVEC/UBUF sources. The bug may result in a warning like the following: WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:27 [inline] WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 raw_atomic_add include/linux/atomic/atomic-arch-fallback.h:537 [inline] WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 atomic_add include/linux/atomic/atomic-instrumented.h:105 [inline] WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 try_grab_page+0x108/0x160 mm/gup.c:252 ... pc : try_grab_page+0x108/0x160 mm/gup.c:229 lr : follow_page_pte+0x174/0x3e4 mm/gup.c:651 ... Call trace: __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:27 [inline] arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] raw_atomic_add include/linux/atomic/atomic-arch-fallback.h:537 [inline] atomic_add include/linux/atomic/atomic-instrumented.h:105 [inline] try_grab_page+0x108/0x160 mm/gup.c:252 follow_pmd_mask mm/gup.c:734 [inline] follow_pud_mask mm/gup.c:765 [inline] follow_p4d_mask mm/gup.c:782 [inline] follow_page_mask+0x12c/0x2e4 mm/gup.c:839 __get_user_pages+0x174/0x30c mm/gup.c:1217 __get_user_pages_locked mm/gup.c:1448 [inline] __gup_longterm_locked+0x94/0x8f4 mm/gup.c:2142 internal_get_user_pages_fast+0x970/0xb60 mm/gup.c:3140 pin_user_pages_fast+0x4c/0x60 mm/gup.c:3246 iov_iter_extract_user_pages lib/iov_iter.c:1768 [inline] iov_iter_extract_pages+0xc8/0x54c lib/iov_iter.c:1831 extract_user_to_sg lib/scatterlist.c:1123 [inline] extract_iter_to_sg lib/scatterlist.c:1349 [inline] extract_iter_to_sg+0x26c/0x6fc lib/scatterlist.c:1339 hash_sendmsg+0xc0/0x43c crypto/algif_hash.c:117 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg+0x54/0x60 net/socket.c:748 ____sys_sendmsg+0x270/0x2ac net/socket.c:2494 ___sys_sendmsg+0x80/0xdc net/socket.c:2548 __sys_sendmsg+0x68/0xc4 net/socket.c:2577 __do_sys_sendmsg net/socket.c:2586 [inline] __se_sys_sendmsg net/socket.c:2584 [inline] __arm64_sys_sendmsg+0x24/0x30 net/socket.c:2584 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x48/0x114 arch/arm64/kernel/syscall.c:52 el0_svc_common.constprop.0+0x44/0xe4 arch/arm64/kernel/syscall.c:142 do_el0_svc+0x38/0xa4 arch/arm64/kernel/syscall.c:191 el0_svc+0x2c/0xb0 arch/arm64/kernel/entry-common.c:647 el0t_64_sync_handler+0xc0/0xc4 arch/arm64/kernel/entry-common.c:665 el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:591 Link: https://lkml.kernel.org/r/20571.1690369076@warthog.procyon.org.uk Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a scatterlist") Reported-by: syzbot+9b82859567f2e50c123e@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-mm/000000000000273d0105ff97bf56@google.com/ Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Steve French <stfrench@microsoft.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jeff Layton <jlayton@kernel.org> Cc: Shyam Prasad N <nspmangalore@gmail.com> Cc: Rohith Surabattula <rohiths.msft@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-30Merge tag 'char-misc-6.5-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char driver and Documentation fixes from Greg KH: "Here is a char driver fix and some documentation updates for 6.5-rc4 that contain the following changes: - sram/genalloc bugfix for reported problem - security-bugs.rst update based on recent discussions - embargoed-hardware-issues minor cleanups and then partial revert for the project/company lists All of these have been in linux-next for a while with no reported problems, and the documentation updates have all been reviewed by the relevant developers" * tag 'char-misc-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: misc/genalloc: Name subpools by of_node_full_name() Documentation: embargoed-hardware-issues.rst: add AMD to the list Documentation: embargoed-hardware-issues.rst: clean out empty and unused entries Documentation: security-bugs.rst: clarify CVE handling Documentation: security-bugs.rst: update preferences when dealing with the linux-distros group
2023-07-26misc/genalloc: Name subpools by of_node_full_name()Linus Walleij
A previous commit tried to come up with more generic subpool names, but this isn't quite working: the node name was used elsewhere to match pools to consumers which regressed the nVidia Tegra 2/3 video decoder. Revert back to an earlier approach using of_node_full_name() instead of just the name to make sure the pool name is more unique, and change both sites using this in the kernel. It is not perfect since two SRAM nodes could have the same subpool name but it makes the situation better than before. Reported-by: Dmitry Osipenko <digetx@gmail.com> Fixes: 21e5a2d10c8f ("misc: sram: Generate unique names for subpools") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Dmitry Osipenko <digetx@gmail.com> Link: https://lore.kernel.org/r/20230622074520.3058027-1-linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-22Merge tag 'block-6.5-2023-07-21' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fix for loop regressions (Mauricio) - Fix a potential stall with batched wakeups in sbitmap (David) - Fix for stall with recursive plug flushes (Ross) - Skip accounting of empty requests for blk-iocost (Chengming) - Remove a dead field in struct blk_mq_hw_ctx (Chengming) * tag 'block-6.5-2023-07-21' of git://git.kernel.dk/linux: loop: do not enforce max_loop hard limit by (new) default loop: deprecate autoloading callback loop_probe() sbitmap: fix batching wakeup blk-iocost: skip empty flush bio in iocost blk-mq: delete dead struct blk_mq_hw_ctx->queued field blk-mq: Fix stall due to recursive flush plug
2023-07-21sbitmap: fix batching wakeupDavid Jeffery
Current code supposes that it is enough to provide forward progress by just waking up one wait queue after one completion batch is done. Unfortunately this way isn't enough, cause waiter can be added to wait queue just after it is woken up. Follows one example(64 depth, wake_batch is 8) 1) all 64 tags are active 2) in each wait queue, there is only one single waiter 3) each time one completion batch(8 completions) wakes up just one waiter in each wait queue, then immediately one new sleeper is added to this wait queue 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each wait queue 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7 can't be waken up anymore. Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for single batch. Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20230721095715.232728-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-18Merge tag 'mm-hotfixes-stable-2023-07-18-12-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "Seven hotfixes, six of which are cc:stable and one of which addresses a post-6.5 issue" * tag 'mm-hotfixes-stable-2023-07-18-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: maple_tree: fix node allocation testing on 32 bit maple_tree: fix 32 bit mas_next testing selftests/mm: mkdirty: fix incorrect position of #endif maple_tree: set the node limit when creating a new root node mm/mlock: fix vma iterator conversion of apply_vma_lock_flags() prctl: move PR_GET_AUXV out of PR_MCE_KILL selftests/mm: give scripts execute permission
2023-07-17maple_tree: fix 32 bit mas_next testingLiam R. Howlett
The test setup of mas_next is dependent on node entry size to create a 2 level tree, but the tests did not account for this in the expected value when shifting beyond the scope of the tree. Fix this by setting up the test to succeed depending on the node entries which is dependent on the 32/64 bit setup. Link: https://lkml.kernel.org/r/20230712173916.168805-1-Liam.Howlett@oracle.com Fixes: 120b116208a0 ("maple_tree: reorganize testing to restore module testing") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/linux-mm/CAMuHMdV4T53fOw7VPoBgPR7fP6RYqf=CBhD_y_vOg53zZX_DnA@mail.gmail.com/ Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-17maple_tree: set the node limit when creating a new root nodePeng Zhang
Set the node limit of the root node so that the last pivot of all nodes is the node limit (if the node is not full). This patch also fixes a bug in mas_rev_awalk(). Effectively, always setting a maximum makes mas_logical_pivot() behave as mas_safe_pivot(). Without this fix, it is possible that very small tasks would fail to find the correct gap. Although this has not been observed with real tasks, it has been reported to happen in m68k nommu running the maple tree tests. Link: https://lkml.kernel.org/r/20230711035444.526-1-zhangpeng.00@bytedance.com Link: https://lore.kernel.org/linux-mm/CAMuHMdV4T53fOw7VPoBgPR7fP6RYqf=CBhD_y_vOg53zZX_DnA@mail.gmail.com/ Link: https://lkml.kernel.org/r/20230711035444.526-2-zhangpeng.00@bytedance.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-10iov_iter: Mark copy_iovec_from_user() noclonePeter Zijlstra
Extend commit 50f9a76ef127 ("iov_iter: Mark copy_compat_iovec_from_user() noinline") to also cover copy_iovec_from_user(). Different compiler versions cause the same problem on different functions. lib/iov_iter.o: warning: objtool: .altinstr_replacement+0x1f: redundant UACCESS disable lib/iov_iter.o: warning: objtool: iovec_from_user+0x84: call to copy_iovec_from_user.part.0() with UACCESS enabled lib/iov_iter.o: warning: objtool: __import_iovec+0x143: call to copy_iovec_from_user.part.0() with UACCESS enabled Fixes: 50f9a76ef127 ("iov_iter: Mark copy_compat_iovec_from_user() noinline") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lkml.kernel.org/r/20230616124354.GD4253@hirez.programming.kicks-ass.net
2023-07-08Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "16 hotfixes. Six are cc:stable and the remainder address post-6.4 issues" The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since it was all hopefully fixed in mainline. * tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: lib: dhry: fix sleeping allocations inside non-preemptable section kasan, slub: fix HW_TAGS zeroing with slub_debug kasan: fix type cast in memory_is_poisoned_n mailmap: add entries for Heiko Stuebner mailmap: update manpage link bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page MAINTAINERS: add linux-next info mailmap: add Markus Schneider-Pargmann writeback: account the number of pages written back mm: call arch_swap_restore() from do_swap_page() squashfs: fix cache race with migration mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison docs: update ocfs2-devel mailing list address MAINTAINERS: update ocfs2-devel mailing list address mm: disable CONFIG_PER_VMA_LOCK until its fixed fork: lock VMAs of the parent process when forking
2023-07-08Merge tag 'hardening-v6.5-rc1-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Check for NULL bdev in LoadPin (Matthias Kaehlcke) - Revert unwanted KUnit FORTIFY build default - Fix 1-element array causing boot warnings with xhci-hub * tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: usb: ch9: Replace bmSublinkSpeedAttr 1-element array with flexible array Revert "fortify: Allow KUnit test to build without FORTIFY" dm: verity-loadpin: Add NULL pointer check for 'bdev' parameter
2023-07-08Merge tag 'bitmap-6.5-rc1' of https://github.com/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: "Fixes for different bitmap pieces: - lib/test_bitmap: increment failure counter properly The tests that don't use expect_eq() macro to determine that a test is failured must increment failed_tests explicitly. - lib/bitmap: drop optimization of bitmap_{from,to}_arr64 bitmap_{from,to}_arr64() optimization is overly optimistic on 32-bit LE architectures when it's wired to bitmap_copy_clear_tail(). - nodemask: Drop duplicate check in for_each_node_mask() As the return value type of first_node() became unsigned, the node >= 0 became unnecessary. - cpumask: fix function description kernel-doc notation - MAINTAINERS: Add bits.h and bitfield.h to the BITMAP API record Add linux/bits.h and linux/bitfield.h for visibility" * tag 'bitmap-6.5-rc1' of https://github.com/norov/linux: MAINTAINERS: Add bitfield.h to the BITMAP API record MAINTAINERS: Add bits.h to the BITMAP API record cpumask: fix function description kernel-doc notation nodemask: Drop duplicate check in for_each_node_mask() lib/bitmap: drop optimization of bitmap_{from,to}_arr64 lib/test_bitmap: increment failure counter properly
2023-07-08lib: dhry: fix sleeping allocations inside non-preemptable sectionGeert Uytterhoeven
The Smatch static checker reports the following warnings: lib/dhry_run.c:38 dhry_benchmark() warn: sleeping in atomic context lib/dhry_run.c:43 dhry_benchmark() warn: sleeping in atomic context Indeed, dhry() does sleeping allocations inside the non-preemptable section delimited by get_cpu()/put_cpu(). Fix this by using atomic allocations instead. Add error handling, as atomic these allocations may fail. Link: https://lkml.kernel.org/r/bac6d517818a7cd8efe217c1ad649fffab9cc371.1688568764.git.geert+renesas@glider.be Fixes: 13684e966d46283e ("lib: dhry: fix unstable smp_processor_id(_) usage") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/0469eb3a-02eb-4b41-b189-de20b931fa56@moroto.mountain Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-06Merge tag 'asm-generic-6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "These are cleanups for architecture specific header files: - the comments in include/linux/syscalls.h have gone out of sync and are really pointless, so these get removed - The asm/bitsperlong.h header no longer needs to be architecture specific on modern compilers, so use a generic version for newer architectures that use new enough userspace compilers - A cleanup for virt_to_pfn/virt_to_bus to have proper type checking, forcing the use of pointers" * tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: syscalls: Remove file path comments from headers tools arch: Remove uapi bitsperlong.h of hexagon and microblaze asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch m68k/mm: Make pfn accessors static inlines arm64: memory: Make virt_to_pfn() a static inline ARM: mm: Make virt_to_pfn() a static inline asm-generic/page.h: Make pfn accessors static inlines xen/netback: Pass (void *) to virt_to_page() netfs: Pass a pointer to virt_to_page() cifs: Pass a pointer to virt_to_page() in cifsglob cifs: Pass a pointer to virt_to_page() riscv: mm: init: Pass a pointer to virt_to_page() ARC: init: Pass a pointer to virt_to_pfn() in init m68k: Pass a pointer to virt_to_pfn() virt_to_page() fs/proc/kcore.c: Pass a pointer to virt_addr_valid()
2023-07-03Revert "fortify: Allow KUnit test to build without FORTIFY"Kees Cook
This reverts commit a9dc8d0442294b426b1ebd4ec6097c82ebe282e0. The standard for KUnit is to not build tests at all when required functionality is missing, rather than doing test "skip". Restore this for the fortify tests, so that architectures without CONFIG_ARCH_HAS_FORTIFY_SOURCE do not emit unsolvable warnings. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/all/CAMuHMdUrxOEroHVUt7-mAnKSBjY=a-D3jr+XiAifuwv06Ob9Pw@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org>
2023-07-03Merge tag 'char-misc-6.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull Char/Misc updates from Greg KH: "Here is the big set of char/misc and other driver subsystem updates for 6.5-rc1. Lots of different, tiny, stuff in here, from a range of smaller driver subsystems, including pulls from some substems directly: - IIO driver updates and additions - W1 driver updates and fixes (and a new maintainer!) - FPGA driver updates and fixes - Counter driver updates - Extcon driver updates - Interconnect driver updates - Coresight driver updates - mfd tree tag merge needed for other updates on top of that, lots of small driver updates as patches, including: - static const updates for class structures - nvmem driver updates - pcmcia driver fix - lots of other small driver updates and fixes All of these have been in linux-next for a while with no reported problems" * tag 'char-misc-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (243 commits) bsr: fix build problem with bsr_class static cleanup comedi: make all 'class' structures const char: xillybus: make xillybus_class a static const structure xilinx_hwicap: make icap_class a static const structure virtio_console: make port class a static const structure ppdev: make ppdev_class a static const structure char: misc: make misc_class a static const structure /dev/mem: make mem_class a static const structure char: lp: make lp_class a static const structure dsp56k: make dsp56k_class a static const structure bsr: make bsr_class a static const structure oradax: make 'cl' a static const structure hwtracing: hisi_ptt: Fix potential sleep in atomic context hwtracing: hisi_ptt: Advertise PERF_PMU_CAP_NO_EXCLUDE for PTT PMU hwtracing: hisi_ptt: Export available filters through sysfs hwtracing: hisi_ptt: Add support for dynamically updating the filter list hwtracing: hisi_ptt: Factor out filter allocation and release operation samples: pfsm: add CC_CAN_LINK dependency misc: fastrpc: check return value of devm_kasprintf() coresight: dummy: Update type of mode parameter in dummy_{sink,source}_enable() ...
2023-06-30Merge tag 'v6.5-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add linear akcipher/sig API - Add tfm cloning (hmac, cmac) - Add statesize to crypto_ahash Algorithms: - Allow only odd e and restrict value in FIPS mode for RSA - Replace LFSR with SHA3-256 in jitter - Add interface for gathering of raw entropy in jitter Drivers: - Fix race on data_avail and actual data in hwrng/virtio - Add hash and HMAC support in starfive - Add RSA algo support in starfive - Add support for PCI device 0x156E in ccp" * tag 'v6.5-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (85 commits) crypto: akcipher - Do not copy dst if it is NULL crypto: sig - Fix verify call crypto: akcipher - Set request tfm on sync path crypto: sm2 - Provide sm2_compute_z_digest when sm2 is disabled hwrng: imx-rngc - switch to DEFINE_SIMPLE_DEV_PM_OPS hwrng: st - keep clock enabled while hwrng is registered hwrng: st - support compile-testing hwrng: imx-rngc - fix the timeout for init and self check KEYS: asymmetric: Use new crypto interface without scatterlists KEYS: asymmetric: Move sm2 code into x509_public_key KEYS: Add forward declaration in asymmetric-parser.h crypto: sig - Add interface for sign/verify crypto: akcipher - Add sync interface without SG lists crypto: cipher - On clone do crypto_mod_get() crypto: api - Add __crypto_alloc_tfmgfp crypto: api - Remove crypto_init_ops() crypto: rsa - allow only odd e and restrict value in FIPS mode crypto: geniv - Split geniv out of AEAD Kconfig option crypto: algboss - Add missing dependency on RNG2 crypto: starfive - Add RSA algo support ...
2023-06-30Merge tag 'probes-v6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: - fprobe: Pass return address to the fprobe entry/exit callbacks so that the callbacks don't need to analyze pt_regs/stack to find the function return address. - kprobe events: cleanup usage of TPARG_FL_FENTRY and TPARG_FL_RETURN flags so that those are not set at once. - fprobe events: - Add a new fprobe events for tracing arbitrary function entry and exit as a trace event. - Add a new tracepoint events for tracing raw tracepoint as a trace event. This allows user to trace non user-exposed tracepoints. - Move eprobe's event parser code into probe event common file. - Introduce BTF (BPF type format) support to kernel probe (kprobe, fprobe and tracepoint probe) events so that user can specify traced function arguments by name. This also applies the type of argument when fetching the argument. - Introduce '$arg*' wildcard support if BTF is available. This expands the '$arg*' meta argument to all function argument automatically. - Check the return value types by BTF. If the function returns 'void', '$retval' is rejected. - Add some selftest script for fprobe events, tracepoint events and BTF support. - Update documentation about the fprobe events. - Some fixes for above features, document and selftests. - selftests for ftrace (in addition to the new fprobe events): - Add a test case for multiple consecutive probes in a function which checks if ftrace based kprobe, optimized kprobe and normal kprobe can be defined in the same target function. - Add a test case for optimized probe, which checks whether kprobe can be optimized or not. * tag 'probes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/probes: Fix tracepoint event with $arg* to fetch correct argument Documentation: Fix typo of reference file name tracing/probes: Fix to return NULL and keep using current argc selftests/ftrace: Add new test case which checks for optimized probes selftests/ftrace: Add new test case which adds multiple consecutive probes in a function Documentation: tracing/probes: Add fprobe event tracing document selftests/ftrace: Add BTF arguments test cases selftests/ftrace: Add tracepoint probe test case tracing/probes: Add BTF retval type support tracing/probes: Add $arg* meta argument for all function args tracing/probes: Support function parameters if BTF is available tracing/probes: Move event parameter fetching code to common parser tracing/probes: Add tracepoint support on fprobe_events selftests/ftrace: Add fprobe related testcases tracing/probes: Add fprobe events for tracing function entry and exit. tracing/probes: Avoid setting TPARG_FL_FENTRY and TPARG_FL_RETURN fprobe: Pass return address to the handlers
2023-06-28Merge tag 'net-next-6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Jakub Kicinski: "WiFi 7 and sendpage changes are the biggest pieces of work for this release. The latter will definitely require fixes but I think that we got it to a reasonable point. Core: - Rework the sendpage & splice implementations Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is Remove the MSG_SENDPAGE_NOTLAST flag completely - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid - Enable socket busy polling with CONFIG_RT - Improve reliability and efficiency of reporting for ref_tracker - Auto-generate a user space C library for various Netlink families Protocols: - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2] - Use per-VMA locking for "page-flipping" TCP receive zerocopy - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO) - Avoid waking up applications using TLS sockets until we have a full record - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch - PPPoE: make number of hash bits configurable - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig) - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge) - Support matching on Connectivity Fault Management (CFM) packets - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug - HSR: don't enable promiscuous mode if device offloads the proto - Support active scanning in IEEE 802.15.4 - Continue work on Multi-Link Operation for WiFi 7 BPF: - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer *should* be, without writing anything - Accept dynptr memory as memory arguments passed to helpers - Add routing table ID to bpf_fib_lookup BPF helper - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only) - Show target_{obj,btf}_id in tracing link fdinfo - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs Netfilter: - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value - Increase ip_vs_conn_tab_bits range for 64BIT builds - Allow updating size of a set - Improve NAT tuple selection when connection is closing Driver API: - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out) - Support configuring rate selection pins of SFP modules - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio) - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message - Split devlink instance and devlink port operations New hardware / drivers: - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers: - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips" * tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits) net: scm: introduce and use scm_recv_unix helper af_unix: Skip SCM_PIDFD if scm->pid is NULL. net: lan743x: Simplify comparison netlink: Add __sock_i_ino() for __netlink_diag_dump(). net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses Revert "af_unix: Call scm_recv() only after scm_set_cred()." phylink: ReST-ify the phylink_pcs_neg_mode() kdoc libceph: Partially revert changes to support MSG_SPLICE_PAGES net: phy: mscc: fix packet loss due to RGMII delays net: mana: use vmalloc_array and vcalloc net: enetc: use vmalloc_array and vcalloc ionic: use vmalloc_array and vcalloc pds_core: use vmalloc_array and vcalloc gve: use vmalloc_array and vcalloc octeon_ep: use vmalloc_array and vcalloc net: usb: qmi_wwan: add u-blox 0x1312 composition perf trace: fix MSG_SPLICE_PAGES build error ipvlan: Fix return value of ipvlan_queue_xmit() netfilter: nf_tables: fix underflow in chain reference counter netfilter: nf_tables: unbind non-anonymous set if rule construction fails ...
2023-06-28Merge tag 'v6.5-rc1-sysctl-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "The changes for sysctl are in line with prior efforts to stop usage of deprecated routines which incur recursion and also make it hard to remove the empty array element in each sysctl array declaration. The most difficult user to modify was parport which required a bit of re-thinking of how to declare shared sysctls there, Joel Granados has stepped up to the plate to do most of this work and eventual removal of register_sysctl_table(). That work ended up saving us about 1465 bytes according to bloat-o-meter. Since we gained a few bloat-o-meter karma points I moved two rather small sysctl arrays from kernel/sysctl.c leaving us only two more sysctl arrays to move left. Most changes have been tested on linux-next for about a month. The last straggler patches are a minor parport fix, changes to the sysctl kernel selftest so to verify correctness and prevent regressions for the future change he made to provide an alternative solution for the special sysctl mount point target which was using the now deprecated sysctl child element. This is all prep work to now finally be able to remove the empty array element in all sysctl declarations / registrations which is expected to save us a bit of bytes all over the kernel. That work will be tested early after v6.5-rc1 is out" * tag 'v6.5-rc1-sysctl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: replace child with an enumeration sysctl: Remove debugging dump_stack test_sysclt: Test for registering a mount point test_sysctl: Add an option to prevent test skip test_sysctl: Add an unregister sysctl test test_sysctl: Group node sysctl test under one func test_sysctl: Fix test metadata getters parport: plug a sysctl register leak sysctl: move security keys sysctl registration to its own file sysctl: move umh sysctl registration to its own file signal: move show_unhandled_signals sysctl to its own file sysctl: remove empty dev table sysctl: Remove register_sysctl_table sysctl: Refactor base paths registrations sysctl: stop exporting register_sysctl_table parport: Removed sysctl related defines parport: Remove register_sysctl_table from parport_default_proc_register parport: Remove register_sysctl_table from parport_device_proc_register parport: Remove register_sysctl_table from parport_proc_register parport: Move magic number "15" to a define