summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2025-02-17Merge commit '71f0dd5a3293d75d26d405ffbaedfdda4836af32' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next into for-6.15/io_uring-rx-zc Merge networking zerocopy receive tree, to get the prep patches for the io_uring rx zc support. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (63 commits) net: add helpers for setting a memory provider on an rx queue net: page_pool: add memory provider helpers net: prepare for non devmem TCP memory providers net: page_pool: add a mp hook to unregister_netdevice* net: page_pool: add callback for mp info printing netdev: add io_uring memory provider info net: page_pool: create hooks for custom memory providers net: generalise net_iov chunk owners net: prefix devmem specific helpers net: page_pool: don't cast mp param to devmem tools: ynl: add all headers to makefile deps eth: fbnic: set IFF_UNICAST_FLT to avoid enabling promiscuous mode when adding unicast addrs eth: fbnic: add MAC address TCAM to debugfs tools: ynl-gen: support limits using definitions tools: ynl-gen: don't output external constants net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled net/mlx5e: Remove unused mlx5e_tc_flow_action struct net/mlx5: Remove stray semicolon in LAG port selection table creation net/mlx5e: Support FEC settings for 200G per lane link modes net/mlx5: Add support for 200Gbps per lane link modes ...
2025-02-13Merge tag 'net-6.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, wireless and bluetooth. Kalle Valo steps down after serving as the WiFi driver maintainer for over a decade. Current release - fix to a fix: - vsock: orphan socket after transport release, avoid null-deref - Bluetooth: L2CAP: fix corrupted list in hci_chan_del Current release - regressions: - eth: - stmmac: correct Rx buffer layout when SPH is enabled - iavf: fix a locking bug in an error path - rxrpc: fix alteration of headers whilst zerocopy pending - s390/qeth: move netif_napi_add_tx() and napi_enable() from under BH - Revert "netfilter: flowtable: teardown flow if cached mtu is stale" Current release - new code bugs: - rxrpc: fix ipv6 path MTU discovery, only ipv4 worked - pse-pd: fix deadlock in current limit functions Previous releases - regressions: - rtnetlink: fix netns refleak with rtnl_setlink() - wifi: brcmfmac: use random seed flag for BCM4355 and BCM4364 firmware Previous releases - always broken: - add missing RCU protection of struct net throughout the stack - can: rockchip: bail out if skb cannot be allocated - eth: ti: am65-cpsw: base XDP support fixes Misc: - ethtool: tsconfig: update the format of hwtstamp flags, changes the uAPI but this uAPI was not in any release yet" * tag 'net-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) net: pse-pd: Fix deadlock in current limit functions rxrpc: Fix ipv6 path MTU discovery Reapply "net: skb: introduce and use a single page frag cache" s390/qeth: move netif_napi_add_tx() and napi_enable() from under BH mlxsw: Add return value check for mlxsw_sp_port_get_stats_raw() ipv6: mcast: add RCU protection to mld_newpack() team: better TEAM_OPTION_TYPE_STRING validation Bluetooth: L2CAP: Fix corrupted list in hci_chan_del Bluetooth: btintel_pcie: Fix a potential race condition Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd net: ethernet: ti: am65_cpsw: fix tx_cleanup for XDP case net: ethernet: ti: am65-cpsw: fix RX & TX statistics for XDP_TX case net: ethernet: ti: am65-cpsw: fix memleak in certain XDP cases vsock/test: Add test for SO_LINGER null ptr deref vsock: Orphan socket after transport release MAINTAINERS: Add sctp headers to the general netdev entry Revert "netfilter: flowtable: teardown flow if cached mtu is stale" iavf: Fix a locking bug in an error path rxrpc: Fix alteration of headers whilst zerocopy pending net: phylink: make configuring clock-stop dependent on MAC support ...
2025-02-13rxrpc: Fix ipv6 path MTU discoveryDavid Howells
rxrpc path MTU discovery currently only makes use of ICMPv4, but not ICMPv6, which means that pmtud for IPv6 doesn't work correctly. Fix it to check for ICMPv6 messages also. Fixes: eeaedc5449d9 ("rxrpc: Implement path-MTU probing using padded PING ACKs (RFC8899)") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/3517283.1739359284@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-13Merge tag 'for-net-2025-02-13' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - btintel_pcie: Fix a potential race condition - L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd - L2CAP: Fix corrupted list in hci_chan_del * tag 'for-net-2025-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Fix corrupted list in hci_chan_del Bluetooth: btintel_pcie: Fix a potential race condition Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd ==================== Link: https://patch.msgid.link/20250213162446.617632-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-13Merge tag 'nf-25-02-13' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains one revert for: 1) Revert flowtable entry teardown cycle when skbuff exceeds mtu to deal with DF flag unset scenarios. This is reverts a patch coming in the previous merge window (available in 6.14-rc releases). * tag 'nf-25-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: Revert "netfilter: flowtable: teardown flow if cached mtu is stale" ==================== Link: https://patch.msgid.link/20250213100502.3983-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-13Reapply "net: skb: introduce and use a single page frag cache"Jakub Kicinski
This reverts commit 011b0335903832facca86cd8ed05d7d8d94c9c76. Sabrina reports that the revert may trigger warnings due to intervening changes, especially the ability to rise MAX_SKB_FRAGS. Let's drop it and revisit once that part is also ironed out. Fixes: 011b03359038 ("Revert "net: skb: introduce and use a single page frag cache"") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/6bf54579233038bc0e76056c5ea459872ce362ab.1739375933.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-13ipv6: mcast: add RCU protection to mld_newpack()Eric Dumazet
mld_newpack() can be called without RTNL or RCU being held. Note that we no longer can use sock_alloc_send_skb() because ipv6.igmp_sk uses GFP_KERNEL allocations which can sleep. Instead use alloc_skb() and charge the net->ipv6.igmp_sk socket under RCU protection. Fixes: b8ad0cbc58f7 ("[NETNS][IPV6] mcast - handle several network namespace") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250212141021.1663666-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-13Bluetooth: L2CAP: Fix corrupted list in hci_chan_delLuiz Augusto von Dentz
This fixes the following trace by reworking the locking of l2cap_conn so instead of only locking when changing the chan_l list this promotes chan_lock to a general lock of l2cap_conn so whenever it is being held it would prevents the likes of l2cap_conn_del to run: list_del corruption, ffff888021297e00->prev is LIST_POISON2 (dead000000000122) ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:61! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 UID: 0 PID: 5896 Comm: syz-executor213 Not tainted 6.14.0-rc1-next-20250204-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024 RIP: 0010:__list_del_entry_valid_or_report+0x12c/0x190 lib/list_debug.c:59 Code: 8c 4c 89 fe 48 89 da e8 32 8c 37 fc 90 0f 0b 48 89 df e8 27 9f 14 fd 48 c7 c7 a0 c0 60 8c 4c 89 fe 48 89 da e8 15 8c 37 fc 90 <0f> 0b 4c 89 e7 e8 0a 9f 14 fd 42 80 3c 2b 00 74 08 4c 89 e7 e8 cb RSP: 0018:ffffc90003f6f998 EFLAGS: 00010246 RAX: 000000000000004e RBX: dead000000000122 RCX: 01454d423f7fbf00 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff819f077c R09: 1ffff920007eded0 R10: dffffc0000000000 R11: fffff520007eded1 R12: dead000000000122 R13: dffffc0000000000 R14: ffff8880352248d8 R15: ffff888021297e00 FS: 00007f7ace6686c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7aceeeb1d0 CR3: 000000003527c000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __list_del_entry_valid include/linux/list.h:124 [inline] __list_del_entry include/linux/list.h:215 [inline] list_del_rcu include/linux/rculist.h:168 [inline] hci_chan_del+0x70/0x1b0 net/bluetooth/hci_conn.c:2858 l2cap_conn_free net/bluetooth/l2cap_core.c:1816 [inline] kref_put include/linux/kref.h:65 [inline] l2cap_conn_put+0x70/0xe0 net/bluetooth/l2cap_core.c:1830 l2cap_sock_shutdown+0xa8a/0x1020 net/bluetooth/l2cap_sock.c:1377 l2cap_sock_release+0x79/0x1d0 net/bluetooth/l2cap_sock.c:1416 __sock_release net/socket.c:642 [inline] sock_close+0xbc/0x240 net/socket.c:1393 __fput+0x3e9/0x9f0 fs/file_table.c:448 task_work_run+0x24f/0x310 kernel/task_work.c:227 ptrace_notify+0x2d2/0x380 kernel/signal.c:2522 ptrace_report_syscall include/linux/ptrace.h:415 [inline] ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline] syscall_exit_work+0xc7/0x1d0 kernel/entry/common.c:173 syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline] syscall_exit_to_user_mode+0x24a/0x340 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f7aceeaf449 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f7ace668218 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: fffffffffffffffc RBX: 00007f7acef39328 RCX: 00007f7aceeaf449 RDX: 000000000000000e RSI: 0000000020000100 RDI: 0000000000000004 RBP: 00007f7acef39320 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 R13: 0000000000000004 R14: 00007f7ace668670 R15: 000000000000000b </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:__list_del_entry_valid_or_report+0x12c/0x190 lib/list_debug.c:59 Code: 8c 4c 89 fe 48 89 da e8 32 8c 37 fc 90 0f 0b 48 89 df e8 27 9f 14 fd 48 c7 c7 a0 c0 60 8c 4c 89 fe 48 89 da e8 15 8c 37 fc 90 <0f> 0b 4c 89 e7 e8 0a 9f 14 fd 42 80 3c 2b 00 74 08 4c 89 e7 e8 cb RSP: 0018:ffffc90003f6f998 EFLAGS: 00010246 RAX: 000000000000004e RBX: dead000000000122 RCX: 01454d423f7fbf00 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff819f077c R09: 1ffff920007eded0 R10: dffffc0000000000 R11: fffff520007eded1 R12: dead000000000122 R13: dffffc0000000000 R14: ffff8880352248d8 R15: ffff888021297e00 FS: 00007f7ace6686c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7acef05b08 CR3: 000000003527c000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Reported-by: syzbot+10bd8fe6741eedd2be2e@syzkaller.appspotmail.com Tested-by: syzbot+10bd8fe6741eedd2be2e@syzkaller.appspotmail.com Fixes: b4f82f9ed43a ("Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
2025-02-13Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmdLuiz Augusto von Dentz
After the hci sync command releases l2cap_conn, the hci receive data work queue references the released l2cap_conn when sending to the upper layer. Add hci dev lock to the hci receive data work queue to synchronize the two. [1] BUG: KASAN: slab-use-after-free in l2cap_send_cmd+0x187/0x8d0 net/bluetooth/l2cap_core.c:954 Read of size 8 at addr ffff8880271a4000 by task kworker/u9:2/5837 CPU: 0 UID: 0 PID: 5837 Comm: kworker/u9:2 Not tainted 6.13.0-rc5-syzkaller-00163-gab75170520d4 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: hci1 hci_rx_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0x169/0x550 mm/kasan/report.c:489 kasan_report+0x143/0x180 mm/kasan/report.c:602 l2cap_build_cmd net/bluetooth/l2cap_core.c:2964 [inline] l2cap_send_cmd+0x187/0x8d0 net/bluetooth/l2cap_core.c:954 l2cap_sig_send_rej net/bluetooth/l2cap_core.c:5502 [inline] l2cap_sig_channel net/bluetooth/l2cap_core.c:5538 [inline] l2cap_recv_frame+0x221f/0x10db0 net/bluetooth/l2cap_core.c:6817 hci_acldata_packet net/bluetooth/hci_core.c:3797 [inline] hci_rx_work+0x508/0xdb0 net/bluetooth/hci_core.c:4040 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Allocated by task 5837: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4329 kmalloc_noprof include/linux/slab.h:901 [inline] kzalloc_noprof include/linux/slab.h:1037 [inline] l2cap_conn_add+0xa9/0x8e0 net/bluetooth/l2cap_core.c:6860 l2cap_connect_cfm+0x115/0x1090 net/bluetooth/l2cap_core.c:7239 hci_connect_cfm include/net/bluetooth/hci_core.h:2057 [inline] hci_remote_features_evt+0x68e/0xac0 net/bluetooth/hci_event.c:3726 hci_event_func net/bluetooth/hci_event.c:7473 [inline] hci_event_packet+0xac2/0x1540 net/bluetooth/hci_event.c:7525 hci_rx_work+0x3f3/0xdb0 net/bluetooth/hci_core.c:4035 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Freed by task 54: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2353 [inline] slab_free mm/slub.c:4613 [inline] kfree+0x196/0x430 mm/slub.c:4761 l2cap_connect_cfm+0xcc/0x1090 net/bluetooth/l2cap_core.c:7235 hci_connect_cfm include/net/bluetooth/hci_core.h:2057 [inline] hci_conn_failed+0x287/0x400 net/bluetooth/hci_conn.c:1266 hci_abort_conn_sync+0x56c/0x11f0 net/bluetooth/hci_sync.c:5603 hci_cmd_sync_work+0x22b/0x400 net/bluetooth/hci_sync.c:332 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Reported-by: syzbot+31c2f641b850a348a734@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=31c2f641b850a348a734 Tested-by: syzbot+31c2f641b850a348a734@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-02-12vsock: Orphan socket after transport releaseMichal Luczaj
During socket release, sock_orphan() is called without considering that it sets sk->sk_wq to NULL. Later, if SO_LINGER is enabled, this leads to a null pointer dereferenced in virtio_transport_wait_close(). Orphan the socket only after transport release. Partially reverts the 'Fixes:' commit. KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] lock_acquire+0x19e/0x500 _raw_spin_lock_irqsave+0x47/0x70 add_wait_queue+0x46/0x230 virtio_transport_release+0x4e7/0x7f0 __vsock_release+0xfd/0x490 vsock_release+0x90/0x120 __sock_release+0xa3/0x250 sock_close+0x14/0x20 __fput+0x35e/0xa90 __x64_sys_close+0x78/0xd0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Reported-by: syzbot+9d55b199192a4be7d02c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c Fixes: fcdd2242c023 ("vsock: Keep the binding until socket destruction") Tested-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250210-vsock-linger-nullderef-v3-1-ef6244d02b54@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12Revert "netfilter: flowtable: teardown flow if cached mtu is stale"Pablo Neira Ayuso
This reverts commit b8baac3b9c5cc4b261454ff87d75ae8306016ffd. IPv4 packets with no DF flag set on result in frequent flow entry teardown cycles, this is visible in the network topology that is used in the nft_flowtable.sh test. nft_flowtable.sh test ocassionally fails reporting that the dscp_fwd test sees no packets going through the flowtable path. Fixes: b8baac3b9c5c ("netfilter: flowtable: teardown flow if cached mtu is stale") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-02-11rxrpc: Fix alteration of headers whilst zerocopy pendingDavid Howells
rxrpc: Fix alteration of headers whilst zerocopy pending AF_RXRPC now uses MSG_SPLICE_PAGES to do zerocopy of the DATA packets when it transmits them, but to reduce the number of descriptors required in the DMA ring, it allocates a space for the protocol header in the memory immediately before the data content so that it can include both in a single descriptor. This is used for either the main RX header or the smaller jumbo subpacket header as appropriate: +----+------+ | RX | | +-+--+DATA | |JH| | +--+------+ Now, when it stitches a large jumbo packet together from a number of individual DATA packets (each of which is 1412 bytes of data), it uses the full RX header from the first and then the jumbo subpacket header for the rest of the components: +---+--+------+--+------+--+------+--+------+--+------+--+------+ |UDP|RX|DATA |JH|DATA |JH|DATA |JH|DATA |JH|DATA |JH|DATA | +---+--+------+--+------+--+------+--+------+--+------+--+------+ As mentioned, the main RX header and the jumbo header overlay one another in memory and the formats don't match, so switching from one to the other means rearranging the fields and adjusting the flags. However, now that TLP has been included, it wants to retransmit the last subpacket as a new data packet on its own, which means switching between the header formats... and if the transmission is still pending, because of the MSG_SPLICE_PAGES, we end up corrupting the jumbo subheader. This has a variety of effects, with the RX service number overwriting the jumbo checksum/key number field and the RX checksum overwriting the jumbo flags - resulting in, at the very least, a confused connection-level abort from the peer. Fix this by leaving the jumbo header in the allocation with the data, but allocating the RX header from the page frag allocator and concocting it on the fly at the point of transmission as it does for ACK packets. Fixes: 7c482665931b ("rxrpc: Implement RACK/TLP to deal with transmission stalls [RFC8985]") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/2181712.1739131675@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11Merge tag 'batadv-net-pullrequest-20250207' of ↵Paolo Abeni
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - Fix panic during interface removal in BATMAN V, by Andy Strohman - Cleanup BATMAN V/ELP metric handling, by Sven Eckelmann (2 patches) - Fix incorrect offset in batadv_tt_tvlv_ogm_handler_v1(), by Remi Pommarel * tag 'batadv-net-pullrequest-20250207' of git://git.open-mesh.org/linux-merge: batman-adv: Fix incorrect offset in batadv_tt_tvlv_ogm_handler_v1() batman-adv: Drop unmanaged ELP metric worker batman-adv: Ignore neighbor throughput metrics in error case batman-adv: fix panic during interface removal ==================== Link: https://patch.msgid.link/20250207095823.26043-1-sw@simonwunderlich.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-10Merge tag 'linux-can-fixes-for-6.14-20250208' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-02-08 The first patch is by Reyders Morales and fixes a code example in the CAN ISO15765-2 documentation. The next patch is contributed by Alexander Hölzl and fixes sending of J1939 messages with zero data length. Fedor Pchelkin's patch for the ctucanfd driver adds a missing handling for an skb allocation error. Krzysztof Kozlowski contributes a patch for the c_can driver to fix unbalanced runtime PM disable in error path. The next patch is by Vincent Mailhol and fixes a NULL pointer dereference on udev->serial in the etas_es58x driver. The patch is by Robin van der Gracht and fixes the handling for an skb allocation error. * tag 'linux-can-fixes-for-6.14-20250208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: rockchip: rkcanfd_handle_rx_fifo_overflow_int(): bail out if skb cannot be allocated can: etas_es58x: fix potential NULL pointer dereference on udev->serial can: c_can: fix unbalanced runtime PM disable in error path can: ctucanfd: handle skb allocation failure can: j1939: j1939_sk_send_loop(): fix unable to send messages with data length zero Documentation/networking: fix basic node example document ISO 15765-2 ==================== Link: https://patch.msgid.link/20250208115120.237274-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10ipv6: mcast: extend RCU protection in igmp6_send()Eric Dumazet
igmp6_send() can be called without RTNL or RCU being held. Extend RCU protection so that we can safely fetch the net pointer and avoid a potential UAF. Note that we no longer can use sock_alloc_send_skb() because ipv6.igmp_sk uses GFP_KERNEL allocations which can sleep. Instead use alloc_skb() and charge the net->ipv6.igmp_sk socket under RCU protection. Fixes: b8ad0cbc58f7 ("[NETNS][IPV6] mcast - handle several network namespace") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10ndisc: extend RCU protection in ndisc_send_skb()Eric Dumazet
ndisc_send_skb() can be called without RTNL or RCU held. Acquire rcu_read_lock() earlier, so that we can use dev_net_rcu() and avoid a potential UAF. Fixes: 1762f7e88eb3 ("[NETNS][IPV6] ndisc - make socket control per namespace") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10openvswitch: use RCU protection in ovs_vport_cmd_fill_info()Eric Dumazet
ovs_vport_cmd_fill_info() can be called without RTNL or RCU. Use RCU protection and dev_net_rcu() to avoid potential UAF. Fixes: 9354d4520342 ("openvswitch: reliable interface indentification in port dumps") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10arp: use RCU protection in arp_xmit()Eric Dumazet
arp_xmit() can be called without RTNL or RCU protection. Use RCU protection to avoid potential UAF. Fixes: 29a26a568038 ("netfilter: Pass struct net into the netfilter hooks") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10neighbour: use RCU protection in __neigh_notify()Eric Dumazet
__neigh_notify() can be called without RTNL or RCU protection. Use RCU protection to avoid potential UAF. Fixes: 426b5303eb43 ("[NETNS]: Modify the neighbour table code so it handles multiple network namespaces") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10ndisc: use RCU protection in ndisc_alloc_skb()Eric Dumazet
ndisc_alloc_skb() can be called without RTNL or RCU being held. Add RCU protection to avoid possible UAF. Fixes: de09334b9326 ("ndisc: Introduce ndisc_alloc_skb() helper.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10ndisc: ndisc_send_redirect() must use dev_get_by_index_rcu()Eric Dumazet
ndisc_send_redirect() is called under RCU protection, not RTNL. It must use dev_get_by_index_rcu() instead of __dev_get_by_index() Fixes: 2f17becfbea5 ("vrf: check the original netdevice for generating redirect") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10Merge tag 'hid-for-linus-2025021001' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - build/dependency fixes for hid-lenovo and hid-intel-thc (Arnd Bergmann) - functional fixes for hid-corsair-void (Stuart Hayhurst) - workqueue handling and ordering fix for hid-steam (Vicki Pfau) - Gamepad mode vs. Lizard mode fix for hid-steam (Vicki Pfau) - OOB read fix for hid-thrustmaster (Tulio Fernandes) - fix for very long timeout on certain firmware in intel-ish-hid (Zhang Lixu) - other assorted small code fixes and device ID additions * tag 'hid-for-linus-2025021001' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: hid-steam: Don't use cancel_delayed_work_sync in IRQ context HID: hid-steam: Move hidraw input (un)registering to work HID: hid-thrustmaster: fix stack-out-of-bounds read in usb_check_int_endpoints() HID: apple: fix up the F6 key on the Omoton KB066 keyboard HID: hid-apple: Apple Magic Keyboard a3203 USB-C support samples/hid: fix broken vmlinux path for VMLINUX_BTF samples/hid: remove unnecessary -I flags from libbpf EXTRA_CFLAGS HID: topre: Fix n-key rollover on Realforce R3S TKL boards HID: intel-ish-hid: ipc: Add Panther Lake PCI device IDs HID: multitouch: Add NULL check in mt_input_configured HID: winwing: Add NULL check in winwing_init_led() HID: hid-steam: Fix issues with disabling both gamepad mode and lizard mode HID: ignore non-functional sensor in HP 5MP Camera HID: intel-thc: fix CONFIG_HID dependency HID: lenovo: select CONFIG_ACPI_PLATFORM_PROFILE HID: intel-ish-hid: Send clock sync message immediately after reset HID: intel-ish-hid: fix the length of MNG_SYNC_FW_CLOCK in doorbell HID: corsair-void: Initialise memory for psy_cfg HID: corsair-void: Add missing delayed work cancel for headset status
2025-02-08can: j1939: j1939_sk_send_loop(): fix unable to send messages with data ↵Alexander Hölzl
length zero The J1939 standard requires the transmission of messages of length 0. For example proprietary messages are specified with a data length of 0 to 1785. The transmission of such messages is not possible. Sending results in no error being returned but no corresponding can frame being generated. Enable the transmission of zero length J1939 messages. In order to facilitate this two changes are necessary: 1) If the transmission of a new message is requested from user space the message is segmented in j1939_sk_send_loop(). Let the segmentation take into account zero length messages, do not terminate immediately, queue the corresponding skb. 2) j1939_session_skb_get_by_offset() selects the next skb to transmit for a session. Take into account that there might be zero length skbs in the queue. Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250205174651.103238-1-alexander.hoelzl@gmx.net Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Cc: stable@vger.kernel.org [mkl: commit message rephrased] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-02-07Revert "net: skb: introduce and use a single page frag cache"Paolo Abeni
This reverts commit dbae2b062824 ("net: skb: introduce and use a single page frag cache"). The intended goal of such change was to counter a performance regression introduced by commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs"). Unfortunately, the blamed commit introduces another regression for the virtio_net driver. Such a driver calls napi_alloc_skb() with a tiny size, so that the whole head frag could fit a 512-byte block. The single page frag cache uses a 1K fragment for such allocation, and the additional overhead, under small UDP packets flood, makes the page allocator a bottleneck. Thanks to commit bf9f1baa279f ("net: add dedicated kmem_cache for typical/small skb->head"), this revert does not re-introduce the original regression. Actually, in the relevant test on top of this revert, I measure a small but noticeable positive delta, just above noise level. The revert itself required some additional mangling due to the introduction of the SKB_HEAD_ALIGN() helper and local lock infra in the affected code. Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: dbae2b062824 ("net: skb: introduce and use a single page frag cache") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/e649212fde9f0fdee23909ca0d14158d32bb7425.1738877290.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-07net: fib_rules: annotate data-races around rule->[io]ifindexEric Dumazet
rule->iifindex and rule->oifindex can be read without holding RTNL. Add READ_ONCE()/WRITE_ONCE() annotations where needed. Fixes: 32affa5578f0 ("fib: rules: no longer hold RTNL in fib_nl_dumprule()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250206083051.2494877-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-07Merge tag 'vfs-6.14-rc2.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix fsnotify FMODE_NONOTIFY* handling. This also disables fsnotify on all pseudo files by default apart from very select exceptions. This carries a regression risk so we need to watch out and adapt accordingly. However, it is overall a significant improvement over the current status quo where every rando file can get fsnotify enabled. - Cleanup and simplify lockref_init() after recent lockref changes. - Fix vboxfs build with gcc-15. - Add an assert into inode_set_cached_link() to catch corrupt links. - Allow users to also use an empty string check to detect whether a given mount option string was empty or not. - Fix how security options were appended to statmount()'s ->mnt_opt field. - Fix statmount() selftests to always check the returned mask. - Fix uninitialized value in vfs_statx_path(). - Fix pidfs_ioctl() sanity checks to guard against ioctl() overloading and preserve extensibility. * tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: sanity check the length passed to inode_set_cached_link() pidfs: improve ioctl handling fsnotify: disable pre-content and permission events by default selftests: always check mask returned by statmount(2) fsnotify: disable notification by default for all pseudo files fs: fix adding security options to statmount.mnt_opt fsnotify: use accessor to set FMODE_NONOTIFY_* lockref: remove count argument of lockref_init gfs2: switch to lockref_init(..., 1) gfs2: use lockref_init for gl_lockref statmount: let unset strings be empty vboxsf: fix building with GCC 15 fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()
2025-02-07fsnotify: disable notification by default for all pseudo filesAmir Goldstein
Most pseudo files are not applicable for fsnotify events at all, let alone to the new pre-content events. Disable notifications to all files allocated with alloc_file_pseudo() and enable legacy inotify events for the specific cases of pipe and socket, which have known users of inotify events. Pre-content events are also kept disabled for sockets and pipes. Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches") Reported-by: Alex Williamson <alex.williamson@redhat.com> Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wi2pThSVY=zhO=ZKxViBj5QCRX-=AS2+rVknQgJnHXDFg@mail.gmail.com/ Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250203223205.861346-3-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06rtnetlink: fix netns leak with rtnl_setlink()Nicolas Dichtel
A call to rtnl_nets_destroy() is needed to release references taken on netns put in rtnl_nets. CC: stable@vger.kernel.org Fixes: 636af13f213b ("rtnetlink: Register rtnl_dellink() and rtnl_setlink() with RTNL_FLAG_DOIT_PERNET_WIP.") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250205221037.2474426-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06ax25: Fix refcount leak caused by setting SO_BINDTODEVICE sockoptMurad Masimov
If an AX25 device is bound to a socket by setting the SO_BINDTODEVICE socket option, a refcount leak will occur in ax25_release(). Commit 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()") added decrement of device refcounts in ax25_release(). In order for that to work correctly the refcounts must already be incremented when the device is bound to the socket. An AX25 device can be bound to a socket by either calling ax25_bind() or setting SO_BINDTODEVICE socket option. In both cases the refcounts should be incremented, but in fact it is done only in ax25_bind(). This bug leads to the following issue reported by Syzkaller: ================================================================ refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 1 PID: 5932 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31 Modules linked in: CPU: 1 UID: 0 PID: 5932 Comm: syz-executor424 Not tainted 6.13.0-rc4-syzkaller-00110-g4099a71718b0 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31 Call Trace: <TASK> __refcount_dec include/linux/refcount.h:336 [inline] refcount_dec include/linux/refcount.h:351 [inline] ref_tracker_free+0x710/0x820 lib/ref_tracker.c:236 netdev_tracker_free include/linux/netdevice.h:4156 [inline] netdev_put include/linux/netdevice.h:4173 [inline] netdev_put include/linux/netdevice.h:4169 [inline] ax25_release+0x33f/0xa10 net/ax25/af_ax25.c:1069 __sock_release+0xb0/0x270 net/socket.c:640 sock_close+0x1c/0x30 net/socket.c:1408 ... do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... </TASK> ================================================================ Fix the implementation of ax25_setsockopt() by adding increment of refcounts for the new device bound, and decrement of refcounts for the old unbound device. Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()") Reported-by: syzbot+33841dc6aa3e1d86b78a@syzkaller.appspotmail.com Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Link: https://patch.msgid.link/20250203091203.1744-1-m.masimov@mt-integration.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06net: ethtool: tsconfig: Fix netlink type of hwtstamp flagsKory Maincent
Fix the netlink type for hardware timestamp flags, which are represented as a bitset of flags. Although only one flag is supported currently, the correct netlink bitset type should be used instead of u32 to keep consistency with other fields. Address this by adding a new named string set description for the hwtstamp flag structure. The code has been introduced in the current release so the uAPI change is still okay. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Fixes: 6e9e2eed4f39 ("net: ethtool: Add support for tsconfig command to get/set hwtstamp config") Link: https://patch.msgid.link/20250205110304.375086-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06Merge branch 'io_uring-zero-copy-rx'Jakub Kicinski
David Wei says: ==================== io_uring zero copy rx This patchset contains net/ patches needed by a new io_uring request implementing zero copy rx into userspace pages, eliminating a kernel to user copy. We configure a page pool that a driver uses to fill a hw rx queue to hand out user pages instead of kernel pages. Any data that ends up hitting this hw rx queue will thus be dma'd into userspace memory directly, without needing to be bounced through kernel memory. 'Reading' data out of a socket instead becomes a _notification_ mechanism, where the kernel tells userspace where the data is. The overall approach is similar to the devmem TCP proposal. This relies on hw header/data split, flow steering and RSS to ensure packet headers remain in kernel memory and only desired flows hit a hw rx queue configured for zero copy. Configuring this is outside of the scope of this patchset. We share netdev core infra with devmem TCP. The main difference is that io_uring is used for the uAPI and the lifetime of all objects are bound to an io_uring instance. Data is 'read' using a new io_uring request type. When done, data is returned via a new shared refill queue. A zero copy page pool refills a hw rx queue from this refill queue directly. Of course, the lifetime of these data buffers are managed by io_uring rather than the networking stack, with different refcounting rules. This patchset is the first step adding basic zero copy support. We will extend this iteratively with new features e.g. dynamically allocated zero copy areas, THP support, dmabuf support, improved copy fallback, general optimisations and more. In terms of netdev support, we're first targeting Broadcom bnxt. Patches aren't included since Taehee Yoo has already sent a more comprehensive patchset adding support in [1]. Google gve should already support this, and Mellanox mlx5 support is WIP pending driver changes. =========== Performance =========== Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet. Test setup: * AMD EPYC 9454 * Broadcom BCM957508 200G * Kernel v6.11 base [2] * liburing fork [3] * kperf fork [4] * 4K MTU * Single TCP flow With application thread + net rx softirq pinned to _different_ cores: +-------------------------------+ | epoll | io_uring | |-----------|-------------------| | 82.2 Gbps | 116.2 Gbps (+41%) | +-------------------------------+ Pinned to _same_ core: +-------------------------------+ | epoll | io_uring | |-----------|-------------------| | 62.6 Gbps | 80.9 Gbps (+29%) | +-------------------------------+ ===== Links ===== Broadcom bnxt support: [1]: https://lore.kernel.org/20241003160620.1521626-8-ap420073@gmail.com Linux kernel branch including io_uring bits: [2]: https://github.com/isilence/linux.git zcrx/v13 liburing for testing: [3]: https://github.com/isilence/liburing.git zcrx/next kperf for testing: [4]: https://git.kernel.dk/kperf.git ==================== Link: https://patch.msgid.link/20250204215622.695511-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06net: add helpers for setting a memory provider on an rx queueDavid Wei
Add helpers that properly prep or remove a memory provider for an rx queue then restart the queue. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-11-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06net: page_pool: add memory provider helpersPavel Begunkov
Add helpers for memory providers to interact with page pools. net_mp_niov_{set,clear}_page_pool() serve to [dis]associate a net_iov with a page pool. If used, the memory provider is responsible to match "set" calls with "clear" once a net_iov is not going to be used by a page pool anymore, changing a page pool, etc. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-10-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06net: prepare for non devmem TCP memory providersPavel Begunkov
There is a good bunch of places in generic paths assuming that the only page pool memory provider is devmem TCP. As we want to reuse the net_iov and provider infrastructure, we need to patch it up and explicitly check the provider type when we branch into devmem TCP code. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-9-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06net: page_pool: add a mp hook to unregister_netdevice*Pavel Begunkov
Devmem TCP needs a hook in unregister_netdevice_many_notify() to upkeep the set tracking queues it's bound to, i.e. ->bound_rxqs. Instead of devmem sticking directly out of the genetic path, add a mp function. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-8-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06net: page_pool: add callback for mp info printingPavel Begunkov
Add a mandatory callback that prints information about the memory provider to netlink. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-7-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06net: page_pool: create hooks for custom memory providersPavel Begunkov
A spin off from the original page pool memory providers patch by Jakub, which allows extending page pools with custom allocators. One of such providers is devmem TCP, and the other is io_uring zerocopy added in following patches. Link: https://lore.kernel.org/netdev/20230707183935.997267-7-kuba@kernel.org/ Co-developed-by: Jakub Kicinski <kuba@kernel.org> # initial mp proposal Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-5-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06net: generalise net_iov chunk ownersPavel Begunkov
Currently net_iov stores a pointer to struct dmabuf_genpool_chunk_owner, which serves as a useful abstraction to share data and provide a context. However, it's too devmem specific, and we want to reuse it for other memory providers, and for that we need to decouple net_iov from devmem. Make net_iov to point to a new base structure called net_iov_area, which dmabuf_genpool_chunk_owner extends. Reviewed-by: Mina Almasry <almasrymina@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-4-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06net: prefix devmem specific helpersPavel Begunkov
Add prefixes to all helpers that are specific to devmem TCP, i.e. net_iov_binding[_id]. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-3-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06net: page_pool: don't cast mp param to devmemPavel Begunkov
page_pool_check_memory_provider() is a generic path and shouldn't assume anything about the actual type of the memory provider argument. It's fine while devmem is the only provider, but cast away the devmem specific binding types to avoid confusion. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-2-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06ipv6: Use RCU in ip6_input()Eric Dumazet
Instead of grabbing rcu_read_lock() from ip6_input_finish(), do it earlier in is caller, so that ip6_input() access to dev_net() can be validated by LOCKDEP. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250205155120.1676781-13-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06ipv6: icmp: convert to dev_net_rcu()Eric Dumazet
icmp6_send() must acquire rcu_read_lock() sooner to ensure the dev_net() call done from a safe context. Other ICMPv6 uses of dev_net() seem safe, change them to dev_net_rcu() to get LOCKDEP support to catch bugs. Fixes: 9a43b709a230 ("[NETNS][IPV6] icmp6 - make icmpv6_socket per namespace") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250205155120.1676781-12-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06ipv6: use RCU protection in ip6_default_advmss()Eric Dumazet
ip6_default_advmss() needs rcu protection to make sure the net structure it reads does not disappear. Fixes: 5578689a4e3c ("[NETNS][IPV6] route6 - make route6 per namespace") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250205155120.1676781-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06flow_dissector: use RCU protection to fetch dev_net()Eric Dumazet
__skb_flow_dissect() can be called from arbitrary contexts. It must extend its RCU protection section to include the call to dev_net(), which can become dev_net_rcu(). This makes sure the net structure can not disappear under us. Fixes: 9b52e3f267a6 ("flow_dissector: handle no-skb use case") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250205155120.1676781-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06ipv4: icmp: convert to dev_net_rcu()Eric Dumazet
__icmp_send() must ensure rcu_read_lock() is held, as spotted by Jakub. Other ICMP uses of dev_net() seem safe, change them to dev_net_rcu() to get LOCKDEP support. Fixes: dde1bc0e6f86 ("[NETNS]: Add namespace for ICMP replying code.") Closes: https://lore.kernel.org/netdev/20250203153633.46ce0337@kernel.org/ Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250205155120.1676781-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06ipv4: use RCU protection in __ip_rt_update_pmtu()Eric Dumazet
__ip_rt_update_pmtu() must use RCU protection to make sure the net structure it reads does not disappear. Fixes: 2fbc6e89b2f1 ("ipv4: Update exception handling for multipath routes via same device") Fixes: 1de6b15a434c ("Namespaceify min_pmtu sysctl") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250205155120.1676781-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06ipv4: use RCU protection in inet_select_addr()Eric Dumazet
inet_select_addr() must use RCU protection to make sure the net structure it reads does not disappear. Fixes: c4544c724322 ("[NETNS]: Process inet_select_addr inside a namespace.") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250205155120.1676781-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06ipv4: use RCU protection in rt_is_expired()Eric Dumazet
rt_is_expired() must use RCU protection to make sure the net structure it reads does not disappear. Fixes: e84f84f27647 ("netns: place rt_genid into struct net") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250205155120.1676781-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06ipv4: use RCU protection in ipv4_default_advmss()Eric Dumazet
ipv4_default_advmss() must use RCU protection to make sure the net structure it reads does not disappear. Fixes: 2e9589ff809e ("ipv4: Namespaceify min_adv_mss sysctl knob") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250205155120.1676781-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc2). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>