summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2025-07-14don't bother with path_get()/path_put() in unix_open_file()Al Viro
Once unix_sock ->path is set, we are guaranteed that its ->path will remain unchanged (and pinned) until the socket is closed. OTOH, dentry_open() does not modify the path passed to it. IOW, there's no need to copy unix_sk(sk)->path in unix_open_file() - we can just pass it to dentry_open() and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/20250712054157.GZ1880847@ZenIV Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-13rpl: Fix use-after-free in rpl_do_srh_inline().Kuniyuki Iwashima
Running lwt_dst_cache_ref_loop.sh in selftest with KASAN triggers the splat below [0]. rpl_do_srh_inline() fetches ipv6_hdr(skb) and accesses it after skb_cow_head(), which is illegal as the header could be freed then. Let's fix it by making oldhdr to a local struct instead of a pointer. [0]: [root@fedora net]# ./lwt_dst_cache_ref_loop.sh ... TEST: rpl (input) [ 57.631529] ================================================================== BUG: KASAN: slab-use-after-free in rpl_do_srh_inline.isra.0 (net/ipv6/rpl_iptunnel.c:174) Read of size 40 at addr ffff888122bf96d8 by task ping6/1543 CPU: 50 UID: 0 PID: 1543 Comm: ping6 Not tainted 6.16.0-rc5-01302-gfadd1e6231b1 #23 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Call Trace: <IRQ> dump_stack_lvl (lib/dump_stack.c:122) print_report (mm/kasan/report.c:409 mm/kasan/report.c:521) kasan_report (mm/kasan/report.c:221 mm/kasan/report.c:636) kasan_check_range (mm/kasan/generic.c:175 (discriminator 1) mm/kasan/generic.c:189 (discriminator 1)) __asan_memmove (mm/kasan/shadow.c:94 (discriminator 2)) rpl_do_srh_inline.isra.0 (net/ipv6/rpl_iptunnel.c:174) rpl_input (net/ipv6/rpl_iptunnel.c:201 net/ipv6/rpl_iptunnel.c:282) lwtunnel_input (net/core/lwtunnel.c:459) ipv6_rcv (./include/net/dst.h:471 (discriminator 1) ./include/net/dst.h:469 (discriminator 1) net/ipv6/ip6_input.c:79 (discriminator 1) ./include/linux/netfilter.h:317 (discriminator 1) ./include/linux/netfilter.h:311 (discriminator 1) net/ipv6/ip6_input.c:311 (discriminator 1)) __netif_receive_skb_one_core (net/core/dev.c:5967) process_backlog (./include/linux/rcupdate.h:869 net/core/dev.c:6440) __napi_poll.constprop.0 (net/core/dev.c:7452) net_rx_action (net/core/dev.c:7518 net/core/dev.c:7643) handle_softirqs (kernel/softirq.c:579) do_softirq (kernel/softirq.c:480 (discriminator 20)) </IRQ> <TASK> __local_bh_enable_ip (kernel/softirq.c:407) __dev_queue_xmit (net/core/dev.c:4740) ip6_finish_output2 (./include/linux/netdevice.h:3358 ./include/net/neighbour.h:526 ./include/net/neighbour.h:540 net/ipv6/ip6_output.c:141) ip6_finish_output (net/ipv6/ip6_output.c:215 net/ipv6/ip6_output.c:226) ip6_output (./include/linux/netfilter.h:306 net/ipv6/ip6_output.c:248) ip6_send_skb (net/ipv6/ip6_output.c:1983) rawv6_sendmsg (net/ipv6/raw.c:588 net/ipv6/raw.c:918) __sys_sendto (net/socket.c:714 (discriminator 1) net/socket.c:729 (discriminator 1) net/socket.c:2228 (discriminator 1)) __x64_sys_sendto (net/socket.c:2231) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) RIP: 0033:0x7f68cffb2a06 Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08 RSP: 002b:00007ffefb7c53d0 EFLAGS: 00000202 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000564cd69f10a0 RCX: 00007f68cffb2a06 RDX: 0000000000000040 RSI: 0000564cd69f10a4 RDI: 0000000000000003 RBP: 00007ffefb7c53f0 R08: 0000564cd6a032ac R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000202 R12: 0000564cd69f10a4 R13: 0000000000000040 R14: 00007ffefb7c66e0 R15: 0000564cd69f10a0 </TASK> Allocated by task 1543: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:60 (discriminator 1) mm/kasan/common.c:69 (discriminator 1)) __kasan_slab_alloc (mm/kasan/common.c:319 mm/kasan/common.c:345) kmem_cache_alloc_node_noprof (./include/linux/kasan.h:250 mm/slub.c:4148 mm/slub.c:4197 mm/slub.c:4249) kmalloc_reserve (net/core/skbuff.c:581 (discriminator 88)) __alloc_skb (net/core/skbuff.c:669) __ip6_append_data (net/ipv6/ip6_output.c:1672 (discriminator 1)) ip6_append_data (net/ipv6/ip6_output.c:1859) rawv6_sendmsg (net/ipv6/raw.c:911) __sys_sendto (net/socket.c:714 (discriminator 1) net/socket.c:729 (discriminator 1) net/socket.c:2228 (discriminator 1)) __x64_sys_sendto (net/socket.c:2231) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Freed by task 1543: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:60 (discriminator 1) mm/kasan/common.c:69 (discriminator 1)) kasan_save_free_info (mm/kasan/generic.c:579 (discriminator 1)) __kasan_slab_free (mm/kasan/common.c:271) kmem_cache_free (mm/slub.c:4643 (discriminator 3) mm/slub.c:4745 (discriminator 3)) pskb_expand_head (net/core/skbuff.c:2274) rpl_do_srh_inline.isra.0 (net/ipv6/rpl_iptunnel.c:158 (discriminator 1)) rpl_input (net/ipv6/rpl_iptunnel.c:201 net/ipv6/rpl_iptunnel.c:282) lwtunnel_input (net/core/lwtunnel.c:459) ipv6_rcv (./include/net/dst.h:471 (discriminator 1) ./include/net/dst.h:469 (discriminator 1) net/ipv6/ip6_input.c:79 (discriminator 1) ./include/linux/netfilter.h:317 (discriminator 1) ./include/linux/netfilter.h:311 (discriminator 1) net/ipv6/ip6_input.c:311 (discriminator 1)) __netif_receive_skb_one_core (net/core/dev.c:5967) process_backlog (./include/linux/rcupdate.h:869 net/core/dev.c:6440) __napi_poll.constprop.0 (net/core/dev.c:7452) net_rx_action (net/core/dev.c:7518 net/core/dev.c:7643) handle_softirqs (kernel/softirq.c:579) do_softirq (kernel/softirq.c:480 (discriminator 20)) __local_bh_enable_ip (kernel/softirq.c:407) __dev_queue_xmit (net/core/dev.c:4740) ip6_finish_output2 (./include/linux/netdevice.h:3358 ./include/net/neighbour.h:526 ./include/net/neighbour.h:540 net/ipv6/ip6_output.c:141) ip6_finish_output (net/ipv6/ip6_output.c:215 net/ipv6/ip6_output.c:226) ip6_output (./include/linux/netfilter.h:306 net/ipv6/ip6_output.c:248) ip6_send_skb (net/ipv6/ip6_output.c:1983) rawv6_sendmsg (net/ipv6/raw.c:588 net/ipv6/raw.c:918) __sys_sendto (net/socket.c:714 (discriminator 1) net/socket.c:729 (discriminator 1) net/socket.c:2228 (discriminator 1)) __x64_sys_sendto (net/socket.c:2231) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) The buggy address belongs to the object at ffff888122bf96c0 which belongs to the cache skbuff_small_head of size 704 The buggy address is located 24 bytes inside of freed 704-byte region [ffff888122bf96c0, ffff888122bf9980) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x122bf8 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x200000000000040(head|node=0|zone=2) page_type: f5(slab) raw: 0200000000000040 ffff888101fc0a00 ffffea000464dc00 0000000000000002 raw: 0000000000000000 0000000080270027 00000000f5000000 0000000000000000 head: 0200000000000040 ffff888101fc0a00 ffffea000464dc00 0000000000000002 head: 0000000000000000 0000000080270027 00000000f5000000 0000000000000000 head: 0200000000000003 ffffea00048afe01 00000000ffffffff 00000000ffffffff head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888122bf9580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888122bf9600: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff888122bf9680: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ^ ffff888122bf9700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888122bf9780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: a7a29f9c361f8 ("net: ipv6: add rpl sr tunnel") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-07-13af_packet: fix soft lockup issue caused by tpacket_snd()Yun Lu
When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for pending_refcnt to decrement to zero before returning. The pending_refcnt is decremented by 1 when the skb->destructor function is called, indicating that the skb has been successfully sent and needs to be destroyed. If an error occurs during this process, the tpacket_snd() function will exit and return error, but pending_refcnt may not yet have decremented to zero. Assuming the next send operation is executed immediately, but there are no available frames to be sent in tx_ring (i.e., packet_current_frame returns NULL), and skb is also NULL, the function will not execute wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it will enter a do-while loop, waiting for pending_refcnt to be zero. Even if the previous skb has completed transmission, the skb->destructor function can only be invoked in the ksoftirqd thread (assuming NAPI threading is enabled). When both the ksoftirqd thread and the tpacket_snd operation happen to run on the same CPU, and the CPU trapped in the do-while loop without yielding, the ksoftirqd thread will not get scheduled to run. As a result, pending_refcnt will never be reduced to zero, and the do-while loop cannot exit, eventually leading to a CPU soft lockup issue. In fact, skb is true for all but the first iterations of that loop, and as long as pending_refcnt is not zero, even if incremented by a previous call, wait_for_completion_interruptible_timeout() should be executed to yield the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, the execution condition of this function should be modified to check if pending_refcnt is not zero, instead of check skb. - if (need_wait && skb) { + if (need_wait && packet_read_pending(&po->tx_ring)) { As a result, the judgment conditions are duplicated with the end code of the while loop, and packet_read_pending() is a very expensive function. Actually, this loop can only exit when ph is NULL, so the loop condition can be changed to while (1), and in the "ph = NULL" branch, if the subsequent condition of if is not met, the loop can break directly. Now, the loop logic remains the same as origin but is clearer and more obvious. Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET") Cc: stable@kernel.org Suggested-by: LongJun Tang <tanglongjun@kylinos.cn> Signed-off-by: Yun Lu <luyun@kylinos.cn> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-07-13af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd()Yun Lu
Due to the changes in commit 581073f626e3 ("af_packet: do not call packet_read_pending() from tpacket_destruct_skb()"), every time tpacket_destruct_skb() is executed, the skb_completion is marked as completed. When wait_for_completion_interruptible_timeout() returns completed, the pending_refcnt has not yet been reduced to zero. Therefore, when ph is NULL, the wait function may need to be called multiple times until packet_read_pending() finally returns zero. We should call sock_sndtimeo() only once, otherwise the SO_SNDTIMEO constraint could be way off. Fixes: 581073f626e3 ("af_packet: do not call packet_read_pending() from tpacket_destruct_skb()") Cc: stable@kernel.org Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yun Lu <luyun@kylinos.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-07-13net/sched: sch_qfq: Fix race condition on qfq_aggregateXiang Mei
A race condition can occur when 'agg' is modified in qfq_change_agg (called during qfq_enqueue) while other threads access it concurrently. For example, qfq_dump_class may trigger a NULL dereference, and qfq_delete_class may cause a use-after-free. This patch addresses the issue by: 1. Moved qfq_destroy_class into the critical section. 2. Added sch_tree_lock protection to qfq_dump_class and qfq_dump_class_stats. Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-07-11netlink: make sure we allow at least one dump skbJakub Kicinski
Commit under Fixes tightened up the memory accounting for Netlink sockets. Looks like the accounting is too strict for some existing use cases, Marek reported issues with nl80211 / WiFi iw CLI. To reduce number of iterations Netlink dumps try to allocate messages based on the size of the buffer passed to previous recvmsg() calls. If user space uses a larger buffer in recvmsg() than sk_rcvbuf we will allocate an skb we won't be able to queue. Make sure we always allow at least one skb to be queued. Same workaround is already present in netlink_attachskb(). Alternative would be to cap the allocation size to rcvbuf - rmem_alloc but as I said, the workaround is already present in other places. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/9794af18-4905-46c6-b12c-365ea2f05858@samsung.com Fixes: ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.") Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250711001121.3649033-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-11netlink: Fix rmem check in netlink_broadcast_deliver().Kuniyuki Iwashima
We need to allow queuing at least one skb even when skb is larger than sk->sk_rcvbuf. The cited commit made a mistake while converting a condition in netlink_broadcast_deliver(). Let's correct the rmem check for the allow-one-skb rule. Fixes: ae8f160e7eb24 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250711053208.2965945-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10gre: Fix IPv6 multicast route creation.Guillaume Nault
Use addrconf_add_dev() instead of ipv6_find_idev() in addrconf_gre_config() so that we don't just get the inet6_dev, but also install the default ff00::/8 multicast route. Before commit 3e6a0243ff00 ("gre: Fix again IPv6 link-local address generation."), the multicast route was created at the end of the function by addrconf_add_mroute(). But this code path is now only taken in one particular case (gre devices not bound to a local IP address and in EUI64 mode). For all other cases, the function exits early and addrconf_add_mroute() is not called anymore. Using addrconf_add_dev() instead of ipv6_find_idev() in addrconf_gre_config(), fixes the problem as it will create the default multicast route for all gre devices. This also brings addrconf_gre_config() a bit closer to the normal netdevice IPv6 configuration code (addrconf_dev_config()). Cc: stable@vger.kernel.org Fixes: 3e6a0243ff00 ("gre: Fix again IPv6 link-local address generation.") Reported-by: Aiden Yang <ling@moedove.com> Closes: https://lore.kernel.org/netdev/CANR=AhRM7YHHXVxJ4DmrTNMeuEOY87K2mLmo9KMed1JMr20p6g@mail.gmail.com/ Reviewed-by: Gary Guo <gary@garyguo.net> Tested-by: Gary Guo <gary@garyguo.net> Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/027a923dcb550ad115e6d93ee8bb7d310378bd01.1752070620.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10net: appletalk: Fix device refcount leak in atrtr_create()Kito Xu
When updating an existing route entry in atrtr_create(), the old device reference was not being released before assigning the new device, leading to a device refcount leak. Fix this by calling dev_put() to release the old device reference before holding the new one. Fixes: c7f905f0f6d4 ("[ATALK]: Add missing dev_hold() to atrtr_create().") Signed-off-by: Kito Xu <veritas501@foxmail.com> Link: https://patch.msgid.link/tencent_E1A26771CDAB389A0396D1681A90A49E5D09@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10Merge tag 'wireless-2025-07-10' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Quite a number of fixes still: - mt76 (hadn't sent any fixes so far) - RCU - scanning - decapsulation offload - interface combinations - rt2x00: build fix (bad function pointer prototype) - cfg80211: prevent A-MSDU flipping attacks in mesh - zd1211rw: prevent race ending with NULL ptr deref - cfg80211/mac80211: more S1G fixes - mwifiex: avoid WARN on certain RX frames - mac80211: - avoid stack data leak in WARN cases - fix non-transmitted BSSID search (on certain multi-BSSID APs) - always initialize key list so driver iteration won't crash - fix monitor interface in device restart - fix __free() annotation usage * tag 'wireless-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (26 commits) wifi: mac80211: add the virtual monitor after reconfig complete wifi: mac80211: always initialize sdata::key_list wifi: mac80211: Fix uninitialized variable with __free() in ieee80211_ml_epcs() wifi: mt76: mt792x: Limit the concurrent STA and SoftAP to operate on the same channel wifi: mt76: mt7925: Fix null-ptr-deref in mt7925_thermal_init() wifi: mt76: fix queue assignment for deauth packets wifi: mt76: add a wrapper for wcid access with validation wifi: mt76: mt7921: prevent decap offload config before STA initialization wifi: mt76: mt7925: prevent NULL pointer dereference in mt7925_sta_set_decap_offload() wifi: mt76: mt7925: fix incorrect scan probe IE handling for hw_scan wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan wifi: mt76: mt7925: fix the wrong config for tx interrupt wifi: mt76: Remove RCU section in mt7996_mac_sta_rc_work() wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl() wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl_fixed() wifi: mt76: Move RCU section in mt7996_mcu_set_fixed_field() wifi: mt76: Assume __mt76_connac_mcu_alloc_sta_req runs in atomic context wifi: prevent A-MSDU attacks in mesh networks wifi: rt2x00: fix remove callback type mismatch wifi: mac80211: reject VHT opmode for unsupported channel widths ... ==================== Link: https://patch.msgid.link/20250710122212.24272-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10wifi: mac80211: add the virtual monitor after reconfig completeMiri Korenblit
In reconfig we add the virtual monitor in 2 cases: 1. If we are resuming (it was deleted on suspend) 2. If it was added after an error but before the reconfig (due to the last non-monitor interface removal). In the second case, the removal of the non-monitor interface will succeed but the addition of the virtual monitor will fail, so we add it in the reconfig. The problem is that we mislead the driver to think that this is an existing interface that is getting re-added - while it is actually a completely new interface from the drivers' point of view. Some drivers act differently when a interface is re-added. For example, it might not initialize things because they were already initialized. Such drivers will - in this case - be left with a partialy initialized vif. To fix it, add the virtual monitor after reconfig_complete, so the driver will know that this is a completely new interface. Fixes: 3c3e21e7443b ("mac80211: destroy virtual monitor interface across suspend") Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709233451.648d39b041e8.I2e37b68375278987e303d6c00cc5f3d8334d2f96@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-10wifi: mac80211: always initialize sdata::key_listMiri Korenblit
This is currently not initialized for a virtual monitor, leading to a NULL pointer dereference when - for example - iterating over all the keys of all the vifs. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709233400.8dcefe578497.I4c90a00ae3256520e063199d7f6f2580d5451acf@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-10net/sched: sch_qfq: Fix null-deref in agg_dequeueXiang Mei
To prevent a potential crash in agg_dequeue (net/sched/sch_qfq.c) when cl->qdisc->ops->peek(cl->qdisc) returns NULL, we check the return value before using it, similar to the existing approach in sch_hfsc.c. To avoid code duplication, the following changes are made: 1. Changed qdisc_warn_nonwc(include/net/pkt_sched.h) into a static inline function. 2. Moved qdisc_peek_len from net/sched/sch_hfsc.c to include/net/pkt_sched.h so that sch_qfq can reuse it. 3. Applied qdisc_peek_len in agg_dequeue to avoid crashing. Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250705212143.3982664-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-09rxrpc: Fix oops due to non-existence of prealloc backlog structDavid Howells
If an AF_RXRPC service socket is opened and bound, but calls are preallocated, then rxrpc_alloc_incoming_call() will oops because the rxrpc_backlog struct doesn't get allocated until the first preallocation is made. Fix this by returning NULL from rxrpc_alloc_incoming_call() if there is no backlog struct. This will cause the incoming call to be aborted. Reported-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> Suggested-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Marc Dionne <marc.dionne@auristor.com> cc: Willy Tarreau <w@1wt.eu> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250708211506.2699012-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09rxrpc: Fix bug due to prealloc collisionDavid Howells
When userspace is using AF_RXRPC to provide a server, it has to preallocate incoming calls and assign to them call IDs that will be used to thread related recvmsg() and sendmsg() together. The preallocated call IDs will automatically be attached to calls as they come in until the pool is empty. To the kernel, the call IDs are just arbitrary numbers, but userspace can use the call ID to hold a pointer to prepared structs. In any case, the user isn't permitted to create two calls with the same call ID (call IDs become available again when the call ends) and EBADSLT should result from sendmsg() if an attempt is made to preallocate a call with an in-use call ID. However, the cleanup in the error handling will trigger both assertions in rxrpc_cleanup_call() because the call isn't marked complete and isn't marked as having been released. Fix this by setting the call state in rxrpc_service_prealloc_one() and then marking it as being released before calling the cleanup function. Fixes: 00e907127e6f ("rxrpc: Preallocate peers, conns and calls for incoming service requests") Reported-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250708211506.2699012-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09tcp: refine sk_rcvbuf increase for ooo packetsEric Dumazet
When a passive flow has not been accepted yet, it is not wise to increase sk_rcvbuf when receiving ooo packets. A very busy server might tune down tcp_rmem[1] to better control how much memory can be used by sockets waiting in its listeners accept queues. Fixes: 63ad7dfedfae ("tcp: adjust rcvbuf in presence of reorders") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250707213900.1543248-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net/sched: Abort __tc_modify_qdisc if parent class does not existVictor Nogueira
Lion's patch [1] revealed an ancient bug in the qdisc API. Whenever a user creates/modifies a qdisc specifying as a parent another qdisc, the qdisc API will, during grafting, detect that the user is not trying to attach to a class and reject. However grafting is performed after qdisc_create (and thus the qdiscs' init callback) is executed. In qdiscs that eventually call qdisc_tree_reduce_backlog during init or change (such as fq, hhf, choke, etc), an issue arises. For example, executing the following commands: sudo tc qdisc add dev lo root handle a: htb default 2 sudo tc qdisc add dev lo parent a: handle beef fq Qdiscs such as fq, hhf, choke, etc unconditionally invoke qdisc_tree_reduce_backlog() in their control path init() or change() which then causes a failure to find the child class; however, that does not stop the unconditional invocation of the assumed child qdisc's qlen_notify with a null class. All these qdiscs make the assumption that class is non-null. The solution is ensure that qdisc_leaf() which looks up the parent class, and is invoked prior to qdisc_create(), should return failure on not finding the class. In this patch, we leverage qdisc_leaf to return ERR_PTRs whenever the parentid doesn't correspond to a class, so that we can detect it earlier on and abort before qdisc_create is called. [1] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/ Fixes: 5e50da01d0ce ("[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs") Reported-by: syzbot+d8b58d7b0ad89a678a16@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68663c93.a70a0220.5d25f.0857.GAE@google.com/ Reported-by: syzbot+5eccb463fa89309d8bdc@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68663c94.a70a0220.5d25f.0858.GAE@google.com/ Reported-by: syzbot+1261670bbdefc5485a06@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0013.GAE@google.com/ Reported-by: syzbot+15b96fc3aac35468fe77@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0014.GAE@google.com/ Reported-by: syzbot+4dadc5aecf80324d5a51@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68679e81.a70a0220.29cf51.0016.GAE@google.com/ Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250707210801.372995-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09atm: clip: Fix NULL pointer dereference in vcc_sendmsg()Yue Haibing
atmarpd_dev_ops does not implement the send method, which may cause crash as bellow. BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: Oops: 0010 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5324 Comm: syz.0.0 Not tainted 6.15.0-rc6-syzkaller-00346-g5723cc3450bc #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffc9000d3cf778 EFLAGS: 00010246 RAX: 1ffffffff1910dd1 RBX: 00000000000000c0 RCX: dffffc0000000000 RDX: ffffc9000dc82000 RSI: ffff88803e4c4640 RDI: ffff888052cd0000 RBP: ffffc9000d3cf8d0 R08: ffff888052c9143f R09: 1ffff1100a592287 R10: dffffc0000000000 R11: 0000000000000000 R12: 1ffff92001a79f00 R13: ffff888052cd0000 R14: ffff88803e4c4640 R15: ffffffff8c886e88 FS: 00007fbc762566c0(0000) GS:ffff88808d6c2000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000041f1b000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> vcc_sendmsg+0xa10/0xc50 net/atm/common.c:644 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg+0x219/0x270 net/socket.c:727 ____sys_sendmsg+0x52d/0x830 net/socket.c:2566 ___sys_sendmsg+0x21f/0x2a0 net/socket.c:2620 __sys_sendmmsg+0x227/0x430 net/socket.c:2709 __do_sys_sendmmsg net/socket.c:2736 [inline] __se_sys_sendmmsg net/socket.c:2733 [inline] __x64_sys_sendmmsg+0xa0/0xc0 net/socket.c:2733 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+e34e5e6b5eddb0014def@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/682f82d5.a70a0220.1765ec.0143.GAE@google.com/T Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250705085228.329202-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09atm: clip: Fix infinite recursive call of clip_push().Kuniyuki Iwashima
syzbot reported the splat below. [0] This happens if we call ioctl(ATMARP_MKIP) more than once. During the first call, clip_mkip() sets clip_push() to vcc->push(), and the second call copies it to clip_vcc->old_push(). Later, when the socket is close()d, vcc_destroy_socket() passes NULL skb to clip_push(), which calls clip_vcc->old_push(), triggering the infinite recursion. Let's prevent the second ioctl(ATMARP_MKIP) by checking vcc->user_back, which is allocated by the first call as clip_vcc. Note also that we use lock_sock() to prevent racy calls. [0]: BUG: TASK stack guard page was hit at ffffc9000d66fff8 (stack is ffffc9000d670000..ffffc9000d678000) Oops: stack guard page: 0000 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5322 Comm: syz.0.0 Not tainted 6.16.0-rc4-syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:clip_push+0x5/0x720 net/atm/clip.c:191 Code: e0 8f aa 8c e8 1c ad 5b fa eb ae 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 55 <41> 57 41 56 41 55 41 54 53 48 83 ec 20 48 89 f3 49 89 fd 48 bd 00 RSP: 0018:ffffc9000d670000 EFLAGS: 00010246 RAX: 1ffff1100235a4a5 RBX: ffff888011ad2508 RCX: ffff8880003c0000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888037f01000 RBP: dffffc0000000000 R08: ffffffff8fa104f7 R09: 1ffffffff1f4209e R10: dffffc0000000000 R11: ffffffff8a99b300 R12: ffffffff8a99b300 R13: ffff888037f01000 R14: ffff888011ad2500 R15: ffff888037f01578 FS: 000055557ab6d500(0000) GS:ffff88808d250000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000d66fff8 CR3: 0000000043172000 CR4: 0000000000352ef0 Call Trace: <TASK> clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 ... clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 vcc_destroy_socket net/atm/common.c:183 [inline] vcc_release+0x157/0x460 net/atm/common.c:205 __sock_release net/socket.c:647 [inline] sock_close+0xc0/0x240 net/socket.c:1391 __fput+0x449/0xa70 fs/file_table.c:465 task_work_run+0x1d1/0x260 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xec/0x110 kernel/entry/common.c:114 exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline] do_syscall_64+0x2bd/0x3b0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff31c98e929 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fffb5aa1f78 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 RAX: 0000000000000000 RBX: 0000000000012747 RCX: 00007ff31c98e929 RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003 RBP: 00007ff31cbb7ba0 R08: 0000000000000001 R09: 0000000db5aa226f R10: 00007ff31c7ff030 R11: 0000000000000246 R12: 00007ff31cbb608c R13: 00007ff31cbb6080 R14: ffffffffffffffff R15: 00007fffb5aa2090 </TASK> Modules linked in: Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+0c77cccd6b7cd917b35a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2371d94d248d126c1eb1 Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250704062416.1613927-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09atm: clip: Fix memory leak of struct clip_vcc.Kuniyuki Iwashima
ioctl(ATMARP_MKIP) allocates struct clip_vcc and set it to vcc->user_back. The code assumes that vcc_destroy_socket() passes NULL skb to vcc->push() when the socket is close()d, and then clip_push() frees clip_vcc. However, ioctl(ATMARPD_CTRL) sets NULL to vcc->push() in atm_init_atmarp(), resulting in memory leak. Let's serialise two ioctl() by lock_sock() and check vcc->push() in atm_init_atmarp() to prevent memleak. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250704062416.1613927-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09atm: clip: Fix potential null-ptr-deref in to_atmarpd().Kuniyuki Iwashima
atmarpd is protected by RTNL since commit f3a0592b37b8 ("[ATM]: clip causes unregister hang"). However, it is not enough because to_atmarpd() is called without RTNL, especially clip_neigh_solicit() / neigh_ops->solicit() is unsleepable. Also, there is no RTNL dependency around atmarpd. Let's use a private mutex and RCU to protect access to atmarpd in to_atmarpd(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250704062416.1613927-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09wifi: mac80211: Fix uninitialized variable with __free() in ieee80211_ml_epcs()Pagadala Yesu Anjaneyulu
The cleanup attribute runs kfree() when the variable goes out of scope. There is a possibility that the link_elems variable is uninitialized if the loop ends before an assignment is made to this variable. This leads to uninitialized variable bug. Fix this by assigning link_elems to NULL. Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213231.eeacd3738a7b.I0f876fa1359daeec47ab3aef098255a9c23efd70@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-08rxrpc: Fix over large frame size warningDavid Howells
Under some circumstances, the compiler will emit the following warning for rxrpc_send_response(): net/rxrpc/output.c: In function 'rxrpc_send_response': net/rxrpc/output.c:974:1: warning: the frame size of 1160 bytes is larger than 1024 bytes This occurs because the local variables include a 16-element scatterlist array and a 16-element bio_vec array. It's probably not actually a problem as this function is only called by the rxrpc I/O thread function in a kernel thread and there won't be much on the stack before it. Fix this by overlaying the bio_vec array over the kvec array in the rxrpc_local struct. There is one of these per I/O thread and the kvec array is intended for pointing at bits of a packet to be transmitted, typically a DATA or an ACK packet. As packets for a local endpoint are only transmitted by its specific I/O thread, there can be no race, and so overlaying this bit of memory should be no problem. Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506240423.E942yKJP-lkp@intel.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250707102435.2381045-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08vsock: Fix IOCTL_VM_SOCKETS_GET_LOCAL_CID to check also `transport_local`Michal Luczaj
Support returning VMADDR_CID_LOCAL in case no other vsock transport is available. Fixes: 0e12190578d0 ("vsock: add local transport support in the vsock core") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250703-vsock-transports-toctou-v4-3-98f0eb530747@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08vsock: Fix transport_* TOCTOUMichal Luczaj
Transport assignment may race with module unload. Protect new_transport from becoming a stale pointer. This also takes care of an insecure call in vsock_use_local_transport(); add a lockdep assert. BUG: unable to handle page fault for address: fffffbfff8056000 Oops: Oops: 0000 [#1] SMP KASAN RIP: 0010:vsock_assign_transport+0x366/0x600 Call Trace: vsock_connect+0x59c/0xc40 __sys_connect+0xe8/0x100 __x64_sys_connect+0x6e/0xc0 do_syscall_64+0x92/0x1c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250703-vsock-transports-toctou-v4-2-98f0eb530747@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08vsock: Fix transport_{g2h,h2g} TOCTOUMichal Luczaj
vsock_find_cid() and vsock_dev_do_ioctl() may race with module unload. transport_{g2h,h2g} may become NULL after the NULL check. Introduce vsock_transport_local_cid() to protect from a potential null-ptr-deref. KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] RIP: 0010:vsock_find_cid+0x47/0x90 Call Trace: __vsock_bind+0x4b2/0x720 vsock_bind+0x90/0xe0 __sys_bind+0x14d/0x1e0 __x64_sys_bind+0x6e/0xc0 do_syscall_64+0x92/0x1c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] RIP: 0010:vsock_dev_do_ioctl.isra.0+0x58/0xf0 Call Trace: __x64_sys_ioctl+0x12d/0x190 do_syscall_64+0x92/0x1c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250703-vsock-transports-toctou-v4-1-98f0eb530747@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08tcp: Correct signedness in skb remaining space calculationJiayuan Chen
Syzkaller reported a bug [1] where sk->sk_forward_alloc can overflow. When we send data, if an skb exists at the tail of the write queue, the kernel will attempt to append the new data to that skb. However, the code that checks for available space in the skb is flawed: ''' copy = size_goal - skb->len ''' The types of the variables involved are: ''' copy: ssize_t (s64 on 64-bit systems) size_goal: int skb->len: unsigned int ''' Due to C's type promotion rules, the signed size_goal is converted to an unsigned int to match skb->len before the subtraction. The result is an unsigned int. When this unsigned int result is then assigned to the s64 copy variable, it is zero-extended, preserving its non-negative value. Consequently, copy is always >= 0. Assume we are sending 2GB of data and size_goal has been adjusted to a value smaller than skb->len. The subtraction will result in copy holding a very large positive integer. In the subsequent logic, this large value is used to update sk->sk_forward_alloc, which can easily cause it to overflow. The syzkaller reproducer uses TCP_REPAIR to reliably create this condition. However, this can also occur in real-world scenarios. The tcp_bound_to_half_wnd() function can also reduce size_goal to a small value. This would cause the subsequent tcp_wmem_schedule() to set sk->sk_forward_alloc to a value close to INT_MAX. Further memory allocation requests would then cause sk_forward_alloc to wrap around and become negative. [1]: https://syzkaller.appspot.com/bug?extid=de6565462ab540f50e47 Reported-by: syzbot+de6565462ab540f50e47@syzkaller.appspotmail.com Fixes: 270a1c3de47e ("tcp: Support MSG_SPLICE_PAGES") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20250707054112.101081-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08Revert "xfrm: destroy xfrm_state synchronously on net exit path"Sabrina Dubroca
This reverts commit f75a2804da391571563c4b6b29e7797787332673. With all states (whether user or kern) removed from the hashtables during deletion, there's no need for synchronous destruction of states. xfrm6_tunnel states still need to have been destroyed (which will be the case when its last user is deleted (not destroyed)) so that xfrm6_tunnel_free_spi removes it from the per-netns hashtable before the netns is destroyed. This has the benefit of skipping one synchronize_rcu per state (in __xfrm_state_destroy(sync=true)) when we exit a netns. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-07-08xfrm: delete x->tunnel as we delete xSabrina Dubroca
The ipcomp fallback tunnels currently get deleted (from the various lists and hashtables) as the last user state that needed that fallback is destroyed (not deleted). If a reference to that user state still exists, the fallback state will remain on the hashtables/lists, triggering the WARN in xfrm_state_fini. Because of those remaining references, the fix in commit f75a2804da39 ("xfrm: destroy xfrm_state synchronously on net exit path") is not complete. We recently fixed one such situation in TCP due to defered freeing of skbs (commit 9b6412e6979f ("tcp: drop secpath at the same time as we currently drop dst")). This can also happen due to IP reassembly: skbs with a secpath remain on the reassembly queue until netns destruction. If we can't guarantee that the queues are flushed by the time xfrm_state_fini runs, there may still be references to a (user) xfrm_state, preventing the timely deletion of the corresponding fallback state. Instead of chasing each instance of skbs holding a secpath one by one, this patch fixes the issue directly within xfrm, by deleting the fallback state as soon as the last user state depending on it has been deleted. Destruction will still happen when the final reference is dropped. A separate lockdep class for the fallback state is required since we're going to lock x->tunnel while x is locked. Fixes: 9d4139c76905 ("netns xfrm: per-netns xfrm_state_all list") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-07-07Merge tag 'for-net-2025-07-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_sync: Fix not disabling advertising instance - hci_core: Remove check of BDADDR_ANY in hci_conn_hash_lookup_big_state - hci_sync: Fix attempting to send HCI_Disconnect to BIS handle - hci_event: Fix not marking Broadcast Sink BIS as connected * tag 'for-net-2025-07-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_event: Fix not marking Broadcast Sink BIS as connected Bluetooth: hci_sync: Fix attempting to send HCI_Disconnect to BIS handle Bluetooth: hci_core: Remove check of BDADDR_ANY in hci_conn_hash_lookup_big_state Bluetooth: hci_sync: Fix not disabling advertising instance ==================== Link: https://patch.msgid.link/20250703160409.1791514-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07tipc: Fix use-after-free in tipc_conn_close().Kuniyuki Iwashima
syzbot reported a null-ptr-deref in tipc_conn_close() during netns dismantle. [0] tipc_topsrv_stop() iterates tipc_net(net)->topsrv->conn_idr and calls tipc_conn_close() for each tipc_conn. The problem is that tipc_conn_close() is called after releasing the IDR lock. At the same time, there might be tipc_conn_recv_work() running and it could call tipc_conn_close() for the same tipc_conn and release its last ->kref. Once we release the IDR lock in tipc_topsrv_stop(), there is no guarantee that the tipc_conn is alive. Let's hold the ref before releasing the lock and put the ref after tipc_conn_close() in tipc_topsrv_stop(). [0]: BUG: KASAN: use-after-free in tipc_conn_close+0x122/0x140 net/tipc/topsrv.c:165 Read of size 8 at addr ffff888099305a08 by task kworker/u4:3/435 CPU: 0 PID: 435 Comm: kworker/u4:3 Not tainted 4.19.204-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1fc/0x2ef lib/dump_stack.c:118 print_address_description.cold+0x54/0x219 mm/kasan/report.c:256 kasan_report_error.cold+0x8a/0x1b9 mm/kasan/report.c:354 kasan_report mm/kasan/report.c:412 [inline] __asan_report_load8_noabort+0x88/0x90 mm/kasan/report.c:433 tipc_conn_close+0x122/0x140 net/tipc/topsrv.c:165 tipc_topsrv_stop net/tipc/topsrv.c:701 [inline] tipc_topsrv_exit_net+0x27b/0x5c0 net/tipc/topsrv.c:722 ops_exit_list+0xa5/0x150 net/core/net_namespace.c:153 cleanup_net+0x3b4/0x8b0 net/core/net_namespace.c:553 process_one_work+0x864/0x1570 kernel/workqueue.c:2153 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296 kthread+0x33f/0x460 kernel/kthread.c:259 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Allocated by task 23: kmem_cache_alloc_trace+0x12f/0x380 mm/slab.c:3625 kmalloc include/linux/slab.h:515 [inline] kzalloc include/linux/slab.h:709 [inline] tipc_conn_alloc+0x43/0x4f0 net/tipc/topsrv.c:192 tipc_topsrv_accept+0x1b5/0x280 net/tipc/topsrv.c:470 process_one_work+0x864/0x1570 kernel/workqueue.c:2153 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296 kthread+0x33f/0x460 kernel/kthread.c:259 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Freed by task 23: __cache_free mm/slab.c:3503 [inline] kfree+0xcc/0x210 mm/slab.c:3822 tipc_conn_kref_release net/tipc/topsrv.c:150 [inline] kref_put include/linux/kref.h:70 [inline] conn_put+0x2cd/0x3a0 net/tipc/topsrv.c:155 process_one_work+0x864/0x1570 kernel/workqueue.c:2153 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296 kthread+0x33f/0x460 kernel/kthread.c:259 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 The buggy address belongs to the object at ffff888099305a00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 8 bytes inside of 512-byte region [ffff888099305a00, ffff888099305c00) The buggy address belongs to the page: page:ffffea000264c140 count:1 mapcount:0 mapping:ffff88813bff0940 index:0x0 flags: 0xfff00000000100(slab) raw: 00fff00000000100 ffffea00028b6b88 ffffea0002cd2b08 ffff88813bff0940 raw: 0000000000000000 ffff888099305000 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888099305900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888099305980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888099305a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888099305a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888099305b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: c5fa7b3cf3cb ("tipc: introduce new TIPC server infrastructure") Reported-by: syzbot+d333febcf8f4bc5f6110@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=27169a847a70550d17be Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Link: https://patch.msgid.link/20250702014350.692213-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07netlink: Fix wraparounds of sk->sk_rmem_alloc.Kuniyuki Iwashima
Netlink has this pattern in some places if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) atomic_add(skb->truesize, &sk->sk_rmem_alloc); , which has the same problem fixed by commit 5a465a0da13e ("udp: Fix multiple wraparounds of sk->sk_rmem_alloc."). For example, if we set INT_MAX to SO_RCVBUFFORCE, the condition is always false as the two operands are of int. Then, a single socket can eat as many skb as possible until OOM happens, and we can see multiple wraparounds of sk->sk_rmem_alloc. Let's fix it by using atomic_add_return() and comparing the two variables as unsigned int. Before: [root@fedora ~]# ss -f netlink Recv-Q Send-Q Local Address:Port Peer Address:Port -1668710080 0 rtnl:nl_wraparound/293 * After: [root@fedora ~]# ss -f netlink Recv-Q Send-Q Local Address:Port Peer Address:Port 2147483072 0 rtnl:nl_wraparound/290 * ^ `--- INT_MAX - 576 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Jason Baron <jbaron@akamai.com> Closes: https://lore.kernel.org/netdev/cover.1750285100.git.jbaron@akamai.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250704054824.1580222-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07wifi: prevent A-MSDU attacks in mesh networksMathy Vanhoef
This patch is a mitigation to prevent the A-MSDU spoofing vulnerability for mesh networks. The initial update to the IEEE 802.11 standard, in response to the FragAttacks, missed this case (CVE-2025-27558). It can be considered a variant of CVE-2020-24588 but for mesh networks. This patch tries to detect if a standard MSDU was turned into an A-MSDU by an adversary. This is done by parsing a received A-MSDU as a standard MSDU, calculating the length of the Mesh Control header, and seeing if the 6 bytes after this header equal the start of an rfc1042 header. If equal, this is a strong indication of an ongoing attack attempt. This defense was tested with mac80211_hwsim against a mesh network that uses an empty Mesh Address Extension field, i.e., when four addresses are used, and when using a 12-byte Mesh Address Extension field, i.e., when six addresses are used. Functionality of normal MSDUs and A-MSDUs was also tested, and confirmed working, when using both an empty and 12-byte Mesh Address Extension field. It was also tested with mac80211_hwsim that A-MSDU attacks in non-mesh networks keep being detected and prevented. Note that the vulnerability being patched, and the defense being implemented, was also discussed in the following paper and in the following IEEE 802.11 presentation: https://papers.mathyvanhoef.com/wisec2025.pdf https://mentor.ieee.org/802.11/dcn/25/11-25-0949-00-000m-a-msdu-mesh-spoof-protection.docx Cc: stable@vger.kernel.org Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://patch.msgid.link/20250616004635.224344-1-Mathy.Vanhoef@kuleuven.be Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-07wifi: mac80211: reject VHT opmode for unsupported channel widthsMoon Hee Lee
VHT operating mode notifications are not defined for channel widths below 20 MHz. In particular, 5 MHz and 10 MHz are not valid under the VHT specification and must be rejected. Without this check, malformed notifications using these widths may reach ieee80211_chan_width_to_rx_bw(), leading to a WARN_ON due to invalid input. This issue was reported by syzbot. Reject these unsupported widths early in sta_link_apply_parameters() when opmode_notif is used. The accepted set includes 20, 40, 80, 160, and 80+80 MHz, which are valid for VHT. While 320 MHz is not defined for VHT, it is allowed to avoid rejecting HE or EHT clients that may still send a VHT opmode notification. Reported-by: syzbot+ededba317ddeca8b3f08@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ededba317ddeca8b3f08 Fixes: 751e7489c1d7 ("wifi: mac80211: expose ieee80211_chan_width_to_rx_bw() to drivers") Tested-by: syzbot+ededba317ddeca8b3f08@syzkaller.appspotmail.com Signed-off-by: Moon Hee Lee <moonhee.lee.ca@gmail.com> Link: https://patch.msgid.link/20250703193756.46622-2-moonhee.lee.ca@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-07wifi: mac80211: fix non-transmitted BSSID profile searchJohannes Berg
When the non-transmitted BSSID profile is found, immediately return from the search to not return the wrong profile_len when the profile is found in a multiple BSSID element that isn't the last one in the frame. Fixes: 5023b14cf4df ("mac80211: support profile split between elements") Reported-by: Michael-CY Lee <michael-cy.lee@mediatek.com> Link: https://patch.msgid.link/20250630154501.f26cd45a0ecd.I28e0525d06e8a99e555707301bca29265cf20dc8@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-07wifi: mac80211: clear frame buffer to never leak stackJohannes Berg
In disconnect paths paths, local frame buffers are used to build deauthentication frames to send them over the air and as notifications to userspace. Some internal error paths (that, given no other bugs, cannot happen) don't always initialize the buffers before sending them to userspace, so in the presence of other bugs they can leak stack content. Initialize the buffers to avoid the possibility of this happening. Suggested-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Link: https://patch.msgid.link/20250701072213.13004-2-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-07wifi: mac80211: correctly identify S1G short beaconLachlan Hodges
mac80211 identifies a short beacon by the presence of the next TBTT field, however the standard actually doesn't explicitly state that the next TBTT can't be in a long beacon or even that it is required in a short beacon - and as a result this validation does not work for all vendor implementations. The standard explicitly states that an S1G long beacon shall contain the S1G beacon compatibility element as the first element in a beacon transmitted at a TBTT that is not a TSBTT (Target Short Beacon Transmission Time) as per IEEE80211-2024 11.1.3.10.1. This is validated by 9.3.4.3 Table 9-76 which states that the S1G beacon compatibility element is only allowed in the full set and is not allowed in the minimum set of elements permitted for use within short beacons. Correctly identify short beacons by the lack of an S1G beacon compatibility element as the first element in an S1G beacon frame. Fixes: 9eaffe5078ca ("cfg80211: convert S1G beacon to scan results") Signed-off-by: Simon Wadsworth <simon@morsemicro.com> Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250701075541.162619-1-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-04libceph: Rename hmac_sha256() to ceph_hmac_sha256()Eric Biggers
Rename hmac_sha256() to ceph_hmac_sha256(), to avoid a naming conflict with the upcoming hmac_sha256() library function. This code will be able to use the HMAC-SHA256 library, but that's left for a later commit. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250630160645.3198-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFDAlexander Mikhalitsyn
Now everything is ready to pass PIDFD_STALE to pidfd_prepare(). This will allow opening pidfd for reaped tasks. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-7-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04af_unix: stash pidfs dentry when neededAlexander Mikhalitsyn
We need to ensure that pidfs dentry is allocated when we meet any struct pid for the first time. This will allows us to open pidfd even after the task it corresponds to is reaped. Basically, we need to identify all places where we fill skb/scm_cookie with struct pid reference for the first time and call pidfs_register_pid(). Tricky thing here is that we have a few places where this happends depending on what userspace is doing: - [__scm_replace_pid()] explicitly sending an SCM_CREDENTIALS message and specified pid in a numeric format - [unix_maybe_add_creds()] enabled SO_PASSCRED/SO_PASSPIDFD but didn't send SCM_CREDENTIALS explicitly - [scm_send()] force_creds is true. Netlink case, we don't need to touch it. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-6-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04af_unix/scm: fix whitespace errorsAlexander Mikhalitsyn
Fix whitespace/formatting errors. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-5-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04af_unix: introduce and use scm_replace_pid() helperAlexander Mikhalitsyn
Existing logic in __scm_send() related to filling an struct scm_cookie with a proper struct pid reference is already pretty tricky. Let's simplify it a bit by introducing a new helper. This helper will be extended in one of the next patches. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-4-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04af_unix: introduce unix_skb_to_scm helperAlexander Mikhalitsyn
Instead of open-coding let's consolidate this logic in a separate helper. This will simplify further changes. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-3-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04af_unix: rework unix_maybe_add_creds() to allow sleepAlexander Mikhalitsyn
As a preparation for the next patches we need to allow sleeping in unix_maybe_add_creds() and also return err. Currently, we can't do that as unix_maybe_add_creds() is being called under unix_state_lock(). There is no need for this, really. So let's move call sites of this helper a bit and do necessary function signature changes. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-2-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04xfrm: interface: fix use-after-free after changing collect_md xfrm interfaceEyal Birger
collect_md property on xfrm interfaces can only be set on device creation, thus xfrmi_changelink() should fail when called on such interfaces. The check to enforce this was done only in the case where the xi was returned from xfrmi_locate() which doesn't look for the collect_md interface, and thus the validation was never reached. Calling changelink would thus errornously place the special interface xi in the xfrmi_net->xfrmi hash, but since it also exists in the xfrmi_net->collect_md_xfrmi pointer it would lead to a double free when the net namespace was taken down [1]. Change the check to use the xi from netdev_priv which is available earlier in the function to prevent changes in xfrm collect_md interfaces. [1] resulting oops: [ 8.516540] kernel BUG at net/core/dev.c:12029! [ 8.516552] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 8.516559] CPU: 0 UID: 0 PID: 12 Comm: kworker/u80:0 Not tainted 6.15.0-virtme #5 PREEMPT(voluntary) [ 8.516565] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 8.516569] Workqueue: netns cleanup_net [ 8.516579] RIP: 0010:unregister_netdevice_many_notify+0x101/0xab0 [ 8.516590] Code: 90 0f 0b 90 48 8b b0 78 01 00 00 48 8b 90 80 01 00 00 48 89 56 08 48 89 32 4c 89 80 78 01 00 00 48 89 b8 80 01 00 00 eb ac 90 <0f> 0b 48 8b 45 00 4c 8d a0 88 fe ff ff 48 39 c5 74 5c 41 80 bc 24 [ 8.516593] RSP: 0018:ffffa93b8006bd30 EFLAGS: 00010206 [ 8.516598] RAX: ffff98fe4226e000 RBX: ffffa93b8006bd58 RCX: ffffa93b8006bc60 [ 8.516601] RDX: 0000000000000004 RSI: 0000000000000000 RDI: dead000000000122 [ 8.516603] RBP: ffffa93b8006bdd8 R08: dead000000000100 R09: ffff98fe4133c100 [ 8.516605] R10: 0000000000000000 R11: 00000000000003d2 R12: ffffa93b8006be00 [ 8.516608] R13: ffffffff96c1a510 R14: ffffffff96c1a510 R15: ffffa93b8006be00 [ 8.516615] FS: 0000000000000000(0000) GS:ffff98fee73b7000(0000) knlGS:0000000000000000 [ 8.516619] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.516622] CR2: 00007fcd2abd0700 CR3: 000000003aa40000 CR4: 0000000000752ef0 [ 8.516625] PKRU: 55555554 [ 8.516627] Call Trace: [ 8.516632] <TASK> [ 8.516635] ? rtnl_is_locked+0x15/0x20 [ 8.516641] ? unregister_netdevice_queue+0x29/0xf0 [ 8.516650] ops_undo_list+0x1f2/0x220 [ 8.516659] cleanup_net+0x1ad/0x2e0 [ 8.516664] process_one_work+0x160/0x380 [ 8.516673] worker_thread+0x2aa/0x3c0 [ 8.516679] ? __pfx_worker_thread+0x10/0x10 [ 8.516686] kthread+0xfb/0x200 [ 8.516690] ? __pfx_kthread+0x10/0x10 [ 8.516693] ? __pfx_kthread+0x10/0x10 [ 8.516697] ret_from_fork+0x82/0xf0 [ 8.516705] ? __pfx_kthread+0x10/0x10 [ 8.516709] ret_from_fork_asm+0x1a/0x30 [ 8.516718] </TASK> Fixes: abc340b38ba2 ("xfrm: interface: support collect metadata mode") Reported-by: Lonial Con <kongln9170@gmail.com> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-07-03Merge tag 'net-6.16-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth. Current release - new code bugs: - eth: - txgbe: fix the issue of TX failure - ngbe: specify IRQ vector when the number of VFs is 7 Previous releases - regressions: - sched: always pass notifications when child class becomes empty - ipv4: fix stat increase when udp early demux drops the packet - bluetooth: prevent unintended pause by checking if advertising is active - virtio: fix error reporting in virtqueue_resize - eth: - virtio-net: - ensure the received length does not exceed allocated size - fix the xsk frame's length check - lan78xx: fix WARN in __netif_napi_del_locked on disconnect Previous releases - always broken: - bluetooth: mesh: check instances prior disabling advertising - eth: - idpf: convert control queue mutex to a spinlock - dpaa2: fix xdp_rxq_info leak - amd-xgbe: align CL37 AN sequence as per databook" * tag 'net-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) vsock/vmci: Clear the vmci transport packet properly when initializing it dt-bindings: net: sophgo,sg2044-dwmac: Drop status from the example net: ngbe: specify IRQ vector when the number of VFs is 7 net: wangxun: revert the adjustment of the IRQ vector sequence net: txgbe: request MISC IRQ in ndo_open virtio_net: Enforce minimum TX ring size for reliability virtio_net: Cleanup '2+MAX_SKB_FRAGS' virtio_ring: Fix error reporting in virtqueue_resize virtio-net: xsk: rx: fix the frame's length check virtio-net: use the check_mergeable_len helper virtio-net: remove redundant truesize check with PAGE_SIZE virtio-net: ensure the received length does not exceed allocated size net: ipv4: fix stat increase when udp early demux drops the packet net: libwx: fix the incorrect display of the queue number amd-xgbe: do not double read link status net/sched: Always pass notifications when child class becomes empty nui: Fix dma_mapping_error() check rose: fix dangling neighbour pointers in rose_rt_device_down() enic: fix incorrect MTU comparison in enic_change_mtu() amd-xgbe: align CL37 AN sequence as per databook ...
2025-07-03Bluetooth: hci_event: Fix not marking Broadcast Sink BIS as connectedLuiz Augusto von Dentz
Upon receiving HCI_EVT_LE_BIG_SYNC_ESTABLISHED with status 0x00 (success) the corresponding BIS hci_conn state shall be set to BT_CONNECTED otherwise they will be left with BT_OPEN which is invalid at that point, also create the debugfs and sysfs entries following the same logic as the likes of Broadcast Source BIS and CIS connections. Fixes: f777d8827817 ("Bluetooth: ISO: Notify user space about failed bis connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-03Bluetooth: hci_sync: Fix attempting to send HCI_Disconnect to BIS handleLuiz Augusto von Dentz
BIS/PA connections do have their own cleanup proceedure which are performed by hci_conn_cleanup/bis_cleanup. Fixes: 23205562ffc8 ("Bluetooth: separate CIS_LINK and BIS_LINK link types") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-03Bluetooth: hci_sync: Fix not disabling advertising instanceLuiz Augusto von Dentz
As the code comments on hci_setup_ext_adv_instance_sync suggests the advertising instance needs to be disabled in order to update its parameters, but it was wrongly checking that !adv->pending. Fixes: cba6b758711c ("Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 2") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-03vsock/vmci: Clear the vmci transport packet properly when initializing itHarshaVardhana S A
In vmci_transport_packet_init memset the vmci_transport_packet before populating the fields to avoid any uninitialised data being left in the structure. Cc: Bryan Tan <bryan-bt.tan@broadcom.com> Cc: Vishnu Dasa <vishnu.dasa@broadcom.com> Cc: Broadcom internal kernel review list Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: virtualization@lists.linux.dev Cc: netdev@vger.kernel.org Cc: stable <stable@kernel.org> Signed-off-by: HarshaVardhana S A <harshavardhana.sa@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250701122254.2397440-1-gregkh@linuxfoundation.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>