summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-17net: bridge: Do not offload IGMP/MLD messagesJoseph Huang
Do not offload IGMP/MLD messages as it could lead to IGMP/MLD Reports being unintentionally flooded to Hosts. Instead, let the bridge decide where to send these IGMP/MLD messages. Consider the case where the local host is sending out reports in response to a remote querier like the following: mcast-listener-process (IP_ADD_MEMBERSHIP) \ br0 / \ swp1 swp2 | | QUERIER SOME-OTHER-HOST In the above setup, br0 will want to br_forward() reports for mcast-listener-process's group(s) via swp1 to QUERIER; but since the source hwdom is 0, the report is eligible for tx offloading, and is flooded by hardware to both swp1 and swp2, reaching SOME-OTHER-HOST as well. (Example and illustration provided by Tobias.) Fixes: 472111920f1c ("net: bridge: switchdev: allow the TX data plane forwarding to be offloaded") Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716153551.1830255-1-Joseph.Huang@garmin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge branch ↵Jakub Kicinski
'net-vlan-fix-vlan-0-refcount-imbalance-of-toggling-filtering-during-runtime' Dong Chenchen says: ==================== net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime Fix VLAN 0 refcount imbalance of toggling filtering during runtime. ==================== Link: https://patch.msgid.link/20250716034504.2285203-1-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17selftests: Add test cases for vlan_filter modification during runtimeDong Chenchen
Add test cases for vlan_filter modification during runtime, which may triger null-ptr-ref or memory leak of vlan0. Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Link: https://patch.msgid.link/20250716034504.2285203-3-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtimeDong Chenchen
Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 #61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge tag 'ovpn-net-20250716' of https://github.com/OpenVPN/ovpn-net-nextJakub Kicinski
Antonio Quartulli says: ==================== This bugfix batch includes the following changes: * properly propagate sk mark to skb->mark field * reject unexpected incoming netlink attributes * reset GSO state when moving skb from transport to tunnel layer * tag 'ovpn-net-20250716' of https://github.com/OpenVPN/ovpn-net-next: ovpn: reset GSO metadata after decapsulation ovpn: reject unexpected netlink attributes ovpn: propagate socket mark to skb in UDP ==================== Link: https://patch.msgid.link/20250716115443.16763-1-antonio@openvpn.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17tls: always refresh the queue when reading sockJakub Kicinski
After recent changes in net-next TCP compacts skbs much more aggressively. This unearthed a bug in TLS where we may try to operate on an old skb when checking if all skbs in the queue have matching decrypt state and geometry. BUG: KASAN: slab-use-after-free in tls_strp_check_rcv+0x898/0x9a0 [tls] (net/tls/tls_strp.c:436 net/tls/tls_strp.c:530 net/tls/tls_strp.c:544) Read of size 4 at addr ffff888013085750 by task tls/13529 CPU: 2 UID: 0 PID: 13529 Comm: tls Not tainted 6.16.0-rc5-virtme Call Trace: kasan_report+0xca/0x100 tls_strp_check_rcv+0x898/0x9a0 [tls] tls_rx_rec_wait+0x2c9/0x8d0 [tls] tls_sw_recvmsg+0x40f/0x1aa0 [tls] inet_recvmsg+0x1c3/0x1f0 Always reload the queue, fast path is to have the record in the queue when we wake, anyway (IOW the path going down "if !strp->stm.full_len"). Fixes: 0d87bbd39d7f ("tls: strp: make sure the TCP skbs do not have overlapping data") Link: https://patch.msgid.link/20250716143850.1520292-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17virtio-net: fix recursived rtnl_lock() during probe()Zigit Zo
The deadlock appears in a stack trace like: virtnet_probe() rtnl_lock() virtio_config_changed_work() netdev_notify_peers() rtnl_lock() It happens if the VMM sends a VIRTIO_NET_S_ANNOUNCE request while the virtio-net driver is still probing. The config_work in probe() will get scheduled until virtnet_open() enables the config change notification via virtio_config_driver_enable(). Fixes: df28de7b0050 ("virtio-net: synchronize operstate with admin state on up/down") Signed-off-by: Zigit Zo <zuozhijie@bytedance.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250716115717.1472430-1-zuozhijie@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net/mlx5: Update the list of the PCI supported devicesMaor Gottlieb
Add the upcoming ConnectX-10 device ID to the table of supported PCI device IDs. Cc: stable@vger.kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1752650969-148501-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17hv_netvsc: Set VF priv_flags to IFF_NO_ADDRCONF before open to prevent IPv6 ↵Li Tian
addrconf Set an additional flag IFF_NO_ADDRCONF to prevent ipv6 addrconf. Commit under Fixes added a new flag change that was not made to hv_netvsc resulting in the VF being assinged an IPv6. Fixes: 8a321cf7becc ("net: add IFF_NO_ADDRCONF and use it in bonding to prevent ipv6 addrconf") Suggested-by: Cathy Avery <cavery@redhat.com> Signed-off-by: Li Tian <litian@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20250716002607.4927-1-litian@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17phonet/pep: Move call to pn_skb_get_dst_sockaddr() earlier in pep_sock_accept()Nathan Chancellor
A new warning in clang [1] points out a place in pep_sock_accept() where dst is uninitialized then passed as a const pointer to pep_find_pipe(): net/phonet/pep.c:829:37: error: variable 'dst' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer] 829 | newsk = pep_find_pipe(&pn->hlist, &dst, pipe_handle); | ^~~: Move the call to pn_skb_get_dst_sockaddr(), which initializes dst, to before the call to pep_find_pipe(), so that dst is consistently used initialized throughout the function. Cc: stable@vger.kernel.org Fixes: f7ae8d59f661 ("Phonet: allocate sock from accept syscall rather than soft IRQ") Link: https://github.com/llvm/llvm-project/commit/00dacf8c22f065cb52efb14cd091d441f19b319e [1] Closes: https://github.com/ClangBuiltLinux/linux/issues/2101 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20250715-net-phonet-fix-uninit-const-pointer-v1-1-8efd1bd188b3@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge tag 'wireless-2025-07-17' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple of fixes: - ath12k performance regression from -rc1 - cfg80211 counted_by() removal for scan request as it doesn't match usage and keeps complaining - iwlwifi crash with certain older devices - iwlwifi missing an error path unlock - iwlwifi compatibility with certain BIOS updates * tag 'wireless-2025-07-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: Fix botched indexing conversion wifi: cfg80211: remove scan request n_channels counted_by wifi: ath12k: Fix packets received in WBM error ring with REO LUT enabled wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap wifi: iwlwifi: pcie: fix locking on invalid TOP reset ==================== Link: https://patch.msgid.link/20250717091831.18787-5-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17Merge tag 'nf-25-07-17' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: 1) Three patches to enhance conntrack selftests for resize and clash resolution, from Florian Westphal. 2) Expand nft_concat_range.sh selftest to improve coverage from error path, from Florian Westphal. 3) Hide clash bit to userspace from netlink dumps until there is a good reason to expose, from Florian Westphal. 4) Revert notification for device registration/unregistration for nftables basechains and flowtables, we decided to go for a better way to handle this through the nfnetlink_hook infrastructure which will come via nf-next, patch from Phil Sutter. 5) Fix crash in conntrack due to race related to SLAB_TYPESAFE_BY_RCU that results in removing a recycled object that is not yet in the hashes. Move IPS_CONFIRM setting after the object is in the hashes. From Florian Westphal. netfilter pull request 25-07-17 * tag 'nf-25-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_conntrack: fix crash due to removal of uninitialised entry Revert "netfilter: nf_tables: Add notifications for hook changes" netfilter: nf_tables: hide clash bit from userspace selftests: netfilter: nft_concat_range.sh: send packets to empty set selftests: netfilter: conntrack_resize.sh: also use udpclash tool selftests: netfilter: add conntrack clash resolution test case selftests: netfilter: conntrack_resize.sh: extend resize test ==================== Link: https://patch.msgid.link/20250717095808.41725-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17netfilter: nf_conntrack: fix crash due to removal of uninitialised entryFlorian Westphal
A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check passes ct->timeout has been altered to reflect the absolute 'best before' date instead of a relative time. Step 3 will therefore not remove the entry. Without this change to nf_ct_should_gc() we could still get this sequence: 1. Check if entry has expired. 2. Obtain a reference. 3. Call nf_ct_should_gc() to double-check step 1: 4 - entry is still observed as expired 5 - meanwhile, ct->timeout is corrected to absolute value on other CPU and confirm bit gets set 6 - confirm bit is seen 7 - valid entry is removed again First do check 6), then 4) so the gc expiry check always picks up either confirmed bit unset (entry gets skipped) or expiry re-check failure for re-inited conntrack objects. This change cannot be backported to releases before 5.19. Without commit 8a75a2c17410 ("netfilter: conntrack: remove unconfirmed list") |= IPS_CONFIRMED line cannot be moved without further changes. Cc: Razvan Cojocaru <rzvncj@gmail.com> Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/ Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/ Fixes: 1397af5bfd7d ("netfilter: conntrack: remove the percpu dying list") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-17net: fix segmentation after TCP/UDP fraglist GROFelix Fietkau
Since "net: gro: use cb instead of skb->network_header", the skb network header is no longer set in the GRO path. This breaks fraglist segmentation, which relies on ip_hdr()/tcp_hdr() to check for address/port changes. Fix this regression by selectively setting the network header for merged segment skbs. Fixes: 186b1ea73ad8 ("net: gro: use cb instead of skb->network_header") Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250705150622.10699-1-nbd@nbd.name Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-16ipv6: mcast: Delay put pmc->idev in mld_del_delrec()Yue Haibing
pmc->idev is still used in ip6_mc_clear_src(), so as mld_clear_delrec() does, the reference should be put after ip6_mc_clear_src() return. Fixes: 63ed8de4be81 ("mld: add mc_lock for protecting per-interface mld data") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20250714141957.3301871-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-07-15 (ixgbe, fm10k, i40e, ice) Arnd Bergmann resolves compile issues with large NR_CPUS for ixgbe, fm10k, and i40e. For ice: Dave adds a NULL check for LAG netdev. Michal corrects a pointer check in debugfs initialization. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: check correct pointer in fwlog debugfs ice: add NULL check in eswitch lag check ethernet: intel: fix building with large NR_CPUS ==================== Link: https://patch.msgid.link/20250715202948.3841437-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16net: airoha: fix potential use-after-free in airoha_npu_get()Alok Tiwari
np->name was being used after calling of_node_put(np), which releases the node and can lead to a use-after-free bug. Previously, of_node_put(np) was called unconditionally after of_find_device_by_node(np), which could result in a use-after-free if pdev is NULL. This patch moves of_node_put(np) after the error check to ensure the node is only released after both the error and success cases are handled appropriately, preventing potential resource issues. Fixes: 23290c7bc190 ("net: airoha: Introduce Airoha NPU support") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250715143102.3458286-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16net/mlx5: Correctly set gso_size when LRO is usedChristoph Paasch
gso_size is expected by the networking stack to be the size of the payload (thus, not including ethernet/IP/TCP-headers). However, cqe_bcnt is the full sized frame (including the headers). Dividing cqe_bcnt by lro_num_seg will then give incorrect results. For example, running a bpftrace higher up in the TCP-stack (tcp_event_data_recv), we commonly have gso_size set to 1450 or 1451 even though in reality the payload was only 1448 bytes. This can have unintended consequences: - In tcp_measure_rcv_mss() len will be for example 1450, but. rcv_mss will be 1448 (because tp->advmss is 1448). Thus, we will always recompute scaling_ratio each time an LRO-packet is received. - In tcp_gro_receive(), it will interfere with the decision whether or not to flush and thus potentially result in less gro'ed packets. So, we need to discount the protocol headers from cqe_bcnt so we can actually divide the payload by lro_num_seg to get the real gso_size. v2: - Use "(unsigned char *)tcp + tcp->doff * 4 - skb->data)" to compute header-len (Tariq Toukan <tariqt@nvidia.com>) - Improve commit-message (Gal Pressman <gal@nvidia.com>) Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Signed-off-by: Christoph Paasch <cpaasch@openai.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250715-cpaasch-pf-925-investigate-incorrect-gso_size-on-cx-7-nic-v2-1-e06c3475f3ac@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16ovpn: reset GSO metadata after decapsulationRalf Lici
The ovpn_netdev_write() function is responsible for injecting decapsulated and decrypted packets back into the local network stack. Prior to this patch, the skb could retain GSO metadata from the outer, encrypted tunnel packet. This original GSO metadata, relevant to the sender's transport context, becomes invalid and misleading for the tunnel/data path once the inner packet is exposed. Leaving this stale metadata intact causes internal GSO validation checks further down the kernel's network stack (validate_xmit_skb()) to fail, leading to packet drops. The reasons for these failures vary by protocol, for example: - for ICMP, no offload handler is registered; - for TCP and UDP, the respective offload handlers return errors when comparing skb->len to the outdated skb_shinfo(skb)->gso_size. By calling skb_gso_reset(skb) we ensure the inner packet is presented to gro_cells_receive() with a clean state, correctly indicating it is an individual packet from the perspective of the local stack. This change eliminates the "Driver has suspect GRO implementation, TCP performance may be compromised" warning and improves overall TCP performance by allowing GSO/GRO to function as intended on the decapsulated traffic. Fixes: 11851cbd60ea ("ovpn: implement TCP transport") Reported-by: Gert Doering <gert@greenie.muc.de> Closes: https://github.com/OpenVPN/ovpn-net-next/issues/4 Tested-by: Gert Doering <gert@greenie.muc.de> Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-07-16ovpn: reject unexpected netlink attributesAntonio Quartulli
Netlink ops do not expect all attributes to be always set, however this condition is not explicitly coded any where, leading the user to believe that all sent attributes are somewhat processed. Fix this behaviour by introducing explicit checks. For CMD_OVPN_PEER_GET and CMD_OVPN_KEY_GET directly open-code the needed condition in the related ops handlers. While for all other ops use attribute subsets in the ovpn.yaml spec file. Fixes: b7a63391aa98 ("ovpn: add basic netlink support") Reported-by: Ralf Lici <ralf@mandelbit.com> Closes: https://github.com/OpenVPN/ovpn-net-next/issues/19 Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-07-16ovpn: propagate socket mark to skb in UDPRalf Lici
OpenVPN allows users to configure a FW mark on sockets used to communicate with other peers. The mark is set by means of the `SO_MARK` Linux socket option. However, in the ovpn UDP code path, the socket's `sk_mark` value is currently ignored and it is not propagated to outgoing `skbs`. This commit ensures proper inheritance of the field by setting `skb->mark` to `sk->sk_mark` before handing the `skb` to the network stack for transmission. Fixes: 08857b5ec5d9 ("ovpn: implement basic TX path (UDP)") Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Ralf Lici <ralf@mandelbit.com> Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31877.html Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-07-15Merge branch 'mptcp-fix-fallback-related-races'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: fix fallback-related races This series contains 3 fixes somewhat related to various races we have while handling fallback. The root cause of the issues addressed here is that the check for "we can fallback to tcp now" and the related action are not atomic. That also applies to fallback due to MP_FAIL -- where the window race is even wider. Address the issue introducing an additional spinlock to bundle together all the relevant events, as per patch 1 and 2. These fixes can be backported up to v5.19 and v5.15. Note that mptcp_disconnect() unconditionally clears the fallback status (zeroing msk->flags) but don't touch the `allows_infinite_fallback` flag. Such issue is addressed in patch 3, and can be backported up to v5.17. ==================== Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-0-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15mptcp: reset fallback status gracefully at disconnect() timePaolo Abeni
mptcp_disconnect() clears the fallback bit unconditionally, without touching the associated flags. The bit clear is safe, as no fallback operation can race with that -- all subflow are already in TCP_CLOSE status thanks to the previous FASTCLOSE -- but we need to consistently reset all the fallback related status. Also acquire the relevant lock, to avoid fouling static analyzers. Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-3-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15mptcp: plug races between subflow fail and subflow creationPaolo Abeni
We have races similar to the one addressed by the previous patch between subflow failing and additional subflow creation. They are just harder to trigger. The solution is similar. Use a separate flag to track the condition 'socket state prevent any additional subflow creation' protected by the fallback lock. The socket fallback makes such flag true, and also receiving or sending an MP_FAIL option. The field 'allow_infinite_fallback' is now always touched under the relevant lock, we can drop the ONCE annotation on write. Fixes: 478d770008b0 ("mptcp: send out MP_FAIL when data checksum fails") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-2-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15mptcp: make fallback action and fallback decision atomicPaolo Abeni
Syzkaller reported the following splat: WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 __mptcp_do_fallback net/mptcp/protocol.h:1223 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_do_fallback net/mptcp/protocol.h:1244 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 check_fully_established net/mptcp/options.c:982 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_incoming_options+0x21a8/0x2510 net/mptcp/options.c:1153 Modules linked in: CPU: 1 UID: 0 PID: 7704 Comm: syz.3.1419 Not tainted 6.16.0-rc3-gbd5ce2324dba #20 PREEMPT(voluntary) Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:__mptcp_do_fallback net/mptcp/protocol.h:1223 [inline] RIP: 0010:mptcp_do_fallback net/mptcp/protocol.h:1244 [inline] RIP: 0010:check_fully_established net/mptcp/options.c:982 [inline] RIP: 0010:mptcp_incoming_options+0x21a8/0x2510 net/mptcp/options.c:1153 Code: 24 18 e8 bb 2a 00 fd e9 1b df ff ff e8 b1 21 0f 00 e8 ec 5f c4 fc 44 0f b7 ac 24 b0 00 00 00 e9 54 f1 ff ff e8 d9 5f c4 fc 90 <0f> 0b 90 e9 b8 f4 ff ff e8 8b 2a 00 fd e9 8d e6 ff ff e8 81 2a 00 RSP: 0018:ffff8880a3f08448 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880180a8000 RCX: ffffffff84afcf45 RDX: ffff888090223700 RSI: ffffffff84afdaa7 RDI: 0000000000000001 RBP: ffff888017955780 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8880180a8910 R14: ffff8880a3e9d058 R15: 0000000000000000 FS: 00005555791b8500(0000) GS:ffff88811c495000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000110c2800b7 CR3: 0000000058e44000 CR4: 0000000000350ef0 Call Trace: <IRQ> tcp_reset+0x26f/0x2b0 net/ipv4/tcp_input.c:4432 tcp_validate_incoming+0x1057/0x1b60 net/ipv4/tcp_input.c:5975 tcp_rcv_established+0x5b5/0x21f0 net/ipv4/tcp_input.c:6166 tcp_v4_do_rcv+0x5dc/0xa70 net/ipv4/tcp_ipv4.c:1925 tcp_v4_rcv+0x3473/0x44a0 net/ipv4/tcp_ipv4.c:2363 ip_protocol_deliver_rcu+0xba/0x480 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2f1/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:317 [inline] NF_HOOK include/linux/netfilter.h:311 [inline] ip_local_deliver+0x1be/0x560 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:469 [inline] ip_rcv_finish net/ipv4/ip_input.c:447 [inline] NF_HOOK include/linux/netfilter.h:317 [inline] NF_HOOK include/linux/netfilter.h:311 [inline] ip_rcv+0x514/0x810 net/ipv4/ip_input.c:567 __netif_receive_skb_one_core+0x197/0x1e0 net/core/dev.c:5975 __netif_receive_skb+0x1f/0x120 net/core/dev.c:6088 process_backlog+0x301/0x1360 net/core/dev.c:6440 __napi_poll.constprop.0+0xba/0x550 net/core/dev.c:7453 napi_poll net/core/dev.c:7517 [inline] net_rx_action+0xb44/0x1010 net/core/dev.c:7644 handle_softirqs+0x1d0/0x770 kernel/softirq.c:579 do_softirq+0x3f/0x90 kernel/softirq.c:480 </IRQ> <TASK> __local_bh_enable_ip+0xed/0x110 kernel/softirq.c:407 local_bh_enable include/linux/bottom_half.h:33 [inline] inet_csk_listen_stop+0x2c5/0x1070 net/ipv4/inet_connection_sock.c:1524 mptcp_check_listen_stop.part.0+0x1cc/0x220 net/mptcp/protocol.c:2985 mptcp_check_listen_stop net/mptcp/mib.h:118 [inline] __mptcp_close+0x9b9/0xbd0 net/mptcp/protocol.c:3000 mptcp_close+0x2f/0x140 net/mptcp/protocol.c:3066 inet_release+0xed/0x200 net/ipv4/af_inet.c:435 inet6_release+0x4f/0x70 net/ipv6/af_inet6.c:487 __sock_release+0xb3/0x270 net/socket.c:649 sock_close+0x1c/0x30 net/socket.c:1439 __fput+0x402/0xb70 fs/file_table.c:465 task_work_run+0x150/0x240 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xd4/0xe0 kernel/entry/common.c:114 exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline] do_syscall_64+0x245/0x360 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fc92f8a36ad Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffcf52802d8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 RAX: 0000000000000000 RBX: 00007ffcf52803a8 RCX: 00007fc92f8a36ad RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003 RBP: 00007fc92fae7ba0 R08: 0000000000000001 R09: 0000002800000000 R10: 00007fc92f700000 R11: 0000000000000246 R12: 00007fc92fae5fac R13: 00007fc92fae5fa0 R14: 0000000000026d00 R15: 0000000000026c51 </TASK> irq event stamp: 4068 hardirqs last enabled at (4076): [<ffffffff81544816>] __up_console_sem+0x76/0x80 kernel/printk/printk.c:344 hardirqs last disabled at (4085): [<ffffffff815447fb>] __up_console_sem+0x5b/0x80 kernel/printk/printk.c:342 softirqs last enabled at (3096): [<ffffffff840e1be0>] local_bh_enable include/linux/bottom_half.h:33 [inline] softirqs last enabled at (3096): [<ffffffff840e1be0>] inet_csk_listen_stop+0x2c0/0x1070 net/ipv4/inet_connection_sock.c:1524 softirqs last disabled at (3097): [<ffffffff813b6b9f>] do_softirq+0x3f/0x90 kernel/softirq.c:480 Since we need to track the 'fallback is possible' condition and the fallback status separately, there are a few possible races open between the check and the actual fallback action. Add a spinlock to protect the fallback related information and use it close all the possible related races. While at it also remove the too-early clearing of allow_infinite_fallback in __mptcp_subflow_connect(): the field will be correctly cleared by subflow_finish_connect() if/when the connection will complete successfully. If fallback is not possible, as per RFC, reset the current subflow. Since the fallback operation can now fail and return value should be checked, rename the helper accordingly. Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status") Cc: stable@vger.kernel.org Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/570 Reported-by: syzbot+5cf807c20386d699b524@syzkaller.appspotmail.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/555 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-1-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15net: libwx: fix multicast packets received countJiawen Wu
Multicast good packets received by PF rings that pass ethternet MAC address filtering are counted for rtnl_link_stats64.multicast. The counter is not cleared on read. Fix the duplicate counting on updating statistics. Fixes: 46b92e10d631 ("net: libwx: support hardware statistics") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/DA229A4F58B70E51+20250714015656.91772-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15Merge branch 'fix-rx-fatal-errors'Jakub Kicinski
Jiawen Wu says: ==================== Fix Rx fatal errors There are some fatal errors on the Rx NAPI path, which can cause the kernel to crash. Fix known issues and potential risks. The part of the patches has been mentioned before[1]. [1]: https://lore.kernel.org/all/C8A23A11DB646E60+20250630094102.22265-1-jiawenwu@trustnetic.com/ v1: https://lore.kernel.org/20250709064025.19436-1-jiawenwu@trustnetic.com ==================== Link: https://patch.msgid.link/20250714024755.17512-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15net: libwx: properly reset Rx ring descriptorJiawen Wu
When device reset is triggered by feature changes such as toggling Rx VLAN offload, wx->do_reset() is called to reinitialize Rx rings. The hardware descriptor ring may retain stale values from previous sessions. And only set the length to 0 in rx_desc[0] would result in building malformed SKBs. Fix it to ensure a clean slate after device reset. [ 549.186435] [ C16] ------------[ cut here ]------------ [ 549.186457] [ C16] kernel BUG at net/core/skbuff.c:2814! [ 549.186468] [ C16] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 549.186472] [ C16] CPU: 16 UID: 0 PID: 0 Comm: swapper/16 Kdump: loaded Not tainted 6.16.0-rc4+ #23 PREEMPT(voluntary) [ 549.186476] [ C16] Hardware name: Micro-Star International Co., Ltd. MS-7E16/X670E GAMING PLUS WIFI (MS-7E16), BIOS 1.90 12/31/2024 [ 549.186478] [ C16] RIP: 0010:__pskb_pull_tail+0x3ff/0x510 [ 549.186484] [ C16] Code: 06 f0 ff 4f 34 74 7b 4d 8b 8c 24 c8 00 00 00 45 8b 84 24 c0 00 00 00 e9 c8 fd ff ff 48 c7 44 24 08 00 00 00 00 e9 5e fe ff ff <0f> 0b 31 c0 e9 23 90 5b ff 41 f7 c6 ff 0f 00 00 75 bf 49 8b 06 a8 [ 549.186487] [ C16] RSP: 0018:ffffb391c0640d70 EFLAGS: 00010282 [ 549.186490] [ C16] RAX: 00000000fffffff2 RBX: ffff8fe7e4d40200 RCX: 00000000fffffff2 [ 549.186492] [ C16] RDX: ffff8fe7c3a4bf8e RSI: 0000000000000180 RDI: ffff8fe7c3a4bf40 [ 549.186494] [ C16] RBP: ffffb391c0640da8 R08: ffff8fe7c3a4c0c0 R09: 000000000000000e [ 549.186496] [ C16] R10: ffffb391c0640d88 R11: 000000000000000e R12: ffff8fe7e4d40200 [ 549.186497] [ C16] R13: 00000000fffffff2 R14: ffff8fe7fa01a000 R15: 00000000fffffff2 [ 549.186499] [ C16] FS: 0000000000000000(0000) GS:ffff8fef5ae40000(0000) knlGS:0000000000000000 [ 549.186502] [ C16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 549.186503] [ C16] CR2: 00007f77d81d6000 CR3: 000000051a032000 CR4: 0000000000750ef0 [ 549.186505] [ C16] PKRU: 55555554 [ 549.186507] [ C16] Call Trace: [ 549.186510] [ C16] <IRQ> [ 549.186513] [ C16] ? srso_alias_return_thunk+0x5/0xfbef5 [ 549.186517] [ C16] __skb_pad+0xc7/0xf0 [ 549.186523] [ C16] wx_clean_rx_irq+0x355/0x3b0 [libwx] [ 549.186533] [ C16] wx_poll+0x92/0x120 [libwx] [ 549.186540] [ C16] __napi_poll+0x28/0x190 [ 549.186544] [ C16] net_rx_action+0x301/0x3f0 [ 549.186548] [ C16] ? srso_alias_return_thunk+0x5/0xfbef5 [ 549.186551] [ C16] ? __raw_spin_lock_irqsave+0x1e/0x50 [ 549.186554] [ C16] ? srso_alias_return_thunk+0x5/0xfbef5 [ 549.186557] [ C16] ? wake_up_nohz_cpu+0x35/0x160 [ 549.186559] [ C16] ? srso_alias_return_thunk+0x5/0xfbef5 [ 549.186563] [ C16] handle_softirqs+0xf9/0x2c0 [ 549.186568] [ C16] __irq_exit_rcu+0xc7/0x130 [ 549.186572] [ C16] common_interrupt+0xb8/0xd0 [ 549.186576] [ C16] </IRQ> [ 549.186577] [ C16] <TASK> [ 549.186579] [ C16] asm_common_interrupt+0x22/0x40 [ 549.186582] [ C16] RIP: 0010:cpuidle_enter_state+0xc2/0x420 [ 549.186585] [ C16] Code: 00 00 e8 11 0e 5e ff e8 ac f0 ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 0d ed 5c ff 45 84 ff 0f 85 40 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 84 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d [ 549.186587] [ C16] RSP: 0018:ffffb391c0277e78 EFLAGS: 00000246 [ 549.186590] [ C16] RAX: ffff8fef5ae40000 RBX: 0000000000000003 RCX: 0000000000000000 [ 549.186591] [ C16] RDX: 0000007fde0faac5 RSI: ffffffff826e53f6 RDI: ffffffff826fa9b3 [ 549.186593] [ C16] RBP: ffff8fe7c3a20800 R08: 0000000000000002 R09: 0000000000000000 [ 549.186595] [ C16] R10: 0000000000000000 R11: 000000000000ffff R12: ffffffff82ed7a40 [ 549.186596] [ C16] R13: 0000007fde0faac5 R14: 0000000000000003 R15: 0000000000000000 [ 549.186601] [ C16] ? cpuidle_enter_state+0xb3/0x420 [ 549.186605] [ C16] cpuidle_enter+0x29/0x40 [ 549.186609] [ C16] cpuidle_idle_call+0xfd/0x170 [ 549.186613] [ C16] do_idle+0x7a/0xc0 [ 549.186616] [ C16] cpu_startup_entry+0x25/0x30 [ 549.186618] [ C16] start_secondary+0x117/0x140 [ 549.186623] [ C16] common_startup_64+0x13e/0x148 [ 549.186628] [ C16] </TASK> Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250714024755.17512-4-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15net: libwx: fix the using of Rx buffer DMAJiawen Wu
The wx_rx_buffer structure contained two DMA address fields: 'dma' and 'page_dma'. However, only 'page_dma' was actually initialized and used to program the Rx descriptor. But 'dma' was uninitialized and used in some paths. This could lead to undefined behavior, including DMA errors or use-after-free, if the uninitialized 'dma' was used. Althrough such error has not yet occurred, it is worth fixing in the code. Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250714024755.17512-3-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15net: libwx: remove duplicate page_pool_put_full_page()Jiawen Wu
page_pool_put_full_page() should only be invoked when freeing Rx buffers or building a skb if the size is too short. At other times, the pages need to be reused. So remove the redundant page put. In the original code, double free pages cause kernel panic: [ 876.949834] __irq_exit_rcu+0xc7/0x130 [ 876.949836] common_interrupt+0xb8/0xd0 [ 876.949838] </IRQ> [ 876.949838] <TASK> [ 876.949840] asm_common_interrupt+0x22/0x40 [ 876.949841] RIP: 0010:cpuidle_enter_state+0xc2/0x420 [ 876.949843] Code: 00 00 e8 d1 1d 5e ff e8 ac f0 ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 cd fc 5c ff 45 84 ff 0f 85 40 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 84 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d [ 876.949844] RSP: 0018:ffffaa7340267e78 EFLAGS: 00000246 [ 876.949845] RAX: ffff9e3f135be000 RBX: 0000000000000002 RCX: 0000000000000000 [ 876.949846] RDX: 000000cc2dc4cb7c RSI: ffffffff89ee49ae RDI: ffffffff89ef9f9e [ 876.949847] RBP: ffff9e378f940800 R08: 0000000000000002 R09: 00000000000000ed [ 876.949848] R10: 000000000000afc8 R11: ffff9e3e9e5a9b6c R12: ffffffff8a6d8580 [ 876.949849] R13: 000000cc2dc4cb7c R14: 0000000000000002 R15: 0000000000000000 [ 876.949852] ? cpuidle_enter_state+0xb3/0x420 [ 876.949855] cpuidle_enter+0x29/0x40 [ 876.949857] cpuidle_idle_call+0xfd/0x170 [ 876.949859] do_idle+0x7a/0xc0 [ 876.949861] cpu_startup_entry+0x25/0x30 [ 876.949862] start_secondary+0x117/0x140 [ 876.949864] common_startup_64+0x13e/0x148 [ 876.949867] </TASK> [ 876.949868] ---[ end trace 0000000000000000 ]--- [ 876.949869] ------------[ cut here ]------------ [ 876.949870] list_del corruption, ffffead40445a348->next is NULL [ 876.949873] WARNING: CPU: 14 PID: 0 at lib/list_debug.c:52 __list_del_entry_valid_or_report+0x67/0x120 [ 876.949875] Modules linked in: snd_hrtimer(E) bnep(E) binfmt_misc(E) amdgpu(E) squashfs(E) vfat(E) loop(E) fat(E) amd_atl(E) snd_hda_codec_realtek(E) intel_rapl_msr(E) snd_hda_codec_generic(E) intel_rapl_common(E) snd_hda_scodec_component(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) edac_mce_amd(E) snd_intel_dspcfg(E) snd_hda_codec(E) snd_hda_core(E) amdxcp(E) kvm_amd(E) snd_hwdep(E) gpu_sched(E) drm_panel_backlight_quirks(E) cec(E) snd_pcm(E) drm_buddy(E) snd_seq_dummy(E) drm_ttm_helper(E) btusb(E) kvm(E) snd_seq_oss(E) btrtl(E) ttm(E) btintel(E) snd_seq_midi(E) btbcm(E) drm_exec(E) snd_seq_midi_event(E) i2c_algo_bit(E) snd_rawmidi(E) bluetooth(E) drm_suballoc_helper(E) irqbypass(E) snd_seq(E) ghash_clmulni_intel(E) sha512_ssse3(E) drm_display_helper(E) aesni_intel(E) snd_seq_device(E) rfkill(E) snd_timer(E) gf128mul(E) drm_client_lib(E) drm_kms_helper(E) snd(E) i2c_piix4(E) joydev(E) soundcore(E) wmi_bmof(E) ccp(E) k10temp(E) i2c_smbus(E) gpio_amdpt(E) i2c_designware_platform(E) gpio_generic(E) sg(E) [ 876.949914] i2c_designware_core(E) sch_fq_codel(E) parport_pc(E) drm(E) ppdev(E) lp(E) parport(E) fuse(E) nfnetlink(E) ip_tables(E) ext4 crc16 mbcache jbd2 sd_mod sfp mdio_i2c i2c_core txgbe ahci ngbe pcs_xpcs libahci libwx r8169 phylink libata realtek ptp pps_core video wmi [ 876.949933] CPU: 14 UID: 0 PID: 0 Comm: swapper/14 Kdump: loaded Tainted: G W E 6.16.0-rc2+ #20 PREEMPT(voluntary) [ 876.949935] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE [ 876.949936] Hardware name: Micro-Star International Co., Ltd. MS-7E16/X670E GAMING PLUS WIFI (MS-7E16), BIOS 1.90 12/31/2024 [ 876.949936] RIP: 0010:__list_del_entry_valid_or_report+0x67/0x120 [ 876.949938] Code: 00 00 00 48 39 7d 08 0f 85 a6 00 00 00 5b b8 01 00 00 00 5d 41 5c e9 73 0d 93 ff 48 89 fe 48 c7 c7 a0 31 e8 89 e8 59 7c b3 ff <0f> 0b 31 c0 5b 5d 41 5c e9 57 0d 93 ff 48 89 fe 48 c7 c7 c8 31 e8 [ 876.949940] RSP: 0018:ffffaa73405d0c60 EFLAGS: 00010282 [ 876.949941] RAX: 0000000000000000 RBX: ffffead40445a348 RCX: 0000000000000000 [ 876.949942] RDX: 0000000000000105 RSI: 0000000000000001 RDI: 00000000ffffffff [ 876.949943] RBP: 0000000000000000 R08: 000000010006dfde R09: ffffffff8a47d150 [ 876.949944] R10: ffffffff8a47d150 R11: 0000000000000003 R12: dead000000000122 [ 876.949945] R13: ffff9e3e9e5af700 R14: ffffead40445a348 R15: ffff9e3e9e5af720 [ 876.949946] FS: 0000000000000000(0000) GS:ffff9e3f135be000(0000) knlGS:0000000000000000 [ 876.949947] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 876.949948] CR2: 00007fa58b480048 CR3: 0000000156724000 CR4: 0000000000750ef0 [ 876.949949] PKRU: 55555554 [ 876.949950] Call Trace: [ 876.949951] <IRQ> [ 876.949952] __rmqueue_pcplist+0x53/0x2c0 [ 876.949955] alloc_pages_bulk_noprof+0x2e0/0x660 [ 876.949958] __page_pool_alloc_pages_slow+0xa9/0x400 [ 876.949961] page_pool_alloc_pages+0xa/0x20 [ 876.949963] wx_alloc_rx_buffers+0xd7/0x110 [libwx] [ 876.949967] wx_clean_rx_irq+0x262/0x430 [libwx] [ 876.949971] wx_poll+0x92/0x130 [libwx] [ 876.949975] __napi_poll+0x28/0x190 [ 876.949977] net_rx_action+0x301/0x3f0 [ 876.949980] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949981] ? profile_tick+0x30/0x70 [ 876.949983] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949984] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949986] ? timerqueue_add+0xa3/0xc0 [ 876.949988] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949989] ? __raise_softirq_irqoff+0x16/0x70 [ 876.949991] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949993] ? srso_alias_return_thunk+0x5/0xfbef5 [ 876.949994] ? wx_msix_clean_rings+0x41/0x50 [libwx] [ 876.949998] handle_softirqs+0xf9/0x2c0 Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250714024755.17512-2-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15net: stmmac: intel: populate entire system_counterval_t in get_time_fn() ↵Markus Blöchl
callback get_time_fn() callback implementations are expected to fill out the entire system_counterval_t struct as it may be initially uninitialized. This broke with the removal of convert_art_to_tsc() helper functions which left use_nsecs uninitialized. Initially assign the entire struct with default values. Fixes: f5e1d0db3f02 ("stmmac: intel: Remove convert_art_to_tsc()") Cc: stable@vger.kernel.org Signed-off-by: Markus Blöchl <markus@blochl.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250713-stmmac_crossts-v1-1-31bfe051b5cb@blochl.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15Merge tag 'linux-can-fixes-for-6.16-20250715' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-07-15 Brett Werling's patch for the tcan4x5x glue code driver fixes the detection of chips which are held in reset/sleep and must be woken up by GPIO prior to communication. * tag 'linux-can-fixes-for-6.16-20250715' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: tcan4x5x: fix reset gpio usage during probe ==================== Link: https://patch.msgid.link/20250715101625.3202690-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15usb: net: sierra: check for no status endpointOliver Neukum
The driver checks for having three endpoints and having bulk in and out endpoints, but not that the third endpoint is interrupt input. Rectify the omission. Reported-by: syzbot+3f89ec3d1d0842e95d50@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/686d5a9f.050a0220.1ffab7.0017.GAE@google.com/ Tested-by: syzbot+3f89ec3d1d0842e95d50@syzkaller.appspotmail.com Fixes: eb4fd8cd355c8 ("net/usb: add sierra_net.c driver") Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://patch.msgid.link/20250714111326.258378-1-oneukum@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15ice: check correct pointer in fwlog debugfsMichal Swiatkowski
pf->ice_debugfs_pf_fwlog should be checked for an error here. Fixes: 96a9a9341cda ("ice: configure FW logging") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-15ice: add NULL check in eswitch lag checkDave Ertman
The function ice_lag_is_switchdev_running() is being called from outside of the LAG event handler code. This results in the lag->upper_netdev being NULL sometimes. To avoid a NULL-pointer dereference, there needs to be a check before it is dereferenced. Fixes: 776fe19953b0 ("ice: block default rule setting on LAG interface") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-15ethernet: intel: fix building with large NR_CPUSArnd Bergmann
With large values of CONFIG_NR_CPUS, three Intel ethernet drivers fail to compile like: In function ‘i40e_free_q_vector’, inlined from ‘i40e_vsi_alloc_q_vectors’ at drivers/net/ethernet/intel/i40e/i40e_main.c:12112:3: 571 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) include/linux/rcupdate.h:1084:17: note: in expansion of macro ‘BUILD_BUG_ON’ 1084 | BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096); \ drivers/net/ethernet/intel/i40e/i40e_main.c:5113:9: note: in expansion of macro ‘kfree_rcu’ 5113 | kfree_rcu(q_vector, rcu); | ^~~~~~~~~ The problem is that the 'rcu' member in 'q_vector' is too far from the start of the structure. Move this member before the CPU mask instead, in all three drivers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-15selftests: net: increase inter-packet timeout in udpgro.shPaolo Abeni
The mentioned test is not very stable when running on top of debug kernel build. Increase the inter-packet timeout to allow more slack in such environments. Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/b0370c06ddb3235debf642c17de0284b2cd3c652.1752163107.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15Merge tag 'iwlwifi-fixes-2025-07-15' of ↵Johannes Berg
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next Miri Korenblit says: ==================== iwlwifi-fixes - missing unlock in error path - Avoid FW assert on bad command values - fix kernel panic due to incorrect index calculation ==================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-15can: tcan4x5x: fix reset gpio usage during probeBrett Werling
Fixes reset GPIO usage during probe by ensuring we retrieve the GPIO and take the device out of reset (if it defaults to being in reset) before we attempt to communicate with the device. This is achieved by moving the call to tcan4x5x_get_gpios() before tcan4x5x_find_version() and avoiding any device communication while getting the GPIOs. Once we determine the version, we can then take the knowledge of which GPIOs we obtained and use it to decide whether we need to disable the wake or state pin functions within the device. This change is necessary in a situation where the reset GPIO is pulled high externally before the CPU takes control of it, meaning we need to explicitly bring the device out of reset before we can start communicating with it at all. This also has the effect of fixing an issue where a reset of the device would occur after having called tcan4x5x_disable_wake(), making the original behavior not actually disable the wake. This patch should now disable wake or state pin functions well after the reset occurs. Signed-off-by: Brett Werling <brett.werling@garmin.com> Link: https://patch.msgid.link/20250711141728.1826073-1-brett.werling@garmin.com Cc: Markus Schneider-Pargmann <msp@baylibre.com> Fixes: 142c6dc6d9d7 ("can: tcan4x5x: Add support for tcan4552/4553") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-15wifi: iwlwifi: Fix botched indexing conversionVille Syrjälä
The conversion from compiler assisted indexing to manual indexing wasn't done correctly. The array is still made up of __le16 elements so multiplying the outer index by the element size is not what we want. Fix it up. This causes the kernel to oops when trying to transfer any significant amount of data over wifi: BUG: unable to handle page fault for address: ffffc900009f5282 PGD 100000067 P4D 100000067 PUD 1000fb067 PMD 102e82067 PTE 0 Oops: Oops: 0002 [#1] SMP CPU: 1 UID: 0 PID: 99 Comm: kworker/u8:3 Not tainted 6.15.0-rc2-cl-bisect3-00604-g6204d5130a64-dirty #78 PREEMPT Hardware name: Dell Inc. Latitude E5400 /0D695C, BIOS A19 06/13/2013 Workqueue: events_unbound cfg80211_wiphy_work [cfg80211] RIP: 0010:iwl_trans_pcie_tx+0x4dd/0xe60 [iwlwifi] Code: 00 00 66 81 fa ff 0f 0f 87 42 09 00 00 3d ff 00 00 00 0f 8f 37 09 00 00 41 c1 e0 0c 41 09 d0 48 8d 14 b6 48 c1 e2 07 48 01 ca <66> 44 89 04 57 48 8d 0c 12 83 f8 3f 0f 8e 84 01 00 00 41 8b 85 80 RSP: 0018:ffffc900001c3b50 EFLAGS: 00010206 RAX: 00000000000000c1 RBX: ffff88810b180028 RCX: 00000000000000c1 RDX: 0000000000002141 RSI: 000000000000000d RDI: ffffc900009f1000 RBP: 0000000000000002 R08: 0000000000000025 R09: ffffffffa050fa60 R10: 00000000fbdbf4bc R11: 0000000000000082 R12: ffff88810e5ade40 R13: ffff88810af81588 R14: 000000000000001a R15: ffff888100dfe0c8 FS: 0000000000000000(0000) GS:ffff8881998c3000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900009f5282 CR3: 0000000001e39000 CR4: 00000000000426f0 Call Trace: <TASK> ? rcu_is_watching+0xd/0x40 ? __iwl_dbg+0xb1/0xe0 [iwlwifi] iwlagn_tx_skb+0x8e2/0xcb0 [iwldvm] iwlagn_mac_tx+0x18/0x30 [iwldvm] ieee80211_handle_wake_tx_queue+0x6c/0xc0 [mac80211] ieee80211_agg_start_txq+0x140/0x2e0 [mac80211] ieee80211_agg_tx_operational+0x126/0x210 [mac80211] ieee80211_process_addba_resp+0x27b/0x2a0 [mac80211] ieee80211_iface_work+0x4bd/0x4d0 [mac80211] ? _raw_spin_unlock_irq+0x1f/0x40 cfg80211_wiphy_work+0x117/0x1f0 [cfg80211] process_one_work+0x1ee/0x570 worker_thread+0x1c5/0x3b0 ? bh_worker+0x240/0x240 kthread+0x110/0x220 ? kthread_queue_delayed_work+0x90/0x90 ret_from_fork+0x28/0x40 ? kthread_queue_delayed_work+0x90/0x90 ret_from_fork_asm+0x11/0x20 </TASK> Modules linked in: ctr aes_generic ccm sch_fq_codel bnep xt_tcpudp xt_multiport xt_state iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables btusb btrtl btintel btbcm bluetooth ecdh_generic ecc libaes hid_generic usbhid hid binfmt_misc joydev mousedev snd_hda_codec_hdmi iwldvm snd_hda_codec_idt snd_hda_codec_generic mac80211 coretemp iTCO_wdt watchdog kvm_intel i2c_dev snd_hda_intel libarc4 kvm snd_intel_dspcfg sdhci_pci sdhci_uhs2 snd_hda_codec iwlwifi sdhci irqbypass cqhci snd_hwdep snd_hda_core cfg80211 firewire_ohci mmc_core psmouse snd_pcm i2c_i801 firewire_core pcspkr led_class uhci_hcd i2c_smbus tg3 crc_itu_t iosf_mbi snd_timer rfkill libphy ehci_pci snd ehci_hcd lpc_ich mfd_core usbcore video intel_agp usb_common soundcore intel_gtt evdev agpgart parport_pc wmi parport backlight CR2: ffffc900009f5282 ---[ end trace 0000000000000000 ]--- RIP: 0010:iwl_trans_pcie_tx+0x4dd/0xe60 [iwlwifi] Code: 00 00 66 81 fa ff 0f 0f 87 42 09 00 00 3d ff 00 00 00 0f 8f 37 09 00 00 41 c1 e0 0c 41 09 d0 48 8d 14 b6 48 c1 e2 07 48 01 ca <66> 44 89 04 57 48 8d 0c 12 83 f8 3f 0f 8e 84 01 00 00 41 8b 85 80 RSP: 0018:ffffc900001c3b50 EFLAGS: 00010206 RAX: 00000000000000c1 RBX: ffff88810b180028 RCX: 00000000000000c1 RDX: 0000000000002141 RSI: 000000000000000d RDI: ffffc900009f1000 RBP: 0000000000000002 R08: 0000000000000025 R09: ffffffffa050fa60 R10: 00000000fbdbf4bc R11: 0000000000000082 R12: ffff88810e5ade40 R13: ffff88810af81588 R14: 000000000000001a R15: ffff888100dfe0c8 FS: 0000000000000000(0000) GS:ffff8881998c3000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900009f5282 CR3: 0000000001e39000 CR4: 00000000000426f0 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Cc: Miri Korenblit <miriam.rachel.korenblit@intel.com> Fixes: 6204d5130a64 ("wifi: iwlwifi: use bc entries instead of bc table also for pre-ax210") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20250711205744.28723-1-ville.syrjala@linux.intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-07-15wifi: cfg80211: remove scan request n_channels counted_byJohannes Berg
This reverts commit e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by"). This really has been a completely failed experiment. There were no actual bugs found, and yet at this point we already have four "fixes" to it, with nothing to show for but code churn, and it never even made the code any safer. In all of the cases that ended up getting "fixed", the structure is also internally inconsistent after the n_channels setting as the channel list isn't actually filled yet. You cannot scan with such a structure, that's just wrong. In mac80211, the struct is also reused multiple times, so initializing it once is no good. Some previous "fixes" (e.g. one in brcm80211) are also just setting n_channels before accessing the array, under the assumption that the code is correct and the array can be accessed, further showing that the whole thing is just pointless when the allocation count and use count are not separate. If we really wanted to fix it, we'd need to separately track the number of channels allocated and the number of channels currently used, but given that no bugs were found despite the numerous syzbot reports, that'd just be a waste of time. Remove the __counted_by() annotation. We really should also remove a number of the n_channels settings that are setting up a structure that's inconsistent, but that can wait. Reported-by: syzbot+e834e757bd9b3d3e1251@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e834e757bd9b3d3e1251 Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by") Link: https://patch.msgid.link/20250714142130.9b0bbb7e1f07.I09112ccde72d445e11348fc2bef68942cb2ffc94@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-15Merge tag 'ath-current-20250714' of ↵Johannes Berg
git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath Jeff Johnson says: ================== ath.git update for v6.16-rc7 Fix an ath12k performance regression introduce in v6.16-rc1 ================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-14net: phy: Don't register LEDs for genphySean Anderson
If a PHY has no driver, the genphy driver is probed/removed directly in phy_attach/detach. If the PHY's ofnode has an "leds" subnode, then the LEDs will be (un)registered when probing/removing the genphy driver. This could occur if the leds are for a non-generic driver that isn't loaded for whatever reason. Synchronously removing the PHY device in phy_detach leads to the following deadlock: rtnl_lock() ndo_close() ... phy_detach() phy_remove() phy_leds_unregister() led_classdev_unregister() led_trigger_set() netdev_trigger_deactivate() unregister_netdevice_notifier() rtnl_lock() There is a corresponding deadlock on the open/register side of things (and that one is reported by lockdep), but it requires a race while this one is deterministic. Generic PHYs do not support LEDs anyway, so don't bother registering them. Fixes: 01e5b728e9e4 ("net: phy: Add a binding for PHY LEDs") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250707195803.666097-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14selftests/tc-testing: Create test cases for adding qdiscs to invalid qdisc ↵Victor Nogueira
parents As described in a previous commit [1], Lion's patch [2] revealed an ancient bug in the qdisc API. Whenever a user tries to add a qdisc to an invalid parent (not a class, root, or ingress qdisc), the qdisc API will detect this after qdisc_create is called. Some qdiscs (like fq_codel, pie, and sfq) call functions (on their init callback) which assume the parent is valid, so qdisc_create itself may have caused a NULL pointer dereference in such cases. This commit creates 3 TDC tests that attempt to add fq_codel, pie and sfq qdiscs to invalid parents - Attempts to add an fq_codel qdisc to an hhf qdisc parent - Attempts to add a pie qdisc to a drr qdisc parent - Attempts to add an sfq qdisc to an inexistent hfsc classid (which would belong to a valid hfsc qdisc) [1] https://lore.kernel.org/all/20250707210801.372995-1-victor@mojatatu.com/ [2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/ Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250712145035.705156-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14smc: Fix various oops due to inet_sock type confusion.Kuniyuki Iwashima
syzbot reported weird splats [0][1] in cipso_v4_sock_setattr() while freeing inet_sk(sk)->inet_opt. The address was freed multiple times even though it was read-only memory. cipso_v4_sock_setattr() did nothing wrong, and the root cause was type confusion. The cited commit made it possible to create smc_sock as an INET socket. The issue is that struct smc_sock does not have struct inet_sock as the first member but hijacks AF_INET and AF_INET6 sk_family, which confuses various places. In this case, inet_sock.inet_opt was actually smc_sock.clcsk_data_ready(), which is an address of a function in the text segment. $ pahole -C inet_sock vmlinux struct inet_sock { ... struct ip_options_rcu * inet_opt; /* 784 8 */ $ pahole -C smc_sock vmlinux struct smc_sock { ... void (*clcsk_data_ready)(struct sock *); /* 784 8 */ The same issue for another field was reported before. [2][3] At that time, an ugly hack was suggested [4], but it makes both INET and SMC code error-prone and hard to change. Also, yet another variant was fixed by a hacky commit 98d4435efcbf3 ("net/smc: prevent NULL pointer dereference in txopt_get"). Instead of papering over the root cause by such hacks, we should not allow non-INET socket to reuse the INET infra. Let's add inet_sock as the first member of smc_sock. [0]: kvfree_call_rcu(): Double-freed call. rcu_head 000000006921da73 WARNING: CPU: 0 PID: 6718 at mm/slab_common.c:1956 kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955 Modules linked in: CPU: 0 UID: 0 PID: 6718 Comm: syz.0.17 Tainted: G W 6.16.0-rc4-syzkaller-g7482bb149b9f #0 PREEMPT Tainted: [W]=WARN Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955 lr : kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955 sp : ffff8000a03a7730 x29: ffff8000a03a7730 x28: 00000000fffffff5 x27: 1fffe000184823d3 x26: dfff800000000000 x25: ffff0000c2411e9e x24: ffff0000dd88da00 x23: ffff8000891ac9a0 x22: 00000000ffffffea x21: ffff8000891ac9a0 x20: ffff8000891ac9a0 x19: ffff80008afc2480 x18: 00000000ffffffff x17: 0000000000000000 x16: ffff80008ae642c8 x15: ffff700011ede14c x14: 1ffff00011ede14c x13: 0000000000000004 x12: ffffffffffffffff x11: ffff700011ede14c x10: 0000000000ff0100 x9 : 5fa3c1ffaf0ff000 x8 : 5fa3c1ffaf0ff000 x7 : 0000000000000001 x6 : 0000000000000001 x5 : ffff8000a03a7078 x4 : ffff80008f766c20 x3 : ffff80008054d360 x2 : 0000000000000000 x1 : 0000000000000201 x0 : 0000000000000000 Call trace: kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955 (P) cipso_v4_sock_setattr+0x2f0/0x3f4 net/ipv4/cipso_ipv4.c:1914 netlbl_sock_setattr+0x240/0x334 net/netlabel/netlabel_kapi.c:1000 smack_netlbl_add+0xa8/0x158 security/smack/smack_lsm.c:2581 smack_inode_setsecurity+0x378/0x430 security/smack/smack_lsm.c:2912 security_inode_setsecurity+0x118/0x3c0 security/security.c:2706 __vfs_setxattr_noperm+0x174/0x5c4 fs/xattr.c:251 __vfs_setxattr_locked+0x1ec/0x218 fs/xattr.c:295 vfs_setxattr+0x158/0x2ac fs/xattr.c:321 do_setxattr fs/xattr.c:636 [inline] file_setxattr+0x1b8/0x294 fs/xattr.c:646 path_setxattrat+0x2ac/0x320 fs/xattr.c:711 __do_sys_fsetxattr fs/xattr.c:761 [inline] __se_sys_fsetxattr fs/xattr.c:758 [inline] __arm64_sys_fsetxattr+0xc0/0xdc fs/xattr.c:758 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 el0_svc+0x58/0x180 arch/arm64/kernel/entry-common.c:879 el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:898 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600 [1]: Unable to handle kernel write to read-only memory at virtual address ffff8000891ac9a8 KASAN: probably user-memory-access in range [0x0000000448d64d40-0x0000000448d64d47] Mem abort info: ESR = 0x000000009600004e EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x0e: level 2 permission fault Data abort info: ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000207144000 [ffff8000891ac9a8] pgd=0000000000000000, p4d=100000020f950003, pud=100000020f951003, pmd=0040000201000781 Internal error: Oops: 000000009600004e [#1] SMP Modules linked in: CPU: 0 UID: 0 PID: 6946 Comm: syz.0.69 Not tainted 6.16.0-rc4-syzkaller-g7482bb149b9f #0 PREEMPT Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kvfree_call_rcu+0x31c/0x3f0 mm/slab_common.c:1971 lr : add_ptr_to_bulk_krc_lock mm/slab_common.c:1838 [inline] lr : kvfree_call_rcu+0xfc/0x3f0 mm/slab_common.c:1963 sp : ffff8000a28a7730 x29: ffff8000a28a7730 x28: 00000000fffffff5 x27: 1fffe00018b09bb3 x26: 0000000000000001 x25: ffff80008f66e000 x24: ffff00019beaf498 x23: ffff00019beaf4c0 x22: 0000000000000000 x21: ffff8000891ac9a0 x20: ffff8000891ac9a0 x19: 0000000000000000 x18: 00000000ffffffff x17: ffff800093363000 x16: ffff80008052c6e4 x15: ffff700014514ecc x14: 1ffff00014514ecc x13: 0000000000000004 x12: ffffffffffffffff x11: ffff700014514ecc x10: 0000000000000001 x9 : 0000000000000001 x8 : ffff00019beaf7b4 x7 : ffff800080a94154 x6 : 0000000000000000 x5 : ffff8000935efa60 x4 : 0000000000000008 x3 : ffff80008052c7fc x2 : 0000000000000001 x1 : ffff8000891ac9a0 x0 : 0000000000000001 Call trace: kvfree_call_rcu+0x31c/0x3f0 mm/slab_common.c:1967 (P) cipso_v4_sock_setattr+0x2f0/0x3f4 net/ipv4/cipso_ipv4.c:1914 netlbl_sock_setattr+0x240/0x334 net/netlabel/netlabel_kapi.c:1000 smack_netlbl_add+0xa8/0x158 security/smack/smack_lsm.c:2581 smack_inode_setsecurity+0x378/0x430 security/smack/smack_lsm.c:2912 security_inode_setsecurity+0x118/0x3c0 security/security.c:2706 __vfs_setxattr_noperm+0x174/0x5c4 fs/xattr.c:251 __vfs_setxattr_locked+0x1ec/0x218 fs/xattr.c:295 vfs_setxattr+0x158/0x2ac fs/xattr.c:321 do_setxattr fs/xattr.c:636 [inline] file_setxattr+0x1b8/0x294 fs/xattr.c:646 path_setxattrat+0x2ac/0x320 fs/xattr.c:711 __do_sys_fsetxattr fs/xattr.c:761 [inline] __se_sys_fsetxattr fs/xattr.c:758 [inline] __arm64_sys_fsetxattr+0xc0/0xdc fs/xattr.c:758 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 el0_svc+0x58/0x180 arch/arm64/kernel/entry-common.c:879 el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:898 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600 Code: aa1f03e2 52800023 97ee1e8d b4000195 (f90006b4) Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC") Reported-by: syzbot+40bf00346c3fe40f90f2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/686d9b50.050a0220.1ffab7.0020.GAE@google.com/ Tested-by: syzbot+40bf00346c3fe40f90f2@syzkaller.appspotmail.com Reported-by: syzbot+f22031fad6cbe52c70e7@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/686da0f3.050a0220.1ffab7.0022.GAE@google.com/ Reported-by: syzbot+271fed3ed6f24600c364@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=271fed3ed6f24600c364 # [2] Link: https://lore.kernel.org/netdev/99f284be-bf1d-4bc4-a629-77b268522fff@huawei.com/ # [3] Link: https://lore.kernel.org/netdev/20250331081003.1503211-1-wangliang74@huawei.com/ # [4] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wang Liang <wangliang74@huawei.com> Link: https://patch.msgid.link/20250711060808.2977529-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14Revert "netfilter: nf_tables: Add notifications for hook changes"Phil Sutter
This reverts commit 465b9ee0ee7bc268d7f261356afd6c4262e48d82. Such notifications fit better into core or nfnetlink_hook code, following the NFNL_MSG_HOOK_GET message format. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-14netfilter: nf_tables: hide clash bit from userspaceFlorian Westphal
Its a kernel implementation detail, at least at this time: We can later decide to revert this patch if there is a compelling reason, but then we should also remove the ifdef that prevents exposure of ip_conntrack_status enum IPS_NAT_CLASH value in the uapi header. Clash entries are not included in dumps (true for both old /proc and ctnetlink) either. So for now exclude the clash bit when dumping. Fixes: 7e5c6aa67e6f ("netfilter: nf_tables: add packets conntrack state to debug trace info") Link: https://lore.kernel.org/netfilter-devel/aGwf3dCggwBlRKKC@strlen.de/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-14selftests: netfilter: nft_concat_range.sh: send packets to empty setFlorian Westphal
The selftest doesn't cover this error path: scratch = *raw_cpu_ptr(m->scratch); if (unlikely(!scratch)) { // here cover this too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-14selftests: netfilter: conntrack_resize.sh: also use udpclash toolFlorian Westphal
Previous patch added a new clash resolution test case. Also use this during conntrack resize stress test in addition to icmp ping flood. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-14selftests: netfilter: add conntrack clash resolution test caseFlorian Westphal
Add a dedicated test to exercise conntrack clash resolution path. Test program emits 128 identical udp packets in parallel, then reads back replies from socat echo server. Also check (via conntrack -S) that the clash path was hit at least once. Due to the racy nature of the test its possible that despite the threaded program all packets were processed in-order or on same cpu, emit a SKIP warning in this case. Two tests are added: - one to test the simpler, non-nat case - one to exercise clash resolution where packets might have different nat transformations attached to them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>