summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-25net: dsa: microchip: fix DCB apptrust configuration on KSZ88x3Oleksij Rempel
Remove KSZ88x3-specific priority and apptrust configuration logic that was based on incorrect register access assumptions. Also fix the register offset for KSZ8_REG_PORT_1_CTRL_0 to align with get_port_addr() logic. The KSZ88x3 switch family uses a different register layout compared to KSZ9477-compatible variants. Specifically, port control registers need offset adjustment through get_port_addr(), and do not match the datasheet values directly. Commit a1ea57710c9d ("net: dsa: microchip: dcb: add special handling for KSZ88X3 family") introduced quirks based on datasheet offsets, which do not work with the driver's internal addressing model. As a result, these quirks addressed the wrong ports and caused unstable behavior. This patch removes all KSZ88x3-specific DCB quirks and corrects the port control register offset, effectively restoring working and predictable apptrust configuration. Fixes: a1ea57710c9d ("net: dsa: microchip: dcb: add special handling for KSZ88X3 family") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250321141044.2128973-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: fix NULL pointer dereference in l3mdev_l3_rcvWang Liang
When delete l3s ipvlan: ip link del link eth0 ipvlan1 type ipvlan mode l3s This may cause a null pointer dereference: Call trace: ip_rcv_finish+0x48/0xd0 ip_rcv+0x5c/0x100 __netif_receive_skb_one_core+0x64/0xb0 __netif_receive_skb+0x20/0x80 process_backlog+0xb4/0x204 napi_poll+0xe8/0x294 net_rx_action+0xd8/0x22c __do_softirq+0x12c/0x354 This is because l3mdev_l3_rcv() visit dev->l3mdev_ops after ipvlan_l3s_unregister() assign the dev->l3mdev_ops to NULL. The process like this: (CPU1) | (CPU2) l3mdev_l3_rcv() | check dev->priv_flags: | master = skb->dev; | | | ipvlan_l3s_unregister() | set dev->priv_flags | dev->l3mdev_ops = NULL; | visit master->l3mdev_ops | To avoid this by do not set dev->l3mdev_ops when unregister l3s ipvlan. Suggested-by: David Ahern <dsahern@kernel.org> Fixes: c675e06a98a4 ("ipvlan: decouple l3s mode dependencies from other modes") Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250321090353.1170545-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25ibmvnic: Use kernel helpers for hex dumpsNick Child
Previously, when the driver was printing hex dumps, the buffer was cast to an 8 byte long and printed using string formatters. If the buffer size was not a multiple of 8 then a read buffer overflow was possible. Therefore, create a new ibmvnic function that loops over a buffer and calls hex_dump_to_buffer instead. This patch address KASAN reports like the one below: ibmvnic 30000003 env3: Login Buffer: ibmvnic 30000003 env3: 01000000af000000 <...> ibmvnic 30000003 env3: 2e6d62692e736261 ibmvnic 30000003 env3: 65050003006d6f63 ================================================================== BUG: KASAN: slab-out-of-bounds in ibmvnic_login+0xacc/0xffc [ibmvnic] Read of size 8 at addr c0000001331a9aa8 by task ip/17681 <...> Allocated by task 17681: <...> ibmvnic_login+0x2f0/0xffc [ibmvnic] ibmvnic_open+0x148/0x308 [ibmvnic] __dev_open+0x1ac/0x304 <...> The buggy address is located 168 bytes inside of allocated 175-byte region [c0000001331a9a00, c0000001331a9aaf) <...> ================================================================= ibmvnic 30000003 env3: 000000000033766e Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250320212951.11142-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25bonding: check xdp prog when set bond modeWang Liang
Following operations can trigger a warning[1]: ip netns add ns1 ip netns exec ns1 ip link add bond0 type bond mode balance-rr ip netns exec ns1 ip link set dev bond0 xdp obj af_xdp_kern.o sec xdp ip netns exec ns1 ip link set bond0 type bond mode broadcast ip netns del ns1 When delete the namespace, dev_xdp_uninstall() is called to remove xdp program on bond dev, and bond_xdp_set() will check the bond mode. If bond mode is changed after attaching xdp program, the warning may occur. Some bond modes (broadcast, etc.) do not support native xdp. Set bond mode with xdp program attached is not good. Add check for xdp program when set bond mode. [1] ------------[ cut here ]------------ WARNING: CPU: 0 PID: 11 at net/core/dev.c:9912 unregister_netdevice_many_notify+0x8d9/0x930 Modules linked in: CPU: 0 UID: 0 PID: 11 Comm: kworker/u4:0 Not tainted 6.14.0-rc4 #107 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:unregister_netdevice_many_notify+0x8d9/0x930 Code: 00 00 48 c7 c6 6f e3 a2 82 48 c7 c7 d0 b3 96 82 e8 9c 10 3e ... RSP: 0018:ffffc90000063d80 EFLAGS: 00000282 RAX: 00000000ffffffa1 RBX: ffff888004959000 RCX: 00000000ffffdfff RDX: 0000000000000000 RSI: 00000000ffffffea RDI: ffffc90000063b48 RBP: ffffc90000063e28 R08: ffffffff82d39b28 R09: 0000000000009ffb R10: 0000000000000175 R11: ffffffff82d09b40 R12: ffff8880049598e8 R13: 0000000000000001 R14: dead000000000100 R15: ffffc90000045000 FS: 0000000000000000(0000) GS:ffff888007a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000d406b60 CR3: 000000000483e000 CR4: 00000000000006f0 Call Trace: <TASK> ? __warn+0x83/0x130 ? unregister_netdevice_many_notify+0x8d9/0x930 ? report_bug+0x18e/0x1a0 ? handle_bug+0x54/0x90 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? unregister_netdevice_many_notify+0x8d9/0x930 ? bond_net_exit_batch_rtnl+0x5c/0x90 cleanup_net+0x237/0x3d0 process_one_work+0x163/0x390 worker_thread+0x293/0x3b0 ? __pfx_worker_thread+0x10/0x10 kthread+0xec/0x1e0 ? __pfx_kthread+0x10/0x10 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") Signed-off-by: Wang Liang <wangliang74@huawei.com> Acked-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250321044852.1086551-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25vmxnet3: unregister xdp rxq info in the reset pathSankararaman Jayaraman
vmxnet3 does not unregister xdp rxq info in the vmxnet3_reset_work() code path as vmxnet3_rq_destroy() is not invoked in this code path. So, we get below message with a backtrace. Missing unregister, handled but fix driver WARNING: CPU:48 PID: 500 at net/core/xdp.c:182 __xdp_rxq_info_reg+0x93/0xf0 This patch fixes the problem by moving the unregister code of XDP from vmxnet3_rq_destroy() to vmxnet3_rq_cleanup(). Fixes: 54f00cce1178 ("vmxnet3: Add XDP support.") Signed-off-by: Sankararaman Jayaraman <sankararaman.jayaraman@broadcom.com> Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com> Link: https://patch.msgid.link/20250320045522.57892-1-sankararaman.jayaraman@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-03-18 (ice, idpf) For ice: Przemek modifies string declarations to resolve compile issues on gcc 7.5. Karol adds padding to initial programming of GLTSYN_TIME* registers to ensure it will occur in the future to prevent hardware issues. Jesse Brandeburg turns off driver RDMA capability when the corresponding kernel config is not enabled to aid in preventing resource exhaustion. Jan adjusts type declaration to properly catch error conditions and prevent truncation of values. He also adds bounds checking to prevent overflow in ice_vc_cfg_q_quanta(). Lukasz adds checking and error reporting for invalid values in ice_vc_cfg_q_bw(). Mateusz adds check for valid size for ice_vc_fdir_parse_raw(). For idpf: Emil adds check, and handling, on failure to register netdev. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: idpf: check error for register_netdev() on init ice: fix using untrusted value of pkt_len in ice_vc_fdir_parse_raw() ice: fix input validation for virtchnl BW ice: validate queue quanta parameters to prevent OOB access ice: stop truncating queue ids when checking virtchnl: make proto and filter action count unsigned ice: fix reservation of resources for RDMA when disabled ice: ensure periodic output start time is in the future ice: health.c: fix compilation on gcc 7.5 ==================== Link: https://patch.msgid.link/20250318200511.2958251-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge branch 'mlx5-misc-fixes-2025-03-18'Jakub Kicinski
Tariq Toukan says: ==================== mlx5 misc fixes 2025-03-18 This small patchset provides misc bug fixes to the mlx5 core driver. ==================== Link: https://patch.msgid.link/1742331077-102038-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5: Start health poll after enable hcaMoshe Shemesh
The health poll mechanism performs periodic checks to detect firmware errors. One of the checks verifies the function is still enabled on firmware side, but the function is enabled only after enable_hca command completed. Start health poll after enable_hca command to avoid a race between function enabled and first health polling. Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742331077-102038-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5: LAG, reload representors on LAG creation failureMark Bloch
When LAG creation fails, the driver reloads the RDMA devices. If RDMA representors are present, they should also be reloaded. This step was missed in the cited commit. Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742331077-102038-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge branch 'sja1105-driver-fixes'Jakub Kicinski
Vladimir Oltean says: ==================== sja1105 driver fixes This is a collection of 3 fixes for the sja1105 DSA driver: - 1/3: "ethtool -S" shows a bogus counter with no name, and doesn't show a valid counter because of it (either "n_not_reach" or "n_rx_bcast"). - 2/3: RX timestamping filters other than L2 PTP event messages don't work, but are not rejected, either. - 3/3: there is a KASAN out-of-bounds warning in sja1105_table_delete_entry() ==================== Link: https://patch.msgid.link/20250318115716.2124395-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: dsa: sja1105: fix kasan out-of-bounds warning in ↵Vladimir Oltean
sja1105_table_delete_entry() There are actually 2 problems: - deleting the last element doesn't require the memmove of elements [i + 1, end) over it. Actually, element i+1 is out of bounds. - The memmove itself should move size - i - 1 elements, because the last element is out of bounds. The out-of-bounds element still remains out of bounds after being accessed, so the problem is only that we touch it, not that it becomes in active use. But I suppose it can lead to issues if the out-of-bounds element is part of an unmapped page. Fixes: 6666cebc5e30 ("net: dsa: sja1105: Add support for VLAN operations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250318115716.2124395-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: dsa: sja1105: reject other RX filters than HWTSTAMP_FILTER_PTP_V2_L2_EVENTVladimir Oltean
This is all that we can support timestamping, so we shouldn't accept anything else. Also see sja1105_hwtstamp_get(). To avoid erroring out in an inconsistent state, operate on copies of priv->hwts_rx_en and priv->hwts_tx_en, and write them back when nothing else can fail anymore. Fixes: a602afd200f5 ("net: dsa: sja1105: Expose PTP timestamping ioctls to userspace") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250318115716.2124395-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: dsa: sja1105: fix displaced ethtool statistics countersVladimir Oltean
Port counters with no name (aka sja1105_port_counters[__SJA1105_COUNTER_UNUSED]) are skipped when reporting sja1105_get_sset_count(), but are not skipped during sja1105_get_strings() and sja1105_get_ethtool_stats(). As a consequence, the first reported counter has an empty name and a bogus value (reads from area 0, aka MAC, from offset 0, bits start:end 0:0). Also, the last counter (N_NOT_REACH on E/T, N_RX_BCAST on P/Q/R/S) gets pushed out of the statistics counters that get shown. Skip __SJA1105_COUNTER_UNUSED consistently, so that the bogus counter with an empty name disappears, and in its place appears a valid counter. Fixes: 039b167d68a3 ("net: dsa: sja1105: don't use burst SPI reads for port statistics") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250318115716.2124395-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: spectrum_acl_bloom_filter: Workaround for some LLVM versionsWangYuli
This is a workaround to mitigate a compiler anomaly. During LLVM toolchain compilation of this driver on s390x architecture, an unreasonable __write_overflow_field warning occurs. Contextually, chunk_index is restricted to 0, 1 or 2. By expanding these possibilities, the compile warning is suppressed. Fix follow error with clang-19 when -Werror: In file included from drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c:5: In file included from ./include/linux/gfp.h:7: In file included from ./include/linux/mmzone.h:8: In file included from ./include/linux/spinlock.h:63: In file included from ./include/linux/lockdep.h:14: In file included from ./include/linux/smp.h:13: In file included from ./include/linux/cpumask.h:12: In file included from ./include/linux/bitmap.h:13: In file included from ./include/linux/string.h:392: ./include/linux/fortify-string.h:571:4: error: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror,-Wattribute-warning] 571 | __write_overflow_field(p_size_field, size); | ^ 1 error generated. According to the testing, we can be fairly certain that this is a clang compiler bug, impacting only clang-19 and below. Clang versions 20 and 21 do not exhibit this behavior. Link: https://lore.kernel.org/all/484364B641C901CD+20250311141025.1624528-1-wangyuli@uniontech.com/ Fixes: 7585cacdb978 ("mlxsw: spectrum_acl: Add Bloom filter handling") Co-developed-by: Zijian Chen <czj2441@163.com> Signed-off-by: Zijian Chen <czj2441@163.com> Co-developed-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Suggested-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Tested-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Link: https://patch.msgid.link/A1858F1D36E653E0+20250318103654.708077-1-wangyuli@uniontech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge branch 'fixes-for-mv88e6xxx-mainly-6320-family'Jakub Kicinski
Marek Behún says: ==================== Fixes for mv88e6xxx (mainly 6320 family) v1: https://patchwork.kernel.org/project/netdevbpf/cover/20250313134146.27087-1-kabel@kernel.org/ ==================== Link: https://patch.msgid.link/20250317173250.28780-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: dsa: mv88e6xxx: workaround RGMII transmit delay erratum for 6320 familyMarek Behún
Implement the workaround for erratum 3.3 RGMII timing may be out of spec when transmit delay is enabled for the 6320 family, which says: When transmit delay is enabled via Port register 1 bit 14 = 1, duty cycle may be out of spec. Under very rare conditions this may cause the attached device receive CRC errors. Signed-off-by: Marek Behún <kabel@kernel.org> Cc: <stable@vger.kernel.org> # 5.4.x Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250317173250.28780-8-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: dsa: mv88e6xxx: fix internal PHYs for 6320 familyMarek Behún
Fix internal PHYs definition for the 6320 family, which has only 2 internal PHYs (on ports 3 and 4). Fixes: bc3931557d1d ("net: dsa: mv88e6xxx: Add number of internal PHYs") Signed-off-by: Marek Behún <kabel@kernel.org> Cc: <stable@vger.kernel.org> # 6.6.x Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250317173250.28780-7-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: dsa: mv88e6xxx: enable STU methods for 6320 familyMarek Behún
Commit c050f5e91b47 ("net: dsa: mv88e6xxx: Fill in STU support for all supported chips") introduced STU methods, but did not add them to the 6320 family. Fix it. Fixes: c050f5e91b47 ("net: dsa: mv88e6xxx: Fill in STU support for all supported chips") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250317173250.28780-6-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: dsa: mv88e6xxx: enable .port_set_policy() for 6320 familyMarek Behún
Commit f3a2cd326e44 ("net: dsa: mv88e6xxx: introduce .port_set_policy") did not add the .port_set_policy() method for the 6320 family. Fix it. Fixes: f3a2cd326e44 ("net: dsa: mv88e6xxx: introduce .port_set_policy") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250317173250.28780-5-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: dsa: mv88e6xxx: enable PVT for 6321 switchMarek Behún
Commit f36456522168 ("net: dsa: mv88e6xxx: move PVT description in info") did not enable PVT for 6321 switch. Fix it. Fixes: f36456522168 ("net: dsa: mv88e6xxx: move PVT description in info") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250317173250.28780-4-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: dsa: mv88e6xxx: fix atu_move_port_mask for 6341 familyMarek Behún
The atu_move_port_mask for 6341 family (Topaz) is 0xf, not 0x1f. The PortVec field is 8 bits wide, not 11 as in 6390 family. Fix this. Fixes: e606ca36bbf2 ("net: dsa: mv88e6xxx: rework ATU Remove") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250317173250.28780-3-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: dsa: mv88e6xxx: fix VTU methods for 6320 familyMarek Behún
The VTU registers of the 6320 family use the 6352 semantics, not 6185. Fix it. Fixes: b8fee9571063 ("net: dsa: mv88e6xxx: add VLAN Get Next support") Signed-off-by: Marek Behún <kabel@kernel.org> Cc: <stable@vger.kernel.org> # 5.15.x Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250317173250.28780-2-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge branch 'bnxt_en-fix-max_skb_frags-30'Jakub Kicinski
Michael Chan says: ==================== bnxt_en: Fix MAX_SKB_FRAGS > 30 The driver was written with the assumption that MAX_SKB_FRAGS could not exceed what the NIC can support. About 2 years ago, CONFIG_MAX_SKB_FRAGS was added. The value can exceed what the NIC can support and it may cause TX timeout. These 2 patches will fix the issue. ==================== Link: https://patch.msgid.link/20250321211639.3812992-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24bnxt_en: Linearize TX SKB if the fragments exceed the maxMichael Chan
If skb_shinfo(skb)->nr_frags excceds what the chip can support, linearize the SKB and warn once to let the user know. net.core.max_skb_frags can be lowered, for example, to avoid the issue. Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250321211639.3812992-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24bnxt_en: Mask the bd_cnt field in the TX BD properlyMichael Chan
The bd_cnt field in the TX BD specifies the total number of BDs for the TX packet. The bd_cnt field has 5 bits and the maximum number supported is 32 with the value 0. CONFIG_MAX_SKB_FRAGS can be modified and the total number of SKB fragments can approach or exceed the maximum supported by the chip. Add a macro to properly mask the bd_cnt field so that the value 32 will be properly masked and set to 0 in the bd_cnd field. Without this patch, the out-of-range bd_cnt value will corrupt the TX BD and may cause TX timeout. The next patch will check for values exceeding 32. Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250321211639.3812992-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5e: Fix ethtool -N flow-type ip4 to RSS contextMaxim Mikityanskiy
There commands can be used to add an RSS context and steer some traffic into it: # ethtool -X eth0 context new New RSS context is 1 # ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1 Added rule with ID 1023 However, the second command fails with EINVAL on mlx5e: # ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1 rmgr: Cannot insert RX class rule: Invalid argument Cannot insert classification rule It happens when flow_get_tirn calls flow_type_to_traffic_type with flow_type = IP_USER_FLOW or IPV6_USER_FLOW. That function only handles IPV4_FLOW and IPV6_FLOW cases, but unlike all other cases which are common for hash and spec, IPv4 and IPv6 defines different contants for hash and for spec: #define TCP_V4_FLOW 0x01 /* hash or spec (tcp_ip4_spec) */ #define UDP_V4_FLOW 0x02 /* hash or spec (udp_ip4_spec) */ ... #define IPV4_USER_FLOW 0x0d /* spec only (usr_ip4_spec) */ #define IP_USER_FLOW IPV4_USER_FLOW #define IPV6_USER_FLOW 0x0e /* spec only (usr_ip6_spec; nfc only) */ #define IPV4_FLOW 0x10 /* hash only */ #define IPV6_FLOW 0x11 /* hash only */ Extend the switch in flow_type_to_traffic_type to support both, which fixes the failing ethtool -N command with flow-type ip4 or ip6. Fixes: 248d3b4c9a39 ("net/mlx5e: Support flow classification into RSS contexts") Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Tested-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250319124508.3979818-1-maxim@isovalent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24gve: unlink old napi only if page pool existsHarshitha Ramamurthy
Commit de70981f295e ("gve: unlink old napi when stopping a queue using queue API") unlinks the old napi when stopping a queue. But this breaks QPL mode of the driver which does not use page pool. Fix this by checking that there's a page pool associated with the ring. Cc: stable@vger.kernel.org Fixes: de70981f295e ("gve: unlink old napi when stopping a queue using queue API") Reviewed-by: Joshua Washington <joshwash@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250317214141.286854-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: stmmac: Fix accessing freed irq affinity_hintQingfang Deng
The cpumask should not be a local variable, since its pointer is saved to irq_desc and may be accessed from procfs. To fix it, use the persistent mask cpumask_of(cpu#). Cc: stable@vger.kernel.org Fixes: 8deec94c6040 ("net: stmmac: set IRQ affinity hint for multi MSI vectors") Signed-off-by: Qingfang Deng <dqfext@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250318032424.112067-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24ax25: Remove broken autobindMurad Masimov
Binding AX25 socket by using the autobind feature leads to memory leaks in ax25_connect() and also refcount leaks in ax25_release(). Memory leak was detected with kmemleak: ================================================================ unreferenced object 0xffff8880253cd680 (size 96): backtrace: __kmalloc_node_track_caller_noprof (./include/linux/kmemleak.h:43) kmemdup_noprof (mm/util.c:136) ax25_rt_autobind (net/ax25/ax25_route.c:428) ax25_connect (net/ax25/af_ax25.c:1282) __sys_connect_file (net/socket.c:2045) __sys_connect (net/socket.c:2064) __x64_sys_connect (net/socket.c:2067) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) ================================================================ When socket is bound, refcounts must be incremented the way it is done in ax25_bind() and ax25_setsockopt() (SO_BINDTODEVICE). In case of autobind, the refcounts are not incremented. This bug leads to the following issue reported by Syzkaller: ================================================================ ax25_connect(): syz-executor318 uses autobind, please contact jreuter@yaina.de ------------[ cut here ]------------ refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 0 PID: 5317 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31 Modules linked in: CPU: 0 UID: 0 PID: 5317 Comm: syz-executor318 Not tainted 6.14.0-rc4-syzkaller-00278-gece144f151ac #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31 ... Call Trace: <TASK> __refcount_dec include/linux/refcount.h:336 [inline] refcount_dec include/linux/refcount.h:351 [inline] ref_tracker_free+0x6af/0x7e0 lib/ref_tracker.c:236 netdev_tracker_free include/linux/netdevice.h:4302 [inline] netdev_put include/linux/netdevice.h:4319 [inline] ax25_release+0x368/0x960 net/ax25/af_ax25.c:1080 __sock_release net/socket.c:647 [inline] sock_close+0xbc/0x240 net/socket.c:1398 __fput+0x3e9/0x9f0 fs/file_table.c:464 __do_sys_close fs/open.c:1580 [inline] __se_sys_close fs/open.c:1565 [inline] __x64_sys_close+0x7f/0x110 fs/open.c:1565 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... </TASK> ================================================================ Considering the issues above and the comments left in the code that say: "check if we can remove this feature. It is broken."; "autobinding in this may or may not work"; - it is better to completely remove this feature than to fix it because it is broken and leads to various kinds of memory bugs. Now calling connect() without first binding socket will result in an error (-EINVAL). Userspace software that relies on the autobind feature might get broken. However, this feature does not seem widely used with this specific driver as it was not reliable at any point of time, and it is already broken anyway. E.g. ax25-tools and ax25-apps packages for popular distributions do not use the autobind feature for AF_AX25. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+33841dc6aa3e1d86b78a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=33841dc6aa3e1d86b78a Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-21net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.Kuniyuki Iwashima
SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to br_ioctl_call(), which causes unnecessary RTNL dance and the splat below [0] under RTNL pressure. Let's say Thread A is trying to detach a device from a bridge and Thread B is trying to remove the bridge. In dev_ioctl(), Thread A bumps the bridge device's refcnt by netdev_hold() and releases RTNL because the following br_ioctl_call() also re-acquires RTNL. In the race window, Thread B could acquire RTNL and try to remove the bridge device. Then, rtnl_unlock() by Thread B will release RTNL and wait for netdev_put() by Thread A. Thread A, however, must hold RTNL after the unlock in dev_ifsioc(), which may take long under RTNL pressure, resulting in the splat by Thread B. Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR) ---------------------- ---------------------- sock_ioctl sock_ioctl `- sock_do_ioctl `- br_ioctl_call `- dev_ioctl `- br_ioctl_stub |- rtnl_lock | |- dev_ifsioc ' ' |- dev = __dev_get_by_name(...) |- netdev_hold(dev, ...) . / |- rtnl_unlock ------. | | |- br_ioctl_call `---> |- rtnl_lock Race | | `- br_ioctl_stub |- br_del_bridge Window | | | |- dev = __dev_get_by_name(...) | | | May take long | `- br_dev_delete(dev, ...) | | | under RTNL pressure | `- unregister_netdevice_queue(dev, ...) | | | | `- rtnl_unlock \ | |- rtnl_lock <-' `- netdev_run_todo | |- ... `- netdev_run_todo | `- rtnl_unlock |- __rtnl_unlock | |- netdev_wait_allrefs_any |- netdev_put(dev, ...) <----------------' Wait refcnt decrement and log splat below To avoid blocking SIOCBRDELBR unnecessarily, let's not call dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF. In the dev_ioctl() path, we do the following: 1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl() 2. Check CAP_NET_ADMIN in dev_ioctl() 3. Call dev_load() in dev_ioctl() 4. Fetch the master dev from ifr.ifr_name in dev_ifsioc() 3. can be done by request_module() in br_ioctl_call(), so we move 1., 2., and 4. to br_ioctl_stub(). Note that 2. is also checked later in add_del_if(), but it's better performed before RTNL. SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since the pre-git era, and there seems to be no specific reason to process them there. [0]: unregister_netdevice: waiting for wpan3 to become free. Usage count = 2 ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at __netdev_tracker_alloc include/linux/netdevice.h:4282 [inline] netdev_hold include/linux/netdevice.h:4311 [inline] dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624 dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826 sock_do_ioctl+0x1ca/0x260 net/socket.c:1213 sock_ioctl+0x23a/0x6c0 net/socket.c:1318 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl fs/ioctl.c:892 [inline] __x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 893b19587534 ("net: bridge: fix ioctl locking") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: yan kang <kangyan91@outlook.com> Reported-by: yue sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P300MB0421.AUSP300.PROD.OUTLOOK.COM/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250316192851.19781-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21eth: bnxt: fix out-of-range access of vnic_info arrayTaehee Yoo
The bnxt_queue_{start | stop}() access vnic_info as much as allocated, which indicates bp->nr_vnics. So, it should not reach bp->vnic_info[bp->nr_vnics]. Fixes: 661958552eda ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250316025837.939527-1-ap420073@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21MAINTAINERS: update bridge entryNikolay Aleksandrov
Roopa has decided to withdraw as a bridge maintainer and Ido has agreed to step up and co-maintain the bridge with me. He has been very helpful in bridge patch reviews and has contributed a lot to the bridge over the years. Add an entry for Roopa to CREDITS and also add bridge's headers to its MAINTAINERS entry. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314100631.40999-1-razor@blackwall.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20Merge tag 'net-6.14-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, bluetooth and ipsec. This contains a last minute revert of a recent GRE patch, mostly to allow me stating there are no known regressions outstanding. Current release - regressions: - revert "gre: Fix IPv6 link-local address generation." - eth: ti: am65-cpsw: fix NAPI registration sequence Previous releases - regressions: - ipv6: fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw(). - mptcp: fix data stream corruption in the address announcement - bluetooth: fix connection regression between LE and non-LE adapters - can: - flexcan: only change CAN state when link up in system PM - ucan: fix out of bound read in strscpy() source Previous releases - always broken: - lwtunnel: fix reentry loops - ipv6: fix TCP GSO segmentation with NAT - xfrm: force software GSO only in tunnel mode - eth: ti: icssg-prueth: add lock to stats Misc: - add Andrea Mayer as a maintainer of SRv6" * tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits) MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6 Revert "gre: Fix IPv6 link-local address generation." Revert "selftests: Add IPv6 link-local address generation tests for GRE devices." net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES tools headers: Sync uapi/asm-generic/socket.h with the kernel sources mptcp: Fix data stream corruption in the address announcement selftests: net: test for lwtunnel dst ref loops net: ipv6: ioam6: fix lwtunnel_output() loop net: lwtunnel: fix recursion loops net: ti: icssg-prueth: Add lock to stats net: atm: fix use after free in lec_send() xsk: fix an integer overflow in xp_create_and_assign_umem() net: stmmac: dwc-qos-eth: use devm_kzalloc() for AXI data selftests: drv-net: use defer in the ping test phy: fix xa_alloc_cyclic() error handling dpll: fix xa_alloc_cyclic() error handling devlink: fix xa_alloc_cyclic() error handling ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create(). ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw(). net: ipv6: fix TCP GSO segmentation with NAT ...
2025-03-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Collected driver fixes from the last few weeks, I was surprised how significant many of them seemed to be. - Fix rdma-core test failures due to wrong startup ordering in rxe - Don't crash in bnxt_re if the FW supports more than 64k QPs - Fix wrong QP table indexing math in bnxt_re - Calculate the max SRQs for userspace properly in bnxt_re - Don't try to do math on errno for mlx5's rate calculation - Properly allow userspace to control the VLAN in the QP state during INIT->RTR for bnxt_re - 6 bug fixes for HNS: - Soft lockup when processing huge MRs, add a cond_resched() - Fix missed error unwind for doorbell allocation - Prevent bad send queue parameters from userspace - Wrong error unwind in qp creation - Missed xa_destroy during driver shutdown - Fix reporting to userspace of max_sge_rd, hns doesn't have a read/write difference" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/hns: Fix wrong value of max_sge_rd RDMA/hns: Fix missing xa_destroy() RDMA/hns: Fix a missing rollback in error path of hns_roce_create_qp_common() RDMA/hns: Fix invalid sq params not being blocked RDMA/hns: Fix unmatched condition in error path of alloc_user_qp_db() RDMA/hns: Fix soft lockup during bt pages loop RDMA/bnxt_re: Avoid clearing VLAN_ID mask in modify qp path RDMA/mlx5: Handle errors returned from mlx5r_ib_rate() RDMA/bnxt_re: Fix reporting maximum SRQs on P7 chips RDMA/bnxt_re: Add missing paranthesis in map_qp_id_to_tbl_indx RDMA/bnxt_re: Fix allocation of QP table RDMA/rxe: Fix the failure of ibv_query_device() and ibv_query_device_ex() tests
2025-03-20Merge tag 'mmc-v6.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC host fixes from Ulf Hansson: - sdhci-brcmstb: Fix CQE suspend/resume support - atmel-mci: Add a missing clk_disable_unprepare() in ->probe() * tag 'mmc-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-brcmstb: add cqhci suspend/resume to PM ops mmc: atmel-mci: Add missing clk_disable_unprepare()
2025-03-20Merge tag 'efi-fixes-for-v6.14-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: "Here's a final batch of EFI fixes for v6.14. The efivarfs ones are fixes for changes that were made this cycle. James's fix is somewhat of a band-aid, but it was blessed by the VFS folks, who are working with James to come up with something better for the next cycle. - Avoid physical address 0x0 for random page allocations - Add correct lockdep annotation when traversing efivarfs on resume - Avoid NULL mount in kernel_file_open() when traversing efivarfs on resume" * tag 'efi-fixes-for-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efivarfs: fix NULL dereference on resume efivarfs: use I_MUTEX_CHILD nested lock to traverse variables on resume efi/libstub: Avoid physical address 0x0 when doing random allocation
2025-03-20MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6David Ahern
Andrea has made significant contributions to SRv6 support in Linux. Acknowledge the work and on-going interest in Srv6 support with a maintainers entry for these files so hopefully he is included on patches going forward. Signed-off-by: David Ahern <dsahern@kernel.org> Acked-by: Andrea Mayer <andrea.mayer@uniroma2.it> Link: https://patch.msgid.link/20250312092212.46299-1-dsahern@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20Merge branch 'gre-revert-ipv6-link-local-address-fix'Paolo Abeni
Guillaume Nault says: ==================== gre: Revert IPv6 link-local address fix. Following Paolo's suggestion, let's revert the IPv6 link-local address generation fix for GRE devices. The patch introduced regressions in the upstream CI, which are still under investigation. Start by reverting the kselftest that depend on that fix (patch 1), then revert the kernel code itself (patch 2). ==================== Link: https://patch.msgid.link/cover.1742418408.git.gnault@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20Revert "gre: Fix IPv6 link-local address generation."Guillaume Nault
This reverts commit 183185a18ff96751db52a46ccf93fff3a1f42815. This patch broke net/forwarding/ip6gre_custom_multipath_hash.sh in some circumstances (https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/). Let's revert it while the problem is being investigated. Fixes: 183185a18ff9 ("gre: Fix IPv6 link-local address generation.") Signed-off-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/8b1ce738eb15dd841aab9ef888640cab4f6ccfea.1742418408.git.gnault@redhat.com Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20Revert "selftests: Add IPv6 link-local address generation tests for GRE ↵Guillaume Nault
devices." This reverts commit 6f50175ccad4278ed3a9394c00b797b75441bd6e. Commit 183185a18ff9 ("gre: Fix IPv6 link-local address generation.") is going to be reverted. So let's revert the corresponding kselftest first. Signed-off-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/259a9e98f7f1be7ce02b53d0b4afb7c18a8ff747.1742418408.git.gnault@redhat.com Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20Merge tag 'ipsec-2025-03-19' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2025-03-19 1) Fix tunnel mode TX datapath in packet offload mode by directly putting it to the xmit path. From Alexandre Cassen. 2) Force software GSO only in tunnel mode in favor of potential HW GSO. From Cosmin Ratiu. ipsec-2025-03-19 * tag 'ipsec-2025-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm_output: Force software GSO only in tunnel mode xfrm: fix tunnel mode TX datapath in packet offload mode ==================== Link: https://patch.msgid.link/20250319065513.987135-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20Merge tag 'batadv-net-pullrequest-20250318' of ↵Paolo Abeni
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is batman-adv bugfix: - Ignore own maximum aggregation size during RX, Sven Eckelmann * tag 'batadv-net-pullrequest-20250318' of git://git.open-mesh.org/linux-merge: batman-adv: Ignore own maximum aggregation size during RX ==================== Link: https://patch.msgid.link/20250318150035.35356-1-sw@simonwunderlich.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTESLin Ma
Previous commit 8b5c171bb3dc ("neigh: new unresolved queue limits") introduces new netlink attribute NDTPA_QUEUE_LENBYTES to represent approximative value for deprecated QUEUE_LEN. However, it forgot to add the associated nla_policy in nl_ntbl_parm_policy array. Fix it with one simple NLA_U32 type policy. Fixes: 8b5c171bb3dc ("neigh: new unresolved queue limits") Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://patch.msgid.link/20250315165113.37600-1-linma@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20tools headers: Sync uapi/asm-generic/socket.h with the kernel sourcesAlexander Mikhalitsyn
This also fixes a wrong definitions for SCM_TS_OPT_ID & SO_RCVPRIORITY. Accidentally found while working on another patchset. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev> Cc: Willem de Bruijn <willemb@google.com> Cc: Jason Xing <kerneljasonxing@gmail.com> Cc: Anna Emese Nyiri <annaemesenyiri@gmail.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Paolo Abeni <pabeni@redhat.com> Fixes: a89568e9be75 ("selftests: txtimestamp: add SCM_TS_OPT_ID test") Fixes: e45469e594b2 ("sock: Introduce SO_RCVPRIORITY socket option") Link: https://lore.kernel.org/netdev/20250314195257.34854-1-kuniyu@amazon.com/ Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20250314214155.16046-1-aleksandr.mikhalitsyn@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20mptcp: Fix data stream corruption in the address announcementArthur Mongodin
Because of the size restriction in the TCP options space, the MPTCP ADD_ADDR option is exclusive and cannot be sent with other MPTCP ones. For this reason, in the linked mptcp_out_options structure, group of fields linked to different options are part of the same union. There is a case where the mptcp_pm_add_addr_signal() function can modify opts->addr, but not ended up sending an ADD_ADDR. Later on, back in mptcp_established_options, other options will be sent, but with unexpected data written in other fields due to the union, e.g. in opts->ext_copy. This could lead to a data stream corruption in the next packet. Using an intermediate variable, prevents from corrupting previously established DSS option. The assignment of the ADD_ADDR option parameters is now done once we are sure this ADD_ADDR option can be set in the packet, e.g. after having dropped other suboptions. Fixes: 1bff1e43a30e ("mptcp: optimize out option generation") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Arthur Mongodin <amongodin@randorisec.fr> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> [ Matt: the commit message has been updated: long lines splits and some clarifications. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-1-122dbb249db3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20Merge branch 'net-fix-lwtunnel-reentry-loops'Paolo Abeni
Justin Iurman says: ==================== net: fix lwtunnel reentry loops When the destination is the same after the transformation, we enter a lwtunnel loop. This is true for most of lwt users: ioam6, rpl, seg6, seg6_local, ila_lwt, and lwt_bpf. It can happen in their input() and output() handlers respectively, where either dst_input() or dst_output() is called at the end. It can also happen in xmit() handlers. Here is an example for rpl_input(): dump_stack_lvl+0x60/0x80 rpl_input+0x9d/0x320 lwtunnel_input+0x64/0xa0 lwtunnel_input+0x64/0xa0 lwtunnel_input+0x64/0xa0 lwtunnel_input+0x64/0xa0 lwtunnel_input+0x64/0xa0 [...] lwtunnel_input+0x64/0xa0 lwtunnel_input+0x64/0xa0 lwtunnel_input+0x64/0xa0 lwtunnel_input+0x64/0xa0 lwtunnel_input+0x64/0xa0 ip6_sublist_rcv_finish+0x85/0x90 ip6_sublist_rcv+0x236/0x2f0 ... until rpl_do_srh() fails, which means skb_cow_head() failed. This series provides a fix at the core level of lwtunnel to catch such loops when they're not caught by the respective lwtunnel users, and handle the loop case in ioam6 which is one of the users. This series also comes with a new selftest to detect some dst cache reference loops in lwtunnel users. ==================== Link: https://patch.msgid.link/20250314120048.12569-1-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20selftests: net: test for lwtunnel dst ref loopsJustin Iurman
As recently specified by commit 0ea09cbf8350 ("docs: netdev: add a note on selftest posting") in net-next, the selftest is therefore shipped in this series. However, this selftest does not really test this series. It needs this series to avoid crashing the kernel. What it really tests, thanks to kmemleak, is what was fixed by the following commits: - commit c71a192976de ("net: ipv6: fix dst refleaks in rpl, seg6 and ioam6 lwtunnels") - commit 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels") - commit c64a0727f9b1 ("net: ipv6: fix dst ref loop on input in seg6 lwt") - commit 13e55fbaec17 ("net: ipv6: fix dst ref loop on input in rpl lwt") - commit 0e7633d7b95b ("net: ipv6: fix dst ref loop in ila lwtunnel") - commit 5da15a9c11c1 ("net: ipv6: fix missing dst ref drop in ila lwtunnel") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Link: https://patch.msgid.link/20250314120048.12569-4-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20net: ipv6: ioam6: fix lwtunnel_output() loopJustin Iurman
Fix the lwtunnel_output() reentry loop in ioam6_iptunnel when the destination is the same after transformation. Note that a check on the destination address was already performed, but it was not enough. This is the example of a lwtunnel user taking care of loops without relying only on the last resort detection offered by lwtunnel. Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Link: https://patch.msgid.link/20250314120048.12569-3-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20net: lwtunnel: fix recursion loopsJustin Iurman
This patch acts as a parachute, catch all solution, by detecting recursion loops in lwtunnel users and taking care of them (e.g., a loop between routes, a loop within the same route, etc). In general, such loops are the consequence of pathological configurations. Each lwtunnel user is still free to catch such loops early and do whatever they want with them. It will be the case in a separate patch for, e.g., seg6 and seg6_local, in order to provide drop reasons and update statistics. Another example of a lwtunnel user taking care of loops is ioam6, which has valid use cases that include loops (e.g., inline mode), and which is addressed by the next patch in this series. Overall, this patch acts as a last resort to catch loops and drop packets, since we don't want to leak something unintentionally because of a pathological configuration in lwtunnels. The solution in this patch reuses dev_xmit_recursion(), dev_xmit_recursion_inc(), and dev_xmit_recursion_dec(), which seems fine considering the context. Closes: https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/ Closes: https://lore.kernel.org/netdev/Z7NKYMY7fJT5cYWu@shredder/ Fixes: ffce41962ef6 ("lwtunnel: support dst output redirect function") Fixes: 2536862311d2 ("lwt: Add support to redirect dst.input") Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Link: https://patch.msgid.link/20250314120048.12569-2-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20net: ti: icssg-prueth: Add lock to statsMD Danish Anwar
Currently the API emac_update_hardware_stats() reads different ICSSG stats without any lock protection. This API gets called by .ndo_get_stats64() which is only under RCU protection and nothing else. Add lock to this API so that the reading of statistics happens during lock. Fixes: c1e10d5dc7a1 ("net: ti: icssg-prueth: Add ICSSG Stats") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314102721.1394366-1-danishanwar@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>