summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-01net: mana: Handle Reset Request from MANA NICHaiyang Zhang
Upon receiving the Reset Request, pause the connection and clean up queues, wait for the specified period, then resume the NIC. In the cleanup phase, the HWC is no longer responding, so set hwc_timeout to zero to skip waiting on the response. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1751055983-29760-1-git-send-email-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01phy: micrel: add Signal Quality Indicator (SQI) support for KSZ9477 switch PHYsOleksij Rempel
Add support for the Signal Quality Indicator (SQI) feature on KSZ9477 family switches, providing a relative measure of receive signal quality. The hardware exposes separate SQI readings per channel. For 1000BASE-T, all four channels are read. For 100BASE-TX, only one channel is reported, but which receive pair is active depends on Auto MDI-X negotiation, which is not exposed by the hardware. Therefore, it is not possible to reliably map the measured channel to a specific wire pair. This resolves an earlier discussion about how to handle multi-channel SQI. Originally, the plan was to expose all channels individually. However, since pair mapping is sometimes unavailable, this implementation treats SQI as a per-link metric instead. This fallback avoids ambiguity and ensures consistent behavior. The existing get_sqi() UAPI was designed for single-pair Ethernet (SPE), where per-pair and per-link are effectively equivalent. Restricting its use to per-link metrics does not introduce regressions for existing users. The raw 7-bit SQI value (0–127, lower is better) is converted to the standard 0–7 (high is better) scale. Empirical testing showed that the link becomes unstable around a raw value of 8. The SQI raw value remains zero if no data is received, even if noise is present. This confirms that the measurement reflects the "quality" during active data reception rather than the passive line state. User space must ensure that traffic is present on the link to obtain valid SQI readings. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250627112539.895255-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01selftests: pp-bench: remove page_pool_put_page wrapperMina Almasry
Minor cleanup: remove the pointless looking _ wrapper around page_pool_put_page, and just do the call directly. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20250627200501.1712389-2-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01selftests: pp-bench: remove unneeded linux/version.hMina Almasry
linux/version.h was used by the out-of-tree version, but not needed in the upstream one anymore. While I'm at it, sort the includes. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506271434.Gk0epC9H-lkp@intel.com/ Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20250627200501.1712389-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01ip6_tunnel: enable to change proto of fb tunnelsNicolas Dichtel
This is possible via the ioctl API: > ip -6 tunnel change ip6tnl0 mode any Let's align the netlink API: > ip link set ip6tnl0 type ip6tnl mode any Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20250630145602.1027220-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01selftests/tc-testing: Enable CONFIG_IP_SETSebastian Andrzej Siewior
The config snippet specifies CONFIG_NET_EMATCH_IPSET. This option depends on CONFIG_IP_SET. Set CONFIG_IP_SET to be enabled at part for tc-testing. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250630153341.Wgh3SzGi@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01dt-bindings: net: convert nxp,lpc1850-dwmac.txt to yaml formatFrank Li
Convert nxp,lpc1850-dwmac.txt to yaml format. Additional changes: - compatible string add fallback as "nxp,lpc1850-dwmac", "snps,dwmac-3.611" "snps,dwmac". - add common interrupts, interrupt-names, clocks, clock-names, resets and reset-names properties. - add ref snps,dwmac.yaml. - add phy-mode in example to avoid dt_binding_check warning. - update examples to align lpc18xx.dtsi. Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250630161613.2838039-1-Frank.Li@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01docs: netdevsim: fixe typo in netdevsim documentationDave Marquardt
Fixed a typographical error in "Rate objects" section Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Dave Marquardt <davemarq@linux.ibm.com> Link: https://patch.msgid.link/20250630-netdevsim-typo-fix-v3-1-e1eae3a5f018@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01net: ethtool: fix leaking netdev ref if ethnl_default_parse() failedJakub Kicinski
Ido spotted that I made a mistake in commit under Fixes, ethnl_default_parse() may acquire a dev reference even when it returns an error. This may have been driven by the code structure in dumps (which unconditionally release dev before handling errors), but it's too much of a trap. Functions should undo what they did before returning an error, rather than expecting caller to clean up. Rather than fixing ethnl_default_set_doit() directly make ethnl_default_parse() clean up errors. Reported-by: Ido Schimmel <idosch@idosch.org> Link: https://lore.kernel.org/aGEPszpq9eojNF4Y@shredder Fixes: 963781bdfe20 ("net: ethtool: call .parse_request for SET handlers") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250630154053.1074664-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01sfc: siena: eliminate xdp_rxq_info_valid using XDP base APIFushuai Wang
Commit d48523cb88e0 ("sfc: Copy shared files needed for Siena (part 2)") use xdp_rxq_info_valid to track failures of xdp_rxq_info_reg(). However, this driver-maintained state becomes redundant since the XDP framework already provides xdp_rxq_info_is_reg() for checking registration status. Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20250628051033.51133-1-wangfushuai@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01sfc: eliminate xdp_rxq_info_valid using XDP base APIFushuai Wang
Commit eb9a36be7f3e ("sfc: perform XDP processing on received packets") use xdp_rxq_info_valid to track failures of xdp_rxq_info_reg(). However, this driver-maintained state becomes redundant since the XDP framework already provides xdp_rxq_info_is_reg() for checking registration status. Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20250628051016.51022-1-wangfushuai@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01net: ieee8021q: fix insufficient table-size assertionRubenKelevra
_Static_assert(ARRAY_SIZE(map) != IEEE8021Q_TT_MAX - 1) rejects only a length of 7 and allows any other mismatch. Replace it with a strict equality test via a helper macro so that every mapping table must have exactly IEEE8021Q_TT_MAX (8) entries. Signed-off-by: RubenKelevra <rubenkelevra@gmail.com> Link: https://patch.msgid.link/20250626205907.1566384-1-rubenkelevra@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-01docs: fbnic: explain the ring configJakub Kicinski
fbnic takes 4 parameters to configure the Rx queues. The semantics are similar to other existing NICs but confusing to newcomers. Document it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250626191554.32343-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-01net: usb: lan78xx: fix possible NULL pointer dereference in lan78xx_phy_init()Oleksij Rempel
If no PHY device is found (e.g., for LAN7801 in fixed-link mode), lan78xx_phy_init() may proceed to dereference a NULL phydev pointer, leading to a crash. Update the logic to perform MAC configuration first, then check for the presence of a PHY. For the fixed-link case, set up the fixed link and return early, bypassing any code that assumes a valid phydev pointer. It is safe to move lan78xx_mac_prepare_for_phy() earlier because this function only uses information from dev->interface, which is configured by lan78xx_get_phy() beforehand. The function does not access phydev or any data set up by later steps. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Fixes: e110bc825897 ("net: usb: lan78xx: Convert to PHYLINK for improved PHY and MAC management") Link: https://patch.msgid.link/20250626103731.3986545-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-01Merge branch 'clean-up-usage-of-ffi-types'Paolo Abeni
Tamir Duberstein says: ==================== Clean up usage of ffi types Remove qualification of ffi types which are included in the prelude and change `as` casts to target the proper ffi type alias rather than the underlying primitive. Signed-off-by: Tamir Duberstein <tamird@gmail.com> ==================== Link: https://patch.msgid.link/20250625-correct-type-cast-v2-0-6f2c29729e69@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-01Cast to the proper typeTamir Duberstein
Use the ffi type rather than the resolved underlying type. Acked-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Signed-off-by: Tamir Duberstein <tamird@gmail.com> Link: https://patch.msgid.link/20250625-correct-type-cast-v2-2-6f2c29729e69@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-01Use unqualified references to ffi typesTamir Duberstein
Remove unnecessary qualifications; `kernel::ffi::*` is included in `kernel::prelude`. Signed-off-by: Tamir Duberstein <tamird@gmail.com> Reviewed-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Link: https://patch.msgid.link/20250625-correct-type-cast-v2-1-6f2c29729e69@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-30net: net->nsid_lock does not need BH safetyEric Dumazet
At the time of commit bc51dddf98c9 ("netns: avoid disabling irq for netns id") peernet2id() was not yet using RCU. Commit 2dce224f469f ("netns: protect netns ID lookups with RCU") changed peernet2id() to no longer acquire net->nsid_lock (potentially from BH context). We do not need to block soft interrupts when acquiring net->nsid_lock anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Guillaume Nault <gnault@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250627163242.230866-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30Merge branch 'net-enetc-change-some-statistics-to-64-bit'Jakub Kicinski
Wei Fang says: ==================== net: enetc: change some statistics to 64-bit The port MAC counters of ENETC are 64-bit registers and the statistics of ethtool are also u64 type, so add enetc_port_rd64() helper function to read 64-bit statistics from these registers, and also change the statistics of ring to unsigned long type to be consistent with the statistics type in struct net_device_stats. v1: https://lore.kernel.org/20250620102140.2020008-1-wei.fang@nxp.com v2: https://lore.kernel.org/20250624101548.2669522-1-wei.fang@nxp.com ==================== Link: https://patch.msgid.link/20250627021108.3359642-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: enetc: read 64-bit statistics from port MAC countersWei Fang
The counters of port MAC are all 64-bit registers, and the statistics of ethtool are u64 type, so replace enetc_port_rd() with enetc_port_rd64() to read 64-bit statistics. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250627021108.3359642-4-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: enetc: separate 64-bit counters from enetc_port_countersWei Fang
Some counters in enetc_port_counters are 32-bit registers, and some are 64-bit registers. But in the current driver, they are all read through enetc_port_rd(), which can only read a 32-bit value. Therefore, separate 64-bit counters (enetc_pm_counters) from enetc_port_counters and use enetc_port_rd64() to read the 64-bit statistics. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250627021108.3359642-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: enetc: change the statistics of ring to unsigned long typeWei Fang
The statistics of the ring are all unsigned int type, so the statistics will overflow quickly under heavy traffic. In addition, the statistics of struct net_device_stats are obtained from struct enetc_ring_stats, but the statistics of net_device_stats are unsigned long type. So it is better to keep the statistics types consistent in these two structures. Considering these two factors, and the fact that both LS1028A and i.MX95 are arm64 architecture, the statistics of enetc_ring_stats are changed to unsigned long type. Note that unsigned int and unsigned long are the same thing on some systems, and on such systems there is no overflow advantage of one over the other. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250627021108.3359642-2-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: fec: allow disable coalescingJonas Rebmann
In the current implementation, IP coalescing is always enabled and cannot be disabled. As setting maximum frames to 0 or 1, or setting delay to zero implies immediate delivery of single packets/IRQs, disable coalescing in hardware in these cases. This also guarantees that coalescing is never enabled with ICFT or ICTT set to zero, a configuration that could lead to unpredictable behaviour according to i.MX8MP reference manual. Signed-off-by: Jonas Rebmann <jre@pengutronix.de> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250626-fec_deactivate_coalescing-v2-1-0b217f2e80da@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30Merge branch 'add-support-for-externally-validated-neighbor-entries'Jakub Kicinski
Ido Schimmel says: ==================== Add support for externally validated neighbor entries Patch #1 adds a new neighbor flag ("extern_valid") that prevents the kernel from invalidating or removing a neighbor entry, while allowing the kernel to notify user space when the entry becomes reachable. See motivation and implementation details in the commit message. Patch #2 adds a selftest. v1: https://lore.kernel.org/20250611141551.462569-1-idosch@nvidia.com ==================== Link: https://patch.msgid.link/20250626073111.244534-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30selftests: net: Add a selftest for externally validated neighbor entriesIdo Schimmel
Add test cases for externally validated neighbor entries, testing both IPv4 and IPv6. Name the file "test_neigh.sh" so that it could be possibly extended in the future with more neighbor test cases. Example output: # ./test_neigh.sh TEST: IPv4 "extern_valid" flag: Add entry [ OK ] TEST: IPv4 "extern_valid" flag: Add with an invalid state [ OK ] TEST: IPv4 "extern_valid" flag: Add with "use" flag [ OK ] TEST: IPv4 "extern_valid" flag: Replace entry [ OK ] TEST: IPv4 "extern_valid" flag: Replace entry with "managed" flag [ OK ] TEST: IPv4 "extern_valid" flag: Replace with an invalid state [ OK ] TEST: IPv4 "extern_valid" flag: Interface down [ OK ] TEST: IPv4 "extern_valid" flag: Carrier down [ OK ] TEST: IPv4 "extern_valid" flag: Transition to "reachable" state [ OK ] TEST: IPv4 "extern_valid" flag: Transition back to "stale" state [ OK ] TEST: IPv4 "extern_valid" flag: Forced garbage collection [ OK ] TEST: IPv4 "extern_valid" flag: Periodic garbage collection [ OK ] TEST: IPv6 "extern_valid" flag: Add entry [ OK ] TEST: IPv6 "extern_valid" flag: Add with an invalid state [ OK ] TEST: IPv6 "extern_valid" flag: Add with "use" flag [ OK ] TEST: IPv6 "extern_valid" flag: Replace entry [ OK ] TEST: IPv6 "extern_valid" flag: Replace entry with "managed" flag [ OK ] TEST: IPv6 "extern_valid" flag: Replace with an invalid state [ OK ] TEST: IPv6 "extern_valid" flag: Interface down [ OK ] TEST: IPv6 "extern_valid" flag: Carrier down [ OK ] TEST: IPv6 "extern_valid" flag: Transition to "reachable" state [ OK ] TEST: IPv6 "extern_valid" flag: Transition back to "stale" state [ OK ] TEST: IPv6 "extern_valid" flag: Forced garbage collection [ OK ] TEST: IPv6 "extern_valid" flag: Periodic garbage collection [ OK ] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250626073111.244534-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30neighbor: Add NTF_EXT_VALIDATED flag for externally validated entriesIdo Schimmel
tl;dr ===== Add a new neighbor flag ("extern_valid") that can be used to indicate to the kernel that a neighbor entry was learned and determined to be valid externally. The kernel will not try to remove or invalidate such an entry, leaving these decisions to the user space control plane. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed. Background ========== In a typical EVPN multi-homing setup each host is multi-homed using a set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf switches (VTEPs). VTEPs that are connected to the same ES are called ES peers. When a neighbor entry is learned on a VTEP, it is distributed to both ES peers and remote VTEPs using EVPN MAC/IP advertisement routes. ES peers use the neighbor entry when routing traffic towards the multi-homed host and remote VTEPs use it for ARP/NS suppression. Motivation ========== If the ES link between a host and the VTEP on which the neighbor entry was locally learned goes down, the EVPN MAC/IP advertisement route will be withdrawn and the neighbor entries will be removed from both ES peers and remote VTEPs. Routing towards the multi-homed host and ARP/NS suppression can fail until another ES peer locally learns the neighbor entry and distributes it via an EVPN MAC/IP advertisement route. "draft-rbickhart-evpn-ip-mac-proxy-adv-03" [1] suggests avoiding these intermittent failures by having the ES peers install the neighbor entries as before, but also injecting EVPN MAC/IP advertisement routes with a proxy indication. When the previously mentioned ES link goes down and the original EVPN MAC/IP advertisement route is withdrawn, the ES peers will not withdraw their neighbor entries, but instead start aging timers for the proxy indication. If an ES peer locally learns the neighbor entry (i.e., it becomes "reachable"), it will restart its aging timer for the entry and emit an EVPN MAC/IP advertisement route without a proxy indication. An ES peer will stop its aging timer for the proxy indication if it observes the removal of the proxy indication from at least one of the ES peers advertising the entry. In the event that the aging timer for the proxy indication expired, an ES peer will withdraw its EVPN MAC/IP advertisement route. If the timer expired on all ES peers and they all withdrew their proxy advertisements, the neighbor entry will be completely removed from the EVPN fabric. Implementation ============== In the above scheme, when the control plane (e.g., FRR) advertises a neighbor entry with a proxy indication, it expects the corresponding entry in the data plane (i.e., the kernel) to remain valid and not be removed due to garbage collection or loss of carrier. The control plane also expects the kernel to notify it if the entry was learned locally (i.e., became "reachable") so that it will remove the proxy indication from the EVPN MAC/IP advertisement route. That is why these entries cannot be programmed with dummy states such as "permanent" or "noarp". Instead, add a new neighbor flag ("extern_valid") which indicates that the entry was learned and determined to be valid externally and should not be removed or invalidated by the kernel. The kernel can probe the entry and notify user space when it becomes "reachable" (it is initially installed as "stale"). However, if the kernel does not receive a confirmation, have it return the entry to the "stale" state instead of the "failed" state. In other words, an entry marked with the "extern_valid" flag behaves like any other dynamically learned entry other than the fact that the kernel cannot remove or invalidate it. One can argue that the "extern_valid" flag should not prevent garbage collection and that instead a neighbor entry should be programmed with both the "extern_valid" and "extern_learn" flags. There are two reasons for not doing that: 1. Unclear why a control plane would like to program an entry that the kernel cannot invalidate but can completely remove. 2. The "extern_learn" flag is used by FRR for neighbor entries learned on remote VTEPs (for ARP/NS suppression) whereas here we are concerned with local entries. This distinction is currently irrelevant for the kernel, but might be relevant in the future. Given that the flag only makes sense when the neighbor has a valid state, reject attempts to add a neighbor with an invalid state and with this flag set. For example: # ip neigh add 192.0.2.1 nud none dev br0.10 extern_valid Error: Cannot create externally validated neighbor with an invalid state. # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid # ip neigh replace 192.0.2.1 nud failed dev br0.10 extern_valid Error: Cannot mark neighbor as externally validated with an invalid state. The above means that a neighbor cannot be created with the "extern_valid" flag and flags such as "use" or "managed" as they result in a neighbor being created with an invalid state ("none") and immediately getting probed: # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use Error: Cannot create externally validated neighbor with an invalid state. However, these flags can be used together with "extern_valid" after the neighbor was created with a valid state: # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use One consequence of preventing the kernel from invalidating a neighbor entry is that by default it will only try to determine reachability using unicast probes. This can be changed using the "mcast_resolicit" sysctl: # sysctl net.ipv4.neigh.br0/10.mcast_resolicit 0 # tcpdump -nn -e -i br0.10 -Q out arp & # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 # sysctl -wq net.ipv4.neigh.br0/10.mcast_resolicit=3 # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 iproute2 patches can be found here [2]. [1] https://datatracker.ietf.org/doc/html/draft-rbickhart-evpn-ip-mac-proxy-adv-03 [2] https://github.com/idosch/iproute2/tree/submit/extern_valid_v1 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20250626073111.244534-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30ipv6: guard ip6_mr_output() with rcuEric Dumazet
syzbot found at least one path leads to an ip_mr_output() without RCU being held. Add guard(rcu)() to fix this in a concise way. WARNING: net/ipv6/ip6mr.c:2376 at ip6_mr_output+0xe0b/0x1040 net/ipv6/ip6mr.c:2376, CPU#1: kworker/1:2/121 Call Trace: <TASK> ip6tunnel_xmit include/net/ip6_tunnel.h:162 [inline] udp_tunnel6_xmit_skb+0x640/0xad0 net/ipv6/ip6_udp_tunnel.c:112 send6+0x5ac/0x8d0 drivers/net/wireguard/socket.c:152 wg_socket_send_skb_to_peer+0x111/0x1d0 drivers/net/wireguard/socket.c:178 wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline] wg_packet_tx_worker+0x1c8/0x7c0 drivers/net/wireguard/send.c:276 process_one_work kernel/workqueue.c:3239 [inline] process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3322 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3403 kthread+0x70e/0x8a0 kernel/kthread.c:464 ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Fixes: 96e8f5a9fe2d ("net: ipv6: Add ip6_mr_output()") Reported-by: syzbot+0141c834e47059395621@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/685e86b3.a00a0220.129264.0003.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Benjamin Poirier <bpoirier@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250627115822.3741390-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30Merge branch 'net-ethtool-consistently-take-rss_lock-for-all-rxfh-ops'Jakub Kicinski
Jakub Kicinski says: ==================== net: ethtool: consistently take rss_lock for all rxfh ops I'd like to bring RXFH and RXFHINDIR ioctls under a single set of Netlink ops. It appears that while core takes the ethtool->rss_lock around some of the RXFHINDIR ops, drivers (sfc) take it internally for the RXFH. Consistently take the lock around all ops and accesses to the XArray within the core. This should hopefully make the rss_lock a lot less confusing. ==================== Link: https://patch.msgid.link/20250626202848.104457-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: ethtool: move get_rxfh callback under the rss_lockJakub Kicinski
We already call get_rxfh under the rss_lock when we read back context state after changes. Let's be consistent and always hold the lock. The existing callers are all under rtnl_lock so this should make no difference in practice, but it makes the locking rules far less confusing IMHO. Any RSS callback and any access to the RSS XArray should hold the lock. Link: https://patch.msgid.link/20250626202848.104457-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: ethtool: move rxfh_fields callbacks under the rss_lockJakub Kicinski
Netlink code will want to perform the RSS_SET operation atomically under the rss_lock. sfc wants to hold the rss_lock in rxfh_fields_get, which makes that difficult. Lets move the locking up to the core so that for all driver-facing callbacks rss_lock is taken consistently by the core. Link: https://patch.msgid.link/20250626202848.104457-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: ethtool: take rss_lock for all rxfh changesJakub Kicinski
Always take the rss_lock in ethtool_set_rxfh(). We will want to make a similar change in ethtool_set_rxfh_fields() and some drivers lock that callback regardless of rss context ID being set. Having some callbacks locked unconditionally and some only if context ID is set would be very confusing. ethtool handling is under rtnl_lock, so rss_lock is very unlikely to ever be congested. Link: https://patch.msgid.link/20250626202848.104457-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net: ethtool: avoid OOB accesses in PAUSE_SETJakub Kicinski
We now reuse .parse_request() from GET on SET, so we need to make sure that the policies for both cover the attributes used for .parse_request(). genetlink will only allocate space in info->attrs for ARRAY_SIZE(policy). Reported-by: syzbot+430f9f76633641a62217@syzkaller.appspotmail.com Fixes: 963781bdfe20 ("net: ethtool: call .parse_request for SET handlers") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250626233926.199801-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-30net/mlx5e: Fix error handling in RQ memory model registrationFushuai Wang
Currently when xdp_rxq_info_reg_mem_model() fails in the XSK path, the error handling incorrectly jumps to err_destroy_page_pool. While this may not cause errors, we should make it jump to the correct location. Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Acked-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-29octeontx2-af: Fix error code in rvu_mbox_init()Dan Carpenter
The error code was intended to be -EINVAL here, but it was accidentally changed to returning success. Set the error code. Fixes: e53ee4acb220 ("octeontx2-af: CN20k basic mbox operations and structures") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-28net: ipv4: guard ip_mr_output() with rcuEric Dumazet
syzbot found at least one path leads to an ip_mr_output() without RCU being held. Add guard(rcu)() to fix this in a concise way. WARNING: CPU: 0 PID: 0 at net/ipv4/ipmr.c:2302 ip_mr_output+0xbb1/0xe70 net/ipv4/ipmr.c:2302 Call Trace: <IRQ> igmp_send_report+0x89e/0xdb0 net/ipv4/igmp.c:799 igmp_timer_expire+0x204/0x510 net/ipv4/igmp.c:-1 call_timer_fn+0x17e/0x5f0 kernel/time/timer.c:1747 expire_timers kernel/time/timer.c:1798 [inline] __run_timers kernel/time/timer.c:2372 [inline] __run_timer_base+0x61a/0x860 kernel/time/timer.c:2384 run_timer_base kernel/time/timer.c:2393 [inline] run_timer_softirq+0xb7/0x180 kernel/time/timer.c:2403 handle_softirqs+0x286/0x870 kernel/softirq.c:579 __do_softirq kernel/softirq.c:613 [inline] invoke_softirq kernel/softirq.c:453 [inline] __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680 irq_exit_rcu+0x9/0x30 kernel/softirq.c:696 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1050 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1050 Fixes: 35bec72a24ac ("net: ipv4: Add ip_mr_output()") Reported-by: syzbot+f02fb9e43bd85c6c66ae@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/685e841a.a00a0220.129264.0002.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Petr Machata <petrm@nvidia.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Benjamin Poirier <bpoirier@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-27Merge branch 'octeontx2-pf-extend-link-modes-support'Jakub Kicinski
Hariprasad Kelam says: ==================== Octeontx2-pf: extend link modes support This series of patches adds multi advertise mode support along with other improvements in link mode management code flow. Patch1: Currently all SGMII modes 10/100/1000baseT are mapped with single firmware mode. This patch updates these link modes with corresponding firmware modes. Patch2: Due to limitation in current kernel <-> firmware communication, link modes are divided into multiple groups, and identified with their group index. Patch3: Adds support for multi advertise mode. ==================== Link: https://patch.msgid.link/20250625092107.9746-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27Octeontx2-pf: ethtool: support multi advertise modeHariprasad Kelam
Current implementation considers only first advertise mode and passes the same to firmware to process. This patch extends code such that user can advertise multiple modes on the given interface. Below are high level changes: 1. Remove unnecessary speed/duplex/autoneg validation as its already verified as part of "set_link_ksettings" 2. Since scratch csr framework designed to support single mode at a time, use "shared firmware data" for multi mode support. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20250625092107.9746-4-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27Octeontx2-af: Introduce mode group indexHariprasad Kelam
Kernel and firmware communicates via scratch register which is 64 bit in size. [MODE_ID PORT AUTONEG DUPLEX SPEED CMD_ID OWNERSHIP ] 63-22 21-14 13 12 11-8 7-2 1-0 The existing MODE_ID bitmap can only support up to 42 modes. To resolve the issue, the unused port field is modified as below uint64_t reserved2:6; uint64_t mode_group_idx:2; 'mode_group_idx' categorize the mode ID range to accommodate more modes. To specify mode ID range of 0 - 41, this field will be 0. To specify mode ID range of 42 - 83, this field will be 1. mode ID will be still mentioned as 1 << (0 - 41). But the mode_group_idx decides the actual mode range Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20250625092107.9746-3-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27Octeontx-pf: Update SGMII mode mappingHariprasad Kelam
Current implementation maps ethtool link modes 10baseT/100baseT/1000baseT to single firmware mode SGMII. This create a problem for end users who want to advertise only one speed among them. This patch addresses the issue by mapping each ethtool link mode to a corresponding firmware mode also updates new modes supported by firmware. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20250625092107.9746-2-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27Merge branch 'dpll-add-reference-sync-feature'Jakub Kicinski
Arkadiusz Kubalewski says: ==================== dpll: add Reference SYNC feature The device may support the Reference SYNC feature, which allows the combination of two inputs into a input pair. In this configuration, clock signals from both inputs are used to synchronize the DPLL device. The higher frequency signal is utilized for the loop bandwidth of the DPLL, while the lower frequency signal is used to syntonize the output signal of the DPLL device. This feature enables the provision of a high-quality loop bandwidth signal from an external source. A capable input provides a list of inputs that can be bound with to create Reference SYNC. To control this feature, the user must request a desired state for a target pin: use ``DPLL_PIN_STATE_CONNECTED`` to enable or ``DPLL_PIN_STATE_DISCONNECTED`` to disable the feature. An input pin can be bound to only one other pin at any given time. Verify pins bind state/capabilities: $ ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/dpll.yaml \ --do pin-get \ --json '{"id":0}' {'board-label': 'CVL-SDP22', 'id': 0, [...] 'reference-sync': [{'id': 1, 'state': 'disconnected'}], [...]} Bind the pins by setting connected state between them: $ ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/dpll.yaml \ --do pin-set \ --json '{"id":0, "reference-sync":{"id":1, "state":"connected"}}' Verify pins bind state: $ ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/dpll.yaml \ --do pin-get \ --json '{"id":0}' {'board-label': 'CVL-SDP22', 'id': 0, [...] 'reference-sync': [{'id': 1, 'state': 'connected'}], [...]} Unbind the pins by setting disconnected state between them: $ ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/dpll.yaml \ --do pin-set \ --json '{"id":0, "reference-sync":{"id":1, "state":"disconnected"}}' ==================== Link: https://patch.msgid.link/20250626135219.1769350-1-arkadiusz.kubalewski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27ice: add ref-sync dpll pinsArkadiusz Kubalewski
Implement reference sync input pin get/set callbacks, allow user space control over dpll pin pairs capable of reference sync support. Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20250626135219.1769350-4-arkadiusz.kubalewski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27dpll: add reference sync get/setArkadiusz Kubalewski
Define function for reference sync pin registration and callback ops to set/get current feature state. Implement netlink handler to fill netlink messages with reference sync pin configuration of capable pins (pin-get). Implement netlink handler to call proper ops and configure reference sync pin state (pin-set). Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Milena Olech <milena.olech@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20250626135219.1769350-3-arkadiusz.kubalewski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27dpll: add reference-sync netlink attributeArkadiusz Kubalewski
Add new netlink attribute to allow user space configuration of reference sync pin pairs, where both pins are used to provide one clock signal consisting of both: base frequency and sync signal. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Milena Olech <milena.olech@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20250626135219.1769350-2-arkadiusz.kubalewski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: remaining TSPLL cleanups These are the remaining patches from the "ice: Separate TSPLL from PTP and cleanup" series [1] with control flow macros removed. What remains are cleanups and some minor improvements. [1] https://lore.kernel.org/netdev/20250618174231.3100231-1-anthony.l.nguyen@intel.com/ * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: default to TIME_REF instead of TXCO on E825-C ice: move TSPLL init calls to ice_ptp.c ice: fall back to TCXO on TSPLL lock fail ice: wait before enabling TSPLL ice: add multiple TSPLL helpers ice: use bitfields instead of unions for CGU regs ice: read TSPLL registers again before reporting status ice: clear time_sync_en field for E825-C during reprogramming ==================== Link: https://patch.msgid.link/20250626162921.1173068-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27eth: bnxt: take page size into account for page pool recycling ringsJakub Kicinski
The Rx rings are filled with Rx buffers. Which are supposed to fit packet headers (or MTU if HW-GRO is disabled). The aggregation buffers are filled with "device pages". Adjust the sizes of the page pool recycling ring appropriately, based on ratio of the size of the buffer on given ring vs system page size. Otherwise on a system with 64kB pages we end up with >700MB of memory sitting in every single page pool cache. Correct the size calculation for the head_pool. Since the buffers there are always small I'm pretty sure I meant to cap the size at 1k, rather than make it the lowest possible size. With 64k pages 1k cache with a 1k ring is 64x larger than we need. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250626165441.4125047-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27Merge branch 'tcp-fix-dsack-bug-with-non-contiguous-ranges'Jakub Kicinski
Eric Dumazet says: ==================== tcp: fix DSACK bug with non contiguous ranges This series combines a fix from xin.guo and a new packetdrill test. ==================== Link: https://patch.msgid.link/20250626123420.1933835-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27selftests/net: packetdrill: add tcp_dsack_mult.pktEric Dumazet
Test DSACK behavior with non contiguous ranges. Without prior fix (tcp: fix tcp_ofo_queue() to avoid including too much DUP SACK range) this would fail with: tcp_dsack_mult.pkt:37: error handling packet: bad value outbound TCP option 5 script packet: 0.100682 . 1:1(0) ack 6001 <nop,nop,sack 1001:3001 7001:8001> actual packet: 0.100679 . 1:1(0) ack 6001 win 1097 <nop,nop,sack 1001:6001 7001:8001> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: xin.guo <guoxin0309@gmail.com> Link: https://patch.msgid.link/20250626123420.1933835-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27tcp: fix tcp_ofo_queue() to avoid including too much DUP SACK rangexin.guo
If the new coming segment covers more than one skbs in the ofo queue, and which seq is equal to rcv_nxt, then the sequence range that is duplicated will be sent as DUP SACK, the detail as below, in step6, the {501,2001} range is clearly including too much DUP SACK range, in violation of RFC 2883 rules. 1. client > server: Flags [.], seq 501:1001, ack 1325288529, win 20000, length 500 2. server > client: Flags [.], ack 1, [nop,nop,sack 1 {501:1001}], length 0 3. client > server: Flags [.], seq 1501:2001, ack 1325288529, win 20000, length 500 4. server > client: Flags [.], ack 1, [nop,nop,sack 2 {1501:2001} {501:1001}], length 0 5. client > server: Flags [.], seq 1:2001, ack 1325288529, win 20000, length 2000 6. server > client: Flags [.], ack 2001, [nop,nop,sack 1 {501:2001}], length 0 After this fix, the final ACK is as below: 6. server > client: Flags [.], ack 2001, options [nop,nop,sack 1 {501:1001}], length 0 [edumazet] added a new packetdrill test in the following patch. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: xin.guo <guoxin0309@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250626123420.1933835-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27Merge branch 'tcp-remove-rtx_syn_ack-and-inet_rtx_syn_ack'Jakub Kicinski
Eric Dumazet says: ==================== tcp: remove rtx_syn_ack and inet_rtx_syn_ack() After DCCP removal, we can cleanup SYNACK retransmits a bit. ==================== Link: https://patch.msgid.link/20250626153017.2156274-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27tcp: remove inet_rtx_syn_ack()Eric Dumazet
inet_rtx_syn_ack() is a simple wrapper around tcp_rtx_synack(), if we move req->num_retrans update. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250626153017.2156274-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>