Age | Commit message (Collapse) | Author |
|
A net device has a threaded sysctl that can be used to enable threaded
NAPI polling on all of the NAPI contexts under that device. Allow
enabling threaded NAPI polling at individual NAPI level using netlink.
Extend the netlink operation `napi-set` and allow setting the threaded
attribute of a NAPI. This will enable the threaded polling on a NAPI
context.
Add a test in `nl_netdev.py` that verifies various cases of threaded
NAPI being set at NAPI and at device level.
Tested
./tools/testing/selftests/net/nl_netdev.py
TAP version 13
1..7
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
ok 3 nl_netdev.page_pool_check
ok 4 nl_netdev.napi_list_check
ok 5 nl_netdev.dev_set_threaded
ok 6 nl_netdev.napi_set_threaded
ok 7 nl_netdev.nsim_rxq_reset_down
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250710211203.3979655-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a PHY has no driver, the genphy driver is probed/removed directly in
phy_attach/detach. If the PHY's ofnode has an "leds" subnode, then the
LEDs will be (un)registered when probing/removing the genphy driver.
This could occur if the leds are for a non-generic driver that isn't
loaded for whatever reason. Synchronously removing the PHY device in
phy_detach leads to the following deadlock:
rtnl_lock()
ndo_close()
...
phy_detach()
phy_remove()
phy_leds_unregister()
led_classdev_unregister()
led_trigger_set()
netdev_trigger_deactivate()
unregister_netdevice_notifier()
rtnl_lock()
There is a corresponding deadlock on the open/register side of things
(and that one is reported by lockdep), but it requires a race while this
one is deterministic. Regular drivers do not have this problem since
they are probed asynchronously (without RTNL held).
Generic PHYs do not support LEDs anyway, so don't bother registering
them.
[JakubL this is a net-next version of
commit f0f2b992d818 ("net: phy: Don't register LEDs for genphy"),
which uses APIs removed in -next.]
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250710201454.1280277-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add flow control mechanism between paired netdevsim devices to stop the
TX queue during high traffic scenarios. When a receive queue becomes
congested (approaching NSIM_RING_SIZE limit), the corresponding transmit
queue on the peer device is stopped using netif_subqueue_try_stop().
Once the receive queue has sufficient capacity again, the peer's
transmit queue is resumed with netif_tx_wake_queue().
Key changes:
* Add nsim_stop_peer_tx_queue() to pause peer TX when RX queue is full
* Add nsim_start_peer_tx_queue() to resume peer TX when RX queue drains
* Implement queue mapping validation to ensure TX/RX queue count match
* Wake all queues during device unlinking to prevent stuck queues
* Use RCU protection when accessing peer device references
* wake the queues when changing the queue numbers
* Remove IFF_NO_QUEUE given it will enqueue packets now
The flow control only activates when devices have matching TX/RX queue
counts to ensure proper queue mapping.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250711-netdev_flow_control-v3-1-aa1d5a155762@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Added a test for variable PMTU in broadcast routes.
This test uses iputils' ping and attempts to send a ping between
two peers, which should result in a regular echo reply.
This test will fail when the receiving peer does not receive the echo
request due to a lack of packet fragmentation.
Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Link: https://patch.msgid.link/20250710142714.12986-2-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, __mkroute_output overrules the MTU value configured for
broadcast routes.
This buggy behaviour can be reproduced with:
ip link set dev eth1 mtu 9000
ip route del broadcast 192.168.0.255 dev eth1 proto kernel scope link src 192.168.0.2
ip route add broadcast 192.168.0.255 dev eth1 proto kernel scope link src 192.168.0.2 mtu 1500
The maximum packet size should be 1500, but it is actually 8000:
ping -b 192.168.0.255 -s 8000
Fix __mkroute_output to allow MTU values to be configured for
for broadcast routes (to support a mixed-MTU local-area-network).
Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Link: https://patch.msgid.link/20250710142714.12986-1-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:
====================
mlx5-next updates 2025-07-14
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: IFC updates for disabled host PF
net/mlx5: Expose disciplined_fr_counter through HCA capabilities in mlx5_ifc
RDMA/mlx5: Fix UMR modifying of mkey page size
net/mlx5: Expose HCA capability bits for mkey max page size
====================
Link: https://patch.msgid.link/1752481357-34780-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2025-07-11
The first patch is by Geert Uytterhoeven and converts the rcar_can
driver to DEFINE_SIMPLE_DEV_PM_OPS.
The last patch is by Biju Das and removes unused macros from the
rcar_canfd driver.
* tag 'linux-can-next-for-6.17-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: rcar_canfd: Drop unused macros
can: rcar_can: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
====================
Link: https://patch.msgid.link/20250711101706.2822687-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
x25_terminate_link() has been unused since the last use was removed
in 2020 by:
commit 7eed751b3b2a ("net/x25: handle additional netdev events")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://patch.msgid.link/20250712205759.278777-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I missed adding rss_api.py to the Makefile. The NIPA Makefile
checking script was scanning for shell scripts only, so it
didn't flag it either.
Fixes: 4d13c6c449af ("selftests: drv-net: test RSS Netlink notifications")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250712012005.4010263-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The buffer bgx_sel used in snprintf() was too small to safely hold
the formatted string "BGX%d" for all valid bgx_id values. This caused
a -Wformat-truncation warning with `Werror` enabled during build.
Increase the buffer size from 5 to 7 and use `sizeof(bgx_sel)` in
snprintf() to ensure safety and suppress the warning.
Build warning:
CC drivers/net/ethernet/cavium/thunder/thunder_bgx.o
drivers/net/ethernet/cavium/thunder/thunder_bgx.c: In function
‘bgx_acpi_match_id’:
drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:27: error: ‘%d’
directive output may be truncated writing between 1 and 3 bytes into a
region of size 2 [-Werror=format-truncation=]
snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);
^~
drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:23: note:
directive argument in the range [0, 255]
snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);
^~~~~~~
drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:2: note:
‘snprintf’ output between 5 and 7 bytes into a destination of size 5
snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);
compiler warning due to insufficient snprintf buffer size.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250711140532.2463602-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Wei Fang says:
====================
net: fec: add some optimizations
Add some optimizations to the fec driver, see each patch for details.
v1: https://lore.kernel.org/20250710090902.1171180-1-wei.fang@nxp.com
====================
Link: https://patch.msgid.link/20250711091639.1374411-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the current driver, the MAC address is set in both fec_restart() and
fec_set_mac_address(), so a generic helper function fec_set_hw_mac_addr()
is added to set the hardware MAC address to make the code more compact.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250711091639.1374411-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are also some RCR bits that are not defined but are used by the
driver, so add macro definitions for these bits to improve readability
and maintainability.
In addition, although FEC_RCR_HALFDPX has been defined, it is not used
in the driver. According to the description of FEC_RCR[1] in RM, it is
used to disable receive on transmit. Therefore, it is more appropriate
to redefine FEC_RCR[1] as FEC_RCR_DRT.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250711091639.1374411-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use the generic helper function phy_interface_mode_is_rgmii() to check
RGMII mode.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250711091639.1374411-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is a follow-up for commit eb1ac9ff6c4a5 ("ipv6: anycast: Don't
hold RTNL for IPV6_JOIN_ANYCAST.").
We should not add a new device lookup API without netdevice_tracker.
Let's pass netdevice_tracker to dev_get_by_flags_rcu() and rename it
with netdev_ prefix to match other newer APIs.
Note that we always use GFP_ATOMIC for netdev_hold() as it's expected
to be called under RCU.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20250708184053.102109f6@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250711051120.2866855-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Renesas RZ/G3E SMARC EVK uses KSZ9131RNXC phy. On deep power state,
PHY loses the power and on wakeup the rgmii delays are not reconfigured
causing it to fail.
Replace the callback kszphy_resume()->ksz9131_resume() for reconfiguring
the rgmii_delay when it exits from PM suspend state.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250711054029.48536-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We default to raising an exception when unknown attrs are found
to make sure those are noticed during development.
When YNL CLI is "installed" and used by sysadmins erroring out
is not going to be helpful. It's far more likely the user space
is older than the kernel in that case, than that some attr is
misdefined or missing.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'struct regmap_config' are not modified in these drivers. They be
statically defined instead of allocated and populated at run-time.
The main benefits are:
- it saves some memory at runtime
- the structures can be declared as 'const', which is always better for
structures that hold some function pointers
- the code is less verbose
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend the process_unknown handing to enum values and flags.
Tested by removing entries from rt-link.yaml and rt-neigh.yaml:
./tools/net/ynl/pyynl/cli.py --family rt-link --dump getlink \
--process-unknown --output-json | jq '.[0] | ."ifi-flags"'
[
"up",
"Unknown(6)",
"loopback",
"Unknown(16)"
]
./tools/net/ynl/pyynl/cli.py --family rt-neigh --dump getneigh \
--process-unknown --output-json | jq '.[] | ."ndm-type"'
"unicast"
"Unknown(5)"
"Unknown(5)"
"unicast"
"Unknown(5)"
"unicast"
"broadcast"
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The port 2 host PF can be disabled, this bit reflects that setting.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1752064867-16874-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Introduce the `disciplined_fr_counter` capability bit to indicate that
the device’s free-running cycle counter is disciplined to real-time.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1752064867-16874-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When changing the page size on an mkey, the driver needs to set the
appropriate bits in the mkey mask to indicate which fields are being
modified.
The 6th bit of a page size in mlx5 driver is considered an extension,
and this bit has a dedicated capability and mask bits.
Previously, the driver was not setting this mask in the mkey mask when
performing page size changes, regardless of its hardware support,
potentially leading to an incorrect page size updates.
This fixes the issue by setting the relevant bit in the mkey mask when
performing page size changes on an mkey and the 6th bit of this field is
supported by the hardware.
Fixes: cef7dde8836a ("net/mlx5: Expand mkey page size to support 6 bits")
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/9f43a9c73bf2db6085a99dc836f7137e76579f09.1751979184.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Expose the HCA capability for maximal page size that can be configured
for an mkey. Used for enforcing capabilities when working with highly
contiguous memory and using large page sizes.
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/3e4d3fda37934430f65f72601519e22bf396fd05.1751979184.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- batman-adv: store hard_iface as iflink private data,
by Matthias Schiffer
* tag 'batadv-next-pullrequest-20250710' of git://git.open-mesh.org/linux-merge:
batman-adv: store hard_iface as iflink private data
batman-adv: Start new development cycle
====================
Link: https://patch.msgid.link/20250710164501.153872-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: cleanups and preparation for live migration
Jake Keller says:
Various cleanups and preparation to the ice driver code for supporting
SR-IOV live migration.
The logic for unpacking Rx queue context data is added. This is the inverse
of the existing packing logic. Thanks to <linux/packing.h> this is trivial
to add.
Code to enable both reading and writing the Tx queue context for a queue
over a shared hardware register interface is added. Thanks to ice_adapter,
this is locked across all PFs that need to use it, preventing concurrency
issues with multiple PFs.
The RSS hash configuration requested by a VF is cached within the VF
structure. This will be used to track and restore the same configuration
during migration load.
ice_sriov_set_msix_vec_count() is updated to use pci_iov_vf_id() instead of
open-coding a worse equivalent, and checks to avoid rebuilding MSI-X if the
current request is for the existing amount of vectors.
A new ice_get_vf_by_dev() helper function is added to simplify accessing a
VF from its PCI device structure. This will be used more heavily within the
live migration code itself.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: introduce ice_get_vf_by_dev() wrapper
ice: avoid rebuilding if MSI-X vector count is unchanged
ice: use pci_iov_vf_id() to get VF ID
ice: expose VF functions used by live migration
ice: move ice_vsi_update_l2tsel to ice_lib.c
ice: save RSS hash configuration for migration
ice: add functions to get and set Tx queue context
ice: add support for reading and unpacking Rx queue context
====================
Link: https://patch.msgid.link/20250710214518.1824208-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In temac_probe(), the debug message intended to print the resolved
PHY node was mistakenly using the controller node temac_np
instead of the actual PHY node lp->phy_node. This patch corrects
the log to reference the correct device tree node.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250710183737.2385156-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Three tests are cooking GSO packets but do not provide
gso_size information to the kernel, triggering this message:
TCP: tun0: Driver has suspect GRO implementation, TCP performance may be compromised.
Add --mss option to avoid this warning.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250710155641.3028726-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Toke Høiland-Jørgensen says:
====================
netdevsim: support setting a permanent address
Network management daemons that match on the device permanent address
currently have no virtual interface types to test against.
NetworkManager, in particular, has carried an out of tree patch to set
the permanent address on netdevsim devices to use in its CI for this
purpose.
This series adds support to netdevsim to set a permanent address on port
creation, and adds a test script to test setting and getting of the
different L2 address types.
v3: https://lore.kernel.org/20250706-netdevsim-perm_addr-v3-0-88123e2b2027@redhat.com
v2: https://lore.kernel.org/20250702-netdevsim-perm_addr-v2-0-66359a6288f0@redhat.com
v1: https://lore.kernel.org/20250203-netdevsim-perm_addr-v1-1-10084bc93044@redhat.com
====================
Link: https://patch.msgid.link/20250710-netdevsim-perm_addr-v4-0-c9db2fecf3bf@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new test script to the network selftests which tests getting and
setting of layer 2 addresses through netlink, including the newly added
support for setting a permaddr on netdevsim devices.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250710-netdevsim-perm_addr-v4-2-c9db2fecf3bf@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Network management daemons that match on the device permanent address
currently have no virtual interface types to test against.
NetworkManager, in particular, has carried an out of tree patch to set
the permanent address on netdevsim devices to use in its CI for this
purpose.
To support this use case, support setting netdev->perm_addr when
creating a netdevsim port.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250710-netdevsim-perm_addr-v4-1-c9db2fecf3bf@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The iou-zcrx selftest currently runs the server on the remote host
and the client on the local host. This commit flips the endpoints
such that server runs on localhost and client on remote.
This change brings the iou-zcrx selftest in convention with other
selftests.
Drive-by fix for a missing import exception that happens when the
network interface has less than 2 combined channels.
Test plan: ran iou-zcrx.py selftest between 2 physical machines
Signed-off-by: Vishwanath Seshagiri <vishs@fb.com>
Link: https://patch.msgid.link/20250710165337.614159-1-vishs@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The code had some rather odd control flow inherited from when it was
shared with siena and ef10 before this driver was split out.
Simplify that for easier reading.
Also add a comment explaining why we return the values we do, since
some Falcon documents and datasheets confusingly mention the part
supporting 4-tuple UDP hashing.
(I couldn't find any record of exactly what was "broken" about the
original Falcon A hash, I'm just trusting that falcon_init_rx_cfg()
had a good reason for not using it.)
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250710173213.1638397-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
net_sched: act: extend RCU use in dump() methods
We are trying to get away from central RTNL in favor of fine-grained
mutexes. While looking at net/sched, I found that act already uses
RCU in the fast path for the most cases, and could also be used
in dump() methods.
This series is not complete and will be followed by a second one.
v1: https://lore.kernel.org/20250707130110.619822-1-edumazet@google.com
====================
Link: https://patch.msgid.link/20250709090204.797558-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_skbedit_params
makes sure there is no discrepancy in tcf_skbedit_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-12-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_police_params
makes sure there is no discrepancy in tcf_police_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-11-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_pedit_params
makes sure there is no discrepancy in tcf_pedit_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-10-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_nat_params
makes sure there is no discrepancy in tcf_nat_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_mpls_params
makes sure there is no discrepancy in tcf_mpls_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_ctinfo_params
makes sure there is no discrepancy in tcf_ctinfo_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 21c167aa0ba9 ("net/sched: act_ctinfo: use percpu stats")
missed that stats_dscp_set, stats_dscp_error and stats_cpmark_set
might be written (and read) locklessly.
Use atomic64_t for these three fields, I doubt act_ctinfo is used
heavily on big SMP hosts anyway.
Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pedro Tammela <pctammela@mojatatu.com>
Link: https://patch.msgid.link/20250709090204.797558-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_ct_params
makes sure there is no discrepancy in tcf_ct_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_csum_params
makes sure there is no discrepancy in tcf_csum_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_connmark_parms
makes sure there is no discrepancy in tcf_connmark_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tcf_tm_dump() reads fields that can be changed concurrently,
and tcf_lastuse_update() might race against itself.
Add READ_ONCE() and WRITE_ONCE() annotations.
Fetch jiffies once in tcf_tm_dump().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
UBSAN complains that we reach beyond the end of the log entry:
UBSAN: array-index-out-of-bounds in drivers/net/ethernet/meta/fbnic/fbnic_fw_log.c:94:50
index 71 is out of range for type 'char [*]'
Call Trace:
<TASK>
ubsan_epilogue+0x5/0x2b
fbnic_fw_log_write+0x120/0x960
fbnic_fw_parse_logs+0x161/0x210
We're just taking the address of the character after the array,
so this really seems like something that should be legal.
But whatever, easy enough to silence by doing direct pointer math.
Fixes: c2b93d6beca8 ("eth: fbnic: Create ring buffer for firmware logs")
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250709205910.3107691-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Consolidate the two nested if conditions for checking tx queue wake
conditions into a single combined condition. This improves code
readability without changing functionality. And move netif_tx_wake_queue
into if condition to reduce unnecessary checks for queue stops.
Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250710023208.846-1-liming.wu@jaguarmicro.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ensure that a duplicating netem cannot exist in a tree with other netems
in both qdisc addition and change. This is meant to prevent the soft
lockup and OOM loop scenario discussed in [1]. Also adjust a HFSC's
re-entrancy test case with netem for this new restriction - KASAN
still triggers upon its failure.
[1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/
Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250708164219.875521-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
netem_enqueue's duplication prevention logic breaks when a netem
resides in a qdisc tree with other netems - this can lead to a
soft lockup and OOM loop in netem_dequeue, as seen in [1].
Ensure that a duplicating netem cannot exist in a tree with other
netems.
Previous approaches suggested in discussions in chronological order:
1) Track duplication status or ttl in the sk_buff struct. Considered
too specific a use case to extend such a struct, though this would
be a resilient fix and address other previous and potential future
DOS bugs like the one described in loopy fun [2].
2) Restrict netem_enqueue recursion depth like in act_mirred with a
per cpu variable. However, netem_dequeue can call enqueue on its
child, and the depth restriction could be bypassed if the child is a
netem.
3) Use the same approach as in 2, but add metadata in netem_skb_cb
to handle the netem_dequeue case and track a packet's involvement
in duplication. This is an overly complex approach, and Jamal
notes that the skb cb can be overwritten to circumvent this
safeguard.
4) Prevent the addition of a netem to a qdisc tree if its ancestral
path contains a netem. However, filters and actions can cause a
packet to change paths when re-enqueued to the root from netem
duplication, leading us to the current solution: prevent a
duplicating netem from inhabiting the same tree as other netems.
[1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/
[2] https://lwn.net/Articles/719297/
Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
Reported-by: William Liu <will@willsroot.io>
Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
Signed-off-by: William Liu <will@willsroot.io>
Signed-off-by: Savino Dicanosa <savy@syst3mfailure.io>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250708164141.875402-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc6-2).
No conflicts.
Adjacent changes:
drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
c701574c5412 ("wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan")
b3a431fe2e39 ("wifi: mt76: mt7925: fix off by one in mt7925_mcu_hw_scan()")
drivers/net/wireless/mediatek/mt76/mt7996/mac.c
62da647a2b20 ("wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()")
dc66a129adf1 ("wifi: mt76: add a wrapper for wcid access with validation")
drivers/net/wireless/mediatek/mt76/mt7996/main.c
3dd6f67c669c ("wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl()")
8989d8e90f5f ("wifi: mt76: mt7996: Do not set wcid.sta to 1 in mt7996_mac_sta_event()")
net/mac80211/cfg.c
58fcb1b4287c ("wifi: mac80211: reject VHT opmode for unsupported channel widths")
037dc18ac3fb ("wifi: mac80211: add support for storing station S1G capabilities")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull more networking fixes from Jakub Kicinski
"Big chunk of fixes for WiFi, Johannes says probably the last for the
release.
The Netlink fixes (on top of the tree) restore operation of iw (WiFi
CLI) which uses sillily small recv buffer, and is the reason for this
'emergency PR'.
The GRE multicast fix also stands out among the user-visible
regressions.
Current release - fix to a fix:
- netlink: make sure we always allow at least one skb to be queued,
even if the recvbuf is (mis)configured to be tiny
Previous releases - regressions:
- gre: fix IPv6 multicast route creation
Previous releases - always broken:
- wifi: prevent A-MSDU attacks in mesh networks
- wifi: cfg80211: fix S1G beacon head validation and detection
- wifi: mac80211:
- always clear frame buffer to prevent stack leak in cases which
hit a WARN()
- fix monitor interface in device restart
- wifi: mwifiex: discard erroneous disassoc frames on STA interface
- wifi: mt76:
- prevent null-deref in mt7925_sta_set_decap_offload()
- add missing RCU annotations, and fix sleep in atomic
- fix decapsulation offload
- fixes for scanning
- phy: microchip: improve link establishment and reset handling
- eth: mlx5e: fix race between DIM disable and net_dim()
- bnxt_en: correct DMA unmap len for XDP_REDIRECT"
* tag 'net-6.16-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
netlink: make sure we allow at least one dump skb
netlink: Fix rmem check in netlink_broadcast_deliver().
bnxt_en: Set DMA unmap len correctly for XDP_REDIRECT
bnxt_en: Flush FW trace before copying to the coredump
bnxt_en: Fix DCB ETS validation
net: ll_temac: Fix missing tx_pending check in ethtools_set_ringparam()
net/mlx5e: Add new prio for promiscuous mode
net/mlx5e: Fix race between DIM disable and net_dim()
net/mlx5: Reset bw_share field when changing a node's parent
can: m_can: m_can_handle_lost_msg(): downgrade msg lost in rx message to debug level
selftests: net: lib: fix shift count out of range
selftests: Add IPv6 multicast route generation tests for GRE devices.
gre: Fix IPv6 multicast route creation.
net: phy: microchip: limit 100M workaround to link-down events on LAN88xx
net: phy: microchip: Use genphy_soft_reset() to purge stale LPA bits
ibmvnic: Fix hardcoded NUM_RX_STATS/NUM_TX_STATS with dynamic sizeof
net: appletalk: Fix device refcount leak in atrtr_create()
netfilter: flowtable: account for Ethernet header in nf_flow_pppoe_proto()
wifi: mac80211: add the virtual monitor after reconfig complete
wifi: mac80211: always initialize sdata::key_list
...
|