Age | Commit message (Collapse) | Author |
|
Jakub Kicinski says:
====================
net: skip taking rtnl_lock for queue GET (prep)
Skip taking rtnl_lock for queue GET ops on devices which opt
into running all ops under the instance lock. In preparating
for performing queue ops without rtnl lock clarify the protection
of queue-related fields.
v1: https://lore.kernel.org/20250312223507.805719-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250324224537.248800-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ensure that all accesses to mp_params are under the netdev
instance lock. The only change we need is to move
dev_memory_provider_uninstall() under the lock.
Appropriately swap the asserts.
Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
netdev netlink is the only reader of netdev_{,rx_}queue->napi,
and it already holds netdev->lock. Switch protection of
the writes to netdev->lock to "ops protected".
The expectation will be now that accessing queue->napi
will require netdev->lock for "ops locked" drivers, and
rtnl_lock for all other drivers.
Current "ops locked" drivers don't require any changes.
gve and netdevsim use _locked() helpers right next to
netif_queue_set_napi() so they must be holding the instance
lock. iavf doesn't call it. bnxt is a bit messy but all paths
seem locked.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drivers which opt into instance lock protection of ops should
only call set_real_num_*_queues() under the instance lock.
This means that queue counts are double protected (writes
are under both rtnl_lock and instance lock, readers under
either).
Some readers may still be under the rtnl_lock, however, so for
now we need double protection of writers.
OTOH queue API paths are only under the protection of the instance
lock, so we need to validate that the instance is actually locking
ops, otherwise the input checks we do against queue count are racy.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Try to define some terminology for which fields are protected
by which lock and how. Some fields are protected by both rtnl_lock
and instance lock which is hard to talk about without having
a "key phrase" to refer to a particular protection scheme.
"ops protected" fields are defined later in the series, one by one.
Add ASSERT_RTNL() to netdev_ops_assert_locked() for drivers
not other instance protection of ops. Hopefully it's not too
confusion that netdev_lock_ops() does not match the lock which
netdev_ops_assert_locked() will assert, exactly. The noun "ops"
is in a different place in the name, so I think it's acceptable...
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
lockdep asserts and predicates can operate on const pointers.
In the future this will let us add asserts in functions
which operate on const pointers like dev_get_min_mp_channel_count().
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit a953be53ce40 ("net-sysfs: add support for device-specific
rx queue sysfs attributes"), so for at least a decade now it is safe
to call net_rx_queue_update_kobjects() when SYSFS=n. That function
does its own ifdef-inery and will return 0. Remove the unnecessary
stub for netif_set_real_num_rx_queues().
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
net_devmem_unbind_dmabuf()
A recent commit added taking the netdev instance lock
in netdev_nl_bind_rx_doit(), but didn't remove it in
net_devmem_unbind_dmabuf() which it calls from an error path.
Always expect the callers of net_devmem_unbind_dmabuf() to
hold the lock. This is consistent with net_devmem_bind_dmabuf().
(Not so) coincidentally this also protects mp_param with the instance
lock, which the rest of this series needs.
Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250324224537.248800-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Akihiko Odaki says:
====================
virtio_net: Fixes and improvements
Jason Wang recently proposed an improvement to struct
virtio_net_rss_config:
https://lore.kernel.org/CACGkMEud0Ki8p=z299Q7b4qEDONpYDzbVqhHxCNVk_vo-KdP9A@mail.gmail.com
This patch series implements it and also fixes a few minor bugs I found
when writing patches.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
v1: https://lore.kernel.org/20250318-virtio-v1-0-344caf336ddd@daynix.com
====================
Link: https://patch.msgid.link/20250321-virtio-v2-0-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
virtnet_probe() lacks the code to free rss_hdr in its error path.
Allocate rss_hdr with devres so that it will be automatically freed.
Fixes: 86a48a00efdf ("virtio_net: Support dynamic rss indirection table size")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20250321-virtio-v2-4-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The new RSS configuration structures allow easily constructing data for
VIRTIO_NET_CTRL_MQ_RSS_CONFIG as they strictly follow the order of data
for the command.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20250321-virtio-v2-3-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Mark the fields of struct virtio_net_ctrl_rss as little endian as
they are in struct virtio_net_rss_config, which it follows.
Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20250321-virtio-v2-2-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
struct virtio_net_rss_config was less useful in actual code because of a
flexible array placed in the middle. Add new structures that split it
into two to avoid having a flexible array in the middle.
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20250321-virtio-v2-1-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.
Remove the redundant 'flush_workqueue()' calls.
This was generated with coccinelle:
@@
expression E;
@@
- flush_workqueue(E);
destroy_workqueue(E);
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250324080854.408188-1-nichen@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Revert "udp_tunnel: use static call for GRO hooks when possible"
This reverts commit 311b36574ceaccfa3f91b74054a09cd4bb877702.
Revert "udp_tunnel: create a fastpath GRO lookup."
This reverts commit 8d4880db378350f8ed8969feea13bdc164564fc1.
There are multiple small issues with the series. In the interest
of unblocking the merge window let's opt for a revert.
Link: https://lore.kernel.org/cover.1742557254.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Maxime Chevallier says:
====================
net: phy: sfp: Add single-byte SMBus SFP access
This is V4 for the single-byte SMBus support for SFP cages as well as
embedded PHYs accessed over mdio-i2c.
v3: https://lore.kernel.org/20250314162319.516163-1-maxime.chevallier@bootlin.com
v2: https://lore.kernel.org/20250225112043.419189-1-maxime.chevallier@bootlin.com
v1: https://lore.kernel.org/20250223172848.1098621-1-maxime.chevallier@bootlin.com
====================
Link: https://patch.msgid.link/20250322075745.120831-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
PHYs that are within copper SFP modules have their MDIO bus accessible
through address 0x56 (usually) on the i2c bus. The MDIO-I2C bridge is
desgned for 16 bits accesses, but we can also perform 8bits accesses by
reading/writing the high and low bytes sequentially.
This commit adds support for this type of accesses, thus supporting
smbus controllers such as the one in the VSC8552.
This was only tested on Copper SFP modules that embed a Marvell 88e1111
PHY.
Tested-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322075745.120831-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The SFP module's eeprom and internals are accessible through an i2c bus.
It is possible that the SFP might be connected to an SMBus-only
controller, such as the one found in some PHY devices in the VSC85xx
family.
Introduce a set of sfp read/write ops that are going to be used if the
i2c bus is only capable of doing smbus byte accesses.
As Single-byte SMBus transaction go against SFF-8472 and breaks the
atomicity for diagnostics data access, hwmon is disabled in the case
of SMBus access.
Moreover, as this may cause other instabilities, print a warning at
probe time to indicate that the setup may be unreliable because of the
hardware design.
As hwmon may be disabled for both broken EEPROM and smbus, the warnings
are udpated accordingly.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250322075745.120831-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2025-03-24
1) Prevent setting high order sequence number bits input in
non-ESN mode. From Leon Romanovsky.
2) Support PMTU handling in tunnel mode for packet offload.
From Leon Romanovsky.
3) Make xfrm_state_lookup_byaddr lockless.
From Florian Westphal.
4) Remove unnecessary NULL check in xfrm_lookup_with_ifid().
From Dan Carpenter.
* tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Remove unnecessary NULL check in xfrm_lookup_with_ifid()
xfrm: state: make xfrm_state_lookup_byaddr lockless
xfrm: check for PMTU in tunnel mode for packet offload
xfrm: provide common xdo_dev_offload_ok callback implementation
xfrm: rely on XFRM offload
xfrm: simplify SA initialization routine
xfrm: delay initialization of offload path till its actually requested
xfrm: prevent high SEQ input in non-ESN mode
====================
Link: https://patch.msgid.link/20250324061855.4116819-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
DTS example in the bindings should be indented with 2- or 4-spaces and
aligned with opening '- |', so correct any differences like 3-spaces or
mixtures 2- and 4-spaces in one binding.
No functional changes here, but saves some comments during reviews of
new patches built on existing code.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Alex Elder <elder@riscstar.com>
Link: https://patch.msgid.link/20250324125222.82057-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following batch contains Netfilter updates for net-next:
1) Use kvmalloc in xt_hashlimit, from Denis Kirjanov.
2) Tighten nf_conntrack sysctl accepted values for nf_conntrack_max
and nf_ct_expect_max, from Nicolas Bouchinet.
3) Avoid lookup in nft_fib if socket is available, from Florian Westphal.
4) Initialize struct lsm_context in nfnetlink_queue to avoid
hypothetical ENOMEM errors, Chenyuan Yang.
5) Use strscpy() instead of _pad when initializing xtables table name,
kzalloc is already used to initialized the table memory area.
From Thorsten Blum.
6) Missing socket lookup by conntrack information for IPv6 traffic
in nft_socket, there is a similar chunk in IPv4, this was never
added when IPv6 NAT was introduced. From Maxim Mikityanskiy.
7) Fix clang issues with nf_tables CONFIG_MITIGATION_RETPOLINE,
from WangYuli.
* tag 'nf-next-25-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: Only use nf_skip_indirect_calls() when MITIGATION_RETPOLINE
netfilter: socket: Lookup orig tuple for IPv6 SNAT
netfilter: xtables: Use strscpy() instead of strscpy_pad()
netfilter: nfnetlink_queue: Initialize ctx to avoid memory allocation error
netfilter: fib: avoid lookup if socket is available
netfilter: conntrack: Bound nf_conntrack sysctl writes
netfilter: xt_hashlimit: replace vmalloc calls with kvmalloc
====================
Link: https://patch.msgid.link/20250323100922.59983-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This fixes the following build warning:
```
drivers/net/ethernet/amd/au1000_eth.c:574:6: warning: no previous prototype for 'au1000_ReleaseDB' [-Wmissing-prototypes]
574 | void au1000_ReleaseDB(struct au1000_private *aup, struct db_dest *pDB)
| ^~~~~~~~~~~~~~~~
```
Signed-off-by: Johan Korsnes <johan.korsnes@gmail.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250323190450.111241-1-johan.korsnes@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RFS is using two kinds of hash tables.
First one is controlled by /proc/sys/net/core/rps_sock_flow_entries = 2^N
and using the N low order bits of the l4 hash is good enough.
Then each RX queue has its own hash table, controlled by
/sys/class/net/eth1/queues/rx-$q/rps_flow_cnt = 2^X
Current hash function, using the X low order bits is suboptimal,
because RSS is usually using Func(hash) = (hash % power_of_two);
For example, with 32 RX queues, 6 low order bits have no entropy
for a given queue.
Switch this hash function to hash_32(hash, log) to increase
chances to use all possible slots and reduce collisions.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250321171309.634100-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
More features for 6.15, major changes:
* cfg80211/mac80211: fix and enable link reconfiguration
* rtw88: support RTL8814AE/RTL8814AU
* mt7996: preparations for MLO
* ath12k: continued work on MLO
* iwlwifi: add new iwlmld sub-driver/op-mode for
some current and future devices
* wfx: wowlan support
* tag 'wireless-next-2025-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (311 commits)
wifi: mt76: mt7996: fix locking in mt7996_mac_sta_rc_work()
wifi: mt76: mt76x2u: add TP-Link TL-WDN6200 ID to device table
wifi: mt76: mt792x: re-register CHANCTX_STA_CSA only for the mt7921 series
wifi: mt76: mt7996: Update mt7996_tx to MLO support
wifi: mt76: mt7996: rework mt7996_ampdu_action to support MLO
wifi: mt76: mt7996: rework set/get_tsf callabcks to support MLO
wifi: mt76: mt7996: set vif default link_id adding/removing vif links
wifi: mt76: mt7996: rework mt7996_mcu_beacon_inband_discov to support MLO
wifi: mt76: mt7996: rework mt7996_mcu_add_obss_spr to support MLO
wifi: mt76: mt7996: rework mt7996_net_fill_forward_path to support MLO
wifi: mt76: mt7996: rework mt7996_update_mu_group to support MLO
wifi: mt76: mt7996: rework mt7996_mac_sta_poll to support MLO
wifi: mt76: mt7996: rework mt7996_mac_sta_rc_work to support MLO
wifi: mt76: mt7996: remove mt7996_mac_enable_rtscts()
wifi: mt76: mt7996: rework mt7996_sta_hw_queue_read to support MLO
wifi: mt76: mt7996: rework mt7996_set_hw_key to support MLO
wifi: mt76: mt7996: Add mt7996_sta_link to mt7996_mcu_add_bss_info signature
wifi: mt76: mt7996: rework mt7996_sta_set_4addr and mt7996_sta_set_decap_offload to support MLO
wifi: mt76: mt7996: rework mt7996_rx_get_wcid to support MLO
wifi: mt76: mt7996: Rely on wcid_to_sta in mt7996_mac_add_txs_skb()
...
====================
Link: https://patch.msgid.link/20250320131106.33266-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jonas Karlman says:
====================
net: stmmac: dwmac-rk: Add GMAC support for RK3528
The Rockchip RK3528 has two Ethernet controllers, one 100/10 MAC to be
used with the integrated PHY and a second 1000/100/10 MAC to be used
with an external Ethernet PHY.
This series add initial support for the Ethernet controllers found
in RK3528 and initial support to power up/down the integrated PHY.
v2: https://lore.kernel.org/20250309232622.1498084-1-jonas@kwiboo.se
v1: https://lore.kernel.org/20250306221402.1704196-1-jonas@kwiboo.se
====================
Link: https://patch.msgid.link/20250319214415.3086027-1-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rockchip RK3528 (and RV1106) has a different integrated PHY compared to
the integrated PHY on RK3228/RK3328. Current powerup/down operation is
not compatible with the integrated PHY found in these newer SoCs.
Add operations to powerup/down the integrated PHY found in RK3528.
Use helpers that can be used by other GMAC variants in the future.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250319214415.3086027-6-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rockchip RK3528 (and RV1106) has a different integrated PHY compared to
the integrated PHY on RK3228/RK3328. Current powerup/down operation is
not compatible with the integrated PHY found in these newer SoCs.
Add a new integrated_phy_powerdown operation and change the call chain
for integrated_phy_powerup to prepare support for the integrated PHY
found in these newer SoCs.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250319214415.3086027-5-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rockchip RK3528 (and RV1106) has a different integrated PHY compared to
the integrated PHY on RK3228/RK3328. Current powerup/down operation is
not compatible with the integrated PHY found in these SoCs.
Move the rk_gmac_integrated_phy_powerup/down functions to top of the
file to prepare for them to be called directly by a GMAC variant
specific powerup/down operation.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250319214415.3086027-4-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rockchip RK3528 has two Ethernet controllers based on Synopsys DWC
Ethernet QoS IP.
Add initial support for the RK3528 GMAC variant.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://patch.msgid.link/20250319214415.3086027-3-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rockchip RK3528 has two Ethernet controllers based on Synopsys DWC
Ethernet QoS IP.
Add compatible string for the RK3528 variant.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250319214415.3086027-2-jonas@kwiboo.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Russell King says:
====================
net: improve stmmac resume rx clocking
stmmac has had a long history of problems with resuming, illustrated by
reset failure due to the receive clock not running.
Several attempts have been attempted over the years to address this
issue, such as moving phylink_start() (now phylink_resume()) super
early in stmmac_resume() in commit 90702dcd19c0 ("net: stmmac: fix MAC
not working when system resume back with WoL a ctive.") However, this
has the downside that stmmac_mac_link_up() can (and demonstrably is)
called before or during the driver initialisation in another thread.
This can cause issues as packets could begin to be queued, and the
transmit/receive enable bits will be set before any initialisation has
been done.
Another attempt is used by dwmac-socfpga.c in commit 2d871aa07136 ("net:
stmmac: add platform init/exit for Altera's ARM socfpga") which
pre-dates the above commit.
Neither of these two approaches consider the effect of EEE with a PHY
that supports receive clock-stop and has that feature enabled (which
the stmmac driver does enable). If the link is up, then there is the
possibility for the receive path to be in low-power mode, and the PHY
may stop its receive clock.
This series addresses these issues by (each is not necessarily a
separate patch):
1) introducing phylink_prepare_resume(), which can be used by MAC
drivers to ensure that the PHY is resumed prior to doing any
re-initialisation work. This call is added to stmmac_resume().
2) moving phylink_resume() after all re-initialisation has completed,
thereby ensuring that the hardware is ready to be enabled for
packet reception/transmission.
3) with (1) and (2) addressed, the need for socfpga to have a private
work-around is no longer necessary, so it is removed.
4) introducing phylink functions to block/unblock the receive clock-
stop at the PHY. As these require PHY access over the MDIO bus,
they can sleep, so are not suitable for atomic access.
5) the stmmac hardware requires the receive clock to be running for
reset to complete. Depending on synthesis options, this requirement
may also extend to writing various registers as well, e.g. setting
the MAC address, writing some of the vlan registers, etc. Full
details are in the databook.
We add blocking/unblocking of the PHY receive clock-stop around
parts of the main stmmac driver where we have a context that we
can sleep. These are wrapped with the new phylink functions.
However, depending on synthesis options, there could be other
places where the net core calls the driver with a BH-disabled
context where we can't sleep, and thus can't block the PHY from
disabling its receive clock. These are documented with FIXME
comments.
Given the last paragraph above, I am wondering whether a better
approach would be to ensure that receive clock-stop is always disabled
at the PHY with stmmac. From what I can see, implementations do not
document to this level of detail, which makes it difficult to tell
which registers require the receive clock to be running to behave
correctly.
This patch series has been tested on the Tegra194 Jetson Xavier NX
board kindly donated by NVidia, with two additional patches that are
pending in patchwork - the first is required to have EEE's LPI mode
passed through to the MAC on this platform to allow testing under
PHY clock-stop scenarios. The second is a bug fix for PHYLIB and
makes "eee off" functional, but should not affect this series.
All patches on top of net-next commit f749448ce9f1 ("Merge branch
'net-mlx5-hw-steering-cleanups'")
https://patchwork.kernel.org/project/netdevbpf/patch/E1ttnHW-00785s-Uq@rmk-PC.armlinux.org.uk/
https://patchwork.kernel.org/project/netdevbpf/patch/E1ttmWN-0077Mb-Q6@rmk-PC.armlinux.org.uk/
====================
Link: https://patch.msgid.link/Z9ySeo61VYTClIJJ@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The DesignWare core requires the receive clock to be running during
certain operations. Ensure that we block PHY RXC clock-stop during
these operations.
This is a best-efforts change - not everywhere can be covered by this
because of net's core locking, which means we can't access the MDIO
bus to configure the PHY to disable RXC clock-stop in certain areas.
These are marked with FIXME comments.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6p-008Vjz-Qy@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some MACs require the PHY receive clock to be running to complete setup
actions. This may fail if the PHY has negotiated EEE, the MAC supports
receive clock stop, and the link has entered LPI state. Provide a pair
of APIs that MAC drivers can use to temporarily block the PHY disabling
the receive clock.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6k-008Vjt-MZ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As the previous commit addressed DWGMAC resuming with a PHY in
suspended state, there is now no need for socfpga to work around
this. Remove this code.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6f-008Vjn-J1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Synopsys Designware GMAC core databook requires all clocks to be
active in order to complete software reset, which we perform during
resume.
However, IEEE 802.3 allows a PHY to stop its clocks when placed in
low-power mode, which happens when the system is suspended and WoL
is not enabled.
As an attempt to work around this, commit 36d18b5664ef ("net: stmmac:
start phylink instance before stmmac_hw_setup()") started phylink
early, but this has the side effect that the mac_link_up() method may
be called before or during the initialisation of GMAC hardware.
We also have the socfpga glue driver directly calling phy_resume()
also as an attempt to work around this.
In a previous commit, phylink_prepare_resume() has been introduced
to give MAC drivers a way to ensure that the PHY is resumed prior to
their initialisation of their MAC hardware. This commit adds the call,
and moves the phylink_resume() call back to where it should be before
the aforementioned commit.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6a-008Vjh-FG@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the system is suspended, the PHY may be placed in low-power mode
by setting the BMCR 0.11 Power down bit. IEEE 802.3 states that the
behaviour of the PHY in this state is implementation specific, and
the PHY is not required to meet the RX_CLK and TX_CLK requirements.
Essentially, this means that a PHY may stop the clocks that it is
generating while in power down state.
However, MACs exist which require the clocks from the PHY to be running
in order to properly resume. phylink_prepare_resume() provides them
with a way to clear the Power down bit early.
Note, however, that IEEE 802.3 gives PHYs up to 500ms grace before the
transmit and receive clocks meet the requirements after clearing the
power down bit.
Add a resume preparation function, which will ensure that the receive
clock from the PHY is appropriately configured while resuming.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tvO6V-008Vjb-AP@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Edward Cree says:
====================
sfc: devlink flash for X4
Updates to support devlink flash on X4 NICs.
Patch #2 is needed for NVRAM_PARTITION_TYPE_AUTO, and patch #1 is
needed because the latest MCDI headers from firmware no longer
include MDIO read/write commands.
v1: https://lore.kernel.org/cover.1742223233.git.ecree.xilinx@gmail.com
====================
Link: https://patch.msgid.link/cover.1742493016.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Unlike X2 and EF100, we do not attempt to parse the firmware file to
find an image within it; we simply hand the entire file to the MC,
which is responsible for understanding any container formats we might
use and validating that the firmware file is applicable to this NIC.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/9a72a74002a7819c780b0a18ce9294c9d4e1db12.1742493017.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/bcb7597460a5a99d1dca4ef282f4aa2dd46ae545.1742493017.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Unlike Siena, no EF10 board ever had an external PHY, and consequently
MDIO handling isn't even built into the firmware. Since Siena has
been split out into its own driver, the MDIO code can be deleted from
the sfc driver.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/aa689d192ddaef7abe82709316c2be648a7bd66e.1742493017.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 14a196807482 ("net: reorganize IP MIB values") changed
MIB values to group hot fields together.
Since then 5 new fields have been added without caring about
data locality.
This patch moves IPSTATS_MIB_OUTPKTS, IPSTATS_MIB_NOECTPKTS,
IPSTATS_MIB_ECT1PKTS, IPSTATS_MIB_ECT0PKTS, IPSTATS_MIB_CEPKTS
to the hot portion of per-cpu data.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250320101434.3174412-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TCP uses generic skb_set_owner_r() and sock_rfree()
for received packets, with socket lock being owned.
Switch to private versions, avoiding two atomic operations
per packet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250320121604.3342831-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kuniyuki Iwashima says:
====================
nexthop: Convert RTM_{NEW,DEL}NEXTHOP to per-netns RTNL.
Patch 1 - 5 move some validation for RTM_NEWNEXTHOP so that it can be
called without RTNL.
Patch 6 & 7 converts RTM_NEWNEXTHOP and RTM_DELNEXTHOP to per-netns RTNL.
Note that RTM_GETNEXTHOP and RTM_GETNEXTHOPBUCKET are not touched in
this series.
rtm_get_nexthop() can be easily converted to RCU, but rtm_dump_nexthop()
needs more work due to the left-to-right rbtree walk, which looks prone
to node deletion and tree rotation without a retry mechanism.
v1: https://lore.kernel.org/netdev/20250318233240.53946-1-kuniyu@amazon.com/
====================
Link: https://patch.msgid.link/20250319230743.65267-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In rtm_del_nexthop(), only nexthop_find_by_id() and remove_nexthop()
require RTNL as they touch net->nexthop.rb_root.
Let's move RTNL down as rtnl_net_lock() before nexthop_find_by_id().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If we pass false to the rtnl_held param of lwtunnel_valid_encap_type(),
we can move RTNL down before rtm_to_nh_config_rtnl().
Let's use rtnl_net_lock() in rtm_new_nexthop().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The number of NHA_GROUP entries is guaranteed to be non-zero in
nh_check_attr_group().
Let's remove the redundant check in nexthop_create_group().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
nexthop_add() checks if NLM_F_REPLACE is specified without
non-zero NHA_ID, which does not require RTNL.
Let's move the check to rtm_new_nexthop().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
NHA_OIF needs to look up a device by __dev_get_by_index(),
which requires RTNL.
Let's move NHA_OIF validation to rtm_to_nh_config_rtnl().
Note that the proceeding checks made the original !cfg->nh_fdb
check redundant.
NHA_FDB is set -> NHA_OIF cannot be set
NHA_FDB is set but false -> NHA_OIF must be set
NHA_FDB is not set -> NHA_OIF must be set
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We will push RTNL down to rtm_new_nexthop(), and then we
want to move non-RTNL operations out of the scope.
nh_check_attr_group() validates NHA_GROUP attributes, and
nexthop_find_by_id() and some validation requires RTNL.
Let's factorise such parts as nh_check_attr_group_rtnl()
and call it from rtm_to_nh_config_rtnl().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250319230743.65267-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We will split rtm_to_nh_config() into non-RTNL and RTNL parts,
and then the latter also needs tb.
As a prep, let's move nlmsg_parse() to rtm_new_nexthop().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250319230743.65267-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|