summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2025-01-16net: stmmac: Convert prefetch() to net_prefetch() for received framesFurong Xu
The size of DMA descriptors is 32 bytes at most. net_prefetch() for received frames, and keep prefetch() for descriptors. This patch brings ~4.8% driver performance improvement in a TCP RX throughput test with iPerf tool on a single isolated Cortex-A65 CPU core, 2.92 Gbits/sec increased to 3.06 Gbits/sec. Suggested-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-16net: stmmac: Optimize cache prefetch in RX pathFurong Xu
Current code prefetches cache lines for the received frame first, and then dma_sync_single_for_cpu() against this frame, this is wrong. Cache prefetch should be triggered after dma_sync_single_for_cpu(). This patch brings ~2.8% driver performance improvement in a TCP RX throughput test with iPerf tool on a single isolated Cortex-A65 CPU core, 2.84 Gbits/sec increased to 2.92 Gbits/sec. Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-16net: stmmac: Set page_pool_params.max_len to a precise sizeFurong Xu
DMA engine will always write no more than dma_buf_sz bytes of a received frame into a page buffer, the remaining spaces are unused or used by CPU exclusively. Setting page_pool_params.max_len to almost the full size of page(s) helps nothing more, but wastes more CPU cycles on cache maintenance. For a standard MTU of 1500, then dma_buf_sz is assigned to 1536, and this patch brings ~16.9% driver performance improvement in a TCP RX throughput test with iPerf tool on a single isolated Cortex-A65 CPU core, from 2.43 Gbits/sec increased to 2.84 Gbits/sec. Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-16net: stmmac: Switch to zero-copy in non-XDP RX pathFurong Xu
Avoid memcpy in non-XDP RX path by marking all allocated SKBs to be recycled in the upper network stack. This patch brings ~11.5% driver performance improvement in a TCP RX throughput test with iPerf tool on a single isolated Cortex-A65 CPU core, from 2.18 Gbits/sec increased to 2.43 Gbits/sec. Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-15net/mlx5e: CT: Offload connections with hardware steering rulesCosmin Ratiu
This is modeled similar to how software steering works: - a reference-counted matcher is maintained for each combination of nat/no_nat x ipv4/ipv6 x tcp/udp/gre. - adding a rule involves finding+referencing or creating a corresponding matcher, then actually adding a rule. - updating rules is implemented using the bwc_rule update API, which can change a rule's actions without touching the match value. By using a T-Rex traffic generator to initiate multi-million UDP flows per second, a kernel running with these patches on the RX side was able to offload ~600K flows per second, which is about ~7x larger than what software steering could do on the same hardware (256-thread AMD EPYC, 512 GB RAM, ConnectX-7 b2b). Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250114130646.1937192-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net/mlx5e: CT: Make mlx5_ct_fs_smfs_ct_validate_flow_rule reusableCosmin Ratiu
This function checks whether a flow_rule has the right flow dissector keys and masks used for a connection tracking flow offload. It is currently used locally by the tc_ct smfs module, but is about to be used from another place, so this commit moves it to a better place, renames it to mlx5e_tc_ct_is_valid_flow_rule and drops the unused fs argument. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250114130646.1937192-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net/mlx5e: CT: Add initial support for Hardware SteeringCosmin Ratiu
Connection tracking can offload tuple matches to the NIC either via firmware commands (when the steering mode is dmfs or offload support is disabled due to eswitch being set to legacy) or via software-managed flow steering (smfs). This commit adds stub operations for a third mode, hardware-managed flow steering. This is enabled when both CONFIG_MLX5_TC_CT and CONFIG_MLX5_HW_STEERING are enabled. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250114130646.1937192-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net/mlx5: HWS, rework the check if matcher size can be increasedYevgeny Kliteynik
When checking if the matcher size can be increased, check both match and action RTCs. Also, consider the increasing step - check that it won't cause the new matcher size to become unsupported. Additionally, since we're using '+ 1' for action RTC size yet again, define it as macro and use in all the required places. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250114130646.1937192-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: protect NAPI enablement with netdev_lock()Jakub Kicinski
Wrap napi_enable() / napi_disable() with netdev_lock(). Provide the "already locked" flavor of the API. iavf needs the usual adjustment. A number of drivers call napi_enable() under a spin lock, so they have to be modified to take netdev_lock() first, then spin lock then call napi_enable_locked(). Protecting napi_enable() implies that napi->napi_id is protected by netdev_lock(). Acked-by: Francois Romieu <romieu@fr.zoreil.com> # via-velocity Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: protect netdev->napi_list with netdev_lock()Jakub Kicinski
Hold netdev->lock when NAPIs are getting added or removed. This will allow safe access to NAPI instances of a net_device without rtnl_lock. Create a family of helpers which assume the lock is already taken. Switch iavf to them, as it makes extensive use of netdev->lock, already. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: add netdev_lock() / netdev_unlock() helpersJakub Kicinski
Add helpers for locking the netdev instance, use it in drivers and the shaper code. This will make grepping for the lock usage much easier, as we extend the lock to cover more fields. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20250115035319.559603-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-01-08 (ice) This series contains updates to ice driver only. Przemek reworks implementation so that ice_init_hw() is called before ice_adapter initialization. The motivation is to have ability to act on the number of PFs in ice_adapter initialization. This is not done here but the code is also a bit cleaner. Michal adds priority to be considered when matching recipes for proper differentiation. Konrad adds devlink health reporting for firmware generated events. R Sundar utilizes string helpers over open coded versions. Jake adds implementation to utilize a lower latency interface to program PHY timer when supported. Additional information can be found on the original cover letter: https://lore.kernel.org/intel-wired-lan/20241216145453.333745-1-anton.nadezhdin@intel.com/ Karol adds and allows for different PTP delay values to be used per pin. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Add in/out PTP pin delays ice: implement low latency PHY timer updates ice: check low latency PHY timer update firmware capability ice: add lock to protect low latency interface ice: rename TS_LL_READ* macros to REG_LL_PROXY_H_* ice: use read_poll_timeout_atomic in ice_read_phy_tstamp_ll_e810 ice: use string choice helpers ice: add fw and port health reporters ice: add recipe priority check in search ice: ice_probe: init ice_adapter after HW init ice: minor: rename goto labels from err to unroll ice: split ice_init_hw() out from ice_init_dev() ice: c827: move wait for FW to ice_init_hw() ==================== Link: https://patch.msgid.link/20250115000844.714530-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15inet: ipmr: fix data-racesEric Dumazet
Following fields of 'struct mr_mfc' can be updated concurrently (no lock protection) from ip_mr_forward() and ip6_mr_forward() - bytes - pkt - wrong_if - lastuse They also can be read from other functions. Convert bytes, pkt and wrong_if to atomic_long_t, and use READ_ONCE()/WRITE_ONCE() for lastuse. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250114221049.1190631-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15bnxt_en: add support for hds-thresh ethtool commandTaehee Yoo
The bnxt_en driver has configured the hds_threshold value automatically when TPA is enabled based on the rx-copybreak default value. Now the hds-thresh ethtool command is added, so it adds an implementation of hds-thresh option. Configuration of the hds-thresh is applied only when the tcp-data-split is enabled. The default value of hds-thresh is 256, which is the default value of rx-copybreak, which used to be the hds_thresh value. The maximum hds-thresh is 1023. # Example: # ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256 # ethtool -g enp14s0f0np0 Ring parameters for enp14s0f0np0: Pre-set maximums: ... HDS thresh: 1023 Current hardware settings: ... TCP data split: on HDS thresh: 256 Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-9-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15bnxt_en: add support for tcp-data-split ethtool commandTaehee Yoo
NICs that uses bnxt_en driver supports tcp-data-split feature by the name of HDS(header-data-split). But there is no implementation for the HDS to enable by ethtool. Only getting the current HDS status is implemented and The HDS is just automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled. The hds_threshold follows rx-copybreak value. and it was unchangeable. This implements `ethtool -G <interface name> tcp-data-split <value>` command option. The value can be <on> and <auto>. The value is <auto> and one of LRO/GRO/JUMBO is enabled, HDS is automatically enabled and all LRO/GRO/JUMBO are disabled, HDS is automatically disabled. HDS feature relies on the aggregation ring. So, if HDS is enabled, the bnxt_en driver initializes the aggregation ring. This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition. Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-8-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15bnxt_en: add support for rx-copybreak ethtool commandTaehee Yoo
The bnxt_en driver supports rx-copybreak, but it couldn't be set by userspace. Only the default value(256) has worked. This patch makes the bnxt_en driver support following command. `ethtool --set-tunable <devname> rx-copybreak <value> ` and `ethtool --get-tunable <devname> rx-copybreak`. By this patch, hds_threshol is set to the rx-copybreak value. But it will be set by `ethtool -G eth0 hds-thresh N` in the next patch. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-7-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15eth: fbnic: Add hardware monitoring support via HWMON interfaceSanman Pradhan
This patch adds support for hardware monitoring to the fbnic driver, allowing for temperature and voltage sensor data to be exposed to userspace via the HWMON interface. The driver registers a HWMON device and provides callbacks for reading sensor data, enabling system admins to monitor the health and operating conditions of fbnic. Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250114000705.2081288-4-sanman.p211993@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15eth: fbnic: hwmon: Add support for reading temperature and voltage sensorsSanman Pradhan
Add support for reading temperature and voltage sensor data from firmware by implementing a new TSENE message type and response parsing. This adds message handler infrastructure to transmit sensor read requests and parse responses. The sensor data will be exposed through the driver's hwmon interface. Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250114000705.2081288-3-sanman.p211993@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15eth: fbnic: hwmon: Add completion infrastructure for firmware requestsSanman Pradhan
Add infrastructure to support firmware request/response handling with completions. Add a completion structure to track message state including message type for matching, completion for waiting for response, and result for error propagation. Use existing spinlock to protect the writes. The data from the various response types will be added to the "union u" by subsequent commits. Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250114000705.2081288-2-sanman.p211993@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: lan969x: add FDMA implementationDaniel Machon
The lan969x switch device supports manual frame injection and extraction to and from the switch core, using a number of injection and extraction queues. This technique is currently supported, but delivers poor performance compared to Frame DMA (FDMA). This lan969x implementation of FDMA, hooks into the existing FDMA for Sparx5, but requires its own RX and TX handling, as lan969x does not support the same native cache coherency that Sparx5 does. Effectively, this means that we are going to use the DMA mapping API for mapping and unmapping TX buffers. The RX loop will utilize the page pool API for efficient RX handling. Other than that, the implementation is largely the same, and utilizes the FDMA library for DCB and DB handling. Some numbers: Manual injection/extraction (before this series): // iperf3 -c 1.0.1.1 [ ID] Interval Transfer Bitrate [ 5] 0.00-10.02 sec 345 MBytes 289 Mbits/sec sender [ 5] 0.00-10.06 sec 345 MBytes 288 Mbits/sec receiver FDMA (after this series): // iperf3 -c 1.0.1.1 [ ID] Interval Transfer Bitrate [ 5] 0.00-10.03 sec 1.10 GBytes 940 Mbits/sec sender [ 5] 0.00-10.07 sec 1.10 GBytes 936 Mbits/sec receiver Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20250113-sparx5-lan969x-switch-driver-5-v2-5-c468f02fd623@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: sparx5: ops out certain FDMA functionsDaniel Machon
We are going to implement the RX and TX paths a bit differently on lan969x and therefore need to introduce new ops for FDMA functions: init, deinit, xmit and poll. Assign the Sparx5 equivalents for these and update the code throughout. Also add a 'struct net_device' argument to the xmit() function, as we will be needing that for lan969x. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20250113-sparx5-lan969x-switch-driver-5-v2-4-c468f02fd623@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: sparx5: activate FDMA tx in start()Daniel Machon
The function sparx5_fdma_tx_activate() is responsible for configuring the TX FDMA instance and activating the channel. TX activation has previously been done in the xmit() function, when the first frame is transmitted. Now that we have separate functions for starting and stopping the FDMA, it seems reasonable to move the TX activation to the start function. This change has no implications on the functionality. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20250113-sparx5-lan969x-switch-driver-5-v2-3-c468f02fd623@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: sparx5: split sparx5_fdma_{start(),stop()}Daniel Machon
The two functions: sparx5_fdma_{start(),stop()} are responsible for a number of things, namely: allocation and initialization of FDMA buffers, activation FDMA channels in hardware and activation of the NAPI instance. This patch splits the buffer allocation and initialization into init and deinit functions, and the channel and NAPI activation into start and stop functions. This serves two purposes: 1) the start() and stop() functions can be reused for lan969x and 2) prepares for future MTU change support, where we must be able to stop and start the FDMA channels and NAPI instance, without free'ing and reallocating the FDMA buffers. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20250113-sparx5-lan969x-switch-driver-5-v2-2-c468f02fd623@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: sparx5: enable FDMA on lan969xDaniel Machon
In a previous series, we made sure that FDMA was not initialized and started on lan969x. Now that we are going to support it, undo that change. In addition, make sure the chip ID check is only applicable on Sparx5, as this is a check that is only relevant on this platform. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20250113-sparx5-lan969x-switch-driver-5-v2-1-c468f02fd623@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: phylink: pass neg_mode into c22 state decoderRussell King (Oracle)
Pass the current neg_mode into phylink_mii_c22_pcs_get_state() and phylink_mii_c22_pcs_decode_state(). Update all users of phylink PCS that use these functions. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/E1tXGeY-000Et9-8g@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: phylink: pass neg_mode into .pcs_get_state() methodRussell King (Oracle)
Pass the current neg_mode into the .pcs_get_state() method. Update all users of phylink PCS. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/E1tXGeT-000Et3-4L@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: bcm: asp2: convert to phylib managed EEERussell King (Oracle)
Convert the Broadcom ASP2 driver to use phylib managed EEE support. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/E1tXk81-000r4x-TS@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: bcm: asp2: remove tx_lpi_enabledRussell King (Oracle)
Phylib maintains a copy of tx_lpi_enabled, which will be used to populate the member when phy_ethtool_get_eee(). Therefore, writing to this member before phy_ethtool_get_eee() will have no effect. Remove it. Also remove setting our copy of info->eee.tx_lpi_enabled which becomes write-only. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/E1tXk7w-000r4r-Pq@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: bcm: asp2: fix LPI timer handlingRussell King (Oracle)
Fix the LPI timer handling in Broadcom ASP2 driver after the phylib managed EEE patches were merged. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/E1tXk7r-000r4l-Li@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: restart LPI timer after cleaning transmit descriptorsRussell King (Oracle)
Fix a bug in the LPI handling, where it is possible to immediately enter LPI mode after cleaning the transmit descriptors when all queues are empty rather than waiting for the LPI timeout to expire. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tXItg-000MBg-TW@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: combine stmmac_enable_eee_mode()Russell King (Oracle)
Combine stmmac_enable_eee_mode() with stmmac_try_to_start_sw_lpi() which makes the code easier to read and the flow more logical. We can now trivially see that if the transmit queues are busy, we (re-)start the eee_ctrl_timer. Otherwise, if the transmit path is not already in LPI mode, we ask the hardware to enter LPI mode. I believe that now we can see better what is going on here, this shows that there is a bug with the software LPI timer implementation. The LPI timer is supposed to define how long after the last transmittion completed before we start signalling LPI. However, this code structure shows that if all transmit queues are empty, and stmmac_try_to_start_sw_lpi() is called immediately after cleaning the transmit queue, we will instruct the hardware to start signalling LPI immediately. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tXItb-000MBa-OU@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: provide function for restarting sw LPI timerRussell King (Oracle)
Provide a function that encapsulates restarting the software LPI timer when we have determined that the transmit path is busy, or whether the EEE parameters have changed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tXItW-000MBU-KQ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: provide stmmac_eee_tx_busy()Russell King (Oracle)
Extract the code which checks whether there's still work to do on any of the stmmac transmit queues. This will allow us to combine stmmac_enable_eee_mode() with stmmac_try_to_start_sw_lpi() in the next patch. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tXItR-000MBO-GF@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: add stmmac_try_to_start_sw_lpi()Russell King (Oracle)
There are two places which call stmmac_enable_eee_mode() and follow it immediately by modifying the expiry of priv->eee_ctrl_timer. Both code paths are trying to enable LPI mode. Remove this duplication by providing a function for this. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tXItM-000MBI-CX@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: check priv->eee_sw_timer_en in suspend pathRussell King (Oracle)
The suspend path uses priv->eee_enabled when cleaning up the software timed LPI mode. Use priv->eee_sw_timer_en instead so we're consistently using a single control for software-based timer handling. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tXItH-000MBC-8i@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: simplify TX cleanup decision for ending sw LPI modeRussell King (Oracle)
As mentioned in "net: stmmac: correct priv->eee_sw_timer_en setting", we can simplify some fast-path tests. The transmit cleaning path checks whether EEE is enabled, the transmit path is not in LPI mode, and that we're using software timed mode. Since the above mentioned commit, checking whether EEE is enabled is no longer necessary as priv->eee_sw_timer_en will be false when EEE is disabled. Simplify this test. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tXItC-000MB6-54@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: correct priv->eee_sw_timer_en settingRussell King (Oracle)
If we are disabling EEE/LPI, then we should not be enabling software mode. The only time when we should is if EEE is active, and we are wanting to use software-timed EEE mode. Therefore, in the disable path of stmmac_eee_init(), ensure that priv->eee_sw_timer_en is set false as we are going to be calling del_timer_sync() on the timer. This will allow us to simplify some fast-path tests in later patches. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tXIt7-000MB0-0W@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: rename stmmac_disable_sw_eee_mode()Russell King (Oracle)
stmmac_disable_sw_eee_mode() was not a good choice for this functions purpose - which is to stop transmitting LPI because we want to send a packet. Rename it to stmmac_stop_sw_lpi(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tXIt1-000MAu-TE@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: ethernet: sunplus: Switch to ndo_eth_ioctl谢致邦 (XIE Zhibang)
The device ioctl handler no longer calls ndo_do_ioctl, but calls ndo_eth_ioctl to handle mii ioctls since commit a76053707dbf ("dev_ioctl: split out ndo_eth_ioctl"). However, sunplus still used ndo_do_ioctl when it was introduced. So switch to ndo_eth_ioctl. Bad commit fd3040b9394c ("net: ethernet: Add driver for Sunplus SP7021") was the initial driver commit, meaning that PHY IOCTLs where never available on this driver. Therefore don't consider this as a fix. Found by code inspection. Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com> Link: https://patch.msgid.link/tencent_8CF8A72C708E96B9C7DC1AF96FEE19AF3D05@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: stm32: Use syscon_regmap_lookup_by_phandle_argsKrzysztof Kozlowski
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over syscon_regmap_lookup_by_phandle() combined with getting the syscon argument. Except simpler code this annotates within one line that given phandle has arguments, so grepping for code would be easier. There is also no real benefit in printing errors on missing syscon argument, because this is done just too late: runtime check on static/build-time data. Dtschema and Devicetree bindings offer the static/build-time check for this already. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250112-syscon-phandle-args-net-v1-5-3423889935f7@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: sti: Use syscon_regmap_lookup_by_phandle_argsKrzysztof Kozlowski
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over syscon_regmap_lookup_by_phandle() combined with getting the syscon argument. Except simpler code this annotates within one line that given phandle has arguments, so grepping for code would be easier. There is also no real benefit in printing errors on missing syscon argument, because this is done just too late: runtime check on static/build-time data. Dtschema and Devicetree bindings offer the static/build-time check for this already. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250112-syscon-phandle-args-net-v1-4-3423889935f7@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: stmmac: imx: Use syscon_regmap_lookup_by_phandle_argsKrzysztof Kozlowski
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over syscon_regmap_lookup_by_phandle() combined with getting the syscon argument. Except simpler code this annotates within one line that given phandle has arguments, so grepping for code would be easier. There is also no real benefit in printing errors on missing syscon argument, because this is done just too late: runtime check on static/build-time data. Dtschema and Devicetree bindings offer the static/build-time check for this already. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250112-syscon-phandle-args-net-v1-3-3423889935f7@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: ti: am65-cpsw-nuss: Use syscon_regmap_lookup_by_phandle_argsKrzysztof Kozlowski
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over syscon_regmap_lookup_by_phandle() combined with getting the syscon argument. Except simpler code this annotates within one line that given phandle has arguments, so grepping for code would be easier. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250112-syscon-phandle-args-net-v1-2-3423889935f7@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: ti: icssg-prueth: Do not print physical memory addressesKrzysztof Kozlowski
Debugging messages should not reveal anything about memory addresses. This also solves arm compile test warnings: drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c:1034:49: error: format specifies type 'unsigned long long' but the argument has type 'phys_addr_t' (aka 'unsigned int') [-Werror,-Wformat] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Link: https://patch.msgid.link/20250112-syscon-phandle-args-net-v1-1-3423889935f7@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: airoha: Enforce ETS Qdisc priomapLorenzo Bianconi
EN7581 SoC supports fixed QoS band priority where WRR queues have lowest priorities with respect to SP ones. E.g: WRR0, WRR1, .., WRRm, SP0, SP1, .., SPn Enforce ETS Qdisc priomap according to the hw capabilities. Suggested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/20250112-airoha_ets_priomap-v1-1-fb616de159ba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: ethernet: ti: am65-cpsw: VLAN-aware CPSW only if !DSAAlexander Sverdlin
Only configure VLAN-aware CPSW mode if no port is used as DSA CPU port. VLAN-aware mode interferes with some DSA tagging schemes and makes stacking DSA switches downstream of CPSW impossible. Previous attempts to address the issue linked below. Link: https://lore.kernel.org/netdev/20240227082815.2073826-1-s-vadapalli@ti.com/ Link: https://lore.kernel.org/linux-arm-kernel/4699400.vD3TdgH1nR@localhost/ Co-developed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Link: https://patch.msgid.link/20250110125737.546184-1-alexander.sverdlin@siemens.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14tsnep: Link queues to NAPIsGerhard Engleder
Use netif_queue_set_napi() to link queues to NAPI instances so that they can be queried with netlink. $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 11}' [{'id': 0, 'ifindex': 11, 'napi-id': 9, 'type': 'rx'}, {'id': 1, 'ifindex': 11, 'napi-id': 10, 'type': 'rx'}, {'id': 0, 'ifindex': 11, 'napi-id': 9, 'type': 'tx'}, {'id': 1, 'ifindex': 11, 'napi-id': 10, 'type': 'tx'}] Additionally use netif_napi_set_irq() to also provide NAPI interrupt number to userspace. $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --do napi-get --json='{"id": 9}' {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 9, 'ifindex': 11, 'irq': 42, 'irq-suspend-timeout': 0} Providing information about queues to userspace makes sense as APIs like XSK provide queue specific access. Also XSK busy polling relies on queues linked to NAPIs. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250110223939.37490-1-gerhard@engleder-embedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14ice: Add in/out PTP pin delaysKarol Kolacinski
HW can have different input/output delays for each of the pins. Currently, only E82X adapters have delay compensation based on TSPLL config and E810 adapters have constant 1 ms compensation, both cases only for output delays and the same one for all pins. E825 adapters have different delays for SDP and other pins. Those delays are also based on direction and input delays are different than output ones. This is the main reason for moving delays to pin description structure. Add a field in ice_ptp_pin_desc structure to reflect that. Delay values are based on approximate calculations of HW delays based on HW spec. Implement external timestamp (input) delay compensation. Remove existing definitions and wrappers for periodic output propagation delays. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-14ice: implement low latency PHY timer updatesJacob Keller
Programming the PHY registers in preparation for an increment value change or a timer adjustment on E810 requires issuing Admin Queue commands for each PHY register. It has been found that the firmware Admin Queue processing occasionally has delays of tens or rarely up to hundreds of milliseconds. This delay cascades to failures in the PTP applications which depend on these updates being low latency. Consider a standard PTP profile with a sync rate of 16 times per second. This means there is ~62 milliseconds between sync messages. A complete cycle of the PTP algorithm 1) Sync message (with Tx timestamp) from source 2) Follow-up message from source 3) Delay request (with Tx timestamp) from sink 4) Delay response (with Rx timestamp of request) from source 5) measure instantaneous clock offset 6) request time adjustment via CLOCK_ADJTIME systemcall The Tx timestamps have a default maximum timeout of 10 milliseconds. If we assume that the maximum possible time is used, this leaves us with ~42 milliseconds of processing time for a complete cycle. The CLOCK_ADJTIME system call is synchronous and will block until the driver completes its timer adjustment or frequency change. If the writes to prepare the PHY timers get hit by a latency spike of 50 milliseconds, then the PTP application will be delayed past the point where the next cycle should start. Packets from the next cycle may have already arrived and are waiting on the socket. In particular, LinuxPTP ptp4l may start complaining about missing an announce message from the source, triggering a fault. In addition, the clockcheck logic it uses may trigger. This clockcheck failure occurs because the timestamp captured by hardware is compared against a reading of CLOCK_MONOTONIC. It is assumed that the time when the Rx timestamp is captured and the read from CLOCK_MONOTONIC are relatively close together. This is not the case if there is a significant delay to processing the Rx packet. Newer firmware supports programming the PHY registers over a low latency interface which bypasses the Admin Queue. Instead, software writes to the REG_LL_PROXY_L and REG_LL_PROXY_H registers. Firmware reads these registers and then programs the PHY timers. Implement functions to use this interface when available to program the PHY timers instead of using the Admin Queue. This avoids the Admin Queue latency and ensures that adjustments happen within acceptable latency bounds. Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-14ice: check low latency PHY timer update firmware capabilityJacob Keller
Newer versions of firmware support programming the PHY timer via the low latency interface exposed over REG_LL_PROXY_L and REG_LL_PROXY_H. Add support for checking the device capabilities for this feature. Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>