summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-18net: ena: Add PHC documentationDavid Arinzon
Provide the relevant information and guidelines about the feature support in the ENA driver. Signed-off-by: Amit Bernstein <amitbern@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-10-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: View PHC stats using debugfsDavid Arinzon
Add an entry named `phc_stats` to view the PHC statistics. If PHC is enabled, the stats are printed, as below: phc_cnt: 0 phc_exp: 0 phc_skp: 0 phc_err_dv: 0 phc_err_ts: 0 If PHC is disabled, no statistics will be displayed. Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-9-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: Add debugfs support to the ENA driverDavid Arinzon
Adding the base directory of debugfs to the driver. In order for the folder to be unique per driver instantiation, the chosen name is the device name. This commit contains the initialization and the base folder. The creation of the base folder may fail, but is considered non-fatal. Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-8-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: Control PHC enable through devlinkDavid Arinzon
Add the capability to set parameters through the devlink framework. The parameter used for controlling PHC (enable/disable) details are as follows: - Name: enable_phc - Type: Boolean (true - enable/false - disable) - Mode: DEVLINK_PARAM_CMODE_DRIVERINIT - Effect: Changes take place during driver initialization, any changes require a devlink reload to take effect. Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-7-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18devlink: Add new "enable_phc" generic device paramDavid Arinzon
Add a new device generic parameter to enable/disable the PHC (PTP Hardware Clock) functionality in the device associated with the devlink instance. Signed-off-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250617110545.5659-6-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: Add devlink port supportDavid Arinzon
Add the basic functionality to support devlink port for devlink model completeness purposes. Current support is for registration/un-registration. Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-5-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: Add device reload capability through devlinkDavid Arinzon
Adding basic devlink capability support of reloading the driver. This capability is required to support driver init type devlink params (DEVLINK_PARAM_CMODE_DRIVERINIT). Such params require reloading of the driver (destroy/restore sequence). The reloading is done by the devlink framework using the hooks provided by the driver. Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-4-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: PHC silent resetDavid Arinzon
Each PHC device kernel registration receives a unique kernel index, which is associated with a new PHC device file located at "/dev/ptp<index>". This device file serves as an interface for obtaining PHC timestamps. Examples of tools that use "/dev/ptp" include testptp [1] and chrony [2]. A reset flow may occur in the ENA driver while PHC is active. During a reset, the driver will unregister and then re-register the PHC device with the kernel. Under race conditions, particularly during heavy PHC loads, the driver’s reset flow might complete faster than the kernel’s PHC unregister/register process. This can result in the PHC index being different from what it was prior to the reset, as the PHC index is selected using kernel ID allocation [3]. While driver rmmod/insmod are done by the user, a reset may occur at anytime, without the user awareness, consequently, the driver might receive a new PHC index after the reset, potentially compromising the user experience. To prevent this issue, the PHC flow will detect the reset during PHC destruction and will skip the PHC unregister/register calls to preserve the kernel PHC index. During the reset flow, any attempt to get a PHC timestamp will fail as expected, but the kernel PHC index will remain unchanged. [1]: https://github.com/torvalds/linux/blob/v6.1/tools/testing/selftests/ptp/testptp.c [2]: https://github.com/mlichvar/chrony [3]: https://www.kernel.org/doc/html/latest/core-api/idr.html Signed-off-by: Amit Bernstein <amitbern@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-3-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: ena: Add PHC support in the ENA driverDavid Arinzon
The ENA driver will be extended to support the new PHC feature using ptp_clock interface [1]. this will provide timestamp reference for user space to allow measuring time offset between the PHC and the system clock in order to achieve nanosecond accuracy. [1] - https://www.kernel.org/doc/html/latest/driver-api/ptp.html Signed-off-by: Amit Bernstein <amitbern@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18Merge branch 'udp_tunnel-remove-rtnl_lock-dependency'Jakub Kicinski
Stanislav Fomichev says: ==================== udp_tunnel: remove rtnl_lock dependency Recently bnxt had to grow back a bunch of rtnl dependencies because of udp_tunnel's infra. Add separate (global) mutext to protect udp_tunnel state. ==================== Link: https://patch.msgid.link/20250616162117.287806-1-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18Revert "bnxt_en: bring back rtnl_lock() in the bnxt_open() path"Stanislav Fomichev
This reverts commit 325eb217e41fa14f307c7cc702bd18d0bb38fe84. udp_tunnel infra doesn't need RTNL, should be safe to get back to only netdev instance lock. Cc: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-7-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18netdevsim: remove udp_ports_sleepStanislav Fomichev
Now that there is only one path in udp_tunnel, there is no need to have udp_ports_sleep knob. Remove it and adjust the test. Cc: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-6-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net: remove redundant ASSERT_RTNL() in queue setup functionsStanislav Fomichev
The existing netdev_ops_assert_locked() already asserts that either the RTNL lock or the per-device lock is held, making the explicit ASSERT_RTNL() redundant. Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-5-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18udp_tunnel: remove rtnl_lock dependencyStanislav Fomichev
Drivers that are using ops lock and don't depend on RTNL lock still need to manage it because udp_tunnel's RTNL dependency. Introduce new udp_tunnel_nic_lock and use it instead of rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to grab udp_tunnel_nic_lock mutex and might sleep). Cover more places in v4: - netlink - udp_tunnel_notify_add_rx_port (ndo_open) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_notify_del_rx_port (ndo_stop) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_get_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - udp_tunnel_drop_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_DROP_INFO - udp_tunnel_nic_reset_ntf (ndo_open) - notifiers - udp_tunnel_nic_netdevice_event, depending on the event: - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - triggers NETDEV_UDP_TUNNEL_DROP_INFO - ethnl_tunnel_info_reply_size - udp_tunnel_nic_set_port_priv (two intel drivers) Cc: Michael Chan <michael.chan@broadcom.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-4-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18vxlan: drop sock_lockStanislav Fomichev
We won't be able to sleep soon in vxlan_offload_rx_ports and won't be able to grab sock_lock. Instead of having separate spinlock to manage sockets, rely on rtnl lock. This is similar to how geneve manages its sockets. Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250616162117.287806-3-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18geneve: rely on rtnl lock in geneve_offload_rx_portsStanislav Fomichev
udp_tunnel_push_rx_port will grab mutex in the next patch so we can't use rcu. geneve_offload_rx_ports is called from geneve_netdevice_event for NETDEV_UDP_TUNNEL_PUSH_INFO and NETDEV_UDP_TUNNEL_DROP_INFO which both have ASSERT_RTNL. Entries are added to and removed from the sock_list under rtnl lock as well (when adding or removing a tunneling device). Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250616162117.287806-2-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18tcp: Remove inet_hashinfo2_free_mod()Yue Haibing
DCCP was removed, inet_hashinfo2_free_mod() is unused now. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250617130613.498659-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18dpaa_eth: don't use fixed_phy_change_carrierHeiner Kallweit
This effectively reverts 6e8b0ff1ba4c ("dpaa_eth: Add change_carrier() for Fixed PHYs"). Usage of fixed_phy_change_carrier() requires that fixed_phy_register() has been called before, directly or indirectly. And that's not the case in this driver. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/7eb189b3-d5fd-4be6-8517-a66671a4e4e3@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18net/mlx4e: Don't redefine IB_MTU_XXX enumMark Zhang
Rely on existing IB_MTU_XXX definitions which exist in ib_verbs.h. Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/382c91ee506e7f1f3c1801957df6b28963484b7d.1750147222.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18nfc: Remove checks for nla_data returning NULLSimon Horman
The implementation of nla_data is as follows: static inline void *nla_data(const struct nlattr *nla) { return (char *) nla + NLA_HDRLEN; } Excluding the case where nla is exactly -NLA_HDRLEN, it will not return NULL. And it seems misleading to assume that it can, other than in this corner case. So drop checks for this condition. Flagged by Smatch. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250617-nfc-null-data-v1-1-c7525ead2e95@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18Merge branch 'eth-migrate-more-drivers-to-new-rxfh-callbacks'Jakub Kicinski
Jakub Kicinski says: ==================== eth: migrate more drivers to new RXFH callbacks Migrate a batch of drivers to the recently added dedicated .get_rxfh_fields and .set_rxfh_fields ethtool callbacks. ==================== Link: https://patch.msgid.link/20250617014848.436741-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18eth: sxgbe: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). RXFH is all this driver supports in RXNFC so old callbacks are completely removed. Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250617014848.436741-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18eth: dpaa2: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250617014848.436741-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18eth: dpaa: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). RXFH is all this driver supports in RXNFC so old callbacks are completely removed. Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250617014848.436741-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18eth: mvpp2: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250617014848.436741-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18eth: niu: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250617014848.436741-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18Merge branch 'eth-migrate-some-drivers-to-new-rxfh-callbacks'Jakub Kicinski
Jakub Kicinski says: ==================== eth: migrate some drivers to new RXFH callbacks Migrate a batch of drivers to the recently added dedicated .get_rxfh_fields and .set_rxfh_fields ethtool callbacks. ==================== Link: https://patch.msgid.link/20250617014555.434790-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18eth: otx2: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Link: https://patch.msgid.link/20250617014555.434790-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18eth: thunder: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). The driver has no other RXNFC functionality so the SET callback can be now removed. Link: https://patch.msgid.link/20250617014555.434790-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18eth: ena: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). The driver has no other RXNFC functionality so the SET callback can be now removed. Reviewed-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617014555.434790-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18eth: bnxt: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250617014555.434790-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18eth: bnx2x: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). The driver has no other RXNFC functionality so the SET callback can be now removed. Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/20250617014555.434790-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18Merge branch 'netconsole-msgid' into mainDavid S. Miller
Gustavo Luiz Duarte says: ==================== netconsole: Add support for msgid in sysdata This patch series introduces a new feature to netconsole which allows appending a message ID to the userdata dictionary. If the msgid feature is enabled, the message ID is built from a per-target 32 bit counter that is incremented and appended to every message sent to the target. Example:: echo 1 > "/sys/kernel/config/netconsole/cmdline0/userdata/msgid_enabled" echo "This is message #1" > /dev/kmsg echo "This is message #2" > /dev/kmsg 13,434,54928466,-;This is message #1 msgid=1 13,435,54934019,-;This is message #2 msgid=2 This feature can be used by the target to detect if messages were dropped or reordered before reaching the target. This allows system administrators to assess the reliability of their netconsole pipeline and detect loss of messages due to network contention or temporary unavailability. --- Changes in v3: - Add kdoc documentation for msgcounter. - Link to v2: https://lore.kernel.org/r/20250612-netconsole-msgid-v2-0-d4c1abc84bac@gmail.com Changes in v2: - Use wrapping_assign_add() to avoid warnings in UBSAN and friends. - Improve documentation to clarify wrapping and distinguish msgid from sequnum. - Rebase and fix conflict in prepare_extradata(). - Link to v1: https://lore.kernel.org/r/20250611-netconsole-msgid-v1-0-1784a51feb1e@gmail.com ==================== Suggested-by: Breno Leitao <leitao@debian.org> Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-18docs: netconsole: document msgid featureGustavo Luiz Duarte
Add documentation explaining the msgid feature in netconsole. This feature appends unique id to the userdata dictionary. The message ID is populated from a per-target 32 bit counter which is incremented for each message sent to the target. This allows a target to detect if messages are dropped before reaching the target. Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-18selftests: netconsole: Add tests for 'msgid' feature in sysdataGustavo Luiz Duarte
Extend the self-tests to cover the 'msgid' feature in sysdata. Verify that msgid is appended to the message when the feature is enabled and that it is not appended when the feature is disabled. Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-18netconsole: append msgid to sysdataGustavo Luiz Duarte
Add msgcounter to the netconsole_target struct to generate message IDs. If the msgid_enabled attribute is true, increment msgcounter and append msgid=<msgcounter> to sysdata buffer before sending the message. Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-18netconsole: implement configfs for msgid_enabledGustavo Luiz Duarte
Implement the _show and _store functions for the msgid_enabled configfs attribute under userdata. Set the sysdata_fields bit accordingly. Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-18netconsole: introduce 'msgid' as a new sysdata fieldGustavo Luiz Duarte
This adds a new sysdata field to enable assigning a per-target unique id to each message sent to that target. This id can later be appended as part of sysdata, allowing targets to detect dropped netconsole messages. Update count_extradata_entries() to take the new field into account. Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-17dpll: remove documentation of rclk_dev_nameSimon Horman
Remove documentation of rclk_dev_name member of dpll_device which doesn't exist. Flagged by ./scripts/kernel-doc -none Introduced by commit 9431063ad323 ("dpll: core: Add DPLL framework base functions") Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250616-dpll-member-v1-1-8c9e6b8e1fd4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge branch '200GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== libeth: add libeth_xdp helper lib Alexander Lobakin says: Time to add XDP helpers infra to libeth to greatly simplify adding XDP to idpf and iavf, as well as improve and extend XDP in ice and i40e. Any vendor is free to reuse helpers. If this happens, I'm fine with moving the folder of out intel/. The helpers greatly simplify building xdp_buff, running a prog, handling the verdict, implement XDP_TX, .ndo_xdp_xmit, XDP buffer completion. Same applies to XSk (with XSk xmit instead of .ndo_xdp_xmit, plus stuff like XSk wakeup). They are entirely generic with no HW definitions or assumptions. HW-specific stuff like parsing Rx desc / filling Tx desc is passed from the driver as inline callbacks. For now, key assumptions that optimize performance / avoid code bloat, but might not fit every driver in driver/net/: * netmem holding the buffers are always order-0; * driver has separate XDP Tx queues, doesn't use stack queues for that. For best efficiency, you may want to have nr_cpu_ids XDP queues, but less (queue sharing) is also supported; * XDP Tx queues are interrupt-less and use "lazy" cleaning only when there are less than 1/4 free Tx descriptors of the queue size; * main target platforms are 64-bit, although 32-bit is also fully supported, but the code might be not as optimized for them. Library code already supports multi-buffer for all kinds of Tx and both header split and no split for Rx and Tx. Frags can come from devmem/io_uring etc., direct `struct page *` is used only for header buffers for which it's always true. Drivers are free to pass their own Rx hints and XSK xmit hints ops. XDP_TX and ndo_xdp_xmit use onstack bulk for the frames to be sent and send them by batches of 16 buffers. This eats ~280 bytes on the stack, but gives good boosts and allow to greatly optimize the main sending function leaving it without any error/exception paths. XSk xmit fills Tx descriptors in the loop unrolled by 8. This was proven to improve perf on ice and i40e. XDP_TX and ndo_xdp_xmit doesn't use unrolling as I wasn't able to get any improvements in those scenenarios from this, while +1 Kb for their sending functions for nothing doesn't sound reasonable. XSk wakeup, instead of traditionally used "SW interrupts" provided by NICs, uses IPI to schedule NAPI on the CPU corresponding to the given queue pair. It gives better control over CPU distribution and in general performs way better than "SW interrupts", plus allows us to not pass any HW-specific callbacks there. The code is built the way that all callbacks passed from drivers get inlined; in general, most of hotpath gets inlined. Everything slow/exception lands to .c files in the libeth folder, doesn't create copies in the drivers themselves and doesn't overloat hotpath. Sure, inlining means that hotpath will be compiled into every driver that uses the lib, but the core code is written in one place, so no copying of bugs happens. Fixed once -- works everywhere. The last commit might look like sorta hack, but it gives really good boosts and decreases object code size, plus there are checks that all those wider accesses are fully safe, so I don't feel anything bad about it. An example of using libeth_xdp can be found either on my GitHub or on the mailing lists here ("XDP for idpf"). Macros for building driver XDP functions lead to that some implementations (XDP_TX, ndo_xdp_xmit etc.) consist of really only a few lines. * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: libeth: xdp, xsk: access adjacent u32s as u64 where applicable libeth: xsk: add XSkFQ refill and XSk wakeup helpers libeth: xsk: add XSk Rx processing support libeth: xsk: add XSk xmit functions libeth: xsk: add XSk XDP_TX sending helpers libeth: xdp: add RSS hash hint and XDP features setup helpers libeth: xdp: add templates for building driver-side callbacks libeth: xdp: add XDP prog run and verdict result handling libeth: xdp: add helpers for preparing/processing &libeth_xdp_buff libeth: xdp: add XDPSQ cleanup timers libeth: xdp: add XDPSQ locking helpers libeth: xdp: add XDPSQE completion helpers libeth: xdp: add .ndo_xdp_xmit() helpers libeth: xdp: add XDP_TX buffers sending libeth: support native XDP and register memory model libeth: convert to netmem libeth, libie: clean symbol exports up a little ==================== Link: https://patch.msgid.link/20250616201639.710420-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge branch 'net-mlx5e-add-support-for-devmem-and-io_uring-tcp-zero-copy'Jakub Kicinski
Mark Bloch says: ==================== net/mlx5e: Add support for devmem and io_uring TCP zero-copy This series adds support for zerocopy rx TCP with devmem and io_uring for ConnectX7 NICs and above. For performance reasons and simplicity HW-GRO will also be turned on when header-data split mode is on. Performance =========== Test setup: * CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (single NUMA) * NIC: ConnectX7 * Benchmarking tool: kperf [0] * Single TCP flow * Test duration: 60s With application thread and interrupts pinned to the *same* core: |------+-----------+----------| | MTU | epoll | io_uring | |------+-----------+----------| | 1500 | 61.6 Gbps | 114 Gbps | | 4096 | 69.3 Gbps | 151 Gbps | | 9000 | 67.8 Gbps | 187 Gbps | |------+-----------+----------| The CPU usage for io_uring is 95%. Reproduction steps for io_uring: server --no-daemon -a 2001:db8::1 --no-memcmp --iou --iou_sendzc \ --iou_zcrx --iou_dev_name eth2 --iou_zcrx_queue_id 2 server --no-daemon -a 2001:db8::2 --no-memcmp --iou --iou_sendzc client --src 2001:db8::2 --dst 2001:db8::1 \ --msg-zerocopy -t 60 --cpu-min=2 --cpu-max=2 Patch overview: ================ First, a netmem API for skb_can_coalesce is added to the core to be able to do skb fragment coalescing on netmems. The next patches introduce some cleanups in the internal SHAMPO code and improvements to hw gro capability checks in FW. A separate page_pool is introduced for headers, to be used only when the rxq has a memory provider. Then the driver is converted to use the netmem API and to allow support for unreadable netmem page pool. The queue management ops are implemented. Finally, the tcp-data-split ring parameter is exposed. References ========== [0] kperf: git://git.kernel.dk/kperf.git v1: https://lore.kernel.org/20250116215530.158886-1-saeed@kernel.org v2: https://lore.kernel.org/1747950086-1246773-1-git-send-email-tariqt@nvidia.com v3: https://lore.kernel.org/20250609145833.990793-1-mbloch@nvidia.com v4: https://lore.kernel.org/20250610150950.1094376-1-mbloch@nvidia.com v5: https://lore.kernel.org/20250612154648.1161201-1-mbloch@nvidia.com ==================== Link: https://patch.msgid.link/20250616141441.1243044-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: Add TX support for netmemsDragos Tatulea
Declare netmem TX support in netdev. As required, use the netmem aware dma unmapping APIs for unmapping netmems in tx completion path. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-13-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: Support ethtool tcp-data-split settingsSaeed Mahameed
In mlx5, tcp header-data split requires HW GRO to be on. Enabling it fails when HW GRO is off. mlx5e_fix_features now keeps HW GRO on when tcp data split is enabled. Finally, when tcp data split is disabled, features are updated to maybe remove the forced HW GRO. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-12-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: Implement queue mgmt ops and single channel swapSaeed Mahameed
The bulk of the work is done in mlx5e_queue_mem_alloc, where we allocate and create the new channel resources, similar to mlx5e_safe_switch_params, but here we do it for a single channel using existing params, sort of a clone channel. To swap the old channel with the new one, we deactivate and close the old channel then replace it with the new one, since the swap procedure doesn't fail in mlx5, we do it all in one place (mlx5e_queue_start). Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250616141441.1243044-11-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: Add support for UNREADABLE netmem page poolsSaeed Mahameed
On netdev_rx_queue_restart, a special type of page pool maybe expected. In this patch declare support for UNREADABLE netmem iov pages in the pool params only when header data split shampo RQ mode is enabled, also set the queue index in the page pool params struct. Shampo mode requirement: Without header split rx needs to peek at the data, we can't do UNREADABLE_NETMEM. The patch also enables the use of a separate page pool for headers when a memory provider is installed for the queue, otherwise the same common page pool continues to be used. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-10-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: Convert over to netmemSaeed Mahameed
mlx5e_page_frag holds the physical page itself, to naturally support zc page pools, remove physical page reference from mlx5 and replace it with netmem_ref, to avoid internal handling in mlx5 for net_iov backed pages. SHAMPO can issue packets that are not split into header and data. These packets will be dropped if the data part resides in a net_iov as the driver can't read into this area. No performance degradation observed. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-9-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: SHAMPO: Separate pool for headersSaeed Mahameed
Allow allocating a separate page pool for headers when SHAMPO is on. This will be useful for adding support to zc page pool, which has to be different from the headers page pool. For now, the pools are the same. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-8-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: SHAMPO: Improve hw gro capability checkingSaeed Mahameed
Add missing HW capabilities, declare the feature in netdev->vlan_features, similar to other features in mlx5e_build_nic_netdev. No functional change here as all by default disabled features are explicitly disabled at the bottom of the function. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-7-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: SHAMPO: Remove redundant paramsSaeed Mahameed
Two SHAMPO params are static and always the same, remove them from the global mlx5e_params struct. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-6-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_allocSaeed Mahameed
Drop redundant SHAMPO structure alloc/free functions. Gather together function calls pertaining to header split info, pass header per WQE (hd_per_wqe) as parameter to those function to avoid use before initialization future mistakes. Allocate HW GRO related info outside of the header related info scope. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>