diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-21 08:28:08 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-21 08:28:08 -0800 |
commit | fcc79e1714e8c2b8e216dc3149812edd37884eef (patch) | |
tree | 17a51d29db810b81412be040aaf380936b3261b4 /drivers/net/ethernet/intel/iavf/iavf_main.c | |
parent | 6e95ef0258ff4ee23ae3b06bf6b00b33dbbd5ef7 (diff) | |
parent | dd7207838d38780b51e4690ee508ab2d5057e099 (diff) |
Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core:
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of
rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infrastructure is guarded by the
CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code:
- Add small file operations for debugfs, to reduce the struct ops
size.
- Refactoring and optimization for the implementation of page_frag
API, This is a preparatory work to consolidate the page_frag
implementation.
Netfilter:
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users the
option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent CI
improvements.
BPF:
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols:
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after
close, the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state
lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per
device neigh lists.
Driver API:
- Introduce a unified interface to configure transmission H/W
shaping, and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling:
- forwarding: introduce deferred commands, to simplify the cleanup
phase
Drivers:
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- add support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Add representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- add support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implement page pool support
- macsec:
- inherit lower device's features and TSO limits when
offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- add dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- add clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature"
* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
mm: page_frag: fix a compile error when kernel is not compiled
Documentation: tipc: fix formatting issue in tipc.rst
selftests: nic_performance: Add selftest for performance of NIC driver
selftests: nic_link_layer: Add selftest case for speed and duplex states
selftests: nic_link_layer: Add link layer selftest for NIC driver
bnxt_en: Add FW trace coredump segments to the coredump
bnxt_en: Add a new ethtool -W dump flag
bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
bnxt_en: Add functions to copy host context memory
bnxt_en: Do not free FW log context memory
bnxt_en: Manage the FW trace context memory
bnxt_en: Allocate backing store memory for FW trace logs
bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
bnxt_en: Refactor bnxt_free_ctx_mem()
bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
bnxt_en: Update firmware interface spec to 1.10.3.85
selftests/bpf: Add some tests with sockmap SK_PASS
bpf: fix recursive lock when verdict program return SK_PASS
wireguard: device: support big tcp GSO
wireguard: selftests: load nf_conntrack if not present
...
Diffstat (limited to 'drivers/net/ethernet/intel/iavf/iavf_main.c')
-rw-r--r-- | drivers/net/ethernet/intel/iavf/iavf_main.c | 161 |
1 files changed, 159 insertions, 2 deletions
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index f782402cd789..12ef160425aa 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -1972,8 +1972,11 @@ static void iavf_finish_config(struct work_struct *work) adapter = container_of(work, struct iavf_adapter, finish_config); - /* Always take RTNL first to prevent circular lock dependency */ + /* Always take RTNL first to prevent circular lock dependency; + * The dev->lock is needed to update the queue number + */ rtnl_lock(); + mutex_lock(&adapter->netdev->lock); mutex_lock(&adapter->crit_lock); if ((adapter->flags & IAVF_FLAG_SETUP_NETDEV_FEATURES) && @@ -2017,6 +2020,7 @@ static void iavf_finish_config(struct work_struct *work) out: mutex_unlock(&adapter->crit_lock); + mutex_unlock(&adapter->netdev->lock); rtnl_unlock(); } @@ -2085,6 +2089,21 @@ static int iavf_process_aq_command(struct iavf_adapter *adapter) return 0; } + if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW) { + iavf_cfg_queues_bw(adapter); + return 0; + } + + if (adapter->aq_required & IAVF_FLAG_AQ_GET_QOS_CAPS) { + iavf_get_qos_caps(adapter); + return 0; + } + + if (adapter->aq_required & IAVF_FLAG_AQ_CFG_QUEUES_QUANTA_SIZE) { + iavf_cfg_queues_quanta_size(adapter); + return 0; + } + if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES) { iavf_configure_queues(adapter); return 0; @@ -2670,6 +2689,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter) /* request initial VLAN offload settings */ iavf_set_vlan_offload_features(adapter, 0, netdev->features); + if (QOS_ALLOWED(adapter)) + adapter->aq_required |= IAVF_FLAG_AQ_GET_QOS_CAPS; + iavf_schedule_finish_config(adapter); return; @@ -2919,6 +2941,30 @@ static void iavf_disable_vf(struct iavf_adapter *adapter) } /** + * iavf_reconfig_qs_bw - Call-back task to handle hardware reset + * @adapter: board private structure + * + * After a reset, the shaper parameters of queues need to be replayed again. + * Since the net_shaper object inside TX rings persists across reset, + * set the update flag for all queues so that the virtchnl message is triggered + * for all queues. + **/ +static void iavf_reconfig_qs_bw(struct iavf_adapter *adapter) +{ + int i, num = 0; + + for (i = 0; i < adapter->num_active_queues; i++) + if (adapter->tx_rings[i].q_shaper.bw_min || + adapter->tx_rings[i].q_shaper.bw_max) { + adapter->tx_rings[i].q_shaper_update = true; + num++; + } + + if (num) + adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW; +} + +/** * iavf_reset_task - Call-back task to handle hardware reset * @work: pointer to work_struct * @@ -2944,10 +2990,12 @@ static void iavf_reset_task(struct work_struct *work) /* When device is being removed it doesn't make sense to run the reset * task, just return in such a case. */ + mutex_lock(&netdev->lock); if (!mutex_trylock(&adapter->crit_lock)) { if (adapter->state != __IAVF_REMOVE) queue_work(adapter->wq, &adapter->reset_task); + mutex_unlock(&netdev->lock); return; } @@ -2995,6 +3043,7 @@ static void iavf_reset_task(struct work_struct *work) reg_val); iavf_disable_vf(adapter); mutex_unlock(&adapter->crit_lock); + mutex_unlock(&netdev->lock); return; /* Do not attempt to reinit. It's dead, Jim. */ } @@ -3124,6 +3173,8 @@ continue_reset: iavf_up_complete(adapter); iavf_irq_enable(adapter, true); + + iavf_reconfig_qs_bw(adapter); } else { iavf_change_state(adapter, __IAVF_DOWN); wake_up(&adapter->down_waitqueue); @@ -3133,6 +3184,7 @@ continue_reset: wake_up(&adapter->reset_waitqueue); mutex_unlock(&adapter->crit_lock); + mutex_unlock(&netdev->lock); return; reset_err: @@ -3143,6 +3195,7 @@ reset_err: iavf_disable_vf(adapter); mutex_unlock(&adapter->crit_lock); + mutex_unlock(&netdev->lock); dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n"); } @@ -3614,8 +3667,10 @@ exit: if (test_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section)) return 0; + mutex_lock(&netdev->lock); netif_set_real_num_rx_queues(netdev, total_qps); netif_set_real_num_tx_queues(netdev, total_qps); + mutex_unlock(&netdev->lock); return ret; } @@ -4893,6 +4948,98 @@ static netdev_features_t iavf_fix_features(struct net_device *netdev, return iavf_fix_strip_features(adapter, features); } +static int +iavf_verify_shaper(struct net_shaper_binding *binding, + const struct net_shaper *shaper, + struct netlink_ext_ack *extack) +{ + struct iavf_adapter *adapter = netdev_priv(binding->netdev); + u64 vf_max; + + if (shaper->handle.scope == NET_SHAPER_SCOPE_QUEUE) { + vf_max = adapter->qos_caps->cap[0].shaper.peak; + if (vf_max && shaper->bw_max > vf_max) { + NL_SET_ERR_MSG_FMT(extack, "Max rate (%llu) of queue %d can't exceed max TX rate of VF (%llu kbps)", + shaper->bw_max, shaper->handle.id, + vf_max); + return -EINVAL; + } + } + return 0; +} + +static int +iavf_shaper_set(struct net_shaper_binding *binding, + const struct net_shaper *shaper, + struct netlink_ext_ack *extack) +{ + struct iavf_adapter *adapter = netdev_priv(binding->netdev); + const struct net_shaper_handle *handle = &shaper->handle; + struct iavf_ring *tx_ring; + int ret = 0; + + mutex_lock(&adapter->crit_lock); + if (handle->id >= adapter->num_active_queues) + goto unlock; + + ret = iavf_verify_shaper(binding, shaper, extack); + if (ret) + goto unlock; + + tx_ring = &adapter->tx_rings[handle->id]; + + tx_ring->q_shaper.bw_min = div_u64(shaper->bw_min, 1000); + tx_ring->q_shaper.bw_max = div_u64(shaper->bw_max, 1000); + tx_ring->q_shaper_update = true; + + adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW; + +unlock: + mutex_unlock(&adapter->crit_lock); + return ret; +} + +static int iavf_shaper_del(struct net_shaper_binding *binding, + const struct net_shaper_handle *handle, + struct netlink_ext_ack *extack) +{ + struct iavf_adapter *adapter = netdev_priv(binding->netdev); + struct iavf_ring *tx_ring; + + mutex_lock(&adapter->crit_lock); + if (handle->id >= adapter->num_active_queues) + goto unlock; + + tx_ring = &adapter->tx_rings[handle->id]; + tx_ring->q_shaper.bw_min = 0; + tx_ring->q_shaper.bw_max = 0; + tx_ring->q_shaper_update = true; + + adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW; + +unlock: + mutex_unlock(&adapter->crit_lock); + return 0; +} + +static void iavf_shaper_cap(struct net_shaper_binding *binding, + enum net_shaper_scope scope, + unsigned long *flags) +{ + if (scope != NET_SHAPER_SCOPE_QUEUE) + return; + + *flags = BIT(NET_SHAPER_A_CAPS_SUPPORT_BW_MIN) | + BIT(NET_SHAPER_A_CAPS_SUPPORT_BW_MAX) | + BIT(NET_SHAPER_A_CAPS_SUPPORT_METRIC_BPS); +} + +static const struct net_shaper_ops iavf_shaper_ops = { + .set = iavf_shaper_set, + .delete = iavf_shaper_del, + .capabilities = iavf_shaper_cap, +}; + static const struct net_device_ops iavf_netdev_ops = { .ndo_open = iavf_open, .ndo_stop = iavf_close, @@ -4908,6 +5055,7 @@ static const struct net_device_ops iavf_netdev_ops = { .ndo_fix_features = iavf_fix_features, .ndo_set_features = iavf_set_features, .ndo_setup_tc = iavf_setup_tc, + .net_shaper_ops = &iavf_shaper_ops, }; /** @@ -5054,7 +5202,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) struct net_device *netdev; struct iavf_adapter *adapter = NULL; struct iavf_hw *hw = NULL; - int err; + int err, len; err = pci_enable_device(pdev); if (err) @@ -5122,6 +5270,13 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) hw->bus.func = PCI_FUNC(pdev->devfn); hw->bus.bus_id = pdev->bus->number; + len = struct_size(adapter->qos_caps, cap, IAVF_MAX_QOS_TC_NUM); + adapter->qos_caps = kzalloc(len, GFP_KERNEL); + if (!adapter->qos_caps) { + err = -ENOMEM; + goto err_alloc_qos_cap; + } + /* set up the locks for the AQ, do this only once in probe * and destroy them only once in remove */ @@ -5160,6 +5315,8 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) /* Initialization goes on in the work. Do not add more of it below. */ return 0; +err_alloc_qos_cap: + iounmap(hw->hw_addr); err_ioremap: destroy_workqueue(adapter->wq); err_alloc_wq: |