summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2024-11-13ice: only allow Tx promiscuous for multicastBrett Creeley
Currently when any VF is trusted and true promiscuous mode is enabled on the PF, the VF will receive all unicast traffic directed to the device's internal switch. This includes traffic external to the NIC and also from other VSI (i.e. VFs). This does not match the expected behavior as unicast traffic should only be visible from external sources in this case. Disable the Tx promiscuous mode bits for unicast promiscuous mode. Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: Add support for persistent NAPI configJoe Damato
Use netif_napi_add_config to assign persistent per-NAPI config when initializing NAPIs. This preserves NAPI config settings when queue counts are adjusted. Tested with an E810-2CQDA2 NIC. Begin by setting the queue count to 4: $ sudo ethtool -L eth4 combined 4 Check the queue settings: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, set the queue with NAPI ID 8451 to have a gro-flush-timeout of 1111: $ sudo ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set --json='{"id": 8451, "gro-flush-timeout": 1111}' None Check that worked: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now reduce the queue count to 2, which would destroy the queue with NAPI ID 8451: $ sudo ethtool -L eth4 combined 2 Check the queue settings, noting that NAPI ID 8451 is gone: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, increase the number of queues back to 4: $ sudo ethtool -L eth4 combined 4 Dump the settings, expecting to see the same NAPI IDs as above and for NAPI ID 8451 to have its gro-flush-timeout set to 1111: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: support optional flags in signature segment headerPrzemek Kitszel
An optional flag field has been added to the signature segment header. The field contains two flags, a "valid" bit, and a "last segment" bit that indicates whether the segment is the last segment that will be sent to firmware. If the flag field's valid bit is NOT set, then as was done before, assume that this is the last segment being downloaded. However, if the flag field's valid bit IS set, then use the last segment flag to determine if this segment is the last segment to download. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: refactor "last" segment of DDP pkgPrzemek Kitszel
Add ice_ddp_send_hunk() that buffers "sent FW hunk" calls to AQ in order to mark the "last" one in more elegant way. Next commit will add even more complicated "sent FW" flow, so it's better to untangle a bit before. Note that metadata buffers were not skipped for NOT-@indicate_last segments, this is fixed now. Minor: + use ice_is_buffer_metadata() instead of open coding it in ice_dwnld_cfg_bufs(); + ice_dwnld_cfg_bufs_no_lock() + dependencies were moved up a bit to have better git-diff, as this function was rewritten (in terms of git-blame) CC: Paul Greenwalt <paul.greenwalt@intel.com> CC: Dan Nowlin <dan.nowlin@intel.com> CC: Ahmed Zaki <ahmed.zaki@intel.com> CC: Simon Horman <horms@kernel.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: extend dump serdes equalizer values featureMateusz Polchlopek
Extend the work done in commit 70838938e89c ("ice: Implement driver functionality to dump serdes equalizer values") by adding the new set of Rx registers that can be read using command: $ ethtool -d interface_name Rx equalization parameters are E810 PHY registers used by end user to gather information about configuration and status to debug link and connection issues in the field. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: rework of dump serdes equalizer values featureMateusz Polchlopek
Refactor function ice_get_tx_rx_equa() to iterate over new table of params instead of multiple calls to ice_aq_get_phy_equalization(). Subsequent commit will extend that function by add more serdes equalizer values to dump. Shorten the fields of struct ice_serdes_equalization_to_ethtool for readability purposes. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13octeontx2-pf: Adds TC offload supportGeetha sowjanya
Implements tc offload support for rvu representors. Usage example: - Add tc rule to drop packets with vlan id 3 using port representor(Rpf1vf0). # tc filter add dev Rpf1vf0 protocol 802.1Q parent ffff: flower vlan_id 3 vlan_ethtype ipv4 skip_sw action drop - Redirect packets with vlan id 5 and IPv4 packets to eth1, after stripping vlan header. # tc filter add dev Rpf1vf0 ingress protocol 802.1Q flower vlan_id 5 vlan_ethtype ipv4 skip_sw action vlan pop action mirred ingress redirect dev eth1 Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Implement offload stats ndo for representorsGeetha sowjanya
Implement the offload stat ndo by fetching the HW stats of rx/tx queues attached to the representor. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Add devlink port supportGeetha sowjanya
Register devlink port for the rvu representors. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Add representors for sdp MACGeetha sowjanya
Hardware supports different types of MACs eg RPM, SDP, LBK. LBK is for internal Tx->Rx HW loopback path. RPM and SDP MACs support ingress/egress pkt IO on interfaces with different set of capabilities like interface modes. At the time of netdev driver registration PF will seek MAC related information from Admin function driver 'drivers/net/ethernet/marvell/octeontx2/af' and sets up ingress/egress queues etc such that pkt IO on the channels of these different MACs is possible. This patch add representors for SDP MAC. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Configure VF mtu via representorGeetha sowjanya
Adds support to manage the mtu configuration for VF through representor. On update of representor mtu a mbox notification is send to VF to update its mtu. This feature is implemented based on the "Network Function Representors" kernel documentation. " Setting an MTU on the representor should cause that same MTU to be reported to the representee. " Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Add support to sync link state between representor and VFsGeetha sowjanya
Implements the below requirement mentioned in the representors documentation. " The representee's link state is controlled through the representor. Setting the representor administratively UP or DOWN should cause carrier ON or OFF at the representee. " This patch enables - Reflecting the link state of representor based on the VF state and link state of VF based on representor. - On VF interface up/down a notification is sent via mbox to representor to update the link state. eg: ip link set eth0 up/down will disable carrier on/off of the corresponding representor(r0p1) interface. - On representor interface up/down will cause the link state update of VF. eg: ip link set r0p1 up/down will disable carrier on/off of the corresponding representee(eth0) interface. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Get VF stats via representorGeetha sowjanya
Adds support to export VF port statistics via representor netdev. Defines new mbox "NIX_LF_STATS" to fetch VF hw stats. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-af: Add packet path between representor and VFGeetha sowjanya
Current HW, do not support in-built switch which will forward pkts between representee and representor. When representor is put under a bridge and pkts needs to be sent to representee, then pkts from representor are sent on a HW internal loopback channel, which again will be punted to ingress pkt parser. Now the rules that this patch installs are the MCAM filters/rules which will match against these pkts and forward them to representee. The rules that this patch installs are for basic representor <=> representee path similar to Tun/TAP between VM and Host. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Add basic net_device_opsGeetha sowjanya
Implements basic set of net_device_ops. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: Create representor netdevGeetha sowjanya
Adds initial devlink support to set/get the switchdev mode. Representor netdevs are created for each rvu devices when the switch mode is set to 'switchdev'. These netdevs are be used to control and configure VFs. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13octeontx2-pf: RVU representor driverGeetha sowjanya
Adds basic driver for the RVU representor. Driver on probe does pci specific initialization and does hw resources configuration. Introduces RVU_ESWITCH kernel config to enable/disable the driver. Representor and NIC shares the code but representors netdev support subset of NIC functionality. Hence "otx2_rep_dev" API helps to skip the features initialization that are not supported by the representors. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-12eth: bnxt: use page pool for head fragsJakub Kicinski
Testing small size RPCs (300B-400B) on a large AMD system suggests that page pool recycling is very useful even for just the head frags. With this patch (and copy break disabled) I see a 30% performance improvement (82Gbps -> 106Gbps). Convert bnxt from normal page frags to page pool frags for head buffers. On systems with small page size we can use the same pool as for TPA pages. On systems with large pages the frag allocation logic of the page pool is already used to split a large page into TPA chunks. TPA chunks are much larger than heads (8k or 64k, AFAICT vs 1kB) and we always allocate the same sized chunks. Mixing allocation of TPA and head pages would lead to sub-optimal memory use. Plus Taehee's work on zero-copy / devmem will need to differentiate between TPA and non-TPA page pool, anyway. Conditionally allocate a new page pool for heads. Link: https://patch.msgid.link/20241109035119.3391864-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-12Revert "igb: Disable threaded IRQ for igb_msix_other"Wander Lairson Costa
This reverts commit 338c4d3902feb5be49bfda530a72c7ab860e2c9f. Sebastian noticed the ISR indirectly acquires spin_locks, which are sleeping locks under PREEMPT_RT, which leads to kernel splats. Fixes: 338c4d3902feb ("igb: Disable threaded IRQ for igb_msix_other") Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20241106111427.7272-1-wander@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-12RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration designBhargava Chenna Marreddy
Refine RoCE SRIOV resource configuration design, using the INITIALIZE_FW's flag as an indication for the new design to the firmware. RoCE driver does not have to provision resources to VF when firmware advertises support for RoCE resource management by NIC driver. Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> CC: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730882676-24434-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-12bnxt_en: Add support for RoCE sriov configurationVikas Gupta
During driver load, PF RDMA driver provisions resources to the RDMA VFs. This logic takes into consideration of the total number of VFs supported on the PF while allocating resources. Firmware now advertises a capability where NIC driver can allocate resources for RDMA VFs when the user actually creates a VF. So this resource distribution can be based on the number of active VFs. This patch adds the support to check for the firmware capability and follow the new RDMA VF resource allocation strategy. The current logic in the RDMA driver will be removed for the newer Firmware versions in a subsequent patch in this series. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730882676-24434-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-11net/mlx5e: SHAMPO, Rework header allocation loopDragos Tatulea
The current loop code was based on the assumption that there can be page leftovers from previous function calls. This patch changes the allocation loop to make it clearer how pages get allocated every MLX5E_SHAMPO_WQ_HEADER_PER_PAGE headers. This change has no functional implications. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-13-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5e: SHAMPO, Drop info arrayDragos Tatulea
The info array is used to store a pointer to the dma address of the header and to the frag page. However, this array is not really required: - The frag page can be calculated from the header index frag page index = header index / headers per page. - The dma address can be calculated through a formula: dma page address + header offset. This series gets rid of the info array and uses the above formulas instead. The current_page_index was used in conjunction with the info array to store page fragment indices. This variable is dropped as well. There was no performance regression observed. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5e: SHAMPO, Change frag page setup order during allocationDragos Tatulea
Now that the UMR allocation has been simplified, it is no longer possible to have a leftover page from a previous call to mlx5e_build_shampo_hd_umr(). This patch simplifies the code by switching the order of operations: first take the frag page and then increment the index. This is more straightforward and it also paves the way for dropping the info array. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5e: SHAMPO, Fix page_index calculation inconsistencyDragos Tatulea
When calculating the index for the next frag page slot, the divisor is incorrect: it should be the number of pages per queue not the number of headers per queue. This is currently harmless because frag pages are not used directly, but they are intermediated through the info array. But it needs to be fixed as an upcoming patch will get rid of the info array. This patch introduces a new pages per queue variable and plugs it in the formula. Now that this variable exists, additional code can be simplified in the SHAMPO initialization code. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-10-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5e: SHAMPO, Simplify UMR allocation for headersDragos Tatulea
Allocating page fragments for header data split is currently more complicated than it should be. That's because the number of KSM entries allocated is not aligned to the number of headers per page. This leads to having leftovers in the next allocation which require additional accounting and needlessly complicated code. This patch aligns (down) the number of KSM entries in the UMR WQE to the number of headers per page by: 1) Aligning the max number of entries allocated per UMR WQE (max_ksm_entries) to MLX5E_SHAMPO_WQ_HEADER_PER_PAGE. 2) Aligning the total number of free headers to MLX5E_SHAMPO_WQ_HEADER_PER_PAGE. ... and then it drops the extra accounting code from mlx5e_build_shampo_hd_umr(). Although the number of entries allocated per UMR WQE is slightly smaller due to aligning down, no performance impact was observed. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5: Make vport QoS enablement more flexible for future extensionsCarolina Jubran
Refactor esw_qos_vport_enable to support more generic configurations, allowing it to be reused for new vport node types in future patches. This refactor includes a new way to change the vport parent node by disabling the current setup and re-enabling it with the new parent. This change sets the foundation for adapting configuration based on the parent type in future patches. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5: Integrate esw_qos_vport_enable logic into rate operationsCarolina Jubran
Fold the esw_qos_vport_enable function into operations for configuring maximum and minimum rates, simplifying QoS logic. This change consolidates enabling and updating the scheduling element configuration, streamlining how vport QoS is initialized and adjusted. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5: Generalize scheduling element operationsCarolina Jubran
Introduce helper functions to create and destroy scheduling elements, allowing flexible configuration for different scheduling element types. The new helper functions streamline the process by centralizing error handling and logging through esw_qos_sched_elem_op_warn, which now accepts the operation type (create, destroy, or modify). The changes also adjust the esw_qos_vport_enable and mlx5_esw_qos_vport_disable functions to leverage the new generalized create/destroy helpers. The destroy functions now log errors with esw_warn without returning them. This prevents unnecessary error handling since the node was already destroyed and no further action is required from callers. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5: Refactor scheduling element configuration bitmasksCarolina Jubran
Refactor esw_qos_sched_elem_config to set bitmasks only when max_rate or bw_share values change, allowing the function to configure nodes with only one of these parameters. This enables more flexible usage for nodes where only one parameter requires configuration. Remove scattered assignments and checks to centralize them within this function, removing the now redundant esw_qos_set_node_max_rate entirely. With this refactor, also remove the assignment of the vport scheduling node max rate to the parent max rate for unlimited vports (where max rate is set to zero), as firmware already handles this behavior. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5: Generalize max_rate and min_rate setting for nodesCarolina Jubran
Refactor max_rate and min_rate setting functions to operate on mlx5_esw_sched_node, allowing for generalized handling of both vports and nodes. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5: Simplify QoS normalization by removing error handlingCarolina Jubran
This change updates esw_qos_normalize_min_rate to not return errors, significantly simplifying the code. Normalization failures are software bugs, and it's unnecessary to handle them with rollback mechanisms. Instead, `esw_qos_update_sched_node_bw_share` and `esw_qos_normalize_min_rate` now return void, with any errors logged as warnings to indicate potential software issues. This approach avoids compensating for hidden bugs and removes error handling from all places that perform normalization, streamlining future patches. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5: E-switch, refactor eswitch mode changePatrisious Haddad
The E-switch mode was previously updated before removing and re-adding the IB device, which could cause a temporary mismatch between the E-switch mode and the IB device configuration. To prevent this discrepancy, the IB device is now removed first, then the E-switch mode is updated, and finally, the IB device is re-added. This sequence ensures consistent alignment between the E-switch mode and the IB device whenever the mode changes, regardless of the new mode value. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107194357.683732-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5e: Disable loopback self-test on multi-PF netdevCarolina Jubran
In Multi-PF (Socket Direct) configurations, when a loopback packet is sent through one of the secondary devices, it will always be received on the primary device. This causes the loopback layer to fail in identifying the loopback packet as the devices are different. To avoid false test failures, disable the loopback self-test in Multi-PF configurations. Fixes: ed29705e4ed1 ("net/mlx5: Enable SD feature") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5e: CT: Fix null-ptr-deref in add rule err flowMoshe Shemesh
In error flow of mlx5_tc_ct_entry_add_rule(), in case ct_rule_add() callback returns error, zone_rule->attr is used uninitiated. Fix it to use attr which has the needed pointer value. Kernel log: BUG: kernel NULL pointer dereference, address: 0000000000000110 RIP: 0010:mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core] … Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x150/0x3e0 ? exc_page_fault+0x74/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core] ? mlx5_tc_ct_entry_add_rule+0x1d5/0x2f0 [mlx5_core] mlx5_tc_ct_block_flow_offload+0xc6a/0xf90 [mlx5_core] ? nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table] nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table] flow_offload_work_handler+0x142/0x320 [nf_flow_table] ? finish_task_switch.isra.0+0x15b/0x2b0 process_one_work+0x16c/0x320 worker_thread+0x28c/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xb8/0xf0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 7fac5c2eced3 ("net/mlx5: CT: Avoid reusing modify header context for natted entries") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5e: clear xdp features on non-uplink representorsWilliam Tu
Non-uplink representor port does not support XDP. The patch clears the xdp feature by checking the net_device_ops.ndo_bpf is set or not. Verify using the netlink tool: $ tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml --dump dev-get Representor netdev before the patch: {'ifindex': 8, 'xdp-features': {'basic', 'ndo-xmit', 'ndo-xmit-sg', 'redirect', 'rx-sg', 'xsk-zerocopy'}, 'xdp-rx-metadata-features': set(), 'xdp-zc-max-segs': 1, 'xsk-features': set()}, With the patch: {'ifindex': 8, 'xdp-features': set(), 'xdp-rx-metadata-features': set(), 'xsk-features': set()}, Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag") Signed-off-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5e: kTLS, Fix incorrect page refcountingDragos Tatulea
The kTLS tx handling code is using a mix of get_page() and page_ref_inc() APIs to increment the page reference. But on the release path (mlx5e_ktls_tx_handle_resync_dump_comp()), only put_page() is used. This is an issue when using pages from large folios: the get_page() references are stored on the folio page while the page_ref_inc() references are stored directly in the given page. On release the folio page will be dereferenced too many times. This was found while doing kTLS testing with sendfile() + ZC when the served file was read from NFS on a kernel with NFS large folios support (commit 49b29a573da8 ("nfs: add support for large folios")). Fixes: 84d1bb2b139e ("net/mlx5e: kTLS, Limit DUMP wqe size") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5: fs, lock FTE when checking if activeMark Bloch
The referenced commits introduced a two-step process for deleting FTEs: - Lock the FTE, delete it from hardware, set the hardware deletion function to NULL and unlock the FTE. - Lock the parent flow group, delete the software copy of the FTE, and remove it from the xarray. However, this approach encounters a race condition if a rule with the same match value is added simultaneously. In this scenario, fs_core may set the hardware deletion function to NULL prematurely, causing a panic during subsequent rule deletions. To prevent this, ensure the active flag of the FTE is checked under a lock, which will prevent the fs_core layer from attaching a new steering rule to an FTE that is in the process of deletion. [ 438.967589] MOSHE: 2496 mlx5_del_flow_rules del_hw_func [ 438.968205] ------------[ cut here ]------------ [ 438.968654] refcount_t: decrement hit 0; leaking memory. [ 438.969249] WARNING: CPU: 0 PID: 8957 at lib/refcount.c:31 refcount_warn_saturate+0xfb/0x110 [ 438.970054] Modules linked in: act_mirred cls_flower act_gact sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core zram zsmalloc fuse [last unloaded: cls_flower] [ 438.973288] CPU: 0 UID: 0 PID: 8957 Comm: tc Not tainted 6.12.0-rc1+ #8 [ 438.973888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 438.974874] RIP: 0010:refcount_warn_saturate+0xfb/0x110 [ 438.975363] Code: 40 66 3b 82 c6 05 16 e9 4d 01 01 e8 1f 7c a0 ff 0f 0b c3 cc cc cc cc 48 c7 c7 10 66 3b 82 c6 05 fd e8 4d 01 01 e8 05 7c a0 ff <0f> 0b c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 [ 438.976947] RSP: 0018:ffff888124a53610 EFLAGS: 00010286 [ 438.977446] RAX: 0000000000000000 RBX: ffff888119d56de0 RCX: 0000000000000000 [ 438.978090] RDX: ffff88852c828700 RSI: ffff88852c81b3c0 RDI: ffff88852c81b3c0 [ 438.978721] RBP: ffff888120fa0e88 R08: 0000000000000000 R09: ffff888124a534b0 [ 438.979353] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888119d56de0 [ 438.979979] R13: ffff888120fa0ec0 R14: ffff888120fa0ee8 R15: ffff888119d56de0 [ 438.980607] FS: 00007fe6dcc0f800(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 [ 438.983984] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 438.984544] CR2: 00000000004275e0 CR3: 0000000186982001 CR4: 0000000000372eb0 [ 438.985205] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 438.985842] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 438.986507] Call Trace: [ 438.986799] <TASK> [ 438.987070] ? __warn+0x7d/0x110 [ 438.987426] ? refcount_warn_saturate+0xfb/0x110 [ 438.987877] ? report_bug+0x17d/0x190 [ 438.988261] ? prb_read_valid+0x17/0x20 [ 438.988659] ? handle_bug+0x53/0x90 [ 438.989054] ? exc_invalid_op+0x14/0x70 [ 438.989458] ? asm_exc_invalid_op+0x16/0x20 [ 438.989883] ? refcount_warn_saturate+0xfb/0x110 [ 438.990348] mlx5_del_flow_rules+0x2f7/0x340 [mlx5_core] [ 438.990932] __mlx5_eswitch_del_rule+0x49/0x170 [mlx5_core] [ 438.991519] ? mlx5_lag_is_sriov+0x3c/0x50 [mlx5_core] [ 438.992054] ? xas_load+0x9/0xb0 [ 438.992407] mlx5e_tc_rule_unoffload+0x45/0xe0 [mlx5_core] [ 438.993037] mlx5e_tc_del_fdb_flow+0x2a6/0x2e0 [mlx5_core] [ 438.993623] mlx5e_flow_put+0x29/0x60 [mlx5_core] [ 438.994161] mlx5e_delete_flower+0x261/0x390 [mlx5_core] [ 438.994728] tc_setup_cb_destroy+0xb9/0x190 [ 438.995150] fl_hw_destroy_filter+0x94/0xc0 [cls_flower] [ 438.995650] fl_change+0x11a4/0x13c0 [cls_flower] [ 438.996105] tc_new_tfilter+0x347/0xbc0 [ 438.996503] ? ___slab_alloc+0x70/0x8c0 [ 438.996929] rtnetlink_rcv_msg+0xf9/0x3e0 [ 438.997339] ? __netlink_sendskb+0x4c/0x70 [ 438.997751] ? netlink_unicast+0x286/0x2d0 [ 438.998171] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 438.998625] netlink_rcv_skb+0x54/0x100 [ 438.999020] netlink_unicast+0x203/0x2d0 [ 438.999421] netlink_sendmsg+0x1e4/0x420 [ 438.999820] __sock_sendmsg+0xa1/0xb0 [ 439.000203] ____sys_sendmsg+0x207/0x2a0 [ 439.000600] ? copy_msghdr_from_user+0x6d/0xa0 [ 439.001072] ___sys_sendmsg+0x80/0xc0 [ 439.001459] ? ___sys_recvmsg+0x8b/0xc0 [ 439.001848] ? generic_update_time+0x4d/0x60 [ 439.002282] __sys_sendmsg+0x51/0x90 [ 439.002658] do_syscall_64+0x50/0x110 [ 439.003040] entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 718ce4d601db ("net/mlx5: Consolidate update FTE for all removal changes") Fixes: cefc23554fc2 ("net/mlx5: Fix FTE cleanup") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5: Fix msix vectors to respect platform limitParav Pandit
The number of PCI vectors allocated by the platform (which may be fewer than requested) is currently not honored when creating the SF pool; only the PCI MSI-X capability is considered. As a result, when a platform allocates fewer vectors (in non-dynamic mode) than requested, the PF and SF pools end up with an invalid vector range. This causes incorrect SF vector accounting, which leads to the following call trace when an invalid IRQ vector is allocated. This issue is resolved by ensuring that the platform's vector limit is respected for both the SF and PF pools. Workqueue: mlx5_vhca_event0 mlx5_sf_dev_add_active_work [mlx5_core] RIP: 0010:pci_irq_vector+0x23/0x80 RSP: 0018:ffffabd5cebd7248 EFLAGS: 00010246 RAX: ffff980880e7f308 RBX: ffff9808932fb880 RCX: 0000000000000001 RDX: 00000000000001ff RSI: 0000000000000200 RDI: ffff980880e7f308 RBP: 0000000000000200 R08: 0000000000000010 R09: ffff97a9116f0860 R10: 0000000000000002 R11: 0000000000000228 R12: ffff980897cd0160 R13: 0000000000000000 R14: ffff97a920fec0c0 R15: ffffabd5cebd72d0 FS: 0000000000000000(0000) GS:ffff97c7ff9c0000(0000) knlGS:0000000000000000 ? rescuer_thread+0x350/0x350 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22 mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22 mlx5_core.sf mlx5_core.sf.6: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced) mlx5_core.sf mlx5_core.sf.7: firmware version: 32.43.356 mlx5_core.sf mlx5_core.sf.6 enpa1s0f0s4: renamed from eth0 mlx5_core.sf mlx5_core.sf.7: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22 mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22 mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22 Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Amir Tzin <amirtz@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net/mlx5: E-switch, unload IB representors when unloading ETH representorsChiara Meiohas
IB representors depend on ETH representors, so the IB representors should not exist without the ETH ones. When unloading the ETH representors, the corresponding IB representors should be also unloaded. The commit 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") introduced the use of the ib_device_set_netdev API in IB repsresentors. ib_device_set_netdev() increments the refcount of the representor's netdev when loading an IB representor and decrements it when unloading. Without the unloading of the IB representor, the refcount of the representor's netdev remains greater than 0, preventing it from being unregistered. The patch uncovered an underlying bug where the eth representor is unloaded, without unloading the IB representor. This issue happened when using multiport E-switch and rebooting, causing the shutdown to hang when unloading the ETH representor because the refcount of the representor's netdevice was greater than 0. Call trace: unregister_netdevice: waiting for eth3 to become free. Usage count = 2 ref_tracker: eth%d@00000000661d60f7 has 1/1 users at ib_device_set_netdev+0x160/0x2d0 [ib_core] mlx5_ib_vport_rep_load+0x104/0x3f0 [mlx5_ib] mlx5_eswitch_reload_ib_reps+0xfc/0x110 [mlx5_core] mlx5_mpesw_work+0x236/0x330 [mlx5_core] process_one_work+0x169/0x320 worker_thread+0x288/0x3a0 kthread+0xb8/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11bnxt_en: add unlocked version of bnxt_refclk_readVadim Fedorenko
Serialization of PHC read with FW reset mechanism uses ptp_lock which also protects timecounter updates. This means we cannot grab it when called from bnxt_cc_read(). Let's move locking into different function. Fixes: 6c0828d00f07 ("bnxt_en: replace PTP spinlock with seqlock") Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20241107214917.2980976-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11r8169: use helper r8169_mod_reg8_cond to simplify rtl_jumbo_configHeiner Kallweit
Use recently added helper r8169_mod_reg8_cond() to simplify jumbo mode configuration. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/3df1d484-a02e-46e7-8f75-db5b428e422e@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: stmmac: dwmac4: Receive Watchdog Timeout is not in abnormal interrupt ↵Ley Foon Tan
summary The Receive Watchdog Timeout (RWT, bit[9]) is not part of Abnormal Interrupt Summary (AIS). Move the RWT handling out of the AIS condition statement. From databook, the AIS is the logical OR of the following interrupt bits: - Bit 1: Transmit Process Stopped - Bit 7: Receive Buffer Unavailable - Bit 8: Receive Process Stopped - Bit 10: Early Transmit Interrupt - Bit 12: Fatal Bus Error - Bit 13: Context Descriptor Error Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241107063637.2122726-4-leyfoon.tan@starfivetech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: stmmac: dwmac4: Fix the MTL_OP_MODE_*_MASK operationLey Foon Tan
In order to mask off the bits, we need to use the '~' operator to invert all the bits of _MASK and clear them. Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241107063637.2122726-3-leyfoon.tan@starfivetech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: stmmac: dwmac4: Fix MTL_OP_MODE_RTC mask and shift macrosLey Foon Tan
RTC fields are located in bits [1:0]. Correct the _MASK and _SHIFT macros to use the appropriate mask and shift. Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241107063637.2122726-2-leyfoon.tan@starfivetech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: ti: icssg-prueth: Add VLAN support for HSR modeRavi Gunasekaran
Add support for VLAN addition/deletion in HSR mode. In HSR mode, even if the host port is not a member of the VLAN domain, the slave ports should simply forward the frames. So allow forwarding of all VLAN frames in HSR mode. Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20241106091710.3308519-4-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: atlantic: use irq_update_affinity_hint()Mohammad Heib
irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. when the driver applies the affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. atlantic device reopening will resets the affinity in aq_ndev_open(). 3. atlantic has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241107120739.415743-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11nfp: use irq_update_affinity_hint()Mohammad Heib
irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. when the driver applies the affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. nfp device reopening will resets the affinity in nfp_net_netdev_open(). 3. nfp has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://patch.msgid.link/20241107115002.413358-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11bnxt_en: use irq_update_affinity_hint()Mohammad Heib
irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. when the driver applies the affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. bnxt_en device reopening will resets the affinity in bnxt_open(). 3. bnxt_en has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20241106180811.385175-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11octeontx2-af: Knobs for NPC default rule countersLinu Cherian
Add devlink knobs to enable/disable counters on NPC default rule entries. Sample command to enable default rule counters: devlink dev param set <dev> name npc_def_rule_cntr value true cmode runtime Sample command to read the counter: cat /sys/kernel/debug/cn10k/npc/mcam_rules Signed-off-by: Linu Cherian <lcherian@marvell.com> Link: https://patch.msgid.link/20241105125620.2114301-3-lcherian@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>