summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
AgeCommit message (Collapse)Author
7 daysMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in late fixes to prepare for the 6.10 net-next PR. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet/mlx5e: Modifying channels number and updating TX queuesCarolina Jubran
It is not appropriate for the mlx5e_num_channels_changed function to be called solely for updating the TX queues, even if the channels number has not been changed. Move the code responsible for updating the TC and TX queues from mlx5e_num_channels_changed and produce a new function called mlx5e_update_tc_and_tx_queues. This new function should only be called when the channels number remains unchanged. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240512124306.740898-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet/mlx5e: Fix netif state handlingShay Drory
mlx5e_suspend cleans resources only if netif_device_present() returns true. However, mlx5e_resume changes the state of netif, via mlx5e_nic_enable, only if reg_state == NETREG_REGISTERED. In the below case, the above leads to NULL-ptr Oops[1] and memory leaks: mlx5e_probe _mlx5e_resume mlx5e_attach_netdev mlx5e_nic_enable <-- netdev not reg, not calling netif_device_attach() register_netdev <-- failed for some reason. ERROR_FLOW: _mlx5e_suspend <-- netif_device_present return false, resources aren't freed :( Hence, clean resources in this case as well. [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0010 [#1] SMP CPU: 2 PID: 9345 Comm: test-ovs-ct-gen Not tainted 6.5.0_for_upstream_min_debug_2023_09_05_16_01 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:0x0 Code: Unable to access opcode bytes at0xffffffffffffffd6. RSP: 0018:ffff888178aaf758 EFLAGS: 00010246 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 notifier_call_chain+0x35/0xb0 blocking_notifier_call_chain+0x3d/0x60 mlx5_blocking_notifier_call_chain+0x22/0x30 [mlx5_core] mlx5_core_uplink_netdev_event_replay+0x3e/0x60 [mlx5_core] mlx5_mdev_netdev_track+0x53/0x60 [mlx5_ib] mlx5_ib_roce_init+0xc3/0x340 [mlx5_ib] __mlx5_ib_add+0x34/0xd0 [mlx5_ib] mlx5r_probe+0xe1/0x210 [mlx5_ib] ? auxiliary_match_id+0x6a/0x90 auxiliary_bus_probe+0x38/0x80 ? driver_sysfs_add+0x51/0x80 really_probe+0xc9/0x3e0 ? driver_probe_device+0x90/0x90 __driver_probe_device+0x80/0x160 driver_probe_device+0x1e/0x90 __device_attach_driver+0x7d/0x100 bus_for_each_drv+0x80/0xd0 __device_attach+0xbc/0x1f0 bus_probe_device+0x86/0xa0 device_add+0x637/0x840 __auxiliary_device_add+0x3b/0xa0 add_adev+0xc9/0x140 [mlx5_core] mlx5_rescan_drivers_locked+0x22a/0x310 [mlx5_core] mlx5_register_device+0x53/0xa0 [mlx5_core] mlx5_init_one_devl_locked+0x5c4/0x9c0 [mlx5_core] mlx5_init_one+0x3b/0x60 [mlx5_core] probe_one+0x44c/0x730 [mlx5_core] local_pci_probe+0x3e/0x90 pci_device_probe+0xbf/0x210 ? kernfs_create_link+0x5d/0xa0 ? sysfs_do_create_link_sd+0x60/0xc0 really_probe+0xc9/0x3e0 ? driver_probe_device+0x90/0x90 __driver_probe_device+0x80/0x160 driver_probe_device+0x1e/0x90 __device_attach_driver+0x7d/0x100 bus_for_each_drv+0x80/0xd0 __device_attach+0xbc/0x1f0 pci_bus_add_device+0x54/0x80 pci_iov_add_virtfn+0x2e6/0x320 sriov_enable+0x208/0x420 mlx5_core_sriov_configure+0x9e/0x200 [mlx5_core] sriov_numvfs_store+0xae/0x1a0 kernfs_fop_write_iter+0x10c/0x1a0 vfs_write+0x291/0x3c0 ksys_write+0x5f/0xe0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- Fixes: 2c3b5beec46a ("net/mlx5e: More generic netdev management API") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240509112951.590184-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 daysnet: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net/mlx5e: Support updating coalescing configuration without resetting channelsRahul Rameshbabu
When CQE mode or DIM state is changed, gracefully reconfigure channels to handle new configuration. Previously, would create new channels that would reflect the changes rather than update the original channels. Co-developed-by: Nabil S. Alramli <dev@nalramli.com> Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240419080445.417574-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net/mlx5e: Dynamically allocate DIM structure for SQs/RQsRahul Rameshbabu
Make it possible for the DIM structure to be torn down while an SQ or RQ is still active. Changing the CQ period mode is an example where the previous sampling done with the DIM structure would need to be invalidated. Co-developed-by: Nabil S. Alramli <dev@nalramli.com> Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240419080445.417574-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net/mlx5e: Use DIM constants for CQ period mode parameterRahul Rameshbabu
Use core DIM CQ period mode enum values for the CQ parameter for the period mode. Translate the value to the specific mlx5 device constant for the selected period mode when creating a CQ. Avoid needing to translate mlx5 device constants to DIM constants for core DIM functionality. Co-developed-by: Nabil S. Alramli <dev@nalramli.com> Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240419080445.417574-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net/mlx5e: Move DIM function declarations to en/dim.hRahul Rameshbabu
Create a header specifically for DIM-related declarations. Move existing DIM-specific functionality from en.h. Future DIM-related functionality will be declared in en/dim.h in subsequent patches. Co-developed-by: Nabil S. Alramli <dev@nalramli.com> Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240419080445.417574-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: include/trace/events/rpcgss.h 386f4a737964 ("trace: events: cleanup deprecated strncpy uses") a4833e3abae1 ("SUNRPC: Fix rpcgss_context trace event acceptor field") Adjacent changes: drivers/net/ethernet/intel/ice/ice_tc_lib.c 2cca35f5dd78 ("ice: Fix checking for unsupported keys on non-tunnel device") 784feaa65dfd ("ice: Add support for PFCP hardware offload in switchdev") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-12net/mlx5: SD, Handle possible devcom ERR_PTRTariq Toukan
Check if devcom holds an error pointer and return immediately. This fixes Smatch static checker warning: drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c:221 sd_register() error: 'devcom' dereferencing possible ERR_PTR() Enhance mlx5_devcom_register_component() so it stops returning NULL, making it easier for its callers. Fixes: d3d057666090 ("net/mlx5: SD, Implement devcom communication and primary election") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/all/f09666c8-e604-41f6-958b-4cc55c73faf9@gmail.com/T/ Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://lore.kernel.org/r/20240411115444.374475-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/unix/garbage.c 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c faa12ca24558 ("bnxt_en: Reset PTP tx_avail after possible firmware reset") b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()") drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c 7ac10c7d728d ("bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()") 194fad5b2781 ("bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions") drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 958f56e48385 ("net/mlx5e: Un-expose functions in en.h") 49e6c9387051 ("net/mlx5e: RSS, Block XOR hash with over 128 channels") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10net/mlx5e: Fix mlx5e_priv_init() cleanup flowCarolina Jubran
When mlx5e_priv_init() fails, the cleanup flow calls mlx5e_selq_cleanup which calls mlx5e_selq_apply() that assures that the `priv->state_lock` is held using lockdep_is_held(). Acquire the state_lock in mlx5e_selq_cleanup(). Kernel log: ============================= WARNING: suspicious RCU usage 6.8.0-rc3_net_next_841a9b5 #1 Not tainted ----------------------------- drivers/net/ethernet/mellanox/mlx5/core/en/selq.c:124 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by systemd-modules/293: #0: ffffffffa05067b0 (devices_rwsem){++++}-{3:3}, at: ib_register_client+0x109/0x1b0 [ib_core] #1: ffff8881096c65c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x104/0x1c0 [ib_core] stack backtrace: CPU: 4 PID: 293 Comm: systemd-modules Not tainted 6.8.0-rc3_net_next_841a9b5 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x8a/0xa0 lockdep_rcu_suspicious+0x154/0x1a0 mlx5e_selq_apply+0x94/0xa0 [mlx5_core] mlx5e_selq_cleanup+0x3a/0x60 [mlx5_core] mlx5e_priv_init+0x2be/0x2f0 [mlx5_core] mlx5_rdma_setup_rn+0x7c/0x1a0 [mlx5_core] rdma_init_netdev+0x4e/0x80 [ib_core] ? mlx5_rdma_netdev_free+0x70/0x70 [mlx5_core] ipoib_intf_init+0x64/0x550 [ib_ipoib] ipoib_intf_alloc+0x4e/0xc0 [ib_ipoib] ipoib_add_one+0xb0/0x360 [ib_ipoib] add_client_context+0x112/0x1c0 [ib_core] ib_register_client+0x166/0x1b0 [ib_core] ? 0xffffffffa0573000 ipoib_init_module+0xeb/0x1a0 [ib_ipoib] do_one_initcall+0x61/0x250 do_init_module+0x8a/0x270 init_module_from_file+0x8b/0xd0 idempotent_init_module+0x17d/0x230 __x64_sys_finit_module+0x61/0xb0 do_syscall_64+0x71/0x140 entry_SYSCALL_64_after_hwframe+0x46/0x4e </TASK> Fixes: 8bf30be75069 ("net/mlx5e: Introduce select queue parameters") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05net/mlx5e: Un-expose functions in en.hTariq Toukan
Un-expose functions that are not used outside of their c file. Make them static. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/20240404173357.123307-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29netlink: introduce type-checking attribute iterationJohannes Berg
There are, especially with multi-attr arrays, many cases of needing to iterate all attributes of a specific type in a netlink message or a nested attribute. Add specific macros to support that case. Also convert many instances using this spatch: @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } Although I had to undo one bad change this made, and I also adjusted some other code for whitespace and to use direct variable initialization now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07net/mlx5e: Support per-mdev queue counterTariq Toukan
Each queue counter object counts some events (in hardware) for the RQs that are attached to it, like events of packet drops due to no receive WQE (rx_out_of_buffer). Each RQ can be attached to a queue counter only within the same vhca. To still cover all RQs with these counters, we create multiple instances, one per vhca. The result that's shown to the user is now the sum of all instances. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07net/mlx5e: Support cross-vhca RSSTariq Toukan
Implement driver support for the HW feature that allows RX steering of one device to target other device's RQs. In SD multi-pf netdev mode, we set the secondaries into silent mode, disconnecting them from the network. This feature is then used to steer traffic from the primary to the secondaries. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07net/mlx5e: Let channels be SD-awareTariq Toukan
Distribute the channels between the different SD-devices to acheive local numa node performance on multiple numas. Each channel works against one specific mdev, creating all datapath queues against it. We distribute channels to mdevs in a round-robin policy. Example for 2 mdevs and 6 channels: +-------+---------+ | ch ix | mdev ix | +-------+---------+ | 0 | 0 | | 1 | 1 | | 2 | 0 | | 3 | 1 | | 4 | 0 | | 5 | 1 | +-------+---------+ This round-robin distribution policy is preferred over another suggested intuitive distribution, in which we first distribute one half of the channels to mdev #0 and then the second half to mdev #1. We prefer round-robin for a reason: it is less influenced by changes in the number of channels. The mapping between channel index and mdev is fixed, no matter how many channels the user configures. As the channel stats are persistent to channels closure, changing the mapping every single time would turn the accumulative stats less representing of the channel's history. Per-channel objects should stop using the primary mdev (priv->mdev) directly, and instead move to using their own channel's mdev. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07net/mlx5e: Create EN core HW resources for all secondary devicesTariq Toukan
Traffic queues will be created on all devices, including the secondaries. Create the needed core layer resources for them as well. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07net/mlx5e: Create single netdev per SD groupTariq Toukan
Integrate the SD library calls into the auxiliary_driver ops in preparation for creating a single netdev for the multiple PFs belonging to the same SD group. SD is still disabled at this stage. It is enabled by a downstream patch when all needed parts are implemented. The netdev is created whenever the SD group, with all its participants, are ready. It is later destroyed whenever any of the participating PFs drops. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-12net/mlx5e: link NAPI instances to queues and IRQsJoe Damato
Make mlx5 compatible with the newly added netlink queue GET APIs. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240209202312.30181-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24net/mlx5e: Use the correct lag ports number when creating TISesSaeed Mahameed
The cited commit moved the code of mlx5e_create_tises() and changed the loop to create TISes over MLX5_MAX_PORTS constant value, instead of getting the correct lag ports supported by the device, which can cause FW errors on devices with less than MLX5_MAX_PORTS ports. Change that back to mlx5e_get_num_lag_ports(mdev). Also IPoIB interfaces create there own TISes, they don't use the eth TISes, pass a flag to indicate that. This fixes the following errors that might appear in kernel log: mlx5_cmd_out_err:808:(pid 650): CREATE_TIS(0x912) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x595b5d), err(-22) mlx5e_create_mdev_resources:174:(pid 650): alloc tises failed, -22 Fixes: b25bd37c859f ("net/mlx5: Move TISes from priv to mdev HW resources") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-07Revert "mlx5 updates 2023-12-20"Jakub Kicinski
Revert "net/mlx5: Implement management PF Ethernet profile" This reverts commit 22c4640698a1d47606b5a4264a584e8046641784. Revert "net/mlx5: Enable SD feature" This reverts commit c88c49ac9c18fb7c3fa431126de1d8f8f555e912. Revert "net/mlx5e: Block TLS device offload on combined SD netdev" This reverts commit 83a59ce0057b7753d7fbece194b89622c663b2a6. Revert "net/mlx5e: Support per-mdev queue counter" This reverts commit d72baceb92539a178d2610b0e9ceb75706a75b55. Revert "net/mlx5e: Support cross-vhca RSS" This reverts commit c73a3ab8fa6e93a783bd563938d7cf00d62d5d34. Revert "net/mlx5e: Let channels be SD-aware" This reverts commit e4f9686bdee7b4dd89e0ed63cd03606e4bda4ced. Revert "net/mlx5e: Create EN core HW resources for all secondary devices" This reverts commit c4fb94aa822d6c9d05fc3c5aee35c7e339061dc1. Revert "net/mlx5e: Create single netdev per SD group" This reverts commit e2578b4f983cfcd47837bbe3bcdbf5920e50b2ad. Revert "net/mlx5: SD, Add informative prints in kernel log" This reverts commit c82d360325112ccc512fc11a3b68cdcdf04a1478. Revert "net/mlx5: SD, Implement steering for primary and secondaries" This reverts commit 605fcce33b2d1beb0139b6e5913fa0b2062116b2. Revert "net/mlx5: SD, Implement devcom communication and primary election" This reverts commit a45af9a96740873db9a4b5bb493ce2ad81ccb4d5. Revert "net/mlx5: SD, Implement basic query and instantiation" This reverts commit 63b9ce944c0e26c44c42cdd5095c2e9851c1a8ff. Revert "net/mlx5: SD, Introduce SD lib" This reverts commit 4a04a31f49320d078b8078e1da4b0e2faca5dfa3. Revert "net/mlx5: Fix query of sd_group field" This reverts commit e04984a37398b3f4f5a79c993b94c6b1224184cc. Revert "net/mlx5e: Use the correct lag ports number when creating TISes" This reverts commit a7e7b40c4bc115dbf2a2bb453d7bbb2e0ea99703. There are some unanswered questions on the list, and we don't have any docs. Given the lack of replies so far and the fact that v6.8 merge window has started - let's revert this and revisit for v6.9. Link: https://lore.kernel.org/all/20231221005721.186607-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-20net/mlx5: Implement management PF Ethernet profileArmen Ratner
Add management PF modules, which introduce support for the structures needed to create the resources for the MGMT PF to work. Also, add the necessary calls and functions to establish this functionality. Signed-off-by: Armen Ratner <armeng@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
2023-12-20net/mlx5e: Support per-mdev queue counterTariq Toukan
Each queue counter object counts some events (in hardware) for the RQs that are attached to it, like events of packet drops due to no receive WQE (rx_out_of_buffer). Each RQ can be attached to a queue counter only within the same vhca. To still cover all RQs with these counters, we create multiple instances, one per vhca. The result that's shown to the user is now the sum of all instances. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20net/mlx5e: Support cross-vhca RSSTariq Toukan
Implement driver support for the HW feature that allows RX steering of one device to target other device's RQs. In SD multi-mdev netdev mode, we set the secondaries into silent mode, disconnecting them from the network. This feature is then used to steer traffic from the primary to the secondaries. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20net/mlx5e: Let channels be SD-awareTariq Toukan
Distribute the channels between the different SD-devices to acheive local numa node performance on multiple numas. Each channel works against one specific mdev, creating all datapath queues against it. We distribute channels to mdevs in a round-robin policy. Example for 2 mdevs and 6 channels: +-------+---------+ | ch ix | mdev ix | +-------+---------+ | 0 | 0 | | 1 | 1 | | 2 | 0 | | 3 | 1 | | 4 | 0 | | 5 | 1 | +-------+---------+ This round-robin distribution policy is preferred over another suggested intuitive distribution, in which we first distribute one half of the channels to mdev #0 and then the second half to mdev #1. We prefer round-robin for a reason: it is less influenced by changes in the number of channels. The mapping between channel index and mdev is fixed, no matter how many channels the user configures. As the channel stats are persistent to channels closure, changing the mapping every single time would turn the accumulative stats less representing of the channel's history. Per-channel objects should stop using the primary mdev (priv->mdev) directly, and instead move to using their own channel's mdev. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20net/mlx5e: Create EN core HW resources for all secondary devicesTariq Toukan
Traffic queues will be created on all devices, including the secondaries. Create the needed core layer resources for them as well. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20net/mlx5e: Create single netdev per SD groupTariq Toukan
Integrate the SD library calls into the auxiliary_driver ops in preparation for creating a single netdev for the multiple devices belonging to the same SD group. SD is still disabled at this stage. It is enabled by a downstream patch when all needed parts are implemented. The netdev is created only when the SD group, with all its participants, are ready. It is later destroyed if any of the participating devices drops. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20net/mlx5e: Use the correct lag ports number when creating TISesSaeed Mahameed
The cited commit moved the code of mlx5e_create_tises() and changed the loop to create TISes over MLX5_MAX_PORTS constant value, instead of getting the correct lag ports supported by the device, which can cause FW errors on devices with less than MLX5_MAX_PORTS ports. Change that back to mlx5e_get_num_lag_ports(mdev). Also IPoIB interfaces create there own TISes, they don't use the eth TISes, pass a flag to indicate that. Fixes: b25bd37c859f ("net/mlx5: Move TISes from priv to mdev HW resources") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-15Merge tag 'mlx5-updates-2023-12-13' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-12-13 Preparation for mlx5e socket direct feature. Socket direct will allow multiple PF devices attached to different NUMA nodes but sharing the same physical port. The following series is a small refactoring series in preparation to support socket direct in the following submission. Highlights: - Define required device registers and bits related to socket direct - Flow steering re-arrangements - Generalize TX objects (TISs) and store them in a common object, will be useful in the next series for per function object management. - Decouple raw CQ objects from their parent netdev priv - Prepare devcom for Socket Direct device group discovery. Please see the individual patches for more information. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/intel/iavf/iavf_ethtool.c 3a0b5a2929fd ("iavf: Introduce new state machines for flow director") 95260816b489 ("iavf: use iavf_schedule_aq_request() helper") https://lore.kernel.org/all/84e12519-04dc-bd80-bc34-8cf50d7898ce@intel.com/ drivers/net/ethernet/broadcom/bnxt/bnxt.c c13e268c0768 ("bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logic") c2f8063309da ("bnxt_en: Refactor RX VLAN acceleration logic.") a7445d69809f ("bnxt_en: Add support for new RX and TPA_START completion types for P7") 1c7fd6ee2fe4 ("bnxt_en: Rename some macros for the P5 chips") https://lore.kernel.org/all/20231211110022.27926ad9@canb.auug.org.au/ drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c bd6781c18cb5 ("bnxt_en: Fix wrong return value check in bnxt_close_nic()") 84793a499578 ("bnxt_en: Skip nic close/open when configuring tstamp filters") https://lore.kernel.org/all/20231214113041.3a0c003c@canb.auug.org.au/ drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c 3d7a3f2612d7 ("net/mlx5: Nack sync reset request when HotPlug is enabled") cecf44ea1a1f ("net/mlx5: Allow sync reset flow when BF MGT interface device is present") https://lore.kernel.org/all/20231211110328.76c925af@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13net/mlx5e: Decouple CQ from privTariq Toukan
Make CQ struct and methods independent of "priv", use more basic arguments instead. This will ease the transition to netdev with multiple mdevs. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-13net/mlx5e: Add wrapping for auxiliary_driver ops and remove unused argsTariq Toukan
Turn some of the struct auxiliary_driver ops into wrappers to stop having dummy local vars passed as unused arguments. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-13net/mlx5: Move TISes from priv to mdev HW resourcesTariq Toukan
The transport interface send (TIS) object is responsible for performing all transport related operations of the transmit side. Messages from Send Queues get segmented and transmitted by the TIS including all transport required implications, e.g. in the case of large send offload, the TIS is responsible for the segmentation. These are stateless objects and can be used by multiple netdevs (e.g. representors) who share the same core device. Providing the TISes as a service from the core layer to the netdev layer reduces the number of replecated TIS objects (in case of multiple netdevs), and will ease the transition to netdev with multiple mdevs. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-13net/mlx5e: Remove TLS-specific logic in generic create TIS APITariq Toukan
TLS TISes are created using their own dedicated functions, don't honor their specific logic here. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-04net/mlx5e: Fix possible deadlock on mlx5e_tx_timeout_workMoshe Shemesh
Due to the cited patch, devlink health commands take devlink lock and this may result in deadlock for mlx5e_tx_reporter as it takes local state_lock before calling devlink health report and on the other hand devlink health commands such as diagnose for same reporter take local state_lock after taking devlink lock (see kernel log below). To fix it, remove local state_lock from mlx5e_tx_timeout_work() before calling devlink_health_report() and take care to cancel the work before any call to close channels, which may free the SQs that should be handled by the work. Before cancel_work_sync(), use current_work() to check we are not calling it from within the work, as mlx5e_tx_timeout_work() itself may close the channels and reopen as part of recovery flow. While removing state_lock from mlx5e_tx_timeout_work() keep rtnl_lock to ensure no change in netdev->real_num_tx_queues, but use rtnl_trylock() and a flag to avoid deadlock by calling cancel_work_sync() before closing the channels while holding rtnl_lock too. Kernel log: ====================================================== WARNING: possible circular locking dependency detected 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Not tainted ------------------------------------------------------ kworker/u16:2/65 is trying to acquire lock: ffff888122f6c2f8 (&devlink->lock_key#2){+.+.}-{3:3}, at: devlink_health_report+0x2f1/0x7e0 but task is already holding lock: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&priv->state_lock){+.+.}-{3:3}: __mutex_lock+0x12c/0x14b0 mlx5e_rx_reporter_diagnose+0x71/0x700 [mlx5_core] devlink_nl_cmd_health_reporter_diagnose_doit+0x212/0xa50 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x2e9/0x530 netlink_rcv_skb+0x11d/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x438/0x710 netlink_sendmsg+0x788/0xc40 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x1c1/0x290 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #0 (&devlink->lock_key#2){+.+.}-{3:3}: __lock_acquire+0x2c8a/0x6200 lock_acquire+0x1c1/0x550 __mutex_lock+0x12c/0x14b0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&priv->state_lock); lock(&devlink->lock_key#2); lock(&priv->state_lock); lock(&devlink->lock_key#2); *** DEADLOCK *** 4 locks held by kworker/u16:2/65: #0: ffff88811a55b138 ((wq_completion)mlx5e#2){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340 #1: ffff888101de7db8 ((work_completion)(&priv->tx_timeout_work)){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340 #2: ffffffff84ce8328 (rtnl_mutex){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x53/0x280 [mlx5_core] #3: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] stack backtrace: CPU: 1 PID: 65 Comm: kworker/u16:2 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x57/0x7d check_noncircular+0x278/0x300 ? print_circular_bug+0x460/0x460 ? find_held_lock+0x2d/0x110 ? __stack_depot_save+0x24c/0x520 ? alloc_chain_hlocks+0x228/0x700 __lock_acquire+0x2c8a/0x6200 ? register_lock_class+0x1860/0x1860 ? kasan_save_stack+0x1e/0x40 ? kasan_set_free_info+0x20/0x30 ? ____kasan_slab_free+0x11d/0x1b0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? devlink_health_report+0x3d5/0x7e0 lock_acquire+0x1c1/0x550 ? devlink_health_report+0x2f1/0x7e0 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? find_held_lock+0x2d/0x110 __mutex_lock+0x12c/0x14b0 ? devlink_health_report+0x2f1/0x7e0 ? devlink_health_report+0x2f1/0x7e0 ? mutex_lock_io_nested+0x1320/0x1320 ? trace_hardirqs_on+0x2d/0x100 ? bit_wait_io_timeout+0x170/0x170 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? mlx5e_reporter_tx_err_cqe+0x1b0/0x1b0 [mlx5_core] ? mlx5e_tx_reporter_timeout_dump+0x70/0x70 [mlx5_core] ? mlx5e_tx_reporter_dump_sq+0x320/0x320 [mlx5_core] ? mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] ? mutex_lock_io_nested+0x1320/0x1320 ? process_one_work+0x70f/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? lock_downgrade+0x6e0/0x6e0 mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x59d/0xec0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: c90005b5f75c ("devlink: Hold the instance lock in health callbacks") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-30Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-11-30 We've added 30 non-merge commits during the last 7 day(s) which contain a total of 58 files changed, 1598 insertions(+), 154 deletions(-). The main changes are: 1) Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload, from Stanislav Fomichev with stmmac implementation from Song Yoong Siang. 2) Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques, from Andrii Nakryiko. 3) Add BPF link_info support for uprobe multi link along with bpftool integration for the latter, from Jiri Olsa. 4) Use pkg-config in BPF selftests to determine ld flags which is in particular needed for linking statically, from Akihiko Odaki. 5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18, from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits) bpf/tests: Remove duplicate JSGT tests selftests/bpf: Add TX side to xdp_hw_metadata selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP selftests/bpf: Add TX side to xdp_metadata selftests/bpf: Add csum helpers selftests/xsk: Support tx_metadata_len xsk: Add option to calculate TX checksum in SW xsk: Validate xsk_tx_metadata flags xsk: Document tx_metadata_len layout net: stmmac: Add Tx HWTS support to XDP ZC net/mlx5e: Implement AF_XDP TX timestamp and checksum offload tools: ynl: Print xsk-features from the sample xsk: Add TX timestamp and TX checksum offload support xsk: Support tx_metadata_len selftests/bpf: Use pkg-config for libelf selftests/bpf: Override PKG_CONFIG for static builds selftests/bpf: Choose pkg-config for the target bpftool: Add support to display uprobe_multi links selftests/bpf: Add link_info test for uprobe_multi link selftests/bpf: Use bpf_link__destroy in fill_link_info tests ... ==================== Conflicts: Documentation/netlink/specs/netdev.yaml: 839ff60df3ab ("net: page_pool: add nlspec for basic access to page pools") 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/ While at it also regen, tree is dirty after: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") looks like code wasn't re-rendered after "render-max" was removed. Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29net/mlx5e: Implement AF_XDP TX timestamp and checksum offloadStanislav Fomichev
TX timestamp: - requires passing clock, not sure I'm passing the correct one (from cq->mdev), but the timestamp value looks convincing TX checksum: - looks like device does packet parsing (and doesn't accept custom start/offset), so I'm ignoring user offsets Cc: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-5-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-28eth: link netdev to page_pools in driversJakub Kicinski
Link page pool instances to netdev for the drivers which already link to NAPI. Unless the driver is doing something very weird per-NAPI should imply per-netdev. Add netsec as well, Ilias indicates that it fits the mold. Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-15net/mlx5e: Remove early assignment to netdev->featuresTariq Toukan
The netdev->features is initialized to netdev->hw_features at a later point in the flow. Remove any redundant earlier assignment. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-23page_pool: remove PP_FLAG_PAGE_FRAGYunsheng Lin
PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count handling is unified and page_pool_alloc_frag() is supported in 32-bit arch with 64-bit DMA, so remove it. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> CC: Lorenzo Bianconi <lorenzo@kernel.org> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: Liang Chen <liangchen.linux@gmail.com> CC: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20231020095952.11055-3-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-14net/mlx5e: Preparations for supporting larger number of channelsAdham Faris
Data center server CPUs number keeps getting larger with time. Currently, our driver limits the number of channels to 128. Maximum channels number is enforced and bounded by hardcoded defines (en.h/MLX5E_MAX_NUM_CHANNELS) even though the device and machine (CPUs num) can allow more. Refactor current implementation in order to handle further channels. The maximum supported channels number will be increased in the followup patch. Introduce RQT size calculation/allocation scheme below: 1) Preserve current RQT size of 256 for channels number up to 128 (the old limit). 2) For greater channels number, RQT size is calculated by multiplying the channels number by 2 and rounding up the result to the nearest power of 2. If the calculated RQT size exceeds the maximum supported size by the NIC, fallback to this maximum RQT size (1 << log_max_rqt_size). Since RQT size is no more static, allocate and free the indirection table SW shadow dynamically. Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14net/mlx5e: Refactor rx_res_init() and rx_res_free() APIsAdham Faris
Refactor mlx5e_rx_res_init() and mlx5e_rx_res_free() by wrapping mlx5e_rx_res_alloc() and mlx5e_rx_res_destroy() API's respectively. Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-13Merge branch 'mlx5-next' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== This PR is collected from https://lore.kernel.org/all/cover.1695296682.git.leon@kernel.org This series from Patrisious extends mlx5 to support IPsec packet offload in multiport devices (MPV, see [1] for more details). These devices have single flow steering logic and two netdev interfaces, which require extra logic to manage IPsec configurations as they performed on netdevs. [1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/ * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Handle IPsec steering upon master unbind/bind net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic net/mlx5: Add create alias flow table function to ipsec roce net/mlx5: Implement alias object allow and create functions net/mlx5: Add alias flow table bits net/mlx5: Store devcom pointer inside IPsec RoCE net/mlx5: Register mlx5e priv to devcom in MPV mode RDMA/mlx5: Send events from IB driver about device affiliation state net/mlx5: Introduce ifc bits for migration in a chunk mode ==================== Link: https://lore.kernel.org/r/20231002083832.19746-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10net/mlx5e: Again mutually exclude RX-FCS and RX-port-timestampWill Mortensen
Commit 1e66220948df8 ("net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change") seems to have accidentally inverted the logic added in commit 0bc73ad46a76 ("net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp"). The impact of this is a little unclear since it seems the FCS scattered with RX-FCS is (usually?) correct regardless. Fixes: 1e66220948df8 ("net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change") Tested-by: Charlotte Tan <charlotte@extrahop.com> Reviewed-by: Charlotte Tan <charlotte@extrahop.com> Cc: Adham Faris <afaris@nvidia.com> Cc: Aya Levin <ayal@nvidia.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Moshe Shemesh <moshe@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Will Mortensen <will@extrahop.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20231006053706.514618-1-will@extrahop.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-02net/mlx5: Handle IPsec steering upon master unbind/bindPatrisious Haddad
When the master device is unbinded, make sure to clean up all of the steering rules or flow tables that were created over the master, in order to allow proper unbinding of master, and for ethernet traffic to continue to work independently. Upon bringing master device back up and attaching the slave to it, checks if the slave already has IPsec configured and if so reconfigure the rules needed to support RoCE traffic. Note that while master device is unbound, the user is unable to configure IPsec again, since they are in a kind of illegal state in which they are in MPV mode but the slave has no master. However if IPsec was configured before hand, it will continue to work for ethernet traffic while master is unbound, and would continue to work for all traffic when the master is bound back again. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/8434e88912c588affe51b34669900382a132e873.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02net/mlx5: Register mlx5e priv to devcom in MPV modePatrisious Haddad
If the device is in MPV mode, the ethernet driver would now register to events from IB driver about core devices affiliation or de-affiliation. Use the key provided in said event to connect each mlx5e priv instance to it's master counterpart, this way the ethernet driver is now aware of who is his master core device and even more, such as knowing if partner device has IPsec configured or not. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/279adfa0aa3a1957a339086f2c1739a50b8e4b68.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c fa165e194997 ("sfc: don't unregister flow_indr if it was never registered") 3bf969e88ada ("sfc: add MAE table machinery for conntrack table") https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16net/mlx5e: XDP, Fix fifo overrun on XDP_REDIRECTDragos Tatulea
Before this fix, running high rate traffic through XDP_REDIRECT with multibuf could overrun the fifo used to release the xdp frames after tx completion. This resulted in corrupted data being consumed on the free side. The culplirt was a miscalculation of the fifo size: the maximum ratio between fifo entries / data segments was incorrect. This ratio serves to calculate the max fifo size for a full sq where each packet uses the worst case number of entries in the fifo. This patch fixes the formula and names the constant. It also makes sure that future values will use a power of 2 number of entries for the fifo mask to work. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Fixes: 3f734b8c594b ("net/mlx5e: XDP, Use multiple single-entry objects in xdpi_fifo") Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/intel/igc/igc_main.c 06b412589eef ("igc: Add lock to safeguard global Qbv variables") d3750076d464 ("igc: Add TransmissionOverrun counter") drivers/net/ethernet/microsoft/mana/mana_en.c a7dfeda6fdec ("net: mana: Fix MANA VF unload when hardware is unresponsive") a9ca9f9ceff3 ("page_pool: split types and declarations from page_pool.h") 92272ec4107e ("eth: add missing xdp.h includes in drivers") net/mptcp/protocol.h 511b90e39250 ("mptcp: fix disconnect vs accept race") b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") tools/testing/selftests/net/mptcp/mptcp_join.sh c8c101ae390a ("selftests: mptcp: join: fix 'implicit EP' test") 03668c65d153 ("selftests: mptcp: join: rework detailed report") Signed-off-by: Jakub Kicinski <kuba@kernel.org>