Age | Commit message (Collapse) | Author |
|
The procedure of attaching an action template to an existing matcher had
a few issues:
1. Attaching accidentally overran the `at` array in bwc_matcher, which
would result in memory corruption. This bug wasn't triggered, but it
is possible to trigger it by attaching action templates beyond the
initial buffer size of 8. Fix this by converting to a dynamically
sized buffer and reallocating if needed.
2. Similarly, the `at` array inside the native matcher was never
reallocated. Fix this the same as above.
3. The bwc layer treated any error in action template attach as a signal
that the matcher should be rehashed to account for a larger number of
action STEs. In reality, there are other unrelated errors that can
arise and they should be propagated upstack. Fix this by adding a
`need_rehash` output parameter that's orthogonal to error codes.
Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/1744312662-356571-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since we are about to stash some more information into the pp_magic
field, let's move the magic signature checks into a pair of helper
functions so it can be changed in one place.
Reviewed-by: Mina Almasry <almasrymina@google.com>
Tested-by: Yonglong Liu <liuyonglong@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250409-page-pool-track-dma-v9-1-6a9ef2e0cba8@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With commit 8533b14b3d65 ("eth: mlx4: create a page pool for Rx") mlx4
started using functions guarded by PAGE_POOL. This change introduced
build errors when CONFIG_MLX4_EN is set but CONFIG_PAGE_POOL is not:
ld: vmlinux.o: in function `mlx4_en_alloc_frags':
en_rx.c:(.text+0xa5eaf9): undefined reference to `page_pool_alloc_pages'
ld: vmlinux.o: in function `mlx4_en_create_rx_ring':
(.text+0xa5ee91): undefined reference to `page_pool_create'
Make MLX4_EN select PAGE_POOL to fix the ml;x4 build errors.
Fixes: 8533b14b3d65 ("eth: mlx4: create a page pool for Rx")
Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250401015315.2306092-1-gthelen@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Rather tiny pull request, mostly so that we can get into our trees
your fix to the x86 Makefile.
Current release - regressions:
- Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc", error
queue accounting was missed
Current release - new code bugs:
- 5 fixes for the netdevice instance locking work
Previous releases - regressions:
- usbnet: restore usb%d name exception for local mac addresses
Previous releases - always broken:
- rtnetlink: allocate vfinfo size for VF GUIDs when supported, avoid
spurious GET_LINK failures
- eth: mana: Switch to page pool for jumbo frames
- phy: broadcom: Correct BCM5221 PHY model detection
Misc:
- selftests: drv-net: replace helpers for referring to other files"
* tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits)
Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc"
bnxt_en: bring back rtnl lock in bnxt_shutdown
eth: gve: add missing netdev locks on reset and shutdown paths
selftests: mptcp: ignore mptcp_diag binary
selftests: mptcp: close fd_in before returning in main_loop
selftests: mptcp: fix incorrect fd checks in main_loop
mptcp: fix NULL pointer in can_accept_new_subflow
octeontx2-af: Free NIX_AF_INT_VEC_GEN irq
octeontx2-af: Fix mbox INTR handler when num VFs > 64
net: fix use-after-free in the netdev_nl_sock_priv_destroy()
selftests: net: use Path helpers in ping
selftests: net: use the dummy bpf from net/lib
selftests: drv-net: replace the rpath helper with Path objects
net: lapbether: use netdev_lockdep_set_classes() helper
net: phy: broadcom: Correct BCM5221 PHY model detection
net: usb: usbnet: restore usb%d name exception for local mac addresses
net/mlx5e: SHAMPO, Make reserved size independent of page size
net: mana: Switch to page pool for jumbo frames
MAINTAINERS: Add dedicated entries for phy_link_topology
net: move replay logic to tc_modify_qdisc
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull fwctl subsystem from Jason Gunthorpe:
"fwctl is a new subsystem intended to bring some common rules and order
to the growing pattern of exposing a secure FW interface directly to
userspace.
Unlike existing places like RDMA/DRM/VFIO/uacce that are exposing a
device for datapath operations fwctl is focused on debugging,
configuration and provisioning of the device. It will not have the
necessary features like interrupt delivery to support a datapath.
This concept is similar to the long standing practice in the "HW" RAID
space of having a device specific misc device to manage the RAID
controller FW. fwctl generalizes this notion of a companion debug and
management interface that goes along with a dataplane implemented in
an appropriate subsystem.
There have been three LWN articles written discussing various aspects
of this:
https://lwn.net/Articles/955001/
https://lwn.net/Articles/969383/
https://lwn.net/Articles/990802/
This includes three drivers to launch the subsystem:
- CXL provides a vendor scheme for executing commands and a way to
learn the 'command effects' (ie the security properties) of such
commands. The fwctl driver allows access to these mechanism within
the fwctl security model
- mlx5 is family of networking products, the driver supports all
current Mellanox HW still receiving FW feature updates. This
includes RDMA multiprotocol NICs like ConnectX and the Bluefield
family of Smart NICs.
- AMD/Pensando Distributed Services card is a multi protocol Smart
NIC with a multi PCI function design. fwctl works on the management
PCI function following a 'command effects' model similar to CXL"
* tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (30 commits)
pds_fwctl: add Documentation entries
pds_fwctl: add rpc and query support
pds_fwctl: initial driver framework
pds_core: add new fwctl auxiliary_device
pds_core: specify auxiliary_device to be created
pds_core: make pdsc_auxbus_dev_del() void
cxl: Fixup kdoc issues for include/cxl/features.h
fwctl/cxl: Add documentation to FWCTL CXL
cxl/test: Add Set Feature support to cxl_test
cxl/test: Add Get Feature support to cxl_test
cxl: Add support to handle user feature commands for set feature
cxl: Add support to handle user feature commands for get feature
cxl: Add support for fwctl RPC command to enable CXL feature commands
cxl: Move cxl feature command structs to user header
cxl: Add FWCTL support to CXL
mlx5: Create an auxiliary device for fwctl_mlx5
fwctl/mlx5: Support for communicating with mlx5 fw
fwctl: Add documentation
fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
taint: Add TAINT_FWCTL
...
|
|
When hw-gro is enabled, the maximum number of header entries that are
needed per wqe (hd_per_wqe) is calculated based on the size of the
reservations among other parameters.
Miscalculation of the size of reservations leads to incorrect
calculation of hd_per_wqe as 0, particularly in the case of large page
size like in aarch64, this prevents the SHAMPO header from being
correctly initialized in the device, ultimately causing the following
cqe err that indicates a violation of PD.
mlx5_core 0000:00:08.0 eth2: ERR CQE on RQ: 0x1180
mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x510, ci 0x0, qn 0x1180, opcode 0xe, syndrome 0x4, vendor syndrome 0x32
00000000: 00 00 00 00 04 4a 00 00 00 00 00 00 20 00 93 32
00000010: 55 00 00 00 fb cc 00 00 00 00 00 00 07 18 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4a
00000030: 00 00 00 9a 93 00 32 04 00 00 00 00 00 00 da e1
Use the correct formula for calculating the size of reservations,
precisely it shouldn't be dependent on page size, instead use the
correct multiply of MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE.
Fixes: e5ca8fb08ab2 ("net/mlx5e: Add control path for SHAMPO feature")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742732906-166564-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Merge in late fixes to prepare for the 6.15 net-next PR.
No conflicts, adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
919f9f497dbc ("eth: bnxt: fix out-of-range access of vnic_info array")
fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2025-03-24
1) Prevent setting high order sequence number bits input in
non-ESN mode. From Leon Romanovsky.
2) Support PMTU handling in tunnel mode for packet offload.
From Leon Romanovsky.
3) Make xfrm_state_lookup_byaddr lockless.
From Florian Westphal.
4) Remove unnecessary NULL check in xfrm_lookup_with_ifid().
From Dan Carpenter.
* tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Remove unnecessary NULL check in xfrm_lookup_with_ifid()
xfrm: state: make xfrm_state_lookup_byaddr lockless
xfrm: check for PMTU in tunnel mode for packet offload
xfrm: provide common xdo_dev_offload_ok callback implementation
xfrm: rely on XFRM offload
xfrm: simplify SA initialization routine
xfrm: delay initialization of offload path till its actually requested
xfrm: prevent high SEQ input in non-ESN mode
====================
Link: https://patch.msgid.link/20250324061855.4116819-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For CT action with commit argument, it's usually followed by the
forward action, either to the output netdev or next chain. The default
behavior for software is to drop by setting action attribute to
TC_ACT_SHOT instead of TC_ACT_PIPE if it's the last action. But driver
can't handle it, so block the offload for such case.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In nic mode CT setup where we do hairpin between the two
nics, both nics register to the same flow table (per zone),
and try to offload all rules on it.
Instead, filter the rules that originated from the relevant nic
(so only one side is offloaded for each nic).
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Align mlx5 driver usage of 'pfnum' with the documentation clarification
introduced in commit bb70b0d48d8e ("devlink: Improve the port attributes
description").
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, mlx5_is_reset_now_capable() checks whether the pci bridge is
accessible only on bridge hot plug capability check. If the pci bridge
is not accessible, reset now will fail regardless of bridge hotplug
capability. Move this check to function mlx5_is_reset_now_capable()
which, in such case, aborts the reset and does so in the request phase
instead of the reset now phase.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As queue affinity is being deprecated and will no longer be supported
in the future, Always check for the presence of the port selection
namespace. When available, leverage it to distribute traffic
across the physical ports via steering, ensuring compatibility with
future NICs.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742392983-153050-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For simplicity reasons, the driver avoids crossing work queue fragment
boundaries within the same TX WQE (Work-Queue Element). Until today, as
the number of packets in a TX MPWQE (Multi-Packet WQE) descriptor is not
known in advance, the driver pre-prepared contiguous memory for the
largest possible WQE. For this, when getting too close to the fragment
edge, having no room for the largest WQE possible, the driver was
filling the fragment remainder with NOP descriptors, aligning the next
descriptor to the beginning of the next fragment.
Generating and handling these NOPs wastes resources, like: CPU cycles,
work-queue entries fetched to the device, and PCI bandwidth.
In this patch, we replace this NOPs filling mechanism in the TX MPWQE
flow. Instead, we utilize the remaining entries of the fragment with a
TX MPWQE. If this room turns out to be too small, we simply open an
additional descriptor starting at the beginning of the next fragment.
Performance benchmark:
uperf test, single server against 3 clients.
TCP multi-stream, bidir, traffic profile "2x350B read, 1400B write".
Bottleneck is in inbound PCI bandwidth (device POV).
+---------------+------------+------------+--------+
| | Before | After | |
+---------------+------------+------------+--------+
| BW | 117.4 Gbps | 121.1 Gbps | +3.1% |
+---------------+------------+------------+--------+
| tx_packets | 15 M/sec | 15.5 M/sec | +3.3% |
+---------------+------------+------------+--------+
| tx_nops | 3 M/sec | 0 | -100% |
+---------------+------------+------------+--------+
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1742391746-118647-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The health poll mechanism performs periodic checks to detect firmware
errors. One of the checks verifies the function is still enabled on
firmware side, but the function is enabled only after enable_hca command
completed. Start health poll after enable_hca command to avoid a race
between function enabled and first health polling.
Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742331077-102038-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When LAG creation fails, the driver reloads the RDMA devices. If RDMA
representors are present, they should also be reloaded. This step was
missed in the cited commit.
Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742331077-102038-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is a workaround to mitigate a compiler anomaly.
During LLVM toolchain compilation of this driver on s390x architecture, an
unreasonable __write_overflow_field warning occurs.
Contextually, chunk_index is restricted to 0, 1 or 2. By expanding these
possibilities, the compile warning is suppressed.
Fix follow error with clang-19 when -Werror:
In file included from drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c:5:
In file included from ./include/linux/gfp.h:7:
In file included from ./include/linux/mmzone.h:8:
In file included from ./include/linux/spinlock.h:63:
In file included from ./include/linux/lockdep.h:14:
In file included from ./include/linux/smp.h:13:
In file included from ./include/linux/cpumask.h:12:
In file included from ./include/linux/bitmap.h:13:
In file included from ./include/linux/string.h:392:
./include/linux/fortify-string.h:571:4: error: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror,-Wattribute-warning]
571 | __write_overflow_field(p_size_field, size);
| ^
1 error generated.
According to the testing, we can be fairly certain that this is a clang
compiler bug, impacting only clang-19 and below. Clang versions 20 and
21 do not exhibit this behavior.
Link: https://lore.kernel.org/all/484364B641C901CD+20250311141025.1624528-1-wangyuli@uniontech.com/
Fixes: 7585cacdb978 ("mlxsw: spectrum_acl: Add Bloom filter handling")
Co-developed-by: Zijian Chen <czj2441@163.com>
Signed-off-by: Zijian Chen <czj2441@163.com>
Co-developed-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Link: https://patch.msgid.link/A1858F1D36E653E0+20250318103654.708077-1-wangyuli@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When hardware floods packets to bridge ports, but flooding to VXLAN bridge
port fails during encapsulation to one of the remote VTEPs, the packets are
trapped to CPU. In such case, the packets are marked with
skb->offload_fwd_mark, which means that packet was L2-forwarded in
hardware. Software data path repeats flooding, but packets which are
marked with skb->offload_fwd_mark will not be flooded by the bridge to
bridge ports which are in the same hardware domain as the ingress port.
Currently, mlxsw does not add VXLAN bridge ports to the same hardware
domain as physical bridge ports despite the fact that the device is able
to forward packets to and from VXLAN tunnels in hardware. In some scenarios
(as mentioned above) this can result in remote VTEPs receiving duplicate
packets. The packets are first flooded by hardware and after an
encapsulation failure, they are flooded again to all remote VTEPs by
software.
Solve this by adding VXLAN bridge ports to the same hardware domain as
physical bridge ports, so then nbp_switchdev_allowed_egress() will return
false also for VXLAN, and packets will not be sent twice from VXLAN device.
switchdev_bridge_port_offload() should get vxlan_dev not as const, so
some changes are required. Call switchdev API from
mlxsw_sp_bridge_vxlan_{join,leave}() which handle offload configurations.
Reported-by: Vladimir Oltean <olteanv@gmail.com>
Closes: https://lore.kernel.org/all/20250210152246.4ajumdchwhvbarik@skbuf/
Reported-by: Vladyslav Mykhaliuk <vmykhaliuk@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7279056843140fae3a72c2d204c7886b79d03899.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Next patch will call __mlxsw_sp_bridge_vxlan_leave() from
mlxsw_sp_bridge_vxlan_join() as part of error flow, move the function to
be able to call the second one.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/64750a0965536530482318578bada30fac372b8a.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is asymmetry in how the VXLAN join and leave functions are used.
The join function (mlxsw_sp_bridge_vxlan_join()) is only called in
response to netdev events (e.g., VXLAN device joining a bridge), but the
leave function is also called in response to switchdev events (e.g.,
VLAN configuration on top of the VXLAN device) in order to invalidate
VNI to FID mappings.
This asymmetry will cause problems when the functions will be later
extended to mark VXLAN bridge ports as offloaded or not.
Therefore, create an internal function (__mlxsw_sp_bridge_vxlan_leave())
that is used to invalidate VNI to FID mappings and call it from
mlxsw_sp_bridge_vxlan_leave() which will only be invoked in response to
netdev events, like mlxsw_sp_bridge_vxlan_join().
No functional changes intended.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/f3a32bd2d87a0b7ac4d2bb98a427dc6d95a01cd0.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bridge
mlxsw_sp_bridge_vxlan_{join,leave}() are not called when a VXLAN device
joins or leaves a VLAN-aware bridge. As mentioned in the comment - when the
bridge is VLAN-aware, the VNI of the VXLAN device needs to be mapped to a
VLAN, but at this point no VLANs are configured on the VxLAN device. This
means that we can call the APIs, but there is no point to do that, as they
do not configure anything in such cases.
Next patch will extend mlxsw_sp_bridge_vxlan_{join,leave}() to set hardware
domain for VXLAN, this should be done also when a VXLAN device joins or
leaves a VLAN-aware bridge. Call the APIs, which for now do not do anything
in these flows.
Align the call to mlxsw_sp_bridge_vxlan_leave() to be called like
mlxsw_sp_bridge_vxlan_join(), only in case that the VXLAN device is up,
so move the check to be done before calling
mlxsw_sp_bridge_vxlan_{join,leave}(). This does not change the existing
behavior, as there is a similar check inside mlxsw_sp_bridge_vxlan_leave().
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/994c1ea93520f9ea55d1011cd47dc2180d526484.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Next patch will set the same hardware domain for all bridge ports,
including VXLAN, to prevent packets from being forwarded by software when
they were already forwarded by hardware.
ARP packets are not flooded by hardware to VXLAN, so software should handle
such flooding. When hardware domain of VXLAN device will be changed, ARP
packets which are trapped and marked with offload_fwd_mark will not be
flooded to VXLAN also in software, which will break VXLAN traffic.
To prevent such breaking, trap ARP packets at layer 2 and don't mark them
as L2-forwarded in hardware, then flooding ARP packets will be done only
in software, and VXLAN will send ARP packets.
Remove NVE_ENCAP_ARP which is no longer needed, as now ARP packets are
trapped when they enter the device.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/b2a2cc607a1f4cb96c10bd3b0b0244ba3117fd2e.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Always set PAGE_POOL_STATS in mlx5 Eth driver.
Cleanup the corresponding #ifdefs.
Page pool stats are essential to monitor and analyze RX performance.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/1742412199-159596-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use bitmap_free() to free memory allocated with bitmap_zalloc_node().
This fixes memtrack error:
mtl rsc inconsistency: memtrack_free: .../drivers/net/ethernet/mellanox/mlx5/core/en_main.c::466: kfree for unknown address=0xFFFF0000CA3619E8, device=0x0
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix coccinelle warnings:
WARNING: NULL check before dev_{put, hold} functions is not needed.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There commands can be used to add an RSS context and steer some traffic
into it:
# ethtool -X eth0 context new
New RSS context is 1
# ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1
Added rule with ID 1023
However, the second command fails with EINVAL on mlx5e:
# ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1
rmgr: Cannot insert RX class rule: Invalid argument
Cannot insert classification rule
It happens when flow_get_tirn calls flow_type_to_traffic_type with
flow_type = IP_USER_FLOW or IPV6_USER_FLOW. That function only handles
IPV4_FLOW and IPV6_FLOW cases, but unlike all other cases which are
common for hash and spec, IPv4 and IPv6 defines different contants for
hash and for spec:
#define TCP_V4_FLOW 0x01 /* hash or spec (tcp_ip4_spec) */
#define UDP_V4_FLOW 0x02 /* hash or spec (udp_ip4_spec) */
...
#define IPV4_USER_FLOW 0x0d /* spec only (usr_ip4_spec) */
#define IP_USER_FLOW IPV4_USER_FLOW
#define IPV6_USER_FLOW 0x0e /* spec only (usr_ip6_spec; nfc only) */
#define IPV4_FLOW 0x10 /* hash only */
#define IPV6_FLOW 0x11 /* hash only */
Extend the switch in flow_type_to_traffic_type to support both, which
fixes the failing ethtool -N command with flow-type ip4 or ip6.
Fixes: 248d3b4c9a39 ("net/mlx5e: Support flow classification into RSS contexts")
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250319124508.3979818-1-maxim@isovalent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Display recovery event of PPCNT recovery counters group. Counts (per
link) the number of total successful recovery events of any recovery
types during port reset cycle.
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1742112876-2890-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Retrieve the number of fields supported by each PPCNT counter group
based on the FW capability for this group.
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742112876-2890-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Adjust the way physical layer counters group is accessed to match the
generic method used for accessing other PPCNT counter groups.
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1742112876-2890-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The code was incorrectly relying on PCAM bit of ppcnt_statistical_group
for accessing per_lane_error_counters.
If ppcnt_statistical_group PCAM bit was not set, we would not read
per_lane_error_counters, even when its PCAM bit is set.
Given the existing device capabilities, it seems to cause no harm, so
this change primarily serves as cleanup.
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742112876-2890-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If a user requested to match on an unsupported combination of fields,
print the unsupported combination in the error message.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741780194-137519-4-git-send-email-tariqt@nvidia.com
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Wherever applicable, use list_move function instead of list_del + list_add.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741780194-137519-3-git-send-email-tariqt@nvidia.com
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Alias flow tables are not in use by HWS - remove the unused code.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741780194-137519-2-git-send-email-tariqt@nvidia.com
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Introduce `mlx5_esw_devlink_rate_node_parent_set()` to allow assigning
a parent to scheduling nodes.
Implement `mlx5_esw_qos_node_update_parent()` and
`mlx5_esw_qos_node_validate_set_parent()` to enforce constraints on
node reassignment.
Don't allow reassignment of nodes with active rate objects.
Update `esw_qos_node_set_parent()` to handle cases where
the parent is NULL. A NULL parent indicates that the scheduling element
is attached to the root scheduling element, and since only rate nodes
can be connected to the root, this update is now necessary.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741642016-44918-5-git-send-email-tariqt@nvidia.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Modify `esw_qos_create_node_sched_elem()` to receive max_rate and
bw_share values while maintaining the previous configuration.
This change is essential for the upcoming patch that will modify rate
nodes and requires the existing settings to be preserved unless
explicitly changed.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741642016-44918-4-git-send-email-tariqt@nvidia.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add a `level` field to `mlx5_esw_sched_node` to track the hierarchy
depth of each scheduling node. This allows enforcement of the
scheduling depth constraints based on `log_esw_max_sched_depth`.
Modify `esw_qos_node_set_parent()` and `__esw_qos_alloc_node()` to
correctly assign hierarchy levels. Ensure that nodes inherit their
parent’s level incrementally.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741642016-44918-3-git-send-email-tariqt@nvidia.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Rename `mlx5_esw_devlink_rate_parent_set()` to
`mlx5_esw_devlink_rate_leaf_parent_set()` to distinguish setting a
parent for leafs from nodes, which is not yet supported.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741642016-44918-2-git-send-email-tariqt@nvidia.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add support for HW Steering action of flow sampler destination. For each
flow sampler created cache the hws action by sampler id as a key. Hold
refcount for each rule using the cached action.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1741543663-22123-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add support for HW Steering action of flow meter range. Flow meters
range can use one HWS action for the whole range. Thus, share a cached
HWS action among rules that use same flow meter object range. Hold
refcount for each rule using the cached action.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1741543663-22123-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Counters HWS actions are shared using refcount, to create action on
demand by flow steering rule and destroy only when no rules are using
the action. The method is extensible to other HWS action types, such as
flow meter and sampler actions, in the downstream patches.
Add an API to facilitate the reuse of get/put logic for HWS actions
shared by refcount.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1741543663-22123-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.14-rc6).
Conflicts:
tools/testing/selftests/drivers/net/ping.py
75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py")
de94e8697405 ("selftests: drv-net: store addresses in dict indexed by ipver")
https://lore.kernel.org/netdev/20250311115758.17a1d414@canb.auug.org.au/
net/core/devmem.c
a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()")
1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
https://lore.kernel.org/netdev/20250313114929.43744df1@canb.auug.org.au/
Adjacent changes:
tools/testing/selftests/net/Makefile
6f50175ccad4 ("selftests: Add IPv6 link-local address generation tests for GRE devices.")
2e5584e0f913 ("selftests/net: expand cmsg_ipv6.sh with ipv4")
drivers/net/ethernet/broadcom/bnxt/bnxt.c
661958552eda ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic")
fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
mlx5_eswitch_get_vepa returns -EPERM if the device lacks
eswitch_manager capability, blocking mlx5e_bridge_getlink from
retrieving VEPA mode. Since mlx5e_bridge_getlink implements
ndo_bridge_getlink, returning -EPERM causes bridge link show to fail
instead of skipping devices without this capability.
To avoid this, return -EOPNOTSUPP from mlx5e_bridge_getlink when
mlx5_eswitch_get_vepa fails, ensuring the command continues processing
other devices while ignoring those without the necessary capability.
Fixes: 4b89251de024 ("net/mlx5: Support ndo bridge_setlink and getlink")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1741644104-97767-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When removing LAG device from bridge, NETDEV_CHANGEUPPER event is
triggered. Driver finds the lower devices (PFs) to flush all the
offloaded entries. And mlx5_lag_is_shared_fdb is checked, it returns
false if one of PF is unloaded. In such case,
mlx5_esw_bridge_lag_rep_get() and its caller return NULL, instead of
the alive PF, and the flush is skipped.
Besides, the bridge fdb entry's lastuse is updated in mlx5 bridge
event handler. But this SWITCHDEV_FDB_ADD_TO_BRIDGE event can be
ignored in this case because the upper interface for bond is deleted,
and the entry will never be aged because lastuse is never updated.
To make things worse, as the entry is alive, mlx5 bridge workqueue
keeps sending that event, which is then handled by kernel bridge
notifier. It causes the following crash when accessing the passed bond
netdev which is already destroyed.
To fix this issue, remove such checks. LAG state is already checked in
commit 15f8f168952f ("net/mlx5: Bridge, verify LAG state when adding
bond to bridge"), driver still need to skip offload if LAG becomes
invalid state after initialization.
Oops: stack segment: 0000 [#1] SMP
CPU: 3 UID: 0 PID: 23695 Comm: kworker/u40:3 Tainted: G OE 6.11.0_mlnx #1
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_bridge_wq mlx5_esw_bridge_update_work [mlx5_core]
RIP: 0010:br_switchdev_event+0x2c/0x110 [bridge]
Code: 44 00 00 48 8b 02 48 f7 00 00 02 00 00 74 69 41 54 55 53 48 83 ec 08 48 8b a8 08 01 00 00 48 85 ed 74 4a 48 83 fe 02 48 89 d3 <4c> 8b 65 00 74 23 76 49 48 83 fe 05 74 7e 48 83 fe 06 75 2f 0f b7
RSP: 0018:ffffc900092cfda0 EFLAGS: 00010297
RAX: ffff888123bfe000 RBX: ffffc900092cfe08 RCX: 00000000ffffffff
RDX: ffffc900092cfe08 RSI: 0000000000000001 RDI: ffffffffa0c585f0
RBP: 6669746f6e690a30 R08: 0000000000000000 R09: ffff888123ae92c8
R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888123ae9c60
R13: 0000000000000001 R14: ffffc900092cfe08 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f15914c8734 CR3: 0000000002830005 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __die_body+0x1a/0x60
? die+0x38/0x60
? do_trap+0x10b/0x120
? do_error_trap+0x64/0xa0
? exc_stack_segment+0x33/0x50
? asm_exc_stack_segment+0x22/0x30
? br_switchdev_event+0x2c/0x110 [bridge]
? sched_balance_newidle.isra.149+0x248/0x390
notifier_call_chain+0x4b/0xa0
atomic_notifier_call_chain+0x16/0x20
mlx5_esw_bridge_update+0xec/0x170 [mlx5_core]
mlx5_esw_bridge_update_work+0x19/0x40 [mlx5_core]
process_scheduled_works+0x81/0x390
worker_thread+0x106/0x250
? bh_worker+0x110/0x110
kthread+0xb7/0xe0
? kthread_park+0x80/0x80
ret_from_fork+0x2d/0x50
? kthread_park+0x80/0x80
ret_from_fork_asm+0x11/0x20
</TASK>
Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1741644104-97767-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, MultiPort E-Switch is requesting to create a LAG with shared
FDB without checking the LAG is supporting shared FDB.
Add the check.
Fixes: a32327a3a02c ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1741644104-97767-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
mlx5_irq_pool_get() is a getter for completion IRQ pool only.
However, after the cited commit, mlx5_irq_pool_get() is called during
ctrl IRQ release flow to retrieve the pool, resulting in the use of an
incorrect IRQ pool.
Hence, use the newly introduced mlx5_irq_get_pool() getter to retrieve
the correct IRQ pool based on the IRQ itself. While at it, rename
mlx5_irq_pool_get() to mlx5_irq_table_get_comp_irq_pool() which
accurately reflects its purpose and improves code readability.
Fixes: 0477d5168bbb ("net/mlx5: Expose SFs IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1741644104-97767-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The bwc layer was clamping the matcher priority from 32 bits to 16 bits.
This didn't show up until a matcher was resized, since the initial
native matcher was created using the correct 32 bit value.
The fix also reorders fields to avoid some padding.
Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741644104-97767-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Some actions in ConnectX-8 (STEv3) have different structure,
and they are handled separately in ste_ctx_v3.
This separate handling was missing two actions: INSERT_HDR
and REMOVE_HDR, which broke SWS for Linux Bridge.
This patch resolves the issue by introducing dedicated
callbacks for the insert and remove header functions,
with version-specific implementations for each STE variant.
Fixes: 4d617b57574f ("net/mlx5: DR, add support for ConnectX-8 steering")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741644104-97767-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:
====================
mlx5-next updates 2025-03-10
The following pull-request contains common mlx5 updates for your *net-next* tree.
Please pull and let me know of any problem.
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Add IFC bits for PPCNT recovery counters group
net/mlx5: fs, add RDMA TRANSPORT steering domain support
net/mlx5: Query ADV_RDMA capabilities
net/mlx5: Limit non-privileged commands
net/mlx5: Allow the throttle mechanism to be more dynamic
net/mlx5: Add RDMA_CTRL HW capabilities
====================
Link: https://patch.msgid.link/1741608293-41436-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Although it does not seem to have any untoward side-effects,
the use of ';' to separate to assignments seems more appropriate than ','.
Flagged by clang-19 -Wcomma
No functional change intended.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250307-mlx5-comma-v1-1-934deb6927bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|