linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-04-14	net/mlx5: HWS, Fix matcher action template attach	Vlad Dogaru
	The procedure of attaching an action template to an existing matcher had a few issues: 1. Attaching accidentally overran the `at` array in bwc_matcher, which would result in memory corruption. This bug wasn't triggered, but it is possible to trigger it by attaching action templates beyond the initial buffer size of 8. Fix this by converting to a dynamically sized buffer and reallocating if needed. 2. Similarly, the `at` array inside the native matcher was never reallocated. Fix this the same as above. 3. The bwc layer treated any error in action template attach as a signal that the matcher should be rehashed to account for a larger number of action STEs. In reality, there are other unrelated errors that can arise and they should be propagated upstack. Fix this by adding a `need_rehash` output parameter that's orthogonal to error codes. Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling") Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1744312662-356571-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14	page_pool: Move pp_magic check into helper functions	Toke Høiland-Jørgensen
	Since we are about to stash some more information into the pp_magic field, let's move the magic signature checks into a pair of helper functions so it can be changed in one place. Reviewed-by: Mina Almasry <almasrymina@google.com> Tested-by: Yonglong Liu <liuyonglong@huawei.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250409-page-pool-track-dma-v9-1-6a9ef2e0cba8@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-05	treewide: Switch/rename to timer_delete[_sync]()	Thomas Gleixner
	timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-02	eth: mlx4: select PAGE_POOL	Greg Thelen
	With commit 8533b14b3d65 ("eth: mlx4: create a page pool for Rx") mlx4 started using functions guarded by PAGE_POOL. This change introduced build errors when CONFIG_MLX4_EN is set but CONFIG_PAGE_POOL is not: ld: vmlinux.o: in function `mlx4_en_alloc_frags': en_rx.c:(.text+0xa5eaf9): undefined reference to `page_pool_alloc_pages' ld: vmlinux.o: in function `mlx4_en_create_rx_ring': (.text+0xa5ee91): undefined reference to `page_pool_create' Make MLX4_EN select PAGE_POOL to fix the ml;x4 build errors. Fixes: 8533b14b3d65 ("eth: mlx4: create a page pool for Rx") Signed-off-by: Greg Thelen <gthelen@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250401015315.2306092-1-gthelen@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-01	Merge tag 'net-6.15-rc0' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Rather tiny pull request, mostly so that we can get into our trees your fix to the x86 Makefile. Current release - regressions: - Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc", error queue accounting was missed Current release - new code bugs: - 5 fixes for the netdevice instance locking work Previous releases - regressions: - usbnet: restore usb%d name exception for local mac addresses Previous releases - always broken: - rtnetlink: allocate vfinfo size for VF GUIDs when supported, avoid spurious GET_LINK failures - eth: mana: Switch to page pool for jumbo frames - phy: broadcom: Correct BCM5221 PHY model detection Misc: - selftests: drv-net: replace helpers for referring to other files" * tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits) Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc" bnxt_en: bring back rtnl lock in bnxt_shutdown eth: gve: add missing netdev locks on reset and shutdown paths selftests: mptcp: ignore mptcp_diag binary selftests: mptcp: close fd_in before returning in main_loop selftests: mptcp: fix incorrect fd checks in main_loop mptcp: fix NULL pointer in can_accept_new_subflow octeontx2-af: Free NIX_AF_INT_VEC_GEN irq octeontx2-af: Fix mbox INTR handler when num VFs > 64 net: fix use-after-free in the netdev_nl_sock_priv_destroy() selftests: net: use Path helpers in ping selftests: net: use the dummy bpf from net/lib selftests: drv-net: replace the rpath helper with Path objects net: lapbether: use netdev_lockdep_set_classes() helper net: phy: broadcom: Correct BCM5221 PHY model detection net: usb: usbnet: restore usb%d name exception for local mac addresses net/mlx5e: SHAMPO, Make reserved size independent of page size net: mana: Switch to page pool for jumbo frames MAINTAINERS: Add dedicated entries for phy_link_topology net: move replay logic to tc_modify_qdisc ...
2025-03-29	Merge tag 'for-linus-fwctl' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull fwctl subsystem from Jason Gunthorpe: "fwctl is a new subsystem intended to bring some common rules and order to the growing pattern of exposing a secure FW interface directly to userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are exposing a device for datapath operations fwctl is focused on debugging, configuration and provisioning of the device. It will not have the necessary features like interrupt delivery to support a datapath. This concept is similar to the long standing practice in the "HW" RAID space of having a device specific misc device to manage the RAID controller FW. fwctl generalizes this notion of a companion debug and management interface that goes along with a dataplane implemented in an appropriate subsystem. There have been three LWN articles written discussing various aspects of this: https://lwn.net/Articles/955001/ https://lwn.net/Articles/969383/ https://lwn.net/Articles/990802/ This includes three drivers to launch the subsystem: - CXL provides a vendor scheme for executing commands and a way to learn the 'command effects' (ie the security properties) of such commands. The fwctl driver allows access to these mechanism within the fwctl security model - mlx5 is family of networking products, the driver supports all current Mellanox HW still receiving FW feature updates. This includes RDMA multiprotocol NICs like ConnectX and the Bluefield family of Smart NICs. - AMD/Pensando Distributed Services card is a multi protocol Smart NIC with a multi PCI function design. fwctl works on the management PCI function following a 'command effects' model similar to CXL" * tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (30 commits) pds_fwctl: add Documentation entries pds_fwctl: add rpc and query support pds_fwctl: initial driver framework pds_core: add new fwctl auxiliary_device pds_core: specify auxiliary_device to be created pds_core: make pdsc_auxbus_dev_del() void cxl: Fixup kdoc issues for include/cxl/features.h fwctl/cxl: Add documentation to FWCTL CXL cxl/test: Add Set Feature support to cxl_test cxl/test: Add Get Feature support to cxl_test cxl: Add support to handle user feature commands for set feature cxl: Add support to handle user feature commands for get feature cxl: Add support for fwctl RPC command to enable CXL feature commands cxl: Move cxl feature command structs to user header cxl: Add FWCTL support to CXL mlx5: Create an auxiliary device for fwctl_mlx5 fwctl/mlx5: Support for communicating with mlx5 fw fwctl: Add documentation fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware taint: Add TAINT_FWCTL ...
2025-03-28	net/mlx5e: SHAMPO, Make reserved size independent of page size	Lama Kayal
	When hw-gro is enabled, the maximum number of header entries that are needed per wqe (hd_per_wqe) is calculated based on the size of the reservations among other parameters. Miscalculation of the size of reservations leads to incorrect calculation of hd_per_wqe as 0, particularly in the case of large page size like in aarch64, this prevents the SHAMPO header from being correctly initialized in the device, ultimately causing the following cqe err that indicates a violation of PD. mlx5_core 0000:00:08.0 eth2: ERR CQE on RQ: 0x1180 mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x510, ci 0x0, qn 0x1180, opcode 0xe, syndrome 0x4, vendor syndrome 0x32 00000000: 00 00 00 00 04 4a 00 00 00 00 00 00 20 00 93 32 00000010: 55 00 00 00 fb cc 00 00 00 00 00 00 07 18 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4a 00000030: 00 00 00 9a 93 00 32 04 00 00 00 00 00 00 da e1 Use the correct formula for calculating the size of reservations, precisely it shouldn't be dependent on page size, instead use the correct multiply of MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE. Fixes: e5ca8fb08ab2 ("net/mlx5e: Add control path for SHAMPO feature") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742732906-166564-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-26	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Merge in late fixes to prepare for the 6.15 net-next PR. No conflicts, adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c 919f9f497dbc ("eth: bnxt: fix out-of-range access of vnic_info array") fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	Merge tag 'ipsec-next-2025-03-24' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2025-03-24 1) Prevent setting high order sequence number bits input in non-ESN mode. From Leon Romanovsky. 2) Support PMTU handling in tunnel mode for packet offload. From Leon Romanovsky. 3) Make xfrm_state_lookup_byaddr lockless. From Florian Westphal. 4) Remove unnecessary NULL check in xfrm_lookup_with_ifid(). From Dan Carpenter. * tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Remove unnecessary NULL check in xfrm_lookup_with_ifid() xfrm: state: make xfrm_state_lookup_byaddr lockless xfrm: check for PMTU in tunnel mode for packet offload xfrm: provide common xdo_dev_offload_ok callback implementation xfrm: rely on XFRM offload xfrm: simplify SA initialization routine xfrm: delay initialization of offload path till its actually requested xfrm: prevent high SEQ input in non-ESN mode ==================== Link: https://patch.msgid.link/20250324061855.4116819-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5e: TC, Don't offload CT commit if it's the last action	Jianbo Liu
	For CT action with commit argument, it's usually followed by the forward action, either to the output netdev or next chain. The default behavior for software is to drop by setting action attribute to TC_ACT_SHOT instead of TC_ACT_PIPE if it's the last action. But driver can't handle it, so block the offload for such case. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5e: CT: Filter legacy rules that are unrelated to nic	Paul Blakey
	In nic mode CT setup where we do hairpin between the two nics, both nics register to the same flow table (per zone), and try to offload all rules on it. Instead, filter the rules that originated from the relevant nic (so only one side is offloaded for each nic). Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5: Update pfnum retrieval for devlink port attributes	Shay Drory
	Align mlx5 driver usage of 'pfnum' with the documentation clarification introduced in commit bb70b0d48d8e ("devlink: Improve the port attributes description"). Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5: fw reset, check bridge accessibility at earlier stage	Amir Tzin
	Currently, mlx5_is_reset_now_capable() checks whether the pci bridge is accessible only on bridge hot plug capability check. If the pci bridge is not accessible, reset now will fail regardless of bridge hotplug capability. Move this check to function mlx5_is_reset_now_capable() which, in such case, aborts the reset and does so in the request phase instead of the reset now phase. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Amir Tzin <amirtz@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5: Lag, use port selection tables when available	Mark Bloch
	As queue affinity is being deprecated and will no longer be supported in the future, Always check for the presence of the port selection namespace. When available, leverage it to distribute traffic across the physical ports via steering, ensuring compatibility with future NICs. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5e: TX, Utilize WQ fragments edge for multi-packet WQEs	Tariq Toukan
	For simplicity reasons, the driver avoids crossing work queue fragment boundaries within the same TX WQE (Work-Queue Element). Until today, as the number of packets in a TX MPWQE (Multi-Packet WQE) descriptor is not known in advance, the driver pre-prepared contiguous memory for the largest possible WQE. For this, when getting too close to the fragment edge, having no room for the largest WQE possible, the driver was filling the fragment remainder with NOP descriptors, aligning the next descriptor to the beginning of the next fragment. Generating and handling these NOPs wastes resources, like: CPU cycles, work-queue entries fetched to the device, and PCI bandwidth. In this patch, we replace this NOPs filling mechanism in the TX MPWQE flow. Instead, we utilize the remaining entries of the fragment with a TX MPWQE. If this room turns out to be too small, we simply open an additional descriptor starting at the beginning of the next fragment. Performance benchmark: uperf test, single server against 3 clients. TCP multi-stream, bidir, traffic profile "2x350B read, 1400B write". Bottleneck is in inbound PCI bandwidth (device POV). +---------------+------------+------------+--------+ \| \| Before \| After \| \| +---------------+------------+------------+--------+ \| BW \| 117.4 Gbps \| 121.1 Gbps \| +3.1% \| +---------------+------------+------------+--------+ \| tx_packets \| 15 M/sec \| 15.5 M/sec \| +3.3% \| +---------------+------------+------------+--------+ \| tx_nops \| 3 M/sec \| 0 \| -100% \| +---------------+------------+------------+--------+ Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1742391746-118647-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	net/mlx5: Start health poll after enable hca	Moshe Shemesh
	The health poll mechanism performs periodic checks to detect firmware errors. One of the checks verifies the function is still enabled on firmware side, but the function is enabled only after enable_hca command completed. Start health poll after enable_hca command to avoid a race between function enabled and first health polling. Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742331077-102038-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	net/mlx5: LAG, reload representors on LAG creation failure	Mark Bloch
	When LAG creation fails, the driver reloads the RDMA devices. If RDMA representors are present, they should also be reloaded. This step was missed in the cited commit. Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742331077-102038-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	mlxsw: spectrum_acl_bloom_filter: Workaround for some LLVM versions	WangYuli
	This is a workaround to mitigate a compiler anomaly. During LLVM toolchain compilation of this driver on s390x architecture, an unreasonable __write_overflow_field warning occurs. Contextually, chunk_index is restricted to 0, 1 or 2. By expanding these possibilities, the compile warning is suppressed. Fix follow error with clang-19 when -Werror: In file included from drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c:5: In file included from ./include/linux/gfp.h:7: In file included from ./include/linux/mmzone.h:8: In file included from ./include/linux/spinlock.h:63: In file included from ./include/linux/lockdep.h:14: In file included from ./include/linux/smp.h:13: In file included from ./include/linux/cpumask.h:12: In file included from ./include/linux/bitmap.h:13: In file included from ./include/linux/string.h:392: ./include/linux/fortify-string.h:571:4: error: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror,-Wattribute-warning] 571 \| __write_overflow_field(p_size_field, size); \| ^ 1 error generated. According to the testing, we can be fairly certain that this is a clang compiler bug, impacting only clang-19 and below. Clang versions 20 and 21 do not exhibit this behavior. Link: https://lore.kernel.org/all/484364B641C901CD+20250311141025.1624528-1-wangyuli@uniontech.com/ Fixes: 7585cacdb978 ("mlxsw: spectrum_acl: Add Bloom filter handling") Co-developed-by: Zijian Chen <czj2441@163.com> Signed-off-by: Zijian Chen <czj2441@163.com> Co-developed-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Suggested-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Tested-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Link: https://patch.msgid.link/A1858F1D36E653E0+20250318103654.708077-1-wangyuli@uniontech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	mlxsw: Add VXLAN bridge ports to same hardware domain as physical bridge ports	Amit Cohen
	When hardware floods packets to bridge ports, but flooding to VXLAN bridge port fails during encapsulation to one of the remote VTEPs, the packets are trapped to CPU. In such case, the packets are marked with skb->offload_fwd_mark, which means that packet was L2-forwarded in hardware. Software data path repeats flooding, but packets which are marked with skb->offload_fwd_mark will not be flooded by the bridge to bridge ports which are in the same hardware domain as the ingress port. Currently, mlxsw does not add VXLAN bridge ports to the same hardware domain as physical bridge ports despite the fact that the device is able to forward packets to and from VXLAN tunnels in hardware. In some scenarios (as mentioned above) this can result in remote VTEPs receiving duplicate packets. The packets are first flooded by hardware and after an encapsulation failure, they are flooded again to all remote VTEPs by software. Solve this by adding VXLAN bridge ports to the same hardware domain as physical bridge ports, so then nbp_switchdev_allowed_egress() will return false also for VXLAN, and packets will not be sent twice from VXLAN device. switchdev_bridge_port_offload() should get vxlan_dev not as const, so some changes are required. Call switchdev API from mlxsw_sp_bridge_vxlan_{join,leave}() which handle offload configurations. Reported-by: Vladimir Oltean <olteanv@gmail.com> Closes: https://lore.kernel.org/all/20250210152246.4ajumdchwhvbarik@skbuf/ Reported-by: Vladyslav Mykhaliuk <vmykhaliuk@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/7279056843140fae3a72c2d204c7886b79d03899.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	mlxsw: spectrum_switchdev: Move mlxsw_sp_bridge_vxlan_join()	Amit Cohen
	Next patch will call __mlxsw_sp_bridge_vxlan_leave() from mlxsw_sp_bridge_vxlan_join() as part of error flow, move the function to be able to call the second one. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/64750a0965536530482318578bada30fac372b8a.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	mlxsw: spectrum_switchdev: Add an internal API for VXLAN leave	Amit Cohen
	There is asymmetry in how the VXLAN join and leave functions are used. The join function (mlxsw_sp_bridge_vxlan_join()) is only called in response to netdev events (e.g., VXLAN device joining a bridge), but the leave function is also called in response to switchdev events (e.g., VLAN configuration on top of the VXLAN device) in order to invalidate VNI to FID mappings. This asymmetry will cause problems when the functions will be later extended to mark VXLAN bridge ports as offloaded or not. Therefore, create an internal function (__mlxsw_sp_bridge_vxlan_leave()) that is used to invalidate VNI to FID mappings and call it from mlxsw_sp_bridge_vxlan_leave() which will only be invoked in response to netdev events, like mlxsw_sp_bridge_vxlan_join(). No functional changes intended. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/f3a32bd2d87a0b7ac4d2bb98a427dc6d95a01cd0.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	mlxsw: spectrum: Call mlxsw_sp_bridge_vxlan_{join, leave}() for VLAN-aware ↵	Amit Cohen
	bridge mlxsw_sp_bridge_vxlan_{join,leave}() are not called when a VXLAN device joins or leaves a VLAN-aware bridge. As mentioned in the comment - when the bridge is VLAN-aware, the VNI of the VXLAN device needs to be mapped to a VLAN, but at this point no VLANs are configured on the VxLAN device. This means that we can call the APIs, but there is no point to do that, as they do not configure anything in such cases. Next patch will extend mlxsw_sp_bridge_vxlan_{join,leave}() to set hardware domain for VXLAN, this should be done also when a VXLAN device joins or leaves a VLAN-aware bridge. Call the APIs, which for now do not do anything in these flows. Align the call to mlxsw_sp_bridge_vxlan_leave() to be called like mlxsw_sp_bridge_vxlan_join(), only in case that the VXLAN device is up, so move the check to be done before calling mlxsw_sp_bridge_vxlan_{join,leave}(). This does not change the existing behavior, as there is a similar check inside mlxsw_sp_bridge_vxlan_leave(). Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/994c1ea93520f9ea55d1011cd47dc2180d526484.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	mlxsw: Trap ARP packets at layer 2 instead of layer 3	Amit Cohen
	Next patch will set the same hardware domain for all bridge ports, including VXLAN, to prevent packets from being forwarded by software when they were already forwarded by hardware. ARP packets are not flooded by hardware to VXLAN, so software should handle such flooding. When hardware domain of VXLAN device will be changed, ARP packets which are trapped and marked with offload_fwd_mark will not be flooded to VXLAN also in software, which will break VXLAN traffic. To prevent such breaking, trap ARP packets at layer 2 and don't mark them as L2-forwarded in hardware, then flooding ARP packets will be done only in software, and VXLAN will send ARP packets. Remove NVE_ENCAP_ARP which is no longer needed, as now ARP packets are trapped when they enter the device. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/b2a2cc607a1f4cb96c10bd3b0b0244ba3117fd2e.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	net/mlx5e: Always select CONFIG_PAGE_POOL_STATS	Tariq Toukan
	Always set PAGE_POOL_STATS in mlx5 Eth driver. Cleanup the corresponding #ifdefs. Page pool stats are essential to monitor and analyze RX performance. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/1742412199-159596-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	net/mlx5e: Use right API to free bitmap memory	Mark Zhang
	Use bitmap_free() to free memory allocated with bitmap_zalloc_node(). This fixes memtrack error: mtl rsc inconsistency: memtrack_free: .../drivers/net/ethernet/mellanox/mlx5/core/en_main.c::466: kfree for unknown address=0xFFFF0000CA3619E8, device=0x0 Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742412199-159596-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	net/mlx5: Remove NULL check before dev_{put, hold}	Gal Pressman
	Fix coccinelle warnings: WARNING: NULL check before dev_{put, hold} functions is not needed. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742412199-159596-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24	net/mlx5e: Fix ethtool -N flow-type ip4 to RSS context	Maxim Mikityanskiy
	There commands can be used to add an RSS context and steer some traffic into it: # ethtool -X eth0 context new New RSS context is 1 # ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1 Added rule with ID 1023 However, the second command fails with EINVAL on mlx5e: # ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1 rmgr: Cannot insert RX class rule: Invalid argument Cannot insert classification rule It happens when flow_get_tirn calls flow_type_to_traffic_type with flow_type = IP_USER_FLOW or IPV6_USER_FLOW. That function only handles IPV4_FLOW and IPV6_FLOW cases, but unlike all other cases which are common for hash and spec, IPv4 and IPv6 defines different contants for hash and for spec: #define TCP_V4_FLOW 0x01 /* hash or spec (tcp_ip4_spec) / #define UDP_V4_FLOW 0x02 / hash or spec (udp_ip4_spec) / ... #define IPV4_USER_FLOW 0x0d / spec only (usr_ip4_spec) / #define IP_USER_FLOW IPV4_USER_FLOW #define IPV6_USER_FLOW 0x0e / spec only (usr_ip6_spec; nfc only) / #define IPV4_FLOW 0x10 / hash only / #define IPV6_FLOW 0x11 / hash only */ Extend the switch in flow_type_to_traffic_type to support both, which fixes the failing ethtool -N command with flow-type ip4 or ip6. Fixes: 248d3b4c9a39 ("net/mlx5e: Support flow classification into RSS contexts") Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Tested-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250319124508.3979818-1-maxim@isovalent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-21	net/mlx5e: Expose port reset cycle recovery counter via ethtool	Yael Chemla
	Display recovery event of PPCNT recovery counters group. Counts (per link) the number of total successful recovery events of any recovery types during port reset cycle. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1742112876-2890-5-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net/mlx5e: Get counter group size by FW capability	Yael Chemla
	Retrieve the number of fields supported by each PPCNT counter group based on the FW capability for this group. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742112876-2890-4-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net/mlx5e: Access PHY layer counter group as other counter groups	Yael Chemla
	Adjust the way physical layer counters group is accessed to match the generic method used for accessing other PPCNT counter groups. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1742112876-2890-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net/mlx5e: Ensure each counter group uses its PCAM bit	Yael Chemla
	The code was incorrectly relying on PCAM bit of ppcnt_statistical_group for accessing per_lane_error_counters. If ppcnt_statistical_group PCAM bit was not set, we would not read per_lane_error_counters, even when its PCAM bit is set. Given the existing device capabilities, it seems to cause no harm, so this change primarily serves as cleanup. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742112876-2890-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19	net/mlx5: HWS, log the unsupported mask in definer	Yevgeny Kliteynik
	If a user requested to match on an unsupported combination of fields, print the unsupported combination in the error message. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741780194-137519-4-git-send-email-tariqt@nvidia.com Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19	net/mlx5: HWS, use list_move() instead of del/add	Yevgeny Kliteynik
	Wherever applicable, use list_move function instead of list_del + list_add. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741780194-137519-3-git-send-email-tariqt@nvidia.com Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19	net/mlx5: HWS, remove unused code for alias flow tables	Yevgeny Kliteynik
	Alias flow tables are not in use by HWS - remove the unused code. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741780194-137519-2-git-send-email-tariqt@nvidia.com Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18	net/mlx5: Add support for setting parent of nodes	Carolina Jubran
	Introduce `mlx5_esw_devlink_rate_node_parent_set()` to allow assigning a parent to scheduling nodes. Implement `mlx5_esw_qos_node_update_parent()` and `mlx5_esw_qos_node_validate_set_parent()` to enforce constraints on node reassignment. Don't allow reassignment of nodes with active rate objects. Update `esw_qos_node_set_parent()` to handle cases where the parent is NULL. A NULL parent indicates that the scheduling element is attached to the root scheduling element, and since only rate nodes can be connected to the root, this update is now necessary. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741642016-44918-5-git-send-email-tariqt@nvidia.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18	net/mlx5: Preserve rate settings when creating a rate node	Carolina Jubran
	Modify `esw_qos_create_node_sched_elem()` to receive max_rate and bw_share values while maintaining the previous configuration. This change is essential for the upcoming patch that will modify rate nodes and requires the existing settings to be preserved unless explicitly changed. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741642016-44918-4-git-send-email-tariqt@nvidia.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18	net/mlx5: Introduce hierarchy level tracking on scheduling nodes	Carolina Jubran
	Add a `level` field to `mlx5_esw_sched_node` to track the hierarchy depth of each scheduling node. This allows enforcement of the scheduling depth constraints based on `log_esw_max_sched_depth`. Modify `esw_qos_node_set_parent()` and `__esw_qos_alloc_node()` to correctly assign hierarchy levels. Ensure that nodes inherit their parent’s level incrementally. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741642016-44918-3-git-send-email-tariqt@nvidia.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18	net/mlx5: Rename devlink rate parent set function for leaf nodes	Carolina Jubran
	Rename `mlx5_esw_devlink_rate_parent_set()` to `mlx5_esw_devlink_rate_leaf_parent_set()` to distinguish setting a parent for leafs from nodes, which is not yet supported. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741642016-44918-2-git-send-email-tariqt@nvidia.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-17	net/mlx5: fs, add support for dest flow sampler HWS action	Moshe Shemesh
	Add support for HW Steering action of flow sampler destination. For each flow sampler created cache the hws action by sampler id as a key. Hold refcount for each rule using the cached action. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1741543663-22123-4-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-17	net/mlx5: fs, add support for flow meters HWS action	Moshe Shemesh
	Add support for HW Steering action of flow meter range. Flow meters range can use one HWS action for the whole range. Thus, share a cached HWS action among rules that use same flow meter object range. Hold refcount for each rule using the cached action. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1741543663-22123-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-17	net/mlx5: fs, add API for sharing HWS action by refcount	Moshe Shemesh
	Counters HWS actions are shared using refcount, to create action on demand by flow steering rule and destroy only when no rules are using the action. The method is extensible to other HWS action types, such as flow meter and sampler actions, in the downstream patches. Add an API to facilitate the reuse of get/put logic for HWS actions shared by refcount. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1741543663-22123-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Paolo Abeni
	Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: tools/testing/selftests/drivers/net/ping.py 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py") de94e8697405 ("selftests: drv-net: store addresses in dict indexed by ipver") https://lore.kernel.org/netdev/20250311115758.17a1d414@canb.auug.org.au/ net/core/devmem.c a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()") 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations") https://lore.kernel.org/netdev/20250313114929.43744df1@canb.auug.org.au/ Adjacent changes: tools/testing/selftests/net/Makefile 6f50175ccad4 ("selftests: Add IPv6 link-local address generation tests for GRE devices.") 2e5584e0f913 ("selftests/net: expand cmsg_ipv6.sh with ipv4") drivers/net/ethernet/broadcom/bnxt/bnxt.c 661958552eda ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic") fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5e: Prevent bridge link show failure for non-eswitch-allowed devices	Carolina Jubran
	mlx5_eswitch_get_vepa returns -EPERM if the device lacks eswitch_manager capability, blocking mlx5e_bridge_getlink from retrieving VEPA mode. Since mlx5e_bridge_getlink implements ndo_bridge_getlink, returning -EPERM causes bridge link show to fail instead of skipping devices without this capability. To avoid this, return -EOPNOTSUPP from mlx5e_bridge_getlink when mlx5_eswitch_get_vepa fails, ensuring the command continues processing other devices while ignoring those without the necessary capability. Fixes: 4b89251de024 ("net/mlx5: Support ndo bridge_setlink and getlink") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-7-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5: Bridge, fix the crash caused by LAG state check	Jianbo Liu
	When removing LAG device from bridge, NETDEV_CHANGEUPPER event is triggered. Driver finds the lower devices (PFs) to flush all the offloaded entries. And mlx5_lag_is_shared_fdb is checked, it returns false if one of PF is unloaded. In such case, mlx5_esw_bridge_lag_rep_get() and its caller return NULL, instead of the alive PF, and the flush is skipped. Besides, the bridge fdb entry's lastuse is updated in mlx5 bridge event handler. But this SWITCHDEV_FDB_ADD_TO_BRIDGE event can be ignored in this case because the upper interface for bond is deleted, and the entry will never be aged because lastuse is never updated. To make things worse, as the entry is alive, mlx5 bridge workqueue keeps sending that event, which is then handled by kernel bridge notifier. It causes the following crash when accessing the passed bond netdev which is already destroyed. To fix this issue, remove such checks. LAG state is already checked in commit 15f8f168952f ("net/mlx5: Bridge, verify LAG state when adding bond to bridge"), driver still need to skip offload if LAG becomes invalid state after initialization. Oops: stack segment: 0000 [#1] SMP CPU: 3 UID: 0 PID: 23695 Comm: kworker/u40:3 Tainted: G OE 6.11.0_mlnx #1 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_bridge_wq mlx5_esw_bridge_update_work [mlx5_core] RIP: 0010:br_switchdev_event+0x2c/0x110 [bridge] Code: 44 00 00 48 8b 02 48 f7 00 00 02 00 00 74 69 41 54 55 53 48 83 ec 08 48 8b a8 08 01 00 00 48 85 ed 74 4a 48 83 fe 02 48 89 d3 <4c> 8b 65 00 74 23 76 49 48 83 fe 05 74 7e 48 83 fe 06 75 2f 0f b7 RSP: 0018:ffffc900092cfda0 EFLAGS: 00010297 RAX: ffff888123bfe000 RBX: ffffc900092cfe08 RCX: 00000000ffffffff RDX: ffffc900092cfe08 RSI: 0000000000000001 RDI: ffffffffa0c585f0 RBP: 6669746f6e690a30 R08: 0000000000000000 R09: ffff888123ae92c8 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888123ae9c60 R13: 0000000000000001 R14: ffffc900092cfe08 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f15914c8734 CR3: 0000000002830005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x1a/0x60 ? die+0x38/0x60 ? do_trap+0x10b/0x120 ? do_error_trap+0x64/0xa0 ? exc_stack_segment+0x33/0x50 ? asm_exc_stack_segment+0x22/0x30 ? br_switchdev_event+0x2c/0x110 [bridge] ? sched_balance_newidle.isra.149+0x248/0x390 notifier_call_chain+0x4b/0xa0 atomic_notifier_call_chain+0x16/0x20 mlx5_esw_bridge_update+0xec/0x170 [mlx5_core] mlx5_esw_bridge_update_work+0x19/0x40 [mlx5_core] process_scheduled_works+0x81/0x390 worker_thread+0x106/0x250 ? bh_worker+0x110/0x110 kthread+0xb7/0xe0 ? kthread_park+0x80/0x80 ret_from_fork+0x2d/0x50 ? kthread_park+0x80/0x80 ret_from_fork_asm+0x11/0x20 </TASK> Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-6-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5: Lag, Check shared fdb before creating MultiPort E-Switch	Shay Drory
	Currently, MultiPort E-Switch is requesting to create a LAG with shared FDB without checking the LAG is supporting shared FDB. Add the check. Fixes: a32327a3a02c ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-5-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5: Fix incorrect IRQ pool usage when releasing IRQs	Shay Drory
	mlx5_irq_pool_get() is a getter for completion IRQ pool only. However, after the cited commit, mlx5_irq_pool_get() is called during ctrl IRQ release flow to retrieve the pool, resulting in the use of an incorrect IRQ pool. Hence, use the newly introduced mlx5_irq_get_pool() getter to retrieve the correct IRQ pool based on the IRQ itself. While at it, rename mlx5_irq_pool_get() to mlx5_irq_table_get_comp_irq_pool() which accurately reflects its purpose and improves code readability. Fixes: 0477d5168bbb ("net/mlx5: Expose SFs IRQs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-4-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5: HWS, Rightsize bwc matcher priority	Vlad Dogaru
	The bwc layer was clamping the matcher priority from 32 bits to 16 bits. This didn't show up until a matcher was resized, since the initial native matcher was created using the correct 32 bit value. The fix also reorders fields to avoid some padding. Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling") Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741644104-97767-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5: DR, use the right action structs for STEv3	Yevgeny Kliteynik
	Some actions in ConnectX-8 (STEv3) have different structure, and they are handled separately in ste_ctx_v3. This separate handling was missing two actions: INSERT_HDR and REMOVE_HDR, which broke SWS for Linux Bridge. This patch resolves the issue by introducing dedicated callbacks for the insert and remove header functions, with version-specific implementations for each STE variant. Fixes: 4d617b57574f ("net/mlx5: DR, add support for ConnectX-8 steering") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741644104-97767-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	Merge branch 'mlx5-next' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2025-03-10 The following pull-request contains common mlx5 updates for your net-next tree. Please pull and let me know of any problem. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Add IFC bits for PPCNT recovery counters group net/mlx5: fs, add RDMA TRANSPORT steering domain support net/mlx5: Query ADV_RDMA capabilities net/mlx5: Limit non-privileged commands net/mlx5: Allow the throttle mechanism to be more dynamic net/mlx5: Add RDMA_CTRL HW capabilities ==================== Link: https://patch.msgid.link/1741608293-41436-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-12	net/mlx5: Avoid unnecessary use of comma operator	Simon Horman
	Although it does not seem to have any untoward side-effects, the use of ';' to separate to assignments seems more appropriate than ','. Flagged by clang-19 -Wcomma No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250307-mlx5-comma-v1-1-934deb6927bb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>