linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-03-10	net/mlx5: handle errors in mlx5_chains_create_table()	Wentao Liang
	In mlx5_chains_create_table(), the return value of mlx5_get_fdb_sub_ns() and mlx5_get_flow_namespace() must be checked to prevent NULL pointer dereferences. If either function fails, the function should log error message with mlx5_core_warn() and return error pointer. Fixes: 39ac237ce009 ("net/mlx5: E-Switch, Refactor chains and priorities") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250307021820.2646-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-08	net/mlx5: fs, add RDMA TRANSPORT steering domain support	Patrisious Haddad
	Add RX and TX RDMA_TRANSPORT flow table namespace, and the ability to create flow tables in those namespaces. The RDMA_TRANSPORT RX and TX are per vport. Packets will traverse through RDMA_TRANSPORT_RX after RDMA_RX and through RDMA_TRANSPORT_TX before RDMA_TX, ensuring proper control and management. RDMA_TRANSPORT domains are managed by the vport group manager. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/a6b550d9859a197eafa804b9a8d76916ca481da9.1740574103.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-08	net/mlx5: Query ADV_RDMA capabilities	Patrisious Haddad
	Query ADV_RDMA capabilities which provide information for advanced RDMA related features. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/e3e6ede03ea31cd201078dcdd4e407608e4a5a87.1740574103.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-08	net/mlx5: Limit non-privileged commands	Chiara Meiohas
	Limit non-privileged UID commands to half of the available command slots when privileged UIDs are present. Privileged throttle commands will not be limited. Use an xarray to store privileged UIDs. Add insert and remove functions for privileged UIDs management. Non-user commands (with uid 0) are not limited. Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/d2f3dd9a0dbad3c9f2b4bb0723837995e4e06de2.1740574103.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-08	net/mlx5: Allow the throttle mechanism to be more dynamic	Chiara Meiohas
	Previously, throttle commands were identified and limited based on opcode. These commands were limited to half the command slots using a semaphore, and callback commands checked the opcode to determine semaphore release. To allow exceptions, we introduce a variable to indicate when the throttle lock is held. This allows scenarios where throttle commands are not limited. Callback functions use this variable to determine if the throttle semaphore needs to be released. This patch contains no functional changes. It's a preparation for the next patch. Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Link: https://patch.msgid.link/055d975edeb816ac4c0fd1e665c6157d11947d26.1740574103.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-07	net/mlx5: Fill out devlink dev info only for PFs	Jiri Pirko
	Firmware version query is supported on the PFs. Due to this following kernel warning log is observed: [ 188.590344] mlx5_core 0000:08:00.2: mlx5_fw_version_query:816:(pid 1453): fw query isn't supported by the FW Fix it by restricting the query and devlink info to the PF. Fixes: 8338d9378895 ("net/mlx5: Added devlink info callback") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Link: https://patch.msgid.link/20250306212529.429329-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5e: Properly match IPsec subnet addresses	Leon Romanovsky
	Existing match criteria didn't allow to match whole subnet and only by specific addresses only. This caused to tunnel mode do not forward such traffic through relevant SA. In tunnel mode, policies look like this: src 192.169.0.0/16 dst 192.169.0.0/16 dir out priority 383615 ptype main tmpl src 192.169.101.2 dst 192.169.101.1 proto esp spi 0xc5141c18 reqid 1 mode tunnel crypto offload parameters: dev eth2 mode packet In this case, the XFRM core code handled all subnet calculations and forwarded network address to the drivers e.g. 192.169.0.0. For mlx5 devices, there is a need to set relevant prefix e.g. 0xFFFF00 to perform flow steering match operation. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250304160620.417580-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5e: Separate address related variables to be in struct	Leon Romanovsky
	Prepare the code to addition of prefix handling logic which is needed to support matching logic based on source and/or destination network prefixes. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250304160620.417580-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5: Lag, Enable Multiport E-Switch offloads on 8 ports LAG	Amir Tzin
	Patch [1] added mlx5 driver support for 8 ports HCAs which are available since ConnectX-8. Now that Multiport E-Switch is tested, we can enable it by removing flag MLX5_LAG_MPESW_OFFLOADS_SUPPORTED_PORTS. [1] commit e0e6adfe8c20 ("net/mlx5: Enable 8 ports LAG") Signed-off-by: Amir Tzin <amirtz@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250304160620.417580-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5e: Enable lanes configuration when auto-negotiation is off	Shahar Shitrit
	Currently, when auto-negotiation is disabled, the driver retrieves the speed and converts it into all link modes that correspond to that speed. With this patch, we add the ability to set the number of lanes, so that the combination of speed and lanes corresponds to exactly one specific link mode for the extended bit map. For the legacy bit map the driver sets all link modes correspond to speed and lanes. This change provides users with the option to set a specific link mode, rather than enabling all link modes associated with a given speed when auto-negotiation is off. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250304160620.417580-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5: Refactor link speed handling with mlx5_link_info struct	Shahar Shitrit
	Introduce struct mlx5_link_info with a speed field and change the types of mlx5e_link_speed and mlx5e_ext_link_speed from arrays of u32 to arrays of struct mlx5_link_info. These arrays are renamed to mlx5e_link_info and mlx5e_ext_link_info, respectively. This change prepares for a future patch that will introduce a lanes field in struct mlx5_link_info and add lanes mapping alongside the speed for each link mode in the two arrays. Additionally, rename function mlx5_port_speed2linkmodes() to mlx5_port_info2linkmodes() and function mlx5_port_ptys2speed() to mlx5_port_ptys2info() and update the speed parameter/return type to struct mlx5_link_info, in preparation for the upcoming patch where these functions will also utilize the lanes field. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250304160620.417580-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5: Relocate function declarations from port.h to mlx5_core.h	Shahar Shitrit
	The port header is a general file under include, yet it contains declarations for functions that are either not exported or exported but not used outside the mlx5_core driver. To enhance code organization, we move these declarations to mlx5_core.h, where they are more appropriately scoped. This refactor removes unnecessary exported symbols and prevents unexported functions from being inadvertently referenced outside of the mlx5_core driver. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250304160620.417580-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	mlx5: Create an auxiliary device for fwctl_mlx5	Saeed Mahameed
	If the device supports User Context then it can support fwctl. Create an auxiliary device to allow fwctl to bind to it. Create a sysfs like: $ ls /sys/devices/pci0000:00/0000:00:0a.0/mlx5_core.fwctl.0/driver -l lrwxrwxrwx 1 root root 0 Apr 25 19:46 /sys/devices/pci0000:00/0000:00:0a.0/mlx5_core.fwctl.0/driver -> ../../../../bus/auxiliary/drivers/mlx5_fwctl.mlx5_fwctl Link: https://patch.msgid.link/r/8-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-04	net: rename netns_local to netns_immutable	Nicolas Dichtel
	The name 'netns_local' is confusing. A following commit will export it via netlink, so let's use a more explicit name. Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-28	net/mlx5: Add trust lockdown error to health syndrome print function	Shahar Shitrit
	Add the new health syndrome value to hsynd_str() function to indicate that the device got a trust lockdown fault. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-28	net/mlx5: Expose crr in health buffer	Shahar Shitrit
	Expose crr bit in struct health buffer. When set, it indicates that the error cannot be recovered without flow involving a cold reset. Add its value to the health buffer info log. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-28	net/mlx5: Log health buffer data on any syndrome	Moshe Shemesh
	Currently health buffer data is logged either when FW fatal error detected or miss counter reached max misses threshold. Log health buffer whenever new health syndrome is detected. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-28	net/mlx5: Avoid report two health errors on same syndrome	Moshe Shemesh
	In case health counter has not increased for few polling intervals, miss counter will reach max misses threshold and health report will be triggered for FW health reporter. In case syndrome found on same health poll another health report will be triggered. Avoid two health reports on same syndrome by marking this syndrome as already known. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-27	net/mlx5: Remove newline at the end of a netlink error message	Gal Pressman
	Netlink error messages should not have a newline at the end of the string. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250226093904.6632-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27	net/mlx5e: Avoid a hundred -Wflex-array-member-not-at-end warnings	Gustavo A. R. Silva
	-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. So, in this particular case, we create a new `struct mlx5e_umr_wqe_hdr` to enclose the header part of flexible structure `struct mlx5e_umr_wqe`. This is, all the members except the flexible arrays `inline_mtts`, `inline_klms` and `inline_ksms` in the anonymous union. We then replace the header part with `struct mlx5e_umr_wqe_hdr hdr;` in `struct mlx5e_umr_wqe`, and change the type of the object currently causing trouble `umr_wqe` from `struct mlx5e_umr_wqe` to `struct mlx5e_umr_wqe_hdr` --this last bit gets rid of the flex-array-in-the-middle part and avoid the warnings. Also, no new members should be added to `struct mlx5e_umr_wqe`, instead any new members must be included in the header structure `struct mlx5e_umr_wqe_hdr`. To enforce this, we use `static_assert()`, ensuring that the memory layout of both the flexible structure and the newly created header struct remain consistent. The next step is to refactor the rest of the related code accordingly, which means adding a bunch of `hdr.` wherever needed. Lastly, we use `container_of()` whenever we need to retrieve a pointer to the flexible structure `struct mlx5e_umr_wqe`. So, with these changes, fix 125 of the following warnings: drivers/net/ethernet/mellanox/mlx5/core/en.h:664:48: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/Z76HzPW1dFTLOSSy@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-6.14-rc5). Conflicts: drivers/net/ethernet/cadence/macb_main.c fa52f15c745c ("net: cadence: macb: Synchronize stats calculations") 75696dd0fd72 ("net: cadence: macb: Convert to get_stats64") https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au Adjacent changes: drivers/net/ethernet/intel/ice/ice_sriov.c 79990cf5e7ad ("ice: Fix deinitializing VF in error path") a203163274a4 ("ice: simplify VF MSI-X managing") net/ipv4/tcp.c 18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace") 297d389e9e5b ("net: prefix devmem specific helpers") net/mptcp/subflow.c 8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join") c3349a22c200 ("mptcp: consolidate subflow cleanup") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26	net/mlx5: IRQ, Fix null string in debug print	Shay Drory
	irq_pool_alloc() debug print can print a null string. Fix it by providing a default string to print. Fixes: 71e084e26414 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501141055.SwfIphN0-lkp@intel.com/ Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250225072608.526866-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26	net/mlx5: Restore missing trace event when enabling vport QoS	Carolina Jubran
	Restore the `trace_mlx5_esw_vport_qos_create` event when creating the vport scheduling element. This trace event was lost during refactoring. Fixes: be034baba83e ("net/mlx5: Make vport QoS enablement more flexible for future extensions") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250225072608.526866-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26	net/mlx5: Fix vport QoS cleanup on error	Carolina Jubran
	When enabling vport QoS fails, the scheduling node was never freed, causing a leak. Add the missing free and reset the vport scheduling node pointer to NULL. Fixes: be034baba83e ("net/mlx5: Make vport QoS enablement more flexible for future extensions") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250225072608.526866-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25	Merge branch 'mlx5-next' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2025-02-24 The following pull-request contains common mlx5 updates for your net-next tree. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Change POOL_NEXT_SIZE define value and make it global net/mlx5: Add new health syndrome error and crr bit offset ==================== Link: https://patch.msgid.link/20250224212446.523259-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25	net/mlx5e: Symmetric OR-XOR RSS hash control	Gal Pressman
	Allow control over the symmetric RSS hash, which was previously set to enabled by default by the driver. Symmetric OR-XOR RSS can now be queried and controlled using the 'ethtool -x/X' command. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250224174416.499070-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24	net/mlx5: Use secs_to_jiffies() instead of msecs_to_jiffies()	Thorsten Blum
	Use secs_to_jiffies() and simplify the code. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20250221085350.198024-3-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24	net/mlx5e: Support RX xfrm state selector's UPSPEC for packet offload	Jianbo Liu
	Previously, the upper layer matches are added for the decryption rule when xfrm selector's UPSPEC is specified in the command. However, it's impossible as packets are not decrypted, and there is no way to do match on the upper protocol (TCP/UDP) with specific source/destination port. The result is that packets are not decrypted by hardware because of this mismatch. Instead, they are forwarded to kernel, and decryption is done by software. To resolve this issue, this patch adds new table (sa_sel) after status table and before policy table. When UPSPEC's proto is specified in xfrm state's selector, a rule is added in status table to forward the decrypted packets to sa_sel table, where the corresponding rule for selector's UPSPEC is added, and packet's upper headers are checked there. If matched, they will be forward to policy table to do policy check. Otherwise, they are dropped immediately. Besides, add a global count for this kind of packet drop. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250220213959.504304-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24	net/mlx5e: Add pass flow group for IPSec RX status table	Jianbo Liu
	This flow group is added for the pass rules for both crypto offload and packet offload. It is placed at the end of the table, and right before the miss group. There are two entries, and the default pass rules for both offloads are added in this group. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250220213959.504304-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24	net/mlx5e: Add num_reserved_entries param for ipsec_ft_create()	Jianbo Liu
	Add parameter for ipsec_ft_create() to pass the number of the reserved entries when creating auto-grouped flow table. It's used to create table with pre-defined group(s) which may have more than one rule. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250220213959.504304-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24	net/mlx5e: Skip IPSec RX policy check for crypto offload	Jianbo Liu
	For crypto offload, there is no xfrm policy rule offloaded to hardware, so no need to continue with policy check for it. Previously, for crypto offload, the hardware metadata reg c4 is not used and not changed, but set to ASO_OK(0) before decryption to avoid garbage data. Then a default rule is added to check ipsec_syndrome and this register. Packets are forwarded to policy table if succeed, or drop if fails. According to hardware document, this register value could be 0, 1. So a special value (0xAA), which is not used by hardware, is chosen as an indication for crypto offload. It is set to c4 before decryption. Then a default rule, which matches on 0xAA (and ipsec_syndrome on 0), is added, which means packets are done by crypto offload, and sends them to kernel directly, thus skips the policy check. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250220213959.504304-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24	net/mlx5e: Move IPSec policy check after decryption	Jianbo Liu
	Currently, xfrm policy check is done before decryption in mlx5 driver. If matching any policy, packets are forwarded to xfrm state table for decryption. But this is exact opposite to what software does. For kernel implementation, xfrm decode is unconditionally activated whenever an IPSec packet reaches the input flow if there’s a matching state rule. This patch changes the order, move policy check after decryption. Besides, a miss flow table is added at the end for legacy mode, to make it easier to update the default destination of the steering rules. So ESP packets are firstly forwarded to SA table for decryption, then the result is checked in status table. If the decryption succeeds, packets are forwarded to another table to check xfrm policy rules. When a policy with allow action is matched, if in legacy mode packets are forwarded to miss flow table with one rule to forward them to RoCE tables, if in switchdev mode they are forwarded directly to TC root chain instead. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250220213959.504304-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24	net/mlx5e: Add correct match to check IPSec syndromes for switchdev mode	Jianbo Liu
	In commit dddb49b63d86 ("net/mlx5e: Add IPsec and ASO syndromes check in HW"), IPSec and ASO syndromes checks after decryption for the specified ASO object were added. But they are correct only for eswith in legacy mode. For switchdev mode, metadata register c1 is used to save the mapped id (not ASO object id). So, need to change the match accordingly for the check rules in status table. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250220213959.504304-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24	net/mlx5e: Change the destination of IPSec RX SA miss rule	Jianbo Liu
	For eswitch in legacy mode, the packets decrypted in RX SA table will continue to be processed for RoCE. But this is not necessary for the un-decrypted packets, which don't match any decryption rules but hit the miss rule at the end of the table. So, change the destination of miss rule to TTC default one and skip RoCE. For eswitch in switchdev mode, the destination is unchanged. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250220213959.504304-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24	net/mlx5e: Add helper function to update IPSec default destination	Jianbo Liu
	The default destination of IPSec steering rules for MPV mode will be updated when the master device is brought up or down. Move the common code into the helper function. It’s convenient to update destinations in later patches. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250220213959.504304-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-23	net/mlx5: Change POOL_NEXT_SIZE define value and make it global	Patrisious Haddad
	Change POOL_NEXT_SIZE define value from 0 to BIT(30), since this define is used to request the available maximum sized flow table, and zero doesn't make sense for it, whereas some places in the driver use zero explicitly expecting the smallest table size possible but instead due to this define they end up allocating the biggest table size unawarely. In addition move the definition to "include/linux/mlx5/fs.h" to expose the define to IB driver as well, while appropriately renaming it. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250219085808.349923-3-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-21	xfrm: provide common xdo_dev_offload_ok callback implementation	Leon Romanovsky
	Almost all drivers except bond and nsim had same check if device can perform XFRM offload on that specific packet. The check was that packet doesn't have IPv4 options and IPv6 extensions. In NIC drivers, the IPv4 HELEN comparison was slightly different, but the intent was to check for the same conditions. So let's chose more strict variant as a common base. Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-20	net/mlx5e: Separate extended link modes request from link modes type selection	Shahar Shitrit
	The function ext_requested() serves two distinct purposes: it checks if extended link modes were requested, and it selects whether to use extended or legacy link modes. This change separates these two purposes. Now, ext_link_mode_requested() is used directly for checking if extended link modes are requested, while the selection of extended modes is handled independently based on the autonegotiation status. By making this distinction, the logic for determining whether to select extended or legacy link modes is clearer. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250219114112.403808-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20	net/mlx5e: Change eth_proto parameter naming	Shahar Shitrit
	eth_proto_cap parameter represents the supported link modes, while eth_proto_admin refers to the configured ones. The function get_advertising() retrieves the configured link modes, thus we update its parameter name to eth_proto_admin. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250219114112.403808-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20	net/mlx5e: Introduce ptys2ethtool_process_link()	Shahar Shitrit
	The functions ptys2ethtool_supported_link(), ptys2ethtool_adver_link() share the same code, thus, in order to remove code duplication we introduce a new function ptys2ethtool_process_link() to handle the processing of both supported and advertised link modes. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250219114112.403808-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20	net/mlx5e: Refactor ptys2ethtool_adver_link()	Shahar Shitrit
	The function ptys2ethtool_adver_link() contains duplicated code that is found in mlx5e_ethtool_get_speed_arr(). To eliminate this redundancy, we update mlx5e_ethtool_get_speed_arr() to select the appropriate table based on the ext argument passed by the caller, rather than querying the supported mode locally. This allows us to replace the current logic in ptys2ethtool_adver_link() with a call to mlx5e_ethtool_get_speed_arr(). This adjustment aligns with the ptys2ethtool_supported_link() function and prepares for an upcoming patch that reduces code duplication. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250219114112.403808-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20	net/mlx5: Bridge, correct config option description	Cosmin Ratiu
	The implication of the previous help text was that without this option enabled, representor devices couldn't be added to a bridge device, while in fact that was possible, just that rules didn't get offloaded to hw. This commit clarifies the help text. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250219114112.403808-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18	eth: mlx4: use the page pool for Rx buffers	Jakub Kicinski
	Simple conversion to page pool. Preserve the current fragmentation logic / page splitting. Each page starts with a single frag reference, and then we bump that when attaching to skbs. This can likely be optimized further. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250213010635.1354034-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18	eth: mlx4: remove the local XDP fast-recycling ring	Jakub Kicinski
	It will be replaced with page pool's built-in recycling. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250213010635.1354034-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18	eth: mlx4: don't try to complete XDP frames in netpoll	Jakub Kicinski
	mlx4 doesn't support ndo_xdp_xmit / XDP_REDIRECT and wasn't using page pool until now, so it could run XDP completions in netpoll (NAPI budget == 0) just fine. Page pool has calling context requirements, make sure we don't try to call it from what is potentially HW IRQ context. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250213010635.1354034-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18	eth: mlx4: create a page pool for Rx	Jakub Kicinski
	Create a pool per rx queue. Subsequent patches will make use of it. Move fcs_del to a hole to make space for the pointer. Per common "wisdom" base the page pool size on the ring size. Note that the page pool cache size is in full pages, so just round up the effective buffer size to pages. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250213010635.1354034-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17	net/mlx5: Add sensor name to temperature event message	Shahar Shitrit
	Previously, a temperature event message included a bitmap indicating which sensors detect high temperatures. To enhance clarity, we modify the message format to explicitly list the names of the overheating sensors, alongside the sensors bitmap. If HWMON is not configured, the event message remains unchanged. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250213094641.226501-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17	net/mlx5: Modify LSB bitmask in temperature event to include only the first bit	Shahar Shitrit
	In the sensor_count field of the MTEWE register, bits 1-62 are supported only for unmanaged switches, not for NICs, and bit 63 is reserved for internal use. To prevent confusing output that may include set bits that are not relevant to NIC sensors, we update the bitmask to retain only the first bit, which corresponds to the sensor ASIC. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Link: https://patch.msgid.link/20250213094641.226501-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17	net/mlx5: Prefix temperature event bitmap with '0x' for clarity	Shahar Shitrit
	Prepend '0x' to the sensor bitmap in the warning message to clearly indicate that the bitmap is in hexadecimal format. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Link: https://patch.msgid.link/20250213094641.226501-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17	net/mlx5: Apply rate-limiting to high temperature warning	Shahar Shitrit
	Wrap the high temperature warning in a temperature event with a call to net_ratelimit() to prevent flooding the kernel log with repeated warning messages when temperature exceeds the threshold multiple times within a short duration. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Link: https://patch.msgid.link/20250213094641.226501-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>