summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)Author
2024-07-09mlxsw: pci: Lock configuration space of upstream bridge during resetIdo Schimmel
The driver triggers a "Secondary Bus Reset" (SBR) by calling __pci_reset_function_locked() which asserts the SBR bit in the "Bridge Control Register" in the configuration space of the upstream bridge for 2ms. This is done without locking the configuration space of the upstream bridge port, allowing user space to access it concurrently. Linux 6.11 will start warning about such unlocked resets [1][2]: pcieport 0000:00:01.0: unlocked secondary bus reset via: pci_reset_bus_function+0x51c/0x6a0 Avoid the warning and the concurrent access by locking the configuration space of the upstream bridge prior to the reset and unlocking it afterwards. [1] https://lore.kernel.org/all/171711746953.1628941.4692125082286867825.stgit@dwillia2-xfh.jf.intel.com/ [2] https://lore.kernel.org/all/20240531213150.GA610983@bhelgaas/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/9937b0afdb50f2f2825945393c94c093c04a5897.1720447210.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-09mlxsw: core_thermal: Report valid current state during cooling device ↵Ido Schimmel
registration Commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") changed the thermal core to read the current state of the cooling device as part of the cooling device's registration. This is incompatible with the current implementation of the cooling device operations in mlxsw, leading to initialization failure with errors such as: mlxsw_spectrum 0000:01:00.0: Failed to register cooling device mlxsw_spectrum 0000:01:00.0: cannot register bus device The reason for the failure is that when the get current state operation is invoked the driver tries to derive the index of the cooling device by walking a per thermal zone array and looking for the matching cooling device pointer. However, the pointer is returned from the registration function and therefore only set in the array after the registration. The issue was later fixed by commit 1af89dedc8a5 ("thermal: core: Do not fail cdev registration because of invalid initial state") by not failing the registration of the cooling device if it cannot report a valid current state during registration, although drivers are responsible for ensuring that this will not happen. Therefore, make sure the driver is able to report a valid current state for the cooling device during registration by passing to the registration function a per cooling device private data that already has the cooling device index populated. While at it, call thermal_cooling_device_unregister() unconditionally since the function returns immediately if the cooling device pointer is NULL. Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/c823c4678b6b7afb902c35b3551c81a053afd110.1720447210.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-09mlxsw: Warn about invalid accesses to array fieldsPetr Machata
A forgotten or buggy variable initialization can cause out-of-bounds access to a register or other item array field. For an overflow, such access would mangle adjacent parts of the register payload. For an underflow, due to all variables being unsigned, the access would likely trample unrelated memory. Since neither is correct, replace these accesses with accesses at the index of 0, and warn about the issue. Suggested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/b988fb265c2f6c1206fe12d5bfdcfa188b7672d1.1720447210.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/phy/aquantia/aquantia.h 219343755eae ("net: phy: aquantia: add missing include guards") 61578f679378 ("net: phy: aquantia: add support for PHY LEDs") drivers/net/ethernet/wangxun/libwx/wx_hw.c bd07a9817846 ("net: txgbe: remove separate irq request for MSI and INTx") b501d261a5b3 ("net: txgbe: add FDIR ATR support") https://lore.kernel.org/all/20240703112936.483c1975@canb.auug.org.au/ include/linux/mlx5/mlx5_ifc.h 048a403648fc ("net/mlx5: IFC updates for changing max EQs") 99be56171fa9 ("net/mlx5e: SHAMPO, Re-enable HW-GRO") https://lore.kernel.org/all/20240701133951.6926b2e3@canb.auug.org.au/ Adjacent changes: drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c 4130c67cd123 ("wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference") 3f3126515fbe ("wifi: iwlwifi: mvm: add mvm-specific guard") include/net/mac80211.h 816c6bec09ed ("wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP") 5a009b42e041 ("wifi: mac80211: track changes in AP's TPE") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-04mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI ↵Aleksandr Mishin
file In case of invalid INI file mlxsw_linecard_types_init() deallocates memory but doesn't reset pointer to NULL and returns 0. In case of any error occurred after mlxsw_linecard_types_init() call, mlxsw_linecards_init() calls mlxsw_linecard_types_fini() which performs memory deallocation again. Add pointer reset to NULL. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: b217127e5e4e ("mlxsw: core_linecards: Add line card objects and implement provisioning") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20240703203251.8871-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-01RDMA/mlx5: Add plane index support when querying PTYS registersMark Zhang
Support the new "plane_ind" field when querying port PTYS registers. This is needed when querying the rate of a plane port. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/1f703c36306aa46917fcd88eadbb23b3e380d526.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01RDMA/mlx5: Add support to multi-plane device and portMark Zhang
When multi-plane is supported, a logical port, which is aggregation of multiple physical plane ports, is exposed for data transmission. Compared with a normal mlx5 IB port, this logical port supports all functionalities except Subnet Management. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/7e37c06c9cb243be9ac79930cd17053903785b95.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-06-28net/mlx5e: Approximate IPsec per-SA payload data bytes countLeon Romanovsky
ConnectX devices lack ability to count payload data byte size which is needed for SA to return to libreswan for rekeying. As a solution let's approximate that by decreasing headers size from total size counted by flow steering. The calculation doesn't take into account any other headers which can be in the packet (e.g. IP extensions). Fixes: 5a6cddb89b51 ("net/mlx5e: Update IPsec per SA packets/bytes count") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28net/mlx5e: Present succeeded IPsec SA bytes and packetLeon Romanovsky
IPsec SA statistics presents successfully decrypted and encrypted packet and bytes, and not total handled by this SA. So update the calculation logic to take into account failures. Fixes: 6fb7f9408779 ("net/mlx5e: Connect mlx5 IPsec statistics with XFRM core") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28net/mlx5e: Add mqprio_rl cleanup and free in mlx5e_priv_cleanup()Jianbo Liu
In the cited commit, mqprio_rl cleanup and free are mistakenly removed in mlx5e_priv_cleanup(), and it causes the leakage of host memory and firmware SCHEDULING_ELEMENT objects while changing eswitch mode. So, add them back. Fixes: 0bb7228f7096 ("net/mlx5e: Fix mqprio_rl handling on devlink reload") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28net/mlx5: E-switch, Create ingress ACL when neededChris Mi
Currently, ingress acl is used for three features. It is created only when vport metadata match and prio tag are enabled. But active-backup lag mode also uses it. It is independent of vport metadata match and prio tag. And vport metadata match can be disabled using the following devlink command: # devlink dev param set pci/0000:08:00.0 name esw_port_metadata \ value false cmode runtime If ingress acl is not created, will hit panic when creating drop rule for active-backup lag mode. If always create it, there will be about 5% performance degradation. Fix it by creating ingress acl when needed. If esw_port_metadata is true, ingress acl exists, then create drop rule using existing ingress acl. If esw_port_metadata is false, create ingress acl and then create drop rule. Fixes: 1749c4c51c16 ("net/mlx5: E-switch, add drop rule support to ingress ACL") Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28net/mlx5: Use max_num_eqs_24b when setting max_io_eqsDaniel Jurgens
Due a bug in the device max_num_eqs doesn't always reflect a written value. As a result, setting max_io_eqs may not work but appear successful. Instead write max_num_eqs_24b, which reflects correct value. Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28net/mlx5: Use max_num_eqs_24b capability if setDaniel Jurgens
A new capability with more bits is added. If it's set use that value as the maximum number of EQs available. This cap is also writable by the vhca_resource_manager to allow limiting the number of EQs available to SFs and VFs. Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28mlxsw: Implement ethtool operation to write to a transceiver module EEPROMIdo Schimmel
Implement the ethtool_ops::set_module_eeprom_by_page operation to allow ethtool to write to a transceiver module EEPROM, in a similar fashion to the ethtool_ops::get_module_eeprom_by_page operation. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: e3f02f32a050 ("ionic: fix kernel panic due to multi-buffer handling") d9c04209990b ("ionic: Mark error paths in the data path as unlikely") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-26mlxsw: pci: Use fragmented buffersAmit Cohen
WQE (Work Queue Element) includes 3 scatter/gather entries for buffers. The buffer can be split into 3 parts, software should set address and byte count of each part. A previous patch-set used page pool to allocate buffers, to simplify the change, we first used one continuous buffer, which was allocated with order > 0. This patch improves page pool usage to allocate the exact number of pages which are required for packet. As part of init, fill WQE.address[x] and WQE.byte_count* with pages which are allocated from the pool. Fill x entries according to number of scatter/gather entries which are required for maximum packet size. When a packet is received, check the actual size and replace only the used pages. Save bytes for software overhead only as part of the first entry. This change also requires using fragmented SKB, till now all the buffer was in the linear part. Note that 'skb->truesize' is decreased for small packets. For now the maximum buffer size is 3 * PAGE_SIZE which is enough, in case that the driver will support larger MTU, we can use 'order' to allocate more than one page per scatter/gather entry. This change significantly reduces memory consumption of mlxsw driver. The footprint is reduced by 26%. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/ee38898c692e7f644a7f3ea4d33aeddb4dd917d2.1719321422.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-26mlxsw: pci: Store number of scatter/gather entries for maximum packet sizeAmit Cohen
A previous patch-set used page pool for Rx buffers allocations. To simplify the change, we first used page pool for one allocation per packet - one continuous buffer is allocated for each packet. This can be improved by using fragmented buffers, then memory consumption will be significantly reduced. WQE (Work Queue Element) includes up to 3 scatter/gather entries for data. As preparation for fragmented buffer usage, calculate number of scatter/gather entries which are required for packet according to maximum MTU and store it for future use. For now use PAGE_SIZE for each entry, which means that maximum buffer size is 3 * PAGE_SIZE. This is enough for the maximum MTU which is supported in the driver now (10K). Warn in an unlikely case of maximum MTU which requires more than 3 pages, for now this warn should not happen with standard page size (>=4K) and maximum MTU (10K). Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/98c3e3adb7e727e571ac538faf67cef262cec4fc.1719321422.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-26net: Drop explicit initialization of struct i2c_device_id::driver_data to 0Uwe Kleine-König
These drivers don't use the driver_data member of struct i2c_device_id, so don't explicitly initialize this member. This prepares putting driver_data in an anonymous union which requires either no initialization or named designators. But it's also a nice cleanup on its own. While add it, also remove commas after the sentinel entries. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Reviewed-by: Petr Machata <petrm@nvidia.com> # For mlxsw Reviewed-by: Kory Maincent <Kory.maincent@bootlin.com> Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au> # for mctp-i2c Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20240625083853.2205977-2-u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-21mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systemsIdo Schimmel
The following two shared buffer operations make use of the Shared Buffer Status Register (SBSR): # devlink sb occupancy snapshot pci/0000:01:00.0 # devlink sb occupancy clearmax pci/0000:01:00.0 The register has two masks of 256 bits to denote on which ingress / egress ports the register should operate on. Spectrum-4 has more than 256 ports, so the register was extended by cited commit with a new 'port_page' field. However, when filling the register's payload, the driver specifies the ports as absolute numbers and not relative to the first port of the port page, resulting in memory corruptions [1]. Fix by specifying the ports relative to the first port of the port page. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0 Read of size 1 at addr ffff8881068cb00f by task devlink/1566 [...] Call Trace: <TASK> dump_stack_lvl+0xc6/0x120 print_report+0xce/0x670 kasan_report+0xd7/0x110 mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0 mlxsw_devlink_sb_occ_snapshot+0x75/0xb0 devlink_nl_sb_occ_snapshot_doit+0x1f9/0x2a0 genl_family_rcv_msg_doit+0x20c/0x300 genl_rcv_msg+0x567/0x800 netlink_rcv_skb+0x170/0x450 genl_rcv+0x2d/0x40 netlink_unicast+0x547/0x830 netlink_sendmsg+0x8d4/0xdb0 __sys_sendto+0x49b/0x510 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0xc1/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f [...] Allocated by task 1: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 copy_verifier_state+0xbc2/0xfb0 do_check_common+0x2c51/0xc7e0 bpf_check+0x5107/0x9960 bpf_prog_load+0xf0e/0x2690 __sys_bpf+0x1a61/0x49d0 __x64_sys_bpf+0x7d/0xc0 do_syscall_64+0xc1/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 1: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 poison_slab_object+0x109/0x170 __kasan_slab_free+0x14/0x30 kfree+0xca/0x2b0 free_verifier_state+0xce/0x270 do_check_common+0x4828/0xc7e0 bpf_check+0x5107/0x9960 bpf_prog_load+0xf0e/0x2690 __sys_bpf+0x1a61/0x49d0 __x64_sys_bpf+0x7d/0xc0 do_syscall_64+0xc1/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: f8538aec88b4 ("mlxsw: Add support for more than 256 ports in SBSR register") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21mlxsw: pci: Fix driver initialization with Spectrum-4Ido Schimmel
Cited commit added support for a new reset flow ("all reset") which is deeper than the existing reset flow ("software reset") and allows the device's PCI firmware to be upgraded. In the new flow the driver first tells the firmware that "all reset" is required by issuing a new reset command (i.e., MRSR.command=6) and then triggers the reset by having the PCI core issue a secondary bus reset (SBR). However, due to a race condition in the device's firmware the device is not always able to recover from this reset, resulting in initialization failures [1]. New firmware versions include a fix for the bug and advertise it using a new capability bit in the Management Capabilities Mask (MCAM) register. Avoid initialization failures by reading the new capability bit and triggering the new reset flow only if the bit is set. If the bit is not set, trigger a normal PCI hot reset by skipping the call to the Management Reset and Shutdown Register (MRSR). Normal PCI hot reset is weaker than "all reset", but it results in a fully operational driver and allows users to flash a new firmware, if they want to. [1] mlxsw_spectrum4 0000:01:00.0: not ready 1023ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 2047ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 4095ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 8191ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 16383ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 32767ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 65535ms after bus reset; giving up mlxsw_spectrum4 0000:01:00.0: PCI function reset failed with -25 mlxsw_spectrum4 0000:01:00.0: cannot register bus device mlxsw_spectrum4: probe of 0000:01:00.0 failed with error -25 Fixes: f257c73e5356 ("mlxsw: pci: Add support for new reset flow") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Maksym Yaremchuk <maksymy@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19mlxsw: pci: Use napi_consume_skb() to free SKB as part of Tx completionAmit Cohen
Currently, as part of Tx completion, the driver calls dev_kfree_skb_any() to free the SKB. For this flow, the correct function is napi_consume_skb(). This function and dev_consume_skb_any() were added to be used for consumed SKBs, which were not dropped, so the skb:kfree_skb tracepoint is not triggered, and we can get better diagnostics about dropped packets. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/a9f9f3dc884c0d1be4bd4c9d72030c88c7ac004f.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19mlxsw: pci: Do not store SKB for RDQ elementsAmit Cohen
The previous patch used page pool to allocate buffers for RDQ. With this change, 'elem_info->u.rdq.skb' is not used anymore, as we do not allocate SKB before getting the packet, we hold page pointer and build the SKB around it once packet is received. Remove the union and store SKB pointer for SDQ only. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/23a531008936dc9a1a298643fb1e4f9a7b8e6eb3.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19mlxsw: pci: Optimize data buffer accessAmit Cohen
Before accessing data buffer, call net_prefetch() to load it into the cache. This change improves driver performance, CPU can handle about 7.1% more packets per second. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1fa07c510890866a6f201163ab7e78890ba28b3b.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19mlxsw: pci: Use page pool for Rx buffers allocationAmit Cohen
As part of driver init, all Rx queues are filled with buffers for hardware usage. Later, when a packet is received, a new buffer should be allocated to be used by hardware instead of the received buffer. Packet's processing time includes allocation time, which can be improved using page pool. Using page pool, DMA mapping is done only for first allocation of buffers. As subsequent buffers allocation avoid DMA mapping, it results in performance improvement. The purpose of page pool is to allocate pages fast from cache without locking. This lockless guarantee naturally comes from running under a NAPI. Use page pool to allocate the data buffer only, so hardware will use it to fill the packet. At completion time, attach the data buffer (now filled with packet payload) to new SKB which is allocated around the received buffer. SKB building at completion time prevents cache miss for each packet, as now the SKB is allocated right before packets will be handled by networking stack. Page pool for each Rx queue enhances Rx side performance by reclaiming buffers back to each queue specific pool. This change significantly improves driver performance, CPU can handle about 345% of the packets per second it previously handled. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1cf788a8f43c70aae6d526018ef77becb27ad6d3.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19mlxsw: pci: Initialize page pool per CQAmit Cohen
Next patch will use page pool to allocate buffers for RDQ. Initialize page pool for each CQ, which is mapped 1:1 to RDQ. Page pool for each Rx queue enhances Rx side performance by reclaiming buffers back to each queue specific pool. When only one NAPI instance is the consumer of pages from page pool, it is recommended to pass it as part of 'page_pool_params', then page pool APIs will be done without special locks. mlxsw driver holds NAPI instance per CQ, so add page pool per CQ and use the existing NAPI instance. For now, pages are not allocated from the pool, next patch will use it. Some notes regarding 'page_pool_params': * Use PP_FLAG_DMA_MAP to allow page pool handles DMA mapping, for now do not use sync flag, as only the device writes to this memory and we read it only when it finishes writing there. This will probably be changed when we will support XDP. * Define 'order' according to maximum MTU and take into account software overhead. Some round up are done, which means that we allocate more pages than we really need. This can be improved later by using fragmented buffers. * Use pool_size = MLXSW_PCI_WQE_COUNT. This will be the size of 'ptr_ring', and should be the maximum amount of packets that page pool will allocate memory for. In our case, this is the queue size, defined as MLXSW_PCI_WQE_COUNT. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/02e5856ae7c572d4293ce6bb92c286ee6cfec800.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19mlxsw: pci: Store CQ pointer as part of RDQ structureAmit Cohen
Next patches will add support for page pool in mlxsw driver. Page pool will be used to allocate buffers for RDQ and will use NAPI instance of the appropriate CQ (RDQ is mapped 1:1 to CQ). To allow pool initialization as part of CQ init, when NAPI is initialized, page_pool structure will be as part of CQ structure. Later, the allocations for RDQ will be done from the pool in the appropriate CQ. To allow access to the appropriate pool, set CQ pointer as part of RDQ initialization. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/d60918ca1e142a554af1df9c1152cdac83854a3b.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19mlxsw: pci: Split NAPI setup/teardown into two stepsAmit Cohen
mlxsw_pci_cq_napi_setup() includes both NAPI initialization and enablement, similar to teardown function. Next patches will add support for page pool in mlxsw driver, then we use NAPI instance for page pool. Page pool initialization should be done before NAPI enablement, same for page pool destruction which should be done after NAPI disablement. As preparation, split NAPI setup/teardown into two steps, then page pool setup will be done between the phases. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/8dbf37e859f07247498fca17109b8858ff2b0498.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18net/mlx4_en: Use ethtool_puts/sprintf to fill stats stringsKamal Heib
Use the ethtool_puts/ethtool_sprintf helper to print the stats strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-4-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18net/mlx4_en: Use ethtool_puts to fill selftest stringsKamal Heib
Use the ethtool_puts helper to print the selftest strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-3-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18net/mlx4_en: Use ethtool_puts to fill priv flags stringsKamal Heib
Use the ethtool_puts helper to print the priv flags strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-2-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17net/mlx5e: Add per queue netdev-genl statsJoe Damato
./cli.py --spec netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue"}' ...snip {'ifindex': 7, 'queue-id': 62, 'queue-type': 'rx', 'rx-alloc-fail': 0, 'rx-bytes': 105965251, 'rx-packets': 179790}, {'ifindex': 7, 'queue-id': 0, 'queue-type': 'tx', 'tx-bytes': 9402665, 'tx-packets': 17551}, ...snip Also tested with the script tools/testing/selftests/drivers/net/stats.py in several scenarios to ensure stats tallying was correct: - on boot (default queue counts) - adjusting queue count up or down (ethtool -L eth0 combined ...) The tools/testing/selftests/drivers/net/stats.py brings the device up, so to test with the device down, I did the following: $ ip link show eth4 7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..] [..snip..] $ cat /proc/net/dev | grep eth4 eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..] $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ --dump qstats-get --json '{"ifindex": 7}' [{'ifindex': 7, 'rx-alloc-fail': 0, 'rx-bytes': 235710489, 'rx-packets': 434811, 'tx-bytes': 2878744, 'tx-packets': 21227}] Compare the values in /proc/net/dev match the output of cli for the same device, even while the device is down. Note that while the device is down, per queue stats output nothing (because the device is down there are no queues): $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue", "ifindex": 7}' [] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-17net/mlx5e: Add txq to sq stats mappingJoe Damato
mlx5 currently maps txqs to an sq via priv->txq2sq. It is useful to map txqs to sq_stats, as well, for direct access to stats. Add priv->txq2sq_stats and insert mappings. The mappings will be used next to tabulate stats information. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-17Merge tag 'v6.10-rc4' into driver-core-nextGreg Kroah-Hartman
We need the driver core and sysfs fixes in here to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16net/mlx5: Reimplement write combining testJianbo Liu
The test of write combining was added before in mlx5_ib driver. It opens UD QP and posts NOP WQEs, and uses BlueFlame doorbell. When BlueFlame is used, WQEs get written directly to a PCI BAR of the device (in addition to memory) so that the device handles them without having to access memory. In this test, the WQEs written in memory are different from the ones written to the BlueFlame which request CQE update. By checking the completion reports posted on CQ, we can know if BlueFlame succeeds or not. The write combining must be supported if BlueFlame succeeds as its register is written using write combining. This patch reimplements the test in the same way, but using a pair of SQ and CQ only. It is moved to mlx5_core as a general feature used by both mlx5_core and mlx5_ib. Besides, save write combine test result of the PCI function, so that its thousands of child functions such as SF can query without paying the time and resource penalty by itself. The test function is called only after failing to get the cached result. With this enhancement, all thousands of SFs of the PF attached to same driver no longer need to perform WC check explicitly, which is already done in the system. This saves several commands per SF, thereby speeds up SF creation and also saves completion EQ creation. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/4ff5a8cc4c5b5b0d98397baa45a5019bcdbf096e.1717409369.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-14mlxsw: Use the same maximum MTU value throughout the driverAmit Cohen
Currently, the driver uses two different values for maximum MTU, one is stored in mlxsw_port->dev->max_mtu and the second is stored in mlxsw_port->max_mtu. The second one is set to value which is queried from firmware. This value was never tested, and unfortunately is not really supported. That means that with the existing code, user can set MTU to X, which is not really supported by firmware and which is bigger than buffer size which is allocated in pci. To make the driver consistent, use only mlxsw_port->dev->max_mtu for maximum MTU value, for buffers headroom add Ethernet frame headers, which are not included in mlxsw_port->dev->max_mtu. Remove mlxsw_port->max_mtu. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/89fa6f804386b918d337e736e14ac291bb947483.1718275854.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14mlxsw: spectrum: Set more accurate values for netdevice min/max MTUAmit Cohen
Currently, the driver uses ETH_MAX_MTU as maximum MTU of netdevices, instead, use the accurate value which is supported by the driver. Subtract Ethernet headers which are taken into account by hardware for MTU checking, as described in the previous patch. Set minimum MTU to ETH_MIN_MTU, as zero MTU is not really supported. With this change: a. The stack will do the MTU checking, so we can remove it from the driver. b. User space will be able to query the actual MTU limits. Before this patch: $ ip -j -d link show dev swp1 | jq | grep mtu "mtu": 1500, "min_mtu": 0, "max_mtu": 65535, With this patch: $ ip -j -d link show dev swp1 | jq | grep mtu "mtu": 1500, "min_mtu": 68, "max_mtu": 10218, Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/be8232e38c196ecb607f82c5e000ea427ce22abb.1718275854.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14mlxsw: Adjust MTU value to hardware checkAmit Cohen
Ethernet frame consists of - Ethernet header, payload, FCS. The MTU value which is used by user is the size of the payload, which means that when user sets MTU to X, the total frame size will be larger due to the addition of the Ethernet header and FCS. Spectrum ASICs take into account Ethernet header and FCS as part of packet size for MTU check. Adjust MTU value when user sets MTU, to configure the MTU size which is required by hardware. The Tx header length which was used by the driver is not relevant for such calculation, take into account Ethernet header (with VLAN extension) and FCS. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/f3203c2477bb8ed18b1e79642fa3e3713e1e55bb.1718275854.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14mlxsw: port: Edit maximum MTU valueAmit Cohen
Currently mlxsw driver supports up to 10000 bytes for maximum MTU, this value is not accurate, we can support up to 10K bytes. Change the value to the maximum supported MTU by firmware. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/666f51681234aeef09d771833ccb6e94bd323c88.1718275854.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14net/mlx5e: Support SWP-mode offload L4 csum calculationRahul Rameshbabu
Calculate the pseudo-header checksum for both IPSec transport mode and IPSec tunnel mode for mlx5 devices that do not implement a pure hardware checksum offload for L4 checksum calculation. Introduce a capability bit that identifies such mlx5 devices. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240613210036.1125203-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14net/mlx5e: Use tcp_v[46]_check checksum helpersGal Pressman
Use the tcp specific helpers to calculate the tcp pseudo header checksum instead of the csum_*_magic ones. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240613210036.1125203-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14net/mlx5e: Fix outdated comment in features checkGal Pressman
The code no longer treats only UDP tunnels, adjust the outdated comment. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240613210036.1125203-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14net/mlx5: Replace strcpy with strscpyMoshe Shemesh
Replace deprecated strcpy with strscpy. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240613210036.1125203-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14net/mlx5: CT: Separate CT and CT-NAT tuple entriesChris Mi
Currently a ct entry is stored in both ct and ct-nat tables. ct action is directed to the ct table, while ct nat action is directed to the nat table. ct-nat entries perform the nat header rewrites, if required. The current design assures that a ct action will match in hardware even if the tuple has nat configured, it will just not execute it. However, storing each connection in two tables increases the system's memory consumption while reducing its insertion rate. Offload a connection to either ct or the ct-nat table. Add a miss fall-through rule from ct-nat table to the ct table allowing ct(nat) action on non-natted connections. ct action on natted connections, by default, will be handled by the software miss path. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240613210036.1125203-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14net/mlx5: Correct TASR typo into TSARCosmin Ratiu
TSAR is the correct spelling (Transmit Scheduling ARbiter). Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240613210036.1125203-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts, no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13auxbus: make to_auxiliary_drv accept and return a constant pointerGreg Kroah-Hartman
In the quest to make struct device constant, start by making to_auxiliary_drv() return a constant pointer so that drivers that call this can be fixed up before the driver core changes. As the return type previously was not constant, also fix up all callers that were assuming that the pointer was not going to be a constant one in order to not break the build. Cc: Dave Ertman <david.m.ertman@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Bingbu Cao <bingbu.cao@intel.com> Cc: Tianshu Qiu <tian.shu.qiu@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Michael Chan <michael.chan@broadcom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Cc: Bard Liao <yung-chuan.liao@linux.intel.com> Cc: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Cc: Daniel Baluta <daniel.baluta@nxp.com> Cc: Kai Vehmanen <kai.vehmanen@linux.intel.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: linux-media@vger.kernel.org Cc: netdev@vger.kernel.org Cc: intel-wired-lan@lists.osuosl.org Cc: linux-rdma@vger.kernel.org Cc: sound-open-firmware@alsa-project.org Cc: linux-sound@vger.kernel.org Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> # drivers/media/pci/intel/ipu6 Acked-by: Mark Brown <broonie@kernel.org> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20240611130103.3262749-7-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12net/mlx5e: flower: validate encapsulation control flagsAsbjørn Sloth Tønnesen
Encapsulation control flags are currently not used anywhere, so all flags are currently unsupported by all drivers. This patch adds validation of this assumption, so that encapsulation flags may be used in the future. In case any encapsulation control flags are masked, flow_rule_match_has_enc_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/20240609173358.193178-4-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12mlxsw: spectrum_router: Apply user-defined multipath hash seedPetr Machata
When Spectrum machines compute hash for the purposes of ECMP routing, they use a seed specified through RECR_v2 (Router ECMP Configuration Register). Up until now mlxsw computed the seed by hashing the machine's base MAC. Now that we can optionally have a user-provided seed, use that if possible. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607151357.421181-4-petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10net/mlx5e: Fix features validation check for tunneled UDP (non-VXLAN) packetsGal Pressman
Move the vxlan_features_check() call to after we verified the packet is a tunneled VXLAN packet. Without this, tunneled UDP non-VXLAN packets (for ex. GENENVE) might wrongly not get offloaded. In some cases, it worked by chance as GENEVE header is the same size as VXLAN, but it is obviously incorrect. Fixes: e3cfc7e6b7bd ("net/mlx5e: TX, Add geneve tunnel stateless offload support") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10mlxsw: spectrum_acl: Fix ACL scale regression and firmware errorsIdo Schimmel
ACLs that reside in the algorithmic TCAM (A-TCAM) in Spectrum-2 and newer ASICs can share the same mask if their masks only differ in up to 8 consecutive bits. For example, consider the following filters: # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 192.0.2.0/24 action drop # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 198.51.100.128/25 action drop The second filter can use the same mask as the first (dst_ip/24) with a delta of 1 bit. However, the above only works because the two filters have different values in the common unmasked part (dst_ip/24). When entries have the same value in the common unmasked part they create undesired collisions in the device since many entries now have the same key. This leads to firmware errors such as [1] and to a reduced scale. Fix by adjusting the hash table key to only include the value in the common unmasked part. That is, without including the delta bits. That way the driver will detect the collision during filter insertion and spill the filter into the circuit TCAM (C-TCAM). Add a test case that fails without the fix and adjust existing cases that check C-TCAM spillage according to the above limitation. [1] mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=3379b18a00003394,reg_id=3027(ptce3),type=write,status=8(resource not available)) Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP") Reported-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>