git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-10-12	net/mlx5e: Don't offload internal port if filter device is out device	Jianbo Liu
	In the cited commit, if the routing device is ovs internal port, the out device is set to uplink, and packets go out after encapsulation. If filter device is uplink, it can trigger the following syndrome: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 3966): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xcdb051), err(-22) Fix this issue by not offloading internal port if filter device is out device. In this case, packets are not forwarded to the root table to be processed, the termination table is used instead to forward them from uplink to uplink. Fixes: 100ad4e2d758 ("net/mlx5e: Offload internal port as encap route device") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12	net/mlx5e: Take RTNL lock before triggering netdev notifiers	Lama Kayal
	Hold RTNL lock when calling xdp_set_features() with a registered netdev, as the call triggers the netdev notifiers. This could happen when switching from nic profile to uplink representor for example. Similar logic which fixed a similar scenario was previously introduced in the following commit: commit 72cc65497065 net/mlx5e: Take RTNL lock when needed before calling xdp_set_features(). This fixes the following assertion and warning call trace: RTNL: assertion failed at net/core/dev.c (1961) WARNING: CPU: 13 PID: 2529 at net/core/dev.c:1961 call_netdevice_notifiers_info+0x7c/0x80 Modules linked in: rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_core zram zsmalloc fuse CPU: 13 PID: 2529 Comm: devlink Not tainted 6.5.0_for_upstream_min_debug_2023_09_07_20_04 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:call_netdevice_notifiers_info+0x7c/0x80 Code: 8f ff 80 3d 77 0d 16 01 00 75 c5 ba a9 07 00 00 48 c7 c6 c4 bb 0d 82 48 c7 c7 18 c8 06 82 c6 05 5b 0d 16 01 01 e8 44 f6 8c ff <0f> 0b eb a2 0f 1f 44 00 00 55 48 89 e5 41 54 48 83 e4 f0 48 83 ec RSP: 0018:ffff88819930f7f0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff8309f740 RCX: 0000000000000027 RDX: ffff88885fb5b5c8 RSI: 0000000000000001 RDI: ffff88885fb5b5c0 RBP: 0000000000000028 R08: ffff88887ffabaa8 R09: 0000000000000003 R10: ffff88887fecbac0 R11: ffff88887ff7bac0 R12: ffff88819930f810 R13: ffff88810b7fea40 R14: ffff8881154e8fd8 R15: ffff888107e881a0 FS: 00007f3ad248f800(0000) GS:ffff88885fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563b85f164e0 CR3: 0000000113b5c006 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x79/0x120 ? call_netdevice_notifiers_info+0x7c/0x80 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? call_netdevice_notifiers_info+0x7c/0x80 call_netdevice_notifiers+0x2e/0x50 mlx5e_set_xdp_feature+0x21/0x50 [mlx5_core] mlx5e_build_rep_params+0x97/0x130 [mlx5_core] mlx5e_init_ul_rep+0x9f/0x100 [mlx5_core] mlx5e_netdev_init_profile+0x76/0x110 [mlx5_core] mlx5e_netdev_attach_profile+0x1f/0x90 [mlx5_core] mlx5e_netdev_change_profile+0x92/0x160 [mlx5_core] mlx5e_vport_rep_load+0x329/0x4a0 [mlx5_core] mlx5_esw_offloads_rep_load+0x9e/0xf0 [mlx5_core] esw_offloads_enable+0x4bc/0xe90 [mlx5_core] mlx5_eswitch_enable_locked+0x3c8/0x570 [mlx5_core] ? kmalloc_trace+0x25/0x80 mlx5_devlink_eswitch_mode_set+0x224/0x680 [mlx5_core] ? devlink_get_from_attrs_lock+0x9e/0x110 devlink_nl_cmd_eswitch_set_doit+0x60/0xe0 genl_family_rcv_msg_doit+0xd0/0x120 genl_rcv_msg+0x180/0x2b0 ? devlink_get_from_attrs_lock+0x110/0x110 ? devlink_nl_cmd_eswitch_get_doit+0x290/0x290 ? devlink_pernet_pre_exit+0xf0/0xf0 ? genl_family_rcv_msg_dumpit+0xf0/0xf0 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1fc/0x2c0 netlink_sendmsg+0x232/0x4a0 sock_sendmsg+0x38/0x60 ? _copy_from_user+0x2a/0x60 __sys_sendto+0x110/0x160 ? handle_mm_fault+0x161/0x260 ? do_user_addr_fault+0x276/0x620 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f3ad231340a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffd70aad4b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000c36b00 RCX:00007f3ad231340a RDX: 0000000000000038 RSI: 0000000000c36b00 RDI: 0000000000000003 RBP: 0000000000c36910 R08: 00007f3ad2625200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 </TASK> ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12	net/mlx5e: XDP, Fix XDP_REDIRECT mpwqe page fragment leaks on shutdown	Dragos Tatulea
	When mlx5e_xdp_xmit is called without the XDP_XMIT_FLUSH set it is possible that it leaves a mpwqe session open. That is ok during runtime: the session will be closed on the next call to mlx5e_xdp_xmit. But having a mpwqe session still open at XDP sq close time is problematic: the pc counter is not updated before flushing the contents of the xdpi_fifo. This results in leaking page fragments. The fix is to always close the mpwqe session at the end of mlx5e_xdp_xmit, regardless of the XDP_XMIT_FLUSH flag being set or not. Fixes: 5e0d2eef771e ("net/mlx5e: XDP, Support Enhanced Multi-Packet TX WQE") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12	net/mlx5e: RX, Fix page_pool allocation failure recovery for legacy rq	Dragos Tatulea
	When a page allocation fails during refill in mlx5e_refill_rx_wqes, the page will be released again on the next refill call. This triggers the page_pool negative page fragment count warning below: [ 338.326070] WARNING: CPU: 4 PID: 0 at include/net/page_pool/helpers.h:130 mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] ... [ 338.328993] RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329094] Call Trace: [ 338.329097] <IRQ> [ 338.329100] ? __warn+0x7d/0x120 [ 338.329105] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329173] ? report_bug+0x155/0x180 [ 338.329179] ? handle_bug+0x3c/0x60 [ 338.329183] ? exc_invalid_op+0x13/0x60 [ 338.329187] ? asm_exc_invalid_op+0x16/0x20 [ 338.329192] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329259] mlx5e_post_rx_wqes+0x210/0x5a0 [mlx5_core] [ 338.329327] ? mlx5e_poll_rx_cq+0x88/0x6f0 [mlx5_core] [ 338.329394] mlx5e_napi_poll+0x127/0x6b0 [mlx5_core] [ 338.329461] __napi_poll+0x25/0x1a0 [ 338.329465] net_rx_action+0x28a/0x300 [ 338.329468] __do_softirq+0xcd/0x279 [ 338.329473] irq_exit_rcu+0x6a/0x90 [ 338.329477] common_interrupt+0x82/0xa0 [ 338.329482] </IRQ> This patch fixes the legacy rq case by releasing all allocated fragments and then setting the skip flag on all released fragments. It is important to note that the number of released fragments will be higher than the number of allocated fragments when an allocation error occurs. Fixes: 3f93f82988bc ("net/mlx5e: RX, Defer page release in legacy rq for better recycling") Tested-by: Chris Mason <clm@fb.com> Reported-by: Chris Mason <clm@fb.com> Closes: https://lore.kernel.org/netdev/117FF31A-7BE0-4050-B2BB-E41F224FF72F@meta.com Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12	net/mlx5e: RX, Fix page_pool allocation failure recovery for striding rq	Dragos Tatulea
	When a page allocation fails during refill in mlx5e_post_rx_mpwqes, the page will be released again on the next refill call. This triggers the page_pool negative page fragment count warning below: [ 2436.447717] WARNING: CPU: 1 PID: 2419 at include/net/page_pool/helpers.h:130 mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] ... [ 2436.447895] RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 2436.447991] Call Trace: [ 2436.447975] mlx5e_post_rx_mpwqes+0x1d5/0xcf0 [mlx5_core] [ 2436.447994] <IRQ> [ 2436.447996] ? __warn+0x7d/0x120 [ 2436.448009] ? mlx5e_handle_rx_cqe_mpwrq+0x109/0x1d0 [mlx5_core] [ 2436.448002] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 2436.448044] ? mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core] [ 2436.448061] ? report_bug+0x155/0x180 [ 2436.448065] ? handle_bug+0x36/0x70 [ 2436.448067] ? exc_invalid_op+0x13/0x60 [ 2436.448070] ? asm_exc_invalid_op+0x16/0x20 [ 2436.448079] mlx5e_napi_poll+0x122/0x6b0 [mlx5_core] [ 2436.448077] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 2436.448113] ? generic_exec_single+0x35/0x100 [ 2436.448117] __napi_poll+0x25/0x1a0 [ 2436.448120] net_rx_action+0x28a/0x300 [ 2436.448122] __do_softirq+0xcd/0x279 [ 2436.448126] irq_exit_rcu+0x6a/0x90 [ 2436.448128] sysvec_apic_timer_interrupt+0x6e/0x90 [ 2436.448130] </IRQ> This patch fixes the striding rq case by setting the skip flag on all the wqe pages that were expected to have new pages allocated. Fixes: 4c2a13236807 ("net/mlx5e: RX, Defer page release in striding rq for better recycling") Tested-by: Chris Mason <clm@fb.com> Reported-by: Chris Mason <clm@fb.com> Closes: https://lore.kernel.org/netdev/117FF31A-7BE0-4050-B2BB-E41F224FF72F@meta.com Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12	net/mlx5: Handle fw tracer change ownership event based on MTRC	Maher Sanalla
	Currently, whenever fw issues a change ownership event, the PF that owns the fw tracer drops its ownership directly and the other PFs try to pick up the ownership via what MTRC register suggests. In some cases, driver releases the ownership of the tracer and reacquires it later on. Whenever the driver releases ownership of the tracer, fw issues a change ownership event. This event can be delayed and come after driver has reacquired ownership of the tracer. Thus the late event will trigger the tracer owner PF to release the ownership again and lead to a scenario where no PF is owning the tracer. To prevent the scenario described above, when handling a change ownership event, do not drop ownership of the tracer directly, instead read the fw MTRC register to retrieve the up-to-date owner of the tracer and set it accordingly in driver level. Fixes: f53aaa31cce7 ("net/mlx5: FW tracer, implement tracer logic") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12	net/mlx5: Bridge, fix peer entry ageing in LAG mode	Vlad Buslov
	With current implementation in single FDB LAG mode all packets are processed by eswitch 0 rules. As such, 'peer' FDB entries receive the packets for rules of other eswitches and are responsible for updating the main entry by sending SWITCHDEV_FDB_ADD_TO_BRIDGE notification from their background update wq task. However, this introduces a race condition when non-zero eswitch instance decides to delete a FDB entry, sends SWITCHDEV_FDB_DEL_TO_BRIDGE notification, but another eswitch's update task refreshes the same entry concurrently while its async delete work is still pending on the workque. In such case another SWITCHDEV_FDB_ADD_TO_BRIDGE event may be generated and entry will remain stuck in FDB marked as 'offloaded' since no more SWITCHDEV_FDB_DEL_TO_BRIDGE notifications are sent for deleting the peer entries. Fix the issue by synchronously marking deleted entries with MLX5_ESW_BRIDGE_FLAG_DELETED flag and skipping them in background update job. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12	net/mlx5: E-switch, register event handler before arming the event	Shay Drory
	Currently, mlx5 is registering event handler for vport context change event some time after arming the event. this can lead to missing an event, which will result in wrong rules in the FDB. Hence, register the event handler before arming the event. This solution is valid since FW is sending vport context change event only on vports which SW armed, and SW arming the vport when enabling it, which is done after the FDB has been created. Fixes: 6933a9379559 ("net/mlx5: E-Switch, Use async events chain") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12	net/mlx5: Perform DMA operations in the right locations	Shay Drory
	The cited patch change mlx5 driver so that during probe DMA operations were performed before pci_enable_device(), and during teardown DMA operations were performed after pci_disable_device(). DMA operations require PCI to be enabled. Hence, The above leads to the following oops in PPC systems[1]. On s390x systems, as reported by Niklas Schnelle, this is a problem because mlx5_pci_init() is where the DMA and coherent mask is set but mlx5_cmd_init() already does a dma_alloc_coherent(). Thus a DMA allocation is done during probe before the correct mask is set. This causes probe to fail initialization of the cmdif SW structs on s390x after that is converted to the common dma-iommu code. This is because on s390x DMA addresses below 4 GiB are reserved on current machines and unlike the old s390x specific DMA API implementation common code enforces DMA masks. Fix it by performing the DMA operations during probe after pci_enable_device() and after the dma mask is set, and during teardown before pci_disable_device(). [1] Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 netconsole rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib iw_cm libiscsi scsi_transport_iscsi ib_cm ib_uverbs ib_core mlx5_core(-) ptp pps_core fuse vmx_crypto crc32c_vpmsum [last unloaded: mlx5_ib] CPU: 1 PID: 8937 Comm: modprobe Not tainted 6.5.0-rc3_for_upstream_min_debug_2023_07_31_16_02 #1 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c000000000423388 LR: c0000000001e733c CTR: c0000000001e4720 REGS: c0000000055636d0 TRAP: 0380 Not tainted (6.5.0-rc3_for_upstream_min_debug_2023_07_31_16_02) MSR: 8000000000009033 CR: 24008884 XER: 20040000 CFAR: c0000000001e7338 IRQMASK: 0 NIP [c000000000423388] __free_pages+0x28/0x160 LR [c0000000001e733c] dma_direct_free+0xac/0x190 Call Trace: [c000000005563970] [5deadbeef0000100] 0x5deadbeef0000100 (unreliable) [c0000000055639b0] [c0000000003d46cc] kfree+0x7c/0x150 [c000000005563a40] [c0000000001e47c8] dma_free_attrs+0xa8/0x1a0 [c000000005563aa0] [c008000000d0064c] mlx5_cmd_cleanup+0xa4/0x100 [mlx5_core] [c000000005563ad0] [c008000000cf629c] mlx5_mdev_uninit+0xf4/0x140 [mlx5_core] [c000000005563b00] [c008000000cf6448] remove_one+0x160/0x1d0 [mlx5_core] [c000000005563b40] [c000000000958540] pci_device_remove+0x60/0x110 [c000000005563b80] [c000000000a35e80] device_remove+0x70/0xd0 [c000000005563bb0] [c000000000a37a38] device_release_driver_internal+0x2a8/0x330 [c000000005563c00] [c000000000a37b8c] driver_detach+0x8c/0x160 [c000000005563c40] [c000000000a35350] bus_remove_driver+0x90/0x110 [c000000005563c80] [c000000000a38948] driver_unregister+0x48/0x90 [c000000005563cf0] [c000000000957e38] pci_unregister_driver+0x38/0x150 [c000000005563d40] [c008000000eb6140] mlx5_cleanup+0x38/0x90 [mlx5_core] Fixes: 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and reload routines") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-11	netdev: replace napi_reschedule with napi_schedule	Christian Marangi
	Now that napi_schedule return a bool, we can drop napi_reschedule that does the same exact function. The function comes from a very old commit bfe13f54f502 ("ibm_emac: Convert to use napi_struct independent of struct net_device") and the purpose is actually deprecated in favour of different logic. Convert every user of napi_reschedule to napi_schedule. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> # ath10k Acked-by: Nick Child <nnac123@linux.ibm.com> # ibm Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for can/dev/rx-offload.c Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20231009133754.9834-3-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10	net/mlx5e: Again mutually exclude RX-FCS and RX-port-timestamp	Will Mortensen
	Commit 1e66220948df8 ("net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change") seems to have accidentally inverted the logic added in commit 0bc73ad46a76 ("net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp"). The impact of this is a little unclear since it seems the FCS scattered with RX-FCS is (usually?) correct regardless. Fixes: 1e66220948df8 ("net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change") Tested-by: Charlotte Tan <charlotte@extrahop.com> Reviewed-by: Charlotte Tan <charlotte@extrahop.com> Cc: Adham Faris <afaris@nvidia.com> Cc: Aya Levin <ayal@nvidia.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Moshe Shemesh <moshe@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Will Mortensen <will@extrahop.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20231006053706.514618-1-will@extrahop.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10	mlxsw: spectrum_ethtool: Fix -Wformat-truncation warning	Ido Schimmel
	Ethtool stats strings cannot be longer than 32 characters ('ETH_GSTRING_LEN'), including the terminating null byte. The format string '%.29s_%.1d' can exceed this limitation if the per-TC counter name exceeds 28 characters. Together with the underscore, the two digits of the TC (bounded at 16) and the terminating null byte, more than 32 characters will be used. Fix this by bounding the counter name at 28 characters which suppresses the following build warning [1]. This does not affect ethtool output since the longest counter name does not exceed this limitation. [1] drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c: In function ‘mlxsw_sp_port_get_strings’: drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c:622:58: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=] 622 \| snprintf(p, ETH_GSTRING_LEN, "%.29s_%.1d", \| ^ In function ‘mlxsw_sp_port_get_tc_strings’, inlined from ‘mlxsw_sp_port_get_strings’ at drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c:677:4: drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c:622:17: note: ‘snprintf’ output between 3 and 33 bytes into a destination of size 32 622 \| snprintf(p, ETH_GSTRING_LEN, "%.29s_%.1d", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 623 \| mlxsw_sp_port_hw_tc_stats[i].str, tc); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c: In function ‘mlxsw_sp_port_get_strings’: drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c:622:58: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=] 622 \| snprintf(p, ETH_GSTRING_LEN, "%.29s_%.1d", \| ^ In function ‘mlxsw_sp_port_get_tc_strings’, inlined from ‘mlxsw_sp_port_get_strings’ at drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c:677:4: drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c:622:17: note: ‘snprintf’ output between 3 and 33 bytes into a destination of size 32 622 \| snprintf(p, ETH_GSTRING_LEN, "%.29s_%.1d", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 623 \| mlxsw_sp_port_hw_tc_stats[i].str, tc); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10	mlxsw: core_thermal: Fix -Wformat-truncation warning	Ido Schimmel
	The name of a thermal zone device cannot be longer than 19 characters ('THERMAL_NAME_LENGTH - 1'). The format string 'mlxsw-lc%d-gearbox%d' can exceed this limitation if the maximum number of line cards and the maximum number of gearboxes on each line card cannot be represented using a single digit. This is not the case with current systems nor future ones. Therefore, increase the size of the result buffer beyond 'THERMAL_NAME_LENGTH' and suppress the following build warning [1]. If this limitation is ever exceeded, we will know about it since the thermal core validates the thermal device's name during registration. [1] drivers/net/ethernet/mellanox/mlxsw/core_thermal.c: In function ‘mlxsw_thermal_gearboxes_init.constprop’: drivers/net/ethernet/mellanox/mlxsw/core_thermal.c:543:71: error: ‘%d’ directive output may be truncated writing between 1 and 3 bytes into a region of size between 1 and 3 [-Werror=format-truncation=] 543 \| snprintf(tz_name, sizeof(tz_name), "mlxsw-lc%d-gearbox%d", \| ^~ In function ‘mlxsw_thermal_gearbox_tz_init’, inlined from ‘mlxsw_thermal_gearboxes_init.constprop’ at drivers/net/ethernet/mellanox/mlxsw/core_thermal.c:611:9: drivers/net/ethernet/mellanox/mlxsw/core_thermal.c:543:52: note: directive argument in the range [1, 255] 543 \| snprintf(tz_name, sizeof(tz_name), "mlxsw-lc%d-gearbox%d", \| ^~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlxsw/core_thermal.c:543:17: note: ‘snprintf’ output between 19 and 23 bytes into a destination of size 20 543 \| snprintf(tz_name, sizeof(tz_name), "mlxsw-lc%d-gearbox%d", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 544 \| gearbox_tz->slot_index, gearbox_tz->module + 1); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10	net/mlx5e: macsec: use update_pn flag instead of PN comparation	Radu Pirea (NXP OSS)
	When updating the SA, use the new update_pn flags instead of comparing the new PN with the initial one. Comparing the initial PN value with the new value will allow the user to update the SA using the initial PN value as a parameter like this: $ ip macsec add macsec0 tx sa 0 pn 1 on key 00 \ ead3664f508eb06c40ac7104cdae4ce5 $ ip macsec set macsec0 tx sa 0 pn 1 off Fixes: 8ff0ac5be144 ("net/mlx5: Add MACsec offload Tx command support") Fixes: aae3454e4d4c ("net/mlx5e: Add MACsec offload Rx command support") Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-08	mlxsw: fix mlxsw_sp2_nve_vxlan_learning_set() return type	Dan Carpenter
	The mlxsw_sp2_nve_vxlan_learning_set() function is supposed to return zero on success or negative error codes. So it needs to be type int instead of bool. Fixes: 4ee70efab68d ("mlxsw: spectrum_nve: Add support for VXLAN on Spectrum-2") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-06	mlxsw: core_acl_flex_keys: Fill blocks with high entropy first	Amit Cohen
	The previous patches prepared the code to allow separating between choosing blocks and filling blocks. Do not add blocks as part of the loop that chooses them. When all the required blocks are set in the bitmap 'chosen_blocks_bm', start filling blocks. Iterate over the bitmap twice - first add only blocks that are marked with 'high_entropy' flag. Then, fill the rest of the blocks. The idea is to place key blocks with high entropy in blocks 0 to 5. See more details in previous patches. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-06	mlxsw: core_acl_flex_keys: Save chosen elements in all blocks per search	Amit Cohen
	Currently, mlxsw_afk_picker() chooses which blocks will be used for a given list of elements, and fills the blocks during the searching - when a key block is found with most hits, it adds it and removes the elements from the count of hits. This should be changed as we want to be able to choose which blocks will be placed in blocks 0 to 5. To separate between choosing blocks and filling blocks, several pre-changes are required. Currently, the indication of whether all elements were found in the chosen blocks is by the structure 'key_info->elusage'. This structure is updated when block is filled as part of mlxsw_afk_picker_key_info_add(). A following patch will call this function only after choosing all the blocks. Add a bitmap called 'elusage_chosen' to store which elements were chosen in the chosen blocks. Change the condition in the loop to check elements that were chosen, not elements that were already filled in the blocks. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-06	mlxsw: core_acl_flex_keys: Save chosen elements per block	Amit Cohen
	Currently, mlxsw_afk_picker() chooses which blocks will be used for a given list of elements, and fills the blocks during the searching - when a key block is found with most hits, it adds it and removes the elements from the count of hits. This should be changed as we want to be able to choose which blocks will be placed in blocks 0 to 5. To separate between choosing blocks and filling blocks, several pre-changes are required. During the search, the structure 'mlxsw_afk_picker' is used per block, it contains how many elements from the required list appear in the block. When a block is chosen and filled, this bitmap of elements is cleaned. To be able to fill the blocks at the end, add a bitmap called 'chosen_element' as part of picker. When a block is chosen, copy the 'element' bitmap to it. Use the new bitmap as part of mlxsw_afk_picker_key_info_add(). So later, when filling the block will be done at the end of the searching, we will use the copied bitmap that contains the elements that should be used in the block. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-06	mlxsw: core_acl_flex_keys: Add a bitmap to save which blocks are chosen	Amit Cohen
	Currently, mlxsw_afk_picker() chooses which blocks will be used for a given list of elements, and fills the blocks during the searching - when a key block is found with most hits, it adds it and removes the elements from the count of hits. This should be changed as we want to be able to choose which blocks will be placed in blocks 0 to 5. To separate between choosing blocks and filling blocks, several pre-changes are required. The indexes of the chosen blocks should be saved, so then the relevant blocks will be filled at the end of search. Allocate a bitmap for chosen blocks, when a block is found with most hits, set the relevant bit in the bitmap. This bitmap will be used in a following patch. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-06	mlxsw: Mark high entropy key blocks	Amit Cohen
	For 12 key blocks in the A-TCAM, rules are split into two records, which constitute two lookups. The two records are linked using a "large entry key ID". Due to a Spectrum-4 hardware issue, KVD entries that correspond to key blocks 0 to 5 of 12 key blocks A-TCAM entries will be placed in the same KVD pipe if they only differ in their "large entry key ID", as it is ignored. This results in a reduced scale. To reduce the probability of this issue, we can place key blocks with high entropy in blocks 0 to 5. The idea is to place blocks that are changed often in blocks 0 to 5, for example, key blocks that match on IPv4 addresses or the LSBs of IPv6 addresses. Such placement will reduce the probability of these blocks to be same. Mark several blocks with 'high_entropy' flag, so later we will take into account this flag and place them in blocks 0 to 5. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04	mlx5: Fix type of mode parameter in mlx5_dpll_device_mode_get()	Nathan Chancellor
	When building with -Wincompatible-function-pointer-types-strict, a warning designed to catch potential kCFI failures at build time rather than run time due to incorrect function pointer types, there is a warning due to a mismatch between the type of the mode parameter in mlx5_dpll_device_mode_get() vs. what the function pointer prototype for ->mode_get() in 'struct dpll_device_ops' expects. drivers/net/ethernet/mellanox/mlx5/core/dpll.c:141:14: error: incompatible function pointer types initializing 'int ()(const struct dpll_device , void , enum dpll_mode , struct netlink_ext_ack )' with an expression of type 'int (const struct dpll_device , void , u32 , struct netlink_ext_ack )' (aka 'int (const struct dpll_device , void , unsigned int , struct netlink_ext_ack *)') [-Werror,-Wincompatible-function-pointer-types-strict] 141 \| .mode_get = mlx5_dpll_device_mode_get, \| ^~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Change the type of the mode parameter in mlx5_dpll_device_mode_get() to clear up the warning and avoid kCFI failures at run time. Fixes: 496fd0a26bbf ("mlx5: Implement SyncE support using DPLL infrastructure") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231002-net-wifpts-dpll_mode_get-v1-2-a356a16413cf@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04	IPsec packet offload support in multiport RoCE devices	Leon Romanovsky
	This series from Patrisious extends mlx5 to support IPsec packet offload in multiport devices (MPV, see [1] for more details). These devices have single flow steering logic and two netdev interfaces, which require extra logic to manage IPsec configurations as they performed on netdevs. Thanks [1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/ Link: https://lore.kernel.org/all/20231002083832.19746-1-leon@kernel.org Signed-of-by: Leon Romanovsky <leon@kernel.org> * mlx5-next: (576 commits) net/mlx5: Handle IPsec steering upon master unbind/bind net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic net/mlx5: Add create alias flow table function to ipsec roce net/mlx5: Implement alias object allow and create functions net/mlx5: Add alias flow table bits net/mlx5: Store devcom pointer inside IPsec RoCE net/mlx5: Register mlx5e priv to devcom in MPV mode RDMA/mlx5: Send events from IB driver about device affiliation state net/mlx5: Introduce ifc bits for migration in a chunk mode Linux 6.6-rc3 ...
2023-10-03	net: Tree wide: Replace xdp_do_flush_map() with xdp_do_flush().	Sebastian Andrzej Siewior
	xdp_do_flush_map() is deprecated and new code should use xdp_do_flush() instead. Replace xdp_do_flush_map() with xdp_do_flush(). Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Clark Wang <xiaoning.wang@nxp.com> Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: David Arinzon <darinzon@amazon.com> Cc: Edward Cree <ecree.xilinx@gmail.com> Cc: Felix Fietkau <nbd@nbd.name> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Jassi Brar <jaswinder.singh@linaro.org> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: John Crispin <john@phrozen.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Lorenzo Bianconi <lorenzo@kernel.org> Cc: Louis Peens <louis.peens@corigine.com> Cc: Marcin Wojtas <mw@semihalf.com> Cc: Mark Lee <Mark-MC.Lee@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: NXP Linux Team <linux-imx@nxp.com> Cc: Noam Dagan <ndagan@amazon.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Saeed Bishara <saeedb@amazon.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Sean Wang <sean.wang@mediatek.com> Cc: Shay Agroskin <shayagr@amazon.com> Cc: Shenwei Wang <shenwei.wang@nxp.com> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Cc: Wei Fang <wei.fang@nxp.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Arthur Kiyanovski <akiyano@amazon.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230908143215.869913-2-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02	mlxsw: spectrum_span: Annotate struct mlxsw_sp_span with __counted_by	Kees Cook
	Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct mlxsw_sp_span. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Petr Machata <petrm@nvidia.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230929180746.3005922-5-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02	mlxsw: spectrum_router: Annotate struct mlxsw_sp_nexthop_group_info with ↵	Kees Cook
	__counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct mlxsw_sp_nexthop_group_info. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Petr Machata <petrm@nvidia.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230929180746.3005922-4-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02	mlxsw: spectrum: Annotate struct mlxsw_sp_counter_pool with __counted_by	Kees Cook
	Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct mlxsw_sp_counter_pool. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Petr Machata <petrm@nvidia.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230929180746.3005922-3-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02	mlxsw: core: Annotate struct mlxsw_env with __counted_by	Kees Cook
	Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct mlxsw_env. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Petr Machata <petrm@nvidia.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230929180746.3005922-2-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02	mlxsw: Annotate struct mlxsw_linecards with __counted_by	Kees Cook
	Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct mlxsw_linecards. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Petr Machata <petrm@nvidia.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230929180746.3005922-1-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02	net/mlx5: Handle IPsec steering upon master unbind/bind	Patrisious Haddad
	When the master device is unbinded, make sure to clean up all of the steering rules or flow tables that were created over the master, in order to allow proper unbinding of master, and for ethernet traffic to continue to work independently. Upon bringing master device back up and attaching the slave to it, checks if the slave already has IPsec configured and if so reconfigure the rules needed to support RoCE traffic. Note that while master device is unbound, the user is unable to configure IPsec again, since they are in a kind of illegal state in which they are in MPV mode but the slave has no master. However if IPsec was configured before hand, it will continue to work for ethernet traffic while master is unbound, and would continue to work for all traffic when the master is bound back again. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/8434e88912c588affe51b34669900382a132e873.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02	net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic	Patrisious Haddad
	Add empty flow table in RDMA_RX master domain, to forward all received traffic to it, in order to continue through the FW RoCE steering. In order to achieve that however, first we check if the decrypted traffic is RoCEv2, if so then forward it to RDMA_RX domain. But in case the traffic is coming from the slave, have to first send the traffic to an alias table in order to switch gvmi and from there we can go to the appropriate gvmi flow table in RDMA_RX master domain. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/d2200b53158b1e7ef30996812107dd7207485c28.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02	net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic	Patrisious Haddad
	Add steering tables/rules in RDMA_TX master domain, to forward all traffic to IPsec crypto table in NIC domain. But in case the traffic is coming from the slave, have to first send the traffic to an alias table in order to switch gvmi and from there we can go to the appropriate gvmi crypto table in NIC domain. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/7ca5cf1ac5c6979359b8726e97510574e2b3d44d.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02	net/mlx5: Add create alias flow table function to ipsec roce	Patrisious Haddad
	Implements functions which creates an alias flow table, and check if alias flow table creation is even supported, and if successful returns the created alias flow table object id. This function would be used in later patches to allow jumping from one vhca to another, in order to add support for MPV mode. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/36e15ef41586f2a9aacc65b935de18391eef5607.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02	net/mlx5: Implement alias object allow and create functions	Patrisious Haddad
	Add functions which allow one vhca to access another vhca object, and functions that creates an alias object or destroys it. Together they can be used to create cross vhca flow table that is able jump from the steering domain that is managed by one vport, to the steering domain on a different vport. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/f45a9c85319fa783186b8988abcd64955b5f2a0c.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02	net/mlx5: Store devcom pointer inside IPsec RoCE	Patrisious Haddad
	Store the mlx5e priv devcom component within IPsec RoCE to enable the IPsec RoCE code to access the other device's private information. This includes retrieving the necessary device information and the IPsec database, which helps determine if IPsec is configured or not. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/5bb3160ceeb07523542302886da54c78eef0d2af.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02	net/mlx5: Register mlx5e priv to devcom in MPV mode	Patrisious Haddad
	If the device is in MPV mode, the ethernet driver would now register to events from IB driver about core devices affiliation or de-affiliation. Use the key provided in said event to connect each mlx5e priv instance to it's master counterpart, this way the ethernet driver is now aware of who is his master core device and even more, such as knowing if partner device has IPsec configured or not. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/279adfa0aa3a1957a339086f2c1739a50b8e4b68.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02	RDMA/mlx5: Send events from IB driver about device affiliation state	Patrisious Haddad
	Send blocking events from IB driver whenever the device is done being affiliated or if it is removed from an affiliation. This is useful since now the EN driver can register to those event and know when a device is affiliated or not. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/a7491c3e483cfd8d962f5f75b9a25f253043384a.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02	mlxsw: i2c: Utilize standard macros for dividing buffer into chunks	Vadim Pasternak
	Use standard macro DIV_ROUND_UP() to determine the number of chunks required for a given buffer. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-02	mlxsw: core: Extend allowed list of external cooling devices for thermal ↵	Vadim Pasternak
	zone binding Extend the list of allowed external cooling devices for thermal zone binding to include devices of type "emc2305". The motivation is to provide support for the system SN2201, which is equipped with the Spectrum-1 ASIC. The system's airflow control is managed by the EMC2305 RPM-based PWM Fan Speed Controller as the cooling device. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-02	mlxsw: reg: Limit MTBR register payload to a single data record	Vadim Pasternak
	The MTBR register is used to read temperatures from multiple sensors in one transaction, but the driver only reads from a single sensor in each transaction. Rrestrict the payload size of the MTBR register to prevent the transmission of redundant data to the firmware. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-28	Merge tag 'mlx5-updates-2023-09-19' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-09-19 Misc updates for mlx5 driver 1) From Erez, Add support for multicast forwarding to multi destination in bridge offloads with software steering mode (SMFS). 2) From Jianbo, Utilize the maximum aggregated link speed for police action rate. 3) From Moshe, Add a health error syndrome for pci data poisoned 4) From Shay, Enable 4 ports multiport E-switch 5) From Jiri, Trivial SF code cleanup ==================== Link: https://lore.kernel.org/r/20230920063552.296978-1-saeed@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-26	IB/mlx5: Rename 400G_8X speed to comply to naming convention	Patrisious Haddad
	Rename 400G_8X speed to comply to naming convention. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/ac98447cac8379a43fbdb36d56e5fb2b741a97ff.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-26	IB/mlx5: Add support for 800G_8X lane speed	Patrisious Haddad
	Add a check for 800G_8X speed when querying PTYS and report it back correctly when needed. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/26fd0b6e1fac071c3eb779657bb3d8ba47f47c4f.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-22	mlxsw: Edit IPv6 key blocks to use one less block for multicast forwarding	Amit Cohen
	Two ACL regions that are configured by the driver during initialization are the ones used for IPv4 and IPv6 multicast forwarding. Entries residing in these two regions match on the {SIP, DIP, VRID} key elements. Currently for IPv6 region, 9 key blocks are used: * 4 for SIP - 'ipv4_1', 'ipv6_{3,4,5}' * 4 for DIP - 'ipv4_0', 'ipv6_{0,1,2/2b}' * 1 for VRID - 'ipv4_4b' This can be improved by reducing the amount key blocks needed for the IPv6 region to 8. It is possible to use key blocks that mix subsets of the VRID element with subsets of the DIP element. The following key blocks can be used: * 4 for SIP - 'ipv4_1', 'ipv6_{3,4,5}' * 1 for subset of DIP - 'ipv4_0' * 3 for the rest of DIP and subsets of VRID - 'ipv6_{0,1,2/2b}' To make this happen, add VRID sub-elements as part of existing keys - 'ipv6_{0,1,2/2b}'. Note that one of the sub-elements is called VRID_ROUTER_MSB and does not contain bit numbers like the rest, as for Spectrum < 4 this element represents bits 8-10 and for Spectrum-4 it represents bits 8-11. Breaking VRID into 3 sub-elements makes the driver use one less block in IPv6 region for multicast forwarding. The sub-elements can be filled in blocks that are used for destination IP. The algorithm in the driver that chooses which key blocks will be used is lazy and not the optimal one. It searches the block that contains the most elements that are required, chooses it, removes the elements that appear in the chosen block and starts again searching the block that contains the most elements. When key block 'ipv4_4' is defined, the algorithm might choose it, as it contains 2 sub-elements of VRID, then 8 blocks must be chosen for SIP and DIP and we get 9 blocks to match on {SIP, DIP, VRID}. That is why we had to remove key block 'ipv4_4' in a previous patch and use key block that contains one field for VRID. This improvement was tested and indeed 8 blocks are used instead of 9. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-22	mlxsw: spectrum_acl_flex_keys: Add 'ipv4_5b' flex key	Amit Cohen
	The previous patch replaced the key block 'ipv4_4' with 'ipv4_5'. The corresponding block for Spectrum-4 is 'ipv4_4b'. To be consistent, replace key block 'ipv4_4b' with 'ipv4_5b'. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-22	mlxsw: Add 'ipv4_5' flex key	Amit Cohen
	Currently virtual router ID element is broken to two sub-elements - 'VIRT_ROUTER_LSB' and 'VIRT_ROUTER_MSB'. It was broken as this field is broken in 'ipv4_4' flex key which is used for IPv4 in Spectrum < 4. For Spectrum-4, we use 'ipv4_4b' flex key which contains one field for virtual router, this key is not supported in older ASICs. Add 'ipv4_5' flex key which is supported in all ASICs and contains one field for virtual router. Then there is no reason to use 'VIRT_ROUTER_LSB' and 'VIRT_ROUTER_MSB', remove them and add one element 'VIRT_ROUTER' for this field. The motivation is to get rid of 'ipv4_4' flex key, as it might be chosen for IPv6 multicast forwarding region. This will not allow the improvement in a following patch. See more details in the cover letter and in a following patch. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-20	net: ethernet: mellanox: Convert to platform remove callback returning void	Uwe Kleine-König
	The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Eventually after all drivers are converted, .remove_new() is renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-19	net/mlx5: Enable 4 ports multiport E-switch	Shay Drory
	enable_mpesw() assumed only 2 ports are available, fix this by removing that assumption and looping through the existing lag ports to enable multi-port E-switch for cards with more than 2 ports. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-09-19	net/mlx5: Add a health error syndrome for pci data poisoned	Moshe Shemesh
	Add new health error syndrome to indicate that pci data poisoned error has been received while fetching device ICM data. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-09-19	net/mlx5: DR, Handle multi destination action in the right order	Erez Shitrit
	Whenever we have few destinations from Flow-table type we need to put the one that goes to the wire to be the last one. We are using FW in order to get iterator, the FW uses RX for the first destinations and TX for the last destination, if we want the packet to be directed to the wire it should be done in the TX path and not in the RX. The code now checks if the FT is directed to the wire and if so puts it as the last destination. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-09-19	net/mlx5: DR, Add check for multi destination FTE	Erez Shitrit
	The driver should not allow rule that forward to more than one FT in TX flow unless there is a specific support from the FW. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>