git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-03-07	net/mlx5: SD, Add debugfs	Tariq Toukan
	Add debugfs entries that describe the Socket-Direct group. Example: $ grep -H . /sys/kernel/debug/mlx5/0000\:08\:00.0/multi-pf/* /sys/kernel/debug/mlx5/0000:08:00.0/multi-pf/group_id:0x00000101 /sys/kernel/debug/mlx5/0000:08:00.0/multi-pf/primary:0000:08:00.0 vhca 0x0 /sys/kernel/debug/mlx5/0000:08:00.0/multi-pf/secondary_0:0000:09:00.0 vhca 0x2 Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07	net/mlx5: SD, Add informative prints in kernel log	Tariq Toukan
	Print to kernel log when an SD group moves from/to ready state. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07	net/mlx5: SD, Implement steering for primary and secondaries	Tariq Toukan
	Implement the needed SD steering adjustments for the primary and secondaries. While the SD multiple PFs are used to avoid cross-numa memory, when it comes to chip level all traffic goes only through the primary device. The secondaries are forced to silent mode, to guarantee they are not involved in any unexpected ingress/egress traffic. In RX, secondary devices will not have steering objects. Traffic will be steered from the primary device to the RQs of a secondary device using advanced cross-vhca RX steering capabilities. In TX, the primary creates a new TX flow table, which is aliased by the secondaries. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07	net/mlx5: SD, Implement devcom communication and primary election	Tariq Toukan
	Use devcom to communicate between the different devices. Add a new devcom component type for this. Each device registers itself to the devcom component <SD, group ID>. Once all devices of a component are registered, the component becomes ready, and a primary device is elected. In principle, any of the devices can act as a primary, they are all capable, and a random election would've worked. However, we aim to achieve predictability and consistency, hence each group always choses the same device, with the lowest PCI BUS number, as primary. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07	net/mlx5: SD, Implement basic query and instantiation	Tariq Toukan
	Add implementation for querying the MPIR register for Socket-Direct attributes, and instantiating a SD struct accordingly. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07	net/mlx5: SD, Introduce SD lib	Tariq Toukan
	Add Socket-Direct API with empty/minimal implementation. We fill-in the implementation gradually in downstream patches. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-06	mlxbf_gige: add support to display pause frame counters	David Thompson
	This patch updates the mlxbf_gige driver to support the "get_pause_stats()" callback, which enables display of pause frame counters via "ethtool -I -a oob_net0". The pause frame counters are only enabled if the "counters_en" bit is asserted in the LLU general config register. The driver will only report stats, and thus overwrite the default stats state of ETHTOOL_STAT_NOT_SET, if "counters_en" is asserted. Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/20240305212137.3525-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05	dpll: move all dpll<>netdev helpers to dpll code	Jakub Kicinski
	Older versions of GCC really want to know the full definition of the type involved in rcu_assign_pointer(). struct dpll_pin is defined in a local header, net/core can't reach it. Move all the netdev <> dpll code into dpll, where the type is known. Otherwise we'd need multiple function calls to jump between the compilation units. This is the same problem the commit under fixes was trying to address, but with rcu_assign_pointer() not rcu_dereference(). Some of the exports are not needed, networking core can't be a module, we only need exports for the helpers used by drivers. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/all/35a869c8-52e8-177-1d4d-e57578b99b6@linux-m68k.org/ Fixes: 640f41ed33b5 ("dpll: fix build failure due to rcu_dereference_check() on unknown type") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240305013532.694866-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-01	net/mlx5e: Switch to using _bh variant of of spinlock API in port ↵	Rahul Rameshbabu
	timestamping NAPI poll context The NAPI poll context is a softirq context. Do not use normal spinlock API in this context to prevent concurrency issues. Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> CC: Vadim Fedorenko <vadfed@meta.com>
2024-03-01	net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking ↵	Rahul Rameshbabu
	occurs after populating the metadata_map Just simply reordering the functions mlx5e_ptp_metadata_map_put and mlx5e_ptpsq_track_metadata in the mlx5e_txwqe_complete context is not good enough since both the compiler and CPU are free to reorder these two functions. If reordering does occur, the issue that was supposedly fixed by 7e3f3ba97e6c ("net/mlx5e: Track xmit submission to PTP WQ after populating metadata map") will be seen. This will lead to NULL pointer dereferences in mlx5e_ptpsq_mark_ts_cqes_undelivered in the NAPI polling context due to the tracking list being populated before the metadata map. Fixes: 7e3f3ba97e6c ("net/mlx5e: Track xmit submission to PTP WQ after populating metadata map") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> CC: Vadim Fedorenko <vadfed@meta.com>
2024-03-01	net/mlx5e: Fix MACsec state loss upon state update in offload path	Emeel Hakim
	The packet number attribute of the SA is incremented by the device rather than the software stack when enabling hardware offload. Because the packet number attribute is managed by the hardware, the software has no insight into the value of the packet number attribute actually written by the device. Previously when MACsec offload was enabled, the hardware object for handling the offload was destroyed when the SA was disabled. Re-enabling the SA would lead to a new hardware object being instantiated. This new hardware object would not have any recollection of the correct packet number for the SA. Instead, destroy the flow steering rule when deactivating the SA and recreate it upon reactivation, preserving the original hardware object. Fixes: 8ff0ac5be144 ("net/mlx5: Add MACsec offload Tx command support") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01	net/mlx5e: Change the warning when ignore_flow_level is not supported	Jianbo Liu
	Downgrade the print from mlx5_core_warn() to mlx5_core_dbg(), as it is just a statement of fact that firmware doesn't support ignore flow level. And change the wording to "firmware flow level support is missing", to make it more accurate. Fixes: ae2ee3be99a8 ("net/mlx5: CT: Remove warning of ignore_flow_level support for VFs") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Suggested-by: Elliott, Robert (Servers) <elliott@hpe.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01	net/mlx5: Check capability for fw_reset	Moshe Shemesh
	Functions which can't access MFRL (Management Firmware Reset Level) register, have no use of fw_reset structures or events. Remove fw_reset structures allocation and registration for fw reset events notifications for these functions. Having the devlink param enable_remote_dev_reset on functions that don't have this capability is misleading as these functions are not allowed to influence the reset flow. Hence, this patch removes this parameter for such functions. In addition, return not supported on devlink reload action fw_activate for these functions. Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01	net/mlx5: Fix fw reporter diagnose output	Aya Levin
	Restore fw reporter diagnose to print the syndrome even if it is zero. Following the cited commit, in this case (syndrome == 0) command returns no output at all. This fix restores command output in case syndrome is cleared: $ devlink health diagnose pci/0000:82:00.0 reporter fw Syndrome: 0 Fixes: d17f98bf7cc9 ("net/mlx5: devlink health: use retained error fmsg API") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01	net/mlx5: E-switch, Change flow rule destination checking	Jianbo Liu
	The checking in the cited commit is not accurate. In the common case, VF destination is internal, and uplink destination is external. However, uplink destination with packet reformat is considered as internal because firmware uses LB+hairpin to support it. Update the checking so header rewrite rules with both internal and external destinations are not allowed. Fixes: e0e22d59b47a ("net/mlx5: E-switch, Add checking for flow rule destinations") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01	Revert "net/mlx5e: Check the number of elements before walk TC rhashtable"	Saeed Mahameed
	This reverts commit 4e25b661f484df54b6751b65f9ea2434a3b67539. This Commit was mistakenly applied by pulling the wrong tag, remove it. Fixes: 4e25b661f484 ("net/mlx5e: Check the number of elements before walk TC rhashtable") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01	Revert "net/mlx5: Block entering switchdev mode with ns inconsistency"	Gavin Li
	This reverts commit 662404b24a4c4d839839ed25e3097571f5938b9b. The revert is required due to the suspicion it is not good for anything and cause crash. Fixes: 662404b24a4c ("net/mlx5e: Block entering switchdev mode with ns inconsistency") Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-27	thermal: core: Eliminate writable trip points masks	Rafael J. Wysocki
	All of the thermal_zone_device_register_with_trips() callers pass zero writable trip points masks to it, so drop the mask argument from that function and update all of its callers accordingly. This also removes the artificial trip points per zone limit of 32, related to using writable trip points masks. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-02-27	mlxsw: core_thermal: Set THERMAL_TRIP_FLAG_RW_TEMP directly	Rafael J. Wysocki
	It is now possible to flag trip points with THERMAL_TRIP_FLAG_RW_TEMP to allow their temperature to be set from user space via sysfs instead of using a nonzero writable trips mask during thermal zone registration, so make the mlxsw code do that. No intentional functional impact. Note that this change is requisite for dropping the mask argument from thermal_zone_device_register_with_trips() going forward. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-02-15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/dev.c 9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()") 723de3ebef03 ("net: free altname using an RCU callback") net/unix/garbage.c 11498715f266 ("af_unix: Remove io_uring code for GC.") 25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.") drivers/net/ethernet/renesas/ravb_main.c ed4adc07207d ("net: ravb: Count packets instead of descriptors in GbEth RX path" ) c2da9408579d ("ravb: Add Rx checksum offload support for GbEth") net/mptcp/protocol.c bdd70eb68913 ("mptcp: drop the push_pending field") 28e5c1380506 ("mptcp: annotate lockless accesses around read-mostly fields") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-12	net/mlx5e: link NAPI instances to queues and IRQs	Joe Damato
	Make mlx5 compatible with the newly added netlink queue GET APIs. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240209202312.30181-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08	net/mlx5: DPLL, Fix possible use after free after delayed work timer triggers	Jiri Pirko
	I managed to hit following use after free warning recently: [ 2169.711665] ================================================================== [ 2169.714009] BUG: KASAN: slab-use-after-free in __run_timers.part.0+0x179/0x4c0 [ 2169.716293] Write of size 8 at addr ffff88812b326a70 by task swapper/4/0 [ 2169.719022] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 6.8.0-rc2jiri+ #2 [ 2169.720974] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 2169.722457] Call Trace: [ 2169.722756] <IRQ> [ 2169.723024] dump_stack_lvl+0x58/0xb0 [ 2169.723417] print_report+0xc5/0x630 [ 2169.723807] ? __virt_addr_valid+0x126/0x2b0 [ 2169.724268] kasan_report+0xbe/0xf0 [ 2169.724667] ? __run_timers.part.0+0x179/0x4c0 [ 2169.725116] ? __run_timers.part.0+0x179/0x4c0 [ 2169.725570] __run_timers.part.0+0x179/0x4c0 [ 2169.726003] ? call_timer_fn+0x320/0x320 [ 2169.726404] ? lock_downgrade+0x3a0/0x3a0 [ 2169.726820] ? kvm_clock_get_cycles+0x14/0x20 [ 2169.727257] ? ktime_get+0x92/0x150 [ 2169.727630] ? lapic_next_deadline+0x35/0x60 [ 2169.728069] run_timer_softirq+0x40/0x80 [ 2169.728475] __do_softirq+0x1a1/0x509 [ 2169.728866] irq_exit_rcu+0x95/0xc0 [ 2169.729241] sysvec_apic_timer_interrupt+0x6b/0x80 [ 2169.729718] </IRQ> [ 2169.729993] <TASK> [ 2169.730259] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 2169.730755] RIP: 0010:default_idle+0x13/0x20 [ 2169.731190] Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 9a 7f 1f 02 85 c0 7e 07 0f 00 2d cf 69 43 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 c0 93 04 00 [ 2169.732759] RSP: 0018:ffff888100dbfe10 EFLAGS: 00000242 [ 2169.733264] RAX: 0000000000000001 RBX: ffff888100d9c200 RCX: ffffffff8241bd62 [ 2169.733925] RDX: ffffed109a848b15 RSI: 0000000000000004 RDI: ffffffff8127ac55 [ 2169.734566] RBP: 0000000000000004 R08: 0000000000000000 R09: ffffed109a848b14 [ 2169.735200] R10: ffff8884d42458a3 R11: 000000000000ba7e R12: ffffffff83d7d3a0 [ 2169.735835] R13: 1ffff110201b7fc6 R14: 0000000000000000 R15: ffff888100d9c200 [ 2169.736478] ? ct_kernel_exit.constprop.0+0xa2/0xc0 [ 2169.736954] ? do_idle+0x285/0x290 [ 2169.737323] default_idle_call+0x63/0x90 [ 2169.737730] do_idle+0x285/0x290 [ 2169.738089] ? arch_cpu_idle_exit+0x30/0x30 [ 2169.738511] ? mark_held_locks+0x1a/0x80 [ 2169.738917] ? lockdep_hardirqs_on_prepare+0x12e/0x200 [ 2169.739417] cpu_startup_entry+0x30/0x40 [ 2169.739825] start_secondary+0x19a/0x1c0 [ 2169.740229] ? set_cpu_sibling_map+0xbd0/0xbd0 [ 2169.740673] secondary_startup_64_no_verify+0x15d/0x16b [ 2169.741179] </TASK> [ 2169.741686] Allocated by task 1098: [ 2169.742058] kasan_save_stack+0x1c/0x40 [ 2169.742456] kasan_save_track+0x10/0x30 [ 2169.742852] __kasan_kmalloc+0x83/0x90 [ 2169.743246] mlx5_dpll_probe+0xf5/0x3c0 [mlx5_dpll] [ 2169.743730] auxiliary_bus_probe+0x62/0xb0 [ 2169.744148] really_probe+0x127/0x590 [ 2169.744534] __driver_probe_device+0xd2/0x200 [ 2169.744973] device_driver_attach+0x6b/0xf0 [ 2169.745402] bind_store+0x90/0xe0 [ 2169.745761] kernfs_fop_write_iter+0x1df/0x2a0 [ 2169.746210] vfs_write+0x41f/0x790 [ 2169.746579] ksys_write+0xc7/0x160 [ 2169.746947] do_syscall_64+0x6f/0x140 [ 2169.747333] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 2169.748049] Freed by task 1220: [ 2169.748393] kasan_save_stack+0x1c/0x40 [ 2169.748789] kasan_save_track+0x10/0x30 [ 2169.749188] kasan_save_free_info+0x3b/0x50 [ 2169.749621] poison_slab_object+0x106/0x180 [ 2169.750044] __kasan_slab_free+0x14/0x50 [ 2169.750451] kfree+0x118/0x330 [ 2169.750792] mlx5_dpll_remove+0xf5/0x110 [mlx5_dpll] [ 2169.751271] auxiliary_bus_remove+0x2e/0x40 [ 2169.751694] device_release_driver_internal+0x24b/0x2e0 [ 2169.752191] unbind_store+0xa6/0xb0 [ 2169.752563] kernfs_fop_write_iter+0x1df/0x2a0 [ 2169.753004] vfs_write+0x41f/0x790 [ 2169.753381] ksys_write+0xc7/0x160 [ 2169.753750] do_syscall_64+0x6f/0x140 [ 2169.754132] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 2169.754847] Last potentially related work creation: [ 2169.755315] kasan_save_stack+0x1c/0x40 [ 2169.755709] __kasan_record_aux_stack+0x9b/0xf0 [ 2169.756165] __queue_work+0x382/0x8f0 [ 2169.756552] call_timer_fn+0x126/0x320 [ 2169.756941] __run_timers.part.0+0x2ea/0x4c0 [ 2169.757376] run_timer_softirq+0x40/0x80 [ 2169.757782] __do_softirq+0x1a1/0x509 [ 2169.758387] Second to last potentially related work creation: [ 2169.758924] kasan_save_stack+0x1c/0x40 [ 2169.759322] __kasan_record_aux_stack+0x9b/0xf0 [ 2169.759773] __queue_work+0x382/0x8f0 [ 2169.760156] call_timer_fn+0x126/0x320 [ 2169.760550] __run_timers.part.0+0x2ea/0x4c0 [ 2169.760978] run_timer_softirq+0x40/0x80 [ 2169.761381] __do_softirq+0x1a1/0x509 [ 2169.761998] The buggy address belongs to the object at ffff88812b326a00 which belongs to the cache kmalloc-256 of size 256 [ 2169.763061] The buggy address is located 112 bytes inside of freed 256-byte region [ffff88812b326a00, ffff88812b326b00) [ 2169.764346] The buggy address belongs to the physical page: [ 2169.764866] page:000000000f2b1e89 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12b324 [ 2169.765731] head:000000000f2b1e89 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 2169.766484] anon flags: 0x200000000000840(slab\|head\|node=0\|zone=2) [ 2169.767048] page_type: 0xffffffff() [ 2169.767422] raw: 0200000000000840 ffff888100042b40 0000000000000000 dead000000000001 [ 2169.768183] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ 2169.768899] page dumped because: kasan: bad access detected [ 2169.769649] Memory state around the buggy address: [ 2169.770116] ffff88812b326900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 2169.770805] ffff88812b326980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 2169.771485] >ffff88812b326a00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 2169.772173] ^ [ 2169.772787] ffff88812b326a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 2169.773477] ffff88812b326b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 2169.774160] ================================================================== [ 2169.774845] ================================================================== I didn't manage to reproduce it. Though the issue seems to be obvious. There is a chance that the mlx5_dpll_remove() calls cancel_delayed_work() when the work runs and manages to re-arm itself. In that case, after delay timer triggers next attempt to queue it, it works with freed memory. Fix this by using cancel_delayed_work_sync() instead which makes sure that work is done when it returns. Fixes: 496fd0a26bbf ("mlx5: Implement SyncE support using DPLL infrastructure") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240206164328.360313-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07	Merge tag 'mlx5-updates-2024-02-01' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2024-02-01 1) IPSec global stats for xfrm and mlx5 2) XSK memory improvements for non-linear SKBs 3) Software steering debug dump to use seq_file ops 4) Various code clean-ups * tag 'mlx5-updates-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: XDP, Exclude headroom and tailroom from memory calculations net/mlx5e: XSK, Exclude tailroom from non-linear SKBs memory calculations net/mlx5: DR, Change SWS usage to debug fs seq_file interface net/mlx5: Change missing SyncE capability print to debug net/mlx5: Remove initial segmentation duplicate definitions net/mlx5: Return specific error code for timeout on wait_fw_init net/mlx5: SF, Stop waiting for FW as teardown was called net/mlx5: remove fw reporter dump option for non PF net/mlx5: remove fw_fatal reporter dump option for non PF net/mlx5: Rename mlx5_sf_dev_remove Documentation: Fix counter name of mlx5 vnic reporter net/mlx5e: Delete obsolete IPsec code net/mlx5e: Connect mlx5 IPsec statistics with XFRM core xfrm: get global statistics from the offloaded device xfrm: generalize xdo_dev_state_update_curlft to allow statistics update ==================== Link: https://lore.kernel.org/r/20240206005527.1353368-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-06	mlx4: Address spelling errors	Simon Horman
	Address spelling errors flagged by codespell. This patch follows-up on an earlier patch by Colin Ian King, which addressed a spelling error in a user-visible log message [1]. This patch includes that change. [1] https://lore.kernel.org/netdev/20231209225135.4055334-1-colin.i.king@gmail.com/ This patch is intended to cover all files under drivers/net/ethernet/mellanox/mlx4 Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240205-mlx5-codespell-v1-1-63b86dffbb61@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-05	net/mlx5e: XDP, Exclude headroom and tailroom from memory calculations	Carolina Jubran
	In the case of XDP Multi-Buffer with Striding RQ, an extra page is allocated for the linear part of non-linear SKBs. Including headroom and tailroom in the calculation may result in an unnecessary increase in the amount of memory allocated. This could be critical, particularly for large MTUs (e.g. 7975B) and large RQ sizes (e.g. 8192). In this case, the requested page pool size is 64K, but 32K would be sufficient. This causes a failure due to exceeding the page pool size limit of 32K. Exclude headroom and tailroom from SKB size calculations to reduce page pool size. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	net/mlx5e: XSK, Exclude tailroom from non-linear SKBs memory calculations	Carolina Jubran
	Packet data buffers lack reserved headroom or tailroom, and SKBs are allocated on a side memory when needed. Exclude the tailroom from the SKB size calculations. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	net/mlx5: DR, Change SWS usage to debug fs seq_file interface	Hamdan Igbaria
	In current SWS debug dump mechanism we implement the seq_file interface, but we only implement the 'show' callback to dump the whole steering DB with a single call to this callback. However, for large data size the seq_printf function will fail to allocate a buffer with the adequate capacity to hold such data. This patch solves this problem by utilizing the seq_file interface mechanism in the following way: - when the user triggers a dump procedure, we will allocate a list of buffers that hold the whole data dump (in the start callback) - using the start, next, show and stop callbacks of the seq_file API we iterate through the list and dump the whole data Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	net/mlx5: Change missing SyncE capability print to debug	Gal Pressman
	Lack of SyncE capability should not emit a warning, change the print to debug level. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	net/mlx5: Remove initial segmentation duplicate definitions	Gal Pressman
	Device definitions belong in mlx5_ifc, remove the duplicates in mlx5_core.h. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	net/mlx5: Return specific error code for timeout on wait_fw_init	Moshe Shemesh
	The function wait_fw_init() returns same error code either if it breaks waiting due to timeout or other reason. Thus, the function callers print error message on timeout without checking error type. Return different error code for different failure reason and print error message accordingly on wait_fw_init(). Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	net/mlx5: SF, Stop waiting for FW as teardown was called	Moshe Shemesh
	When PF/VF teardown is called the driver sets the flag MLX5_BREAK_FW_WAIT to stop waiting for FW loading and initializing. Same should be applied to SF driver teardown to cut waiting time. On mlx5_sf_dev_remove() set the flag before draining health WQ as recovery flow may also wait for FW reloading while it is not relevant anymore. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	net/mlx5: remove fw reporter dump option for non PF	Moshe Shemesh
	In case function is not a Physical Function it is not allowed to get FW core dump, so if tried it will fail the fw health reporter dump option. Instead of failing, remove the option of fw_fatal health reporter dump for such function. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	net/mlx5: remove fw_fatal reporter dump option for non PF	Moshe Shemesh
	In case function is not a Physical Function it is not allowed to collect crdump, so if tried it will fail the fw_fatal health reporter dump option. Instead of failing on permission, remove the option of fw_fatal health reporter dump for such function. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	net/mlx5: Rename mlx5_sf_dev_remove	Moshe Shemesh
	Mlx5 has two functions with the same name mlx5_sf_dev_remove. Both are static, in different files, so no compilation or logical issue, but it makes it hard to follow the code and some traces even can get both as one leads to the other [1]. Rename one to mlx5_sf_dev_remove_aux() as it actually removes the auxiliary device of the SF. [1] mlx5_sf_dev_remove+0x2a/0x70 [mlx5_core] auxiliary_bus_remove+0x18/0x30 device_release_driver_internal+0x199/0x200 bus_remove_device+0xd7/0x140 device_del+0x153/0x3d0 ? process_one_work+0x16a/0x4b0 mlx5_sf_dev_remove+0x2e/0x90 [mlx5_core] mlx5_sf_dev_table_destroy+0xa0/0x100 [mlx5_core] Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	net/mlx5e: Delete obsolete IPsec code	Leon Romanovsky
	After addition of HW managed counters and implementation drop in flow steering logic, the code in driver which checks syndrome is not reachable anymore. Let's delete it. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	net/mlx5e: Connect mlx5 IPsec statistics with XFRM core	Leon Romanovsky
	Fill integrity, replay and bad trailer counters. As an example, after simulating replay window attack with 5 packets: [leonro@c ~]$ grep XfrmInStateSeqError /proc/net/xfrm_stat XfrmInStateSeqError 5 [leonro@c ~]$ sudo ip -s x s <...> stats: replay-window 0 replay 5 failed 0 Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	xfrm: get global statistics from the offloaded device	Leon Romanovsky
	Iterate over all SAs in order to fill global IPsec statistics. Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05	xfrm: generalize xdo_dev_state_update_curlft to allow statistics update	Leon Romanovsky
	In order to allow drivers to fill all statistics, change the name of xdo_dev_state_update_curlft to be xdo_dev_state_update_stats. Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-01	net/mlx5: DPLL, Implement lock status error value	Jiri Pirko
	Fill-up the lock status error value properly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01	dpll: extend lock_status_get() op by status error and expose to user	Jiri Pirko
	Pass additional argunent status_error over lock_status_get() so drivers can fill it up. In case they do, expose the value over previously introduced attribute to user. Do it only in case the current lock_status is either "unlocked" or "holdover". Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30	mlxsw: remove I2C_CLASS_HWMON from drivers w/o detect and address_list	Heiner Kallweit
	Class-based I2C probing requires detect() and address_list to be set in the I2C client driver, see checks in i2c_detect(). It's misleading to declare I2C_CLASS_HWMON support if this precondition isn't met. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/77b5ab8e-20f2-4310-bd89-57db99e2f53b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-30	mlxsw: Use refcount_t for reference counting	Amit Cohen
	mlxsw driver uses 'unsigned int' for reference counters in several structures. Instead, use refcount_t type which allows us to catch overflow and underflow issues. Change the type of the counters and use the appropriate API. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30	mlxsw: spectrum: Refactor LAG create and destroy code	Amit Cohen
	mlxsw_sp stores an array of LAGs. When a port joins a LAG, in case that this LAG is already in use, we only have to increase the reference counter. Otherwise, we have to search for an unused LAG ID and configure it in hardware. When a port leaves a LAG, we have to destroy it only for the last user. This code can be simplified, for such requirements we usually add get() and put() functions which create and destroy the object. Add mlxsw_sp_lag_{get,put}() and use them. These functions take care of the reference counter and hardware configuration if needed. Change the reference counter to refcount_t type which catches overflow and underflow issues. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30	mlxsw: spectrum: Search for free LAD ID once	Amit Cohen
	Currently, the function mlxsw_sp_lag_index_get() is called twice - first as part of NETDEV_PRECHANGEUPPER event and later as part of NETDEV_CHANGEUPPER. This function will be changed in the next patch. To simplify the code, call it only once as part of NETDEV_CHANGEUPPER event and set an error message using 'extack' in case of failure. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30	mlxsw: spectrum: Query max_lag once	Amit Cohen
	The maximum number of LAGs is queried from core several times. It is used to allocate LAG array, and then to iterate over it. In addition, it is used for PGT initialization. To simplify the code, instead of querying it several times, store the value as part of 'mlxsw_sp' and use it. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30	mlxsw: spectrum: Remove mlxsw_sp_lag_get()	Amit Cohen
	A next patch will add mlxsw_sp_lag_{get,put}() functions to handle LAG reference counting and create/destroy it only for first user/last user. Remove mlxsw_sp_lag_get() function and access LAG array directly. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30	mlxsw: spectrum: Change mlxsw_sp_upper to LAG structure	Amit Cohen
	The structure mlxsw_sp_upper is used only as LAG. Rename it to mlxsw_sp_lag and move it to spectrum.c file, as it is used only there. Move the function mlxsw_sp_lag_get() with the structure. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-24	net/mlx5e: fix a potential double-free in fs_any_create_groups	Dinghao Liu
	When kcalloc() for ft->g succeeds but kvzalloc() for in fails, fs_any_create_groups() will free ft->g. However, its caller fs_any_create_table() will free ft->g again through calling mlx5e_destroy_flow_table(), which will lead to a double-free. Fix this by setting ft->g to NULL in fs_any_create_groups(). Fixes: 0f575c20bf06 ("net/mlx5e: Introduce Flow Steering ANY API") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24	net/mlx5e: fix a double-free in arfs_create_groups	Zhipeng Lu
	When `in` allocated by kvzalloc fails, arfs_create_groups will free ft->g and return an error. However, arfs_create_table, the only caller of arfs_create_groups, will hold this error and call to mlx5e_destroy_flow_table, in which the ft->g will be freed again. Fixes: 1cabe6b0965e ("net/mlx5e: Create aRFS flow tables") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24	net/mlx5e: Ignore IPsec replay window values on sender side	Leon Romanovsky
	XFRM stack doesn't prevent from users to configure replay window in TX side and strongswan sets replay_window to be 1. It causes to failures in validation logic when trying to offload the SA. Replay window is not relevant in TX side and should be ignored. Fixes: cded6d80129b ("net/mlx5e: Store replay window in XFRM attributes") Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>