Age | Commit message (Collapse) | Author |
|
Add debugfs entries that describe the Socket-Direct group.
Example:
$ grep -H . /sys/kernel/debug/mlx5/0000\:08\:00.0/multi-pf/*
/sys/kernel/debug/mlx5/0000:08:00.0/multi-pf/group_id:0x00000101
/sys/kernel/debug/mlx5/0000:08:00.0/multi-pf/primary:0000:08:00.0 vhca 0x0
/sys/kernel/debug/mlx5/0000:08:00.0/multi-pf/secondary_0:0000:09:00.0 vhca 0x2
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Print to kernel log when an SD group moves from/to ready state.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Implement the needed SD steering adjustments for the primary and
secondaries.
While the SD multiple PFs are used to avoid cross-numa memory, when it
comes to chip level all traffic goes only through the primary device.
The secondaries are forced to silent mode, to guarantee they are not
involved in any unexpected ingress/egress traffic.
In RX, secondary devices will not have steering objects. Traffic will be
steered from the primary device to the RQs of a secondary device using
advanced cross-vhca RX steering capabilities.
In TX, the primary creates a new TX flow table, which is aliased by the
secondaries.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Use devcom to communicate between the different devices. Add a new
devcom component type for this.
Each device registers itself to the devcom component <SD, group ID>.
Once all devices of a component are registered, the component becomes
ready, and a primary device is elected.
In principle, any of the devices can act as a primary, they are all
capable, and a random election would've worked. However, we aim to
achieve predictability and consistency, hence each group always choses
the same device, with the lowest PCI BUS number, as primary.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add implementation for querying the MPIR register for Socket-Direct
attributes, and instantiating a SD struct accordingly.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add Socket-Direct API with empty/minimal implementation.
We fill-in the implementation gradually in downstream patches.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
This patch updates the mlxbf_gige driver to support the
"get_pause_stats()" callback, which enables display of
pause frame counters via "ethtool -I -a oob_net0".
The pause frame counters are only enabled if the "counters_en"
bit is asserted in the LLU general config register. The driver
will only report stats, and thus overwrite the default stats
state of ETHTOOL_STAT_NOT_SET, if "counters_en" is asserted.
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20240305212137.3525-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Older versions of GCC really want to know the full definition
of the type involved in rcu_assign_pointer().
struct dpll_pin is defined in a local header, net/core can't
reach it. Move all the netdev <> dpll code into dpll, where
the type is known. Otherwise we'd need multiple function calls
to jump between the compilation units.
This is the same problem the commit under fixes was trying to address,
but with rcu_assign_pointer() not rcu_dereference().
Some of the exports are not needed, networking core can't
be a module, we only need exports for the helpers used by
drivers.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/all/35a869c8-52e8-177-1d4d-e57578b99b6@linux-m68k.org/
Fixes: 640f41ed33b5 ("dpll: fix build failure due to rcu_dereference_check() on unknown type")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240305013532.694866-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
timestamping NAPI poll context
The NAPI poll context is a softirq context. Do not use normal spinlock API
in this context to prevent concurrency issues.
Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
CC: Vadim Fedorenko <vadfed@meta.com>
|
|
occurs after populating the metadata_map
Just simply reordering the functions mlx5e_ptp_metadata_map_put and
mlx5e_ptpsq_track_metadata in the mlx5e_txwqe_complete context is not good
enough since both the compiler and CPU are free to reorder these two
functions. If reordering does occur, the issue that was supposedly fixed by
7e3f3ba97e6c ("net/mlx5e: Track xmit submission to PTP WQ after populating
metadata map") will be seen. This will lead to NULL pointer dereferences in
mlx5e_ptpsq_mark_ts_cqes_undelivered in the NAPI polling context due to the
tracking list being populated before the metadata map.
Fixes: 7e3f3ba97e6c ("net/mlx5e: Track xmit submission to PTP WQ after populating metadata map")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
CC: Vadim Fedorenko <vadfed@meta.com>
|
|
The packet number attribute of the SA is incremented by the device rather
than the software stack when enabling hardware offload. Because the packet
number attribute is managed by the hardware, the software has no insight
into the value of the packet number attribute actually written by the
device.
Previously when MACsec offload was enabled, the hardware object for
handling the offload was destroyed when the SA was disabled. Re-enabling
the SA would lead to a new hardware object being instantiated. This new
hardware object would not have any recollection of the correct packet
number for the SA. Instead, destroy the flow steering rule when
deactivating the SA and recreate it upon reactivation, preserving the
original hardware object.
Fixes: 8ff0ac5be144 ("net/mlx5: Add MACsec offload Tx command support")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Downgrade the print from mlx5_core_warn() to mlx5_core_dbg(), as it
is just a statement of fact that firmware doesn't support ignore flow
level.
And change the wording to "firmware flow level support is missing", to
make it more accurate.
Fixes: ae2ee3be99a8 ("net/mlx5: CT: Remove warning of ignore_flow_level support for VFs")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Suggested-by: Elliott, Robert (Servers) <elliott@hpe.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Functions which can't access MFRL (Management Firmware Reset Level)
register, have no use of fw_reset structures or events. Remove fw_reset
structures allocation and registration for fw reset events notifications
for these functions.
Having the devlink param enable_remote_dev_reset on functions that don't
have this capability is misleading as these functions are not allowed to
influence the reset flow. Hence, this patch removes this parameter for
such functions.
In addition, return not supported on devlink reload action fw_activate
for these functions.
Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Restore fw reporter diagnose to print the syndrome even if it is zero.
Following the cited commit, in this case (syndrome == 0) command returns no
output at all.
This fix restores command output in case syndrome is cleared:
$ devlink health diagnose pci/0000:82:00.0 reporter fw
Syndrome: 0
Fixes: d17f98bf7cc9 ("net/mlx5: devlink health: use retained error fmsg API")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The checking in the cited commit is not accurate. In the common case,
VF destination is internal, and uplink destination is external.
However, uplink destination with packet reformat is considered as
internal because firmware uses LB+hairpin to support it. Update the
checking so header rewrite rules with both internal and external
destinations are not allowed.
Fixes: e0e22d59b47a ("net/mlx5: E-switch, Add checking for flow rule destinations")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
This reverts commit 4e25b661f484df54b6751b65f9ea2434a3b67539.
This Commit was mistakenly applied by pulling the wrong tag, remove it.
Fixes: 4e25b661f484 ("net/mlx5e: Check the number of elements before walk TC rhashtable")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
This reverts commit 662404b24a4c4d839839ed25e3097571f5938b9b.
The revert is required due to the suspicion it is not good for anything
and cause crash.
Fixes: 662404b24a4c ("net/mlx5e: Block entering switchdev mode with ns inconsistency")
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
All of the thermal_zone_device_register_with_trips() callers pass zero
writable trip points masks to it, so drop the mask argument from that
function and update all of its callers accordingly.
This also removes the artificial trip points per zone limit of 32,
related to using writable trip points masks.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
It is now possible to flag trip points with THERMAL_TRIP_FLAG_RW_TEMP
to allow their temperature to be set from user space via sysfs instead
of using a nonzero writable trips mask during thermal zone registration,
so make the mlxsw code do that.
No intentional functional impact.
Note that this change is requisite for dropping the mask argument from
thermal_zone_device_register_with_trips() going forward.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
net/core/dev.c
9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()")
723de3ebef03 ("net: free altname using an RCU callback")
net/unix/garbage.c
11498715f266 ("af_unix: Remove io_uring code for GC.")
25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.")
drivers/net/ethernet/renesas/ravb_main.c
ed4adc07207d ("net: ravb: Count packets instead of descriptors in GbEth RX path"
)
c2da9408579d ("ravb: Add Rx checksum offload support for GbEth")
net/mptcp/protocol.c
bdd70eb68913 ("mptcp: drop the push_pending field")
28e5c1380506 ("mptcp: annotate lockless accesses around read-mostly fields")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Make mlx5 compatible with the newly added netlink queue GET APIs.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240209202312.30181-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I managed to hit following use after free warning recently:
[ 2169.711665] ==================================================================
[ 2169.714009] BUG: KASAN: slab-use-after-free in __run_timers.part.0+0x179/0x4c0
[ 2169.716293] Write of size 8 at addr ffff88812b326a70 by task swapper/4/0
[ 2169.719022] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 6.8.0-rc2jiri+ #2
[ 2169.720974] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 2169.722457] Call Trace:
[ 2169.722756] <IRQ>
[ 2169.723024] dump_stack_lvl+0x58/0xb0
[ 2169.723417] print_report+0xc5/0x630
[ 2169.723807] ? __virt_addr_valid+0x126/0x2b0
[ 2169.724268] kasan_report+0xbe/0xf0
[ 2169.724667] ? __run_timers.part.0+0x179/0x4c0
[ 2169.725116] ? __run_timers.part.0+0x179/0x4c0
[ 2169.725570] __run_timers.part.0+0x179/0x4c0
[ 2169.726003] ? call_timer_fn+0x320/0x320
[ 2169.726404] ? lock_downgrade+0x3a0/0x3a0
[ 2169.726820] ? kvm_clock_get_cycles+0x14/0x20
[ 2169.727257] ? ktime_get+0x92/0x150
[ 2169.727630] ? lapic_next_deadline+0x35/0x60
[ 2169.728069] run_timer_softirq+0x40/0x80
[ 2169.728475] __do_softirq+0x1a1/0x509
[ 2169.728866] irq_exit_rcu+0x95/0xc0
[ 2169.729241] sysvec_apic_timer_interrupt+0x6b/0x80
[ 2169.729718] </IRQ>
[ 2169.729993] <TASK>
[ 2169.730259] asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 2169.730755] RIP: 0010:default_idle+0x13/0x20
[ 2169.731190] Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 9a 7f 1f 02 85 c0 7e 07 0f 00 2d cf 69 43 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 c0 93 04 00
[ 2169.732759] RSP: 0018:ffff888100dbfe10 EFLAGS: 00000242
[ 2169.733264] RAX: 0000000000000001 RBX: ffff888100d9c200 RCX: ffffffff8241bd62
[ 2169.733925] RDX: ffffed109a848b15 RSI: 0000000000000004 RDI: ffffffff8127ac55
[ 2169.734566] RBP: 0000000000000004 R08: 0000000000000000 R09: ffffed109a848b14
[ 2169.735200] R10: ffff8884d42458a3 R11: 000000000000ba7e R12: ffffffff83d7d3a0
[ 2169.735835] R13: 1ffff110201b7fc6 R14: 0000000000000000 R15: ffff888100d9c200
[ 2169.736478] ? ct_kernel_exit.constprop.0+0xa2/0xc0
[ 2169.736954] ? do_idle+0x285/0x290
[ 2169.737323] default_idle_call+0x63/0x90
[ 2169.737730] do_idle+0x285/0x290
[ 2169.738089] ? arch_cpu_idle_exit+0x30/0x30
[ 2169.738511] ? mark_held_locks+0x1a/0x80
[ 2169.738917] ? lockdep_hardirqs_on_prepare+0x12e/0x200
[ 2169.739417] cpu_startup_entry+0x30/0x40
[ 2169.739825] start_secondary+0x19a/0x1c0
[ 2169.740229] ? set_cpu_sibling_map+0xbd0/0xbd0
[ 2169.740673] secondary_startup_64_no_verify+0x15d/0x16b
[ 2169.741179] </TASK>
[ 2169.741686] Allocated by task 1098:
[ 2169.742058] kasan_save_stack+0x1c/0x40
[ 2169.742456] kasan_save_track+0x10/0x30
[ 2169.742852] __kasan_kmalloc+0x83/0x90
[ 2169.743246] mlx5_dpll_probe+0xf5/0x3c0 [mlx5_dpll]
[ 2169.743730] auxiliary_bus_probe+0x62/0xb0
[ 2169.744148] really_probe+0x127/0x590
[ 2169.744534] __driver_probe_device+0xd2/0x200
[ 2169.744973] device_driver_attach+0x6b/0xf0
[ 2169.745402] bind_store+0x90/0xe0
[ 2169.745761] kernfs_fop_write_iter+0x1df/0x2a0
[ 2169.746210] vfs_write+0x41f/0x790
[ 2169.746579] ksys_write+0xc7/0x160
[ 2169.746947] do_syscall_64+0x6f/0x140
[ 2169.747333] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 2169.748049] Freed by task 1220:
[ 2169.748393] kasan_save_stack+0x1c/0x40
[ 2169.748789] kasan_save_track+0x10/0x30
[ 2169.749188] kasan_save_free_info+0x3b/0x50
[ 2169.749621] poison_slab_object+0x106/0x180
[ 2169.750044] __kasan_slab_free+0x14/0x50
[ 2169.750451] kfree+0x118/0x330
[ 2169.750792] mlx5_dpll_remove+0xf5/0x110 [mlx5_dpll]
[ 2169.751271] auxiliary_bus_remove+0x2e/0x40
[ 2169.751694] device_release_driver_internal+0x24b/0x2e0
[ 2169.752191] unbind_store+0xa6/0xb0
[ 2169.752563] kernfs_fop_write_iter+0x1df/0x2a0
[ 2169.753004] vfs_write+0x41f/0x790
[ 2169.753381] ksys_write+0xc7/0x160
[ 2169.753750] do_syscall_64+0x6f/0x140
[ 2169.754132] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 2169.754847] Last potentially related work creation:
[ 2169.755315] kasan_save_stack+0x1c/0x40
[ 2169.755709] __kasan_record_aux_stack+0x9b/0xf0
[ 2169.756165] __queue_work+0x382/0x8f0
[ 2169.756552] call_timer_fn+0x126/0x320
[ 2169.756941] __run_timers.part.0+0x2ea/0x4c0
[ 2169.757376] run_timer_softirq+0x40/0x80
[ 2169.757782] __do_softirq+0x1a1/0x509
[ 2169.758387] Second to last potentially related work creation:
[ 2169.758924] kasan_save_stack+0x1c/0x40
[ 2169.759322] __kasan_record_aux_stack+0x9b/0xf0
[ 2169.759773] __queue_work+0x382/0x8f0
[ 2169.760156] call_timer_fn+0x126/0x320
[ 2169.760550] __run_timers.part.0+0x2ea/0x4c0
[ 2169.760978] run_timer_softirq+0x40/0x80
[ 2169.761381] __do_softirq+0x1a1/0x509
[ 2169.761998] The buggy address belongs to the object at ffff88812b326a00
which belongs to the cache kmalloc-256 of size 256
[ 2169.763061] The buggy address is located 112 bytes inside of
freed 256-byte region [ffff88812b326a00, ffff88812b326b00)
[ 2169.764346] The buggy address belongs to the physical page:
[ 2169.764866] page:000000000f2b1e89 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12b324
[ 2169.765731] head:000000000f2b1e89 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 2169.766484] anon flags: 0x200000000000840(slab|head|node=0|zone=2)
[ 2169.767048] page_type: 0xffffffff()
[ 2169.767422] raw: 0200000000000840 ffff888100042b40 0000000000000000 dead000000000001
[ 2169.768183] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[ 2169.768899] page dumped because: kasan: bad access detected
[ 2169.769649] Memory state around the buggy address:
[ 2169.770116] ffff88812b326900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2169.770805] ffff88812b326980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2169.771485] >ffff88812b326a00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2169.772173] ^
[ 2169.772787] ffff88812b326a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2169.773477] ffff88812b326b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2169.774160] ==================================================================
[ 2169.774845] ==================================================================
I didn't manage to reproduce it. Though the issue seems to be obvious.
There is a chance that the mlx5_dpll_remove() calls
cancel_delayed_work() when the work runs and manages to re-arm itself.
In that case, after delay timer triggers next attempt to queue it,
it works with freed memory.
Fix this by using cancel_delayed_work_sync() instead which makes sure
that work is done when it returns.
Fixes: 496fd0a26bbf ("mlx5: Implement SyncE support using DPLL infrastructure")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240206164328.360313-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2024-02-01
1) IPSec global stats for xfrm and mlx5
2) XSK memory improvements for non-linear SKBs
3) Software steering debug dump to use seq_file ops
4) Various code clean-ups
* tag 'mlx5-updates-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: XDP, Exclude headroom and tailroom from memory calculations
net/mlx5e: XSK, Exclude tailroom from non-linear SKBs memory calculations
net/mlx5: DR, Change SWS usage to debug fs seq_file interface
net/mlx5: Change missing SyncE capability print to debug
net/mlx5: Remove initial segmentation duplicate definitions
net/mlx5: Return specific error code for timeout on wait_fw_init
net/mlx5: SF, Stop waiting for FW as teardown was called
net/mlx5: remove fw reporter dump option for non PF
net/mlx5: remove fw_fatal reporter dump option for non PF
net/mlx5: Rename mlx5_sf_dev_remove
Documentation: Fix counter name of mlx5 vnic reporter
net/mlx5e: Delete obsolete IPsec code
net/mlx5e: Connect mlx5 IPsec statistics with XFRM core
xfrm: get global statistics from the offloaded device
xfrm: generalize xdo_dev_state_update_curlft to allow statistics update
====================
Link: https://lore.kernel.org/r/20240206005527.1353368-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Address spelling errors flagged by codespell.
This patch follows-up on an earlier patch by Colin Ian King,
which addressed a spelling error in a user-visible log message [1].
This patch includes that change.
[1] https://lore.kernel.org/netdev/20231209225135.4055334-1-colin.i.king@gmail.com/
This patch is intended to cover all files under
drivers/net/ethernet/mellanox/mlx4
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20240205-mlx5-codespell-v1-1-63b86dffbb61@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the case of XDP Multi-Buffer with Striding RQ, an extra
page is allocated for the linear part of non-linear SKBs.
Including headroom and tailroom in the calculation may
result in an unnecessary increase in the amount of memory
allocated. This could be critical, particularly for large
MTUs (e.g. 7975B) and large RQ sizes (e.g. 8192).
In this case, the requested page pool size is 64K, but
32K would be sufficient. This causes a failure due to
exceeding the page pool size limit of 32K.
Exclude headroom and tailroom from SKB size calculations
to reduce page pool size.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Packet data buffers lack reserved headroom or tailroom,
and SKBs are allocated on a side memory when needed.
Exclude the tailroom from the SKB size calculations.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In current SWS debug dump mechanism we implement the seq_file interface,
but we only implement the 'show' callback to dump the whole steering DB
with a single call to this callback.
However, for large data size the seq_printf function will fail to
allocate a buffer with the adequate capacity to hold such data.
This patch solves this problem by utilizing the seq_file interface
mechanism in the following way:
- when the user triggers a dump procedure, we will allocate a list of
buffers that hold the whole data dump (in the start callback)
- using the start, next, show and stop callbacks of the seq_file
API we iterate through the list and dump the whole data
Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Lack of SyncE capability should not emit a warning, change the print to
debug level.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Device definitions belong in mlx5_ifc, remove the duplicates in
mlx5_core.h.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The function wait_fw_init() returns same error code either if it breaks
waiting due to timeout or other reason. Thus, the function callers print
error message on timeout without checking error type.
Return different error code for different failure reason and print error
message accordingly on wait_fw_init().
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When PF/VF teardown is called the driver sets the flag
MLX5_BREAK_FW_WAIT to stop waiting for FW loading and initializing. Same
should be applied to SF driver teardown to cut waiting time. On
mlx5_sf_dev_remove() set the flag before draining health WQ as recovery
flow may also wait for FW reloading while it is not relevant anymore.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In case function is not a Physical Function it is not allowed to get FW
core dump, so if tried it will fail the fw health reporter dump option.
Instead of failing, remove the option of fw_fatal health reporter dump
for such function.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In case function is not a Physical Function it is not allowed to collect
crdump, so if tried it will fail the fw_fatal health reporter dump
option. Instead of failing on permission, remove the option of fw_fatal
health reporter dump for such function.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Mlx5 has two functions with the same name mlx5_sf_dev_remove. Both are
static, in different files, so no compilation or logical issue, but it
makes it hard to follow the code and some traces even can get both as
one leads to the other [1]. Rename one to mlx5_sf_dev_remove_aux() as it
actually removes the auxiliary device of the SF.
[1]
mlx5_sf_dev_remove+0x2a/0x70 [mlx5_core]
auxiliary_bus_remove+0x18/0x30
device_release_driver_internal+0x199/0x200
bus_remove_device+0xd7/0x140
device_del+0x153/0x3d0
? process_one_work+0x16a/0x4b0
mlx5_sf_dev_remove+0x2e/0x90 [mlx5_core]
mlx5_sf_dev_table_destroy+0xa0/0x100 [mlx5_core]
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
After addition of HW managed counters and implementation drop
in flow steering logic, the code in driver which checks syndrome
is not reachable anymore.
Let's delete it.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Fill integrity, replay and bad trailer counters.
As an example, after simulating replay window attack with 5 packets:
[leonro@c ~]$ grep XfrmInStateSeqError /proc/net/xfrm_stat
XfrmInStateSeqError 5
[leonro@c ~]$ sudo ip -s x s
<...>
stats:
replay-window 0 replay 5 failed 0
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Iterate over all SAs in order to fill global IPsec statistics.
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In order to allow drivers to fill all statistics, change the name
of xdo_dev_state_update_curlft to be xdo_dev_state_update_stats.
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Fill-up the lock status error value properly.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Pass additional argunent status_error over lock_status_get()
so drivers can fill it up. In case they do, expose the value over
previously introduced attribute to user. Do it only in case the
current lock_status is either "unlocked" or "holdover".
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Class-based I2C probing requires detect() and address_list to be
set in the I2C client driver, see checks in i2c_detect().
It's misleading to declare I2C_CLASS_HWMON support if this
precondition isn't met.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/77b5ab8e-20f2-4310-bd89-57db99e2f53b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlxsw driver uses 'unsigned int' for reference counters in several
structures. Instead, use refcount_t type which allows us to catch overflow
and underflow issues. Change the type of the counters and use the
appropriate API.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
mlxsw_sp stores an array of LAGs. When a port joins a LAG, in case that
this LAG is already in use, we only have to increase the reference counter.
Otherwise, we have to search for an unused LAG ID and configure it in
hardware. When a port leaves a LAG, we have to destroy it only for the last
user. This code can be simplified, for such requirements we usually add
get() and put() functions which create and destroy the object.
Add mlxsw_sp_lag_{get,put}() and use them. These functions take care of
the reference counter and hardware configuration if needed. Change the
reference counter to refcount_t type which catches overflow and underflow
issues.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, the function mlxsw_sp_lag_index_get() is called twice - first
as part of NETDEV_PRECHANGEUPPER event and later as part of
NETDEV_CHANGEUPPER. This function will be changed in the next patch. To
simplify the code, call it only once as part of NETDEV_CHANGEUPPER
event and set an error message using 'extack' in case of failure.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The maximum number of LAGs is queried from core several times. It is
used to allocate LAG array, and then to iterate over it. In addition, it
is used for PGT initialization. To simplify the code, instead of
querying it several times, store the value as part of 'mlxsw_sp' and use
it.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
A next patch will add mlxsw_sp_lag_{get,put}() functions to handle LAG
reference counting and create/destroy it only for first user/last user.
Remove mlxsw_sp_lag_get() function and access LAG array directly.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The structure mlxsw_sp_upper is used only as LAG. Rename it to
mlxsw_sp_lag and move it to spectrum.c file, as it is used only there.
Move the function mlxsw_sp_lag_get() with the structure.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When kcalloc() for ft->g succeeds but kvzalloc() for in fails,
fs_any_create_groups() will free ft->g. However, its caller
fs_any_create_table() will free ft->g again through calling
mlx5e_destroy_flow_table(), which will lead to a double-free.
Fix this by setting ft->g to NULL in fs_any_create_groups().
Fixes: 0f575c20bf06 ("net/mlx5e: Introduce Flow Steering ANY API")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When `in` allocated by kvzalloc fails, arfs_create_groups will free
ft->g and return an error. However, arfs_create_table, the only caller of
arfs_create_groups, will hold this error and call to
mlx5e_destroy_flow_table, in which the ft->g will be freed again.
Fixes: 1cabe6b0965e ("net/mlx5e: Create aRFS flow tables")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
XFRM stack doesn't prevent from users to configure replay window
in TX side and strongswan sets replay_window to be 1. It causes
to failures in validation logic when trying to offload the SA.
Replay window is not relevant in TX side and should be ignored.
Fixes: cded6d80129b ("net/mlx5e: Store replay window in XFRM attributes")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|