Age | Commit message (Collapse) | Author |
|
To support multiple users referencing the same fragment,
'pp_frag_count' is renamed to 'pp_ref_count', transitioning pp pages
from fragment management to reference count management after draining
based on the suggestion from [1].
The idea is that the concept of fragmenting exists before the page is
drained, and all related functions retain their current names.
However, once the page is drained, its management shifts to being
governed by 'pp_ref_count'. Therefore, all functions associated with
that lifecycle stage of a pp page are renamed.
[1]
http://lore.kernel.org/netdev/f71d9448-70c8-8793-dc9a-0eb48a570300@huawei.com
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://lore.kernel.org/r/20231212044614.42733-2-liangchen.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Swap is a function interface that provides exchange function. To avoid
code duplication, we can use swap function.
./drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1254:50-51: WARNING opportunity for swap().
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7580
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add a getter for the number of participants in a devcom
component (those who share the same component id and key).
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Make CQ struct and methods independent of "priv", use more basic
arguments instead.
This will ease the transition to netdev with multiple mdevs.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Turn some of the struct auxiliary_driver ops into wrappers to stop
having dummy local vars passed as unused arguments.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Function usage is limited to the monitor_stats.c file, do not expose it.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The transport interface send (TIS) object is responsible for performing
all transport related operations of the transmit side. Messages from
Send Queues get segmented and transmitted by the TIS including all
transport required implications, e.g. in the case of large send offload,
the TIS is responsible for the segmentation.
These are stateless objects and can be used by multiple netdevs (e.g.
representors) who share the same core device.
Providing the TISes as a service from the core layer to the netdev layer
reduces the number of replecated TIS objects (in case of multiple
netdevs), and will ease the transition to netdev with multiple mdevs.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
TLS TISes are created using their own dedicated functions,
don't honor their specific logic here.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Introduce an API to set/unset the TX flow table root for a device.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Introduce an API to set/unset the L2TABLE entry silent mode for a
device. If silent, no north/south traffic is allowed, the device won't
be able to communicate with the port directly to send/receive traffic by
its own.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
MPIR register allows to query the PCIe indexes
and Socket-Direct related parameters.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
by representors
snprintf returns the length of the formatted string, excluding the trailing
null, without accounting for truncation. This means that is the return
value is greater than or equal to the size parameter, the fw_version string
was truncated.
Link: https://docs.kernel.org/core-api/kernel-api.html#c.snprintf
Fixes: 1b2bd0c0264f ("net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
snprintf returns the length of the formatted string, excluding the trailing
null, without accounting for truncation. This means that is the return
value is greater than or equal to the size parameter, the fw_version string
was truncated.
Reported-by: David Laight <David.Laight@ACULAB.COM>
Closes: https://lore.kernel.org/netdev/81cae734ee1b4cde9b380a9a31006c1a@AcuMS.aculab.com/
Link: https://docs.kernel.org/core-api/kernel-api.html#c.snprintf
Fixes: 41e63c2baa11 ("net/mlx5e: Check return value of snprintf writing to fw_version buffer")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Set the error code if set_branch_dest_ft() fails.
Fixes: ccbe33003b10 ("net/mlx5e: TC, Don't offload post action rule if not supported")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Preserve the error code if esw_add_restore_rule() fails. Don't return
success.
Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Currently the destination rep pointer is only used for comparisons or to
obtain vport number from it. Since it is used both during flow creation and
deletion it may point to representor of another eswitch instance which can
be deallocated during driver unload even when there are rules pointing to
it[0]. Refactor the code to store vport number and 'valid' flag instead of
the representor pointer.
[0]:
[176805.886303] ==================================================================
[176805.889433] BUG: KASAN: slab-use-after-free in esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.892981] Read of size 2 at addr ffff888155090aa0 by task modprobe/27280
[176805.895462] CPU: 3 PID: 27280 Comm: modprobe Tainted: G B 6.6.0-rc3+ #1
[176805.896771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[176805.898514] Call Trace:
[176805.899026] <TASK>
[176805.899519] dump_stack_lvl+0x33/0x50
[176805.900221] print_report+0xc2/0x610
[176805.900893] ? mlx5_chains_put_table+0x33d/0x8d0 [mlx5_core]
[176805.901897] ? esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.902852] kasan_report+0xac/0xe0
[176805.903509] ? esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.904461] esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.905223] __mlx5_eswitch_del_rule+0x1ae/0x460 [mlx5_core]
[176805.906044] ? esw_cleanup_dests+0x440/0x440 [mlx5_core]
[176805.906822] ? xas_find_conflict+0x420/0x420
[176805.907496] ? down_read+0x11e/0x200
[176805.908046] mlx5e_tc_rule_unoffload+0xc4/0x2a0 [mlx5_core]
[176805.908844] mlx5e_tc_del_fdb_flow+0x7da/0xb10 [mlx5_core]
[176805.909597] mlx5e_flow_put+0x4b/0x80 [mlx5_core]
[176805.910275] mlx5e_delete_flower+0x5b4/0xb70 [mlx5_core]
[176805.911010] tc_setup_cb_reoffload+0x27/0xb0
[176805.911648] fl_reoffload+0x62d/0x900 [cls_flower]
[176805.912313] ? mlx5e_rep_indr_block_unbind+0xd0/0xd0 [mlx5_core]
[176805.913151] ? __fl_put+0x230/0x230 [cls_flower]
[176805.913768] ? filter_irq_stacks+0x90/0x90
[176805.914335] ? kasan_save_stack+0x1e/0x40
[176805.914893] ? kasan_set_track+0x21/0x30
[176805.915484] ? kasan_save_free_info+0x27/0x40
[176805.916105] tcf_block_playback_offloads+0x79/0x1f0
[176805.916773] ? mlx5e_rep_indr_block_unbind+0xd0/0xd0 [mlx5_core]
[176805.917647] tcf_block_unbind+0x12d/0x330
[176805.918239] tcf_block_offload_cmd.isra.0+0x24e/0x320
[176805.918953] ? tcf_block_bind+0x770/0x770
[176805.919551] ? _raw_read_unlock_irqrestore+0x30/0x30
[176805.920236] ? mutex_lock+0x7d/0xd0
[176805.920735] ? mutex_unlock+0x80/0xd0
[176805.921255] tcf_block_offload_unbind+0xa5/0x120
[176805.921909] __tcf_block_put+0xc2/0x2d0
[176805.922467] ingress_destroy+0xf4/0x3d0 [sch_ingress]
[176805.923178] __qdisc_destroy+0x9d/0x280
[176805.923741] dev_shutdown+0x1c6/0x330
[176805.924295] unregister_netdevice_many_notify+0x6ef/0x1500
[176805.925034] ? netdev_freemem+0x50/0x50
[176805.925610] ? _raw_spin_lock_irq+0x7b/0xd0
[176805.926235] ? _raw_spin_lock_bh+0xe0/0xe0
[176805.926849] unregister_netdevice_queue+0x1e0/0x280
[176805.927592] ? unregister_netdevice_many+0x10/0x10
[176805.928275] unregister_netdev+0x18/0x20
[176805.928835] mlx5e_vport_rep_unload+0xc0/0x200 [mlx5_core]
[176805.929608] mlx5_esw_offloads_unload_rep+0x9d/0xc0 [mlx5_core]
[176805.930492] mlx5_eswitch_unload_vf_vports+0x108/0x1a0 [mlx5_core]
[176805.931422] ? mlx5_eswitch_unload_sf_vport+0x50/0x50 [mlx5_core]
[176805.932304] ? rwsem_down_write_slowpath+0x11f0/0x11f0
[176805.932987] mlx5_eswitch_disable_sriov+0x6f9/0xa60 [mlx5_core]
[176805.933807] ? mlx5_core_disable_hca+0xe1/0x130 [mlx5_core]
[176805.934576] ? mlx5_eswitch_disable_locked+0x580/0x580 [mlx5_core]
[176805.935463] mlx5_device_disable_sriov+0x138/0x490 [mlx5_core]
[176805.936308] mlx5_sriov_disable+0x8c/0xb0 [mlx5_core]
[176805.937063] remove_one+0x7f/0x210 [mlx5_core]
[176805.937711] pci_device_remove+0x96/0x1c0
[176805.938289] device_release_driver_internal+0x361/0x520
[176805.938981] ? kobject_put+0x5c/0x330
[176805.939553] driver_detach+0xd7/0x1d0
[176805.940101] bus_remove_driver+0x11f/0x290
[176805.943847] pci_unregister_driver+0x23/0x1f0
[176805.944505] mlx5_cleanup+0xc/0x20 [mlx5_core]
[176805.945189] __x64_sys_delete_module+0x2b3/0x450
[176805.945837] ? module_flags+0x300/0x300
[176805.946377] ? dput+0xc2/0x830
[176805.946848] ? __kasan_record_aux_stack+0x9c/0xb0
[176805.947555] ? __call_rcu_common.constprop.0+0x46c/0xb50
[176805.948338] ? fpregs_assert_state_consistent+0x1d/0xa0
[176805.949055] ? exit_to_user_mode_prepare+0x30/0x120
[176805.949713] do_syscall_64+0x3d/0x90
[176805.950226] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[176805.950904] RIP: 0033:0x7f7f42c3f5ab
[176805.951462] Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48
[176805.953710] RSP: 002b:00007fff07dc9d08 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[176805.954691] RAX: ffffffffffffffda RBX: 000055b6e91c01e0 RCX: 00007f7f42c3f5ab
[176805.955691] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055b6e91c0248
[176805.956662] RBP: 000055b6e91c01e0 R08: 0000000000000000 R09: 0000000000000000
[176805.957601] R10: 00007f7f42d9eac0 R11: 0000000000000206 R12: 000055b6e91c0248
[176805.958593] R13: 0000000000000000 R14: 000055b6e91bfb38 R15: 0000000000000000
[176805.959599] </TASK>
[176805.960324] Allocated by task 20490:
[176805.960893] kasan_save_stack+0x1e/0x40
[176805.961463] kasan_set_track+0x21/0x30
[176805.962019] __kasan_kmalloc+0x77/0x90
[176805.962554] esw_offloads_init+0x1bb/0x480 [mlx5_core]
[176805.963318] mlx5_eswitch_init+0xc70/0x15c0 [mlx5_core]
[176805.964092] mlx5_init_one_devl_locked+0x366/0x1230 [mlx5_core]
[176805.964902] probe_one+0x6f7/0xc90 [mlx5_core]
[176805.965541] local_pci_probe+0xd7/0x180
[176805.966075] pci_device_probe+0x231/0x6f0
[176805.966631] really_probe+0x1d4/0xb50
[176805.967179] __driver_probe_device+0x18d/0x450
[176805.967810] driver_probe_device+0x49/0x120
[176805.968431] __driver_attach+0x1fb/0x490
[176805.968976] bus_for_each_dev+0xed/0x170
[176805.969560] bus_add_driver+0x21a/0x570
[176805.970124] driver_register+0x133/0x460
[176805.970684] 0xffffffffa0678065
[176805.971180] do_one_initcall+0x92/0x2b0
[176805.971744] do_init_module+0x22d/0x720
[176805.972318] load_module+0x58c3/0x63b0
[176805.972847] init_module_from_file+0xd2/0x130
[176805.973441] __x64_sys_finit_module+0x389/0x7c0
[176805.974045] do_syscall_64+0x3d/0x90
[176805.974556] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[176805.975566] Freed by task 27280:
[176805.976077] kasan_save_stack+0x1e/0x40
[176805.976655] kasan_set_track+0x21/0x30
[176805.977221] kasan_save_free_info+0x27/0x40
[176805.977834] ____kasan_slab_free+0x11a/0x1b0
[176805.978505] __kmem_cache_free+0x163/0x2d0
[176805.979113] esw_offloads_cleanup_reps+0xb8/0x120 [mlx5_core]
[176805.979963] mlx5_eswitch_cleanup+0x182/0x270 [mlx5_core]
[176805.980763] mlx5_cleanup_once+0x9a/0x1e0 [mlx5_core]
[176805.981477] mlx5_uninit_one+0xa9/0x180 [mlx5_core]
[176805.982196] remove_one+0x8f/0x210 [mlx5_core]
[176805.982868] pci_device_remove+0x96/0x1c0
[176805.983461] device_release_driver_internal+0x361/0x520
[176805.984169] driver_detach+0xd7/0x1d0
[176805.984702] bus_remove_driver+0x11f/0x290
[176805.985261] pci_unregister_driver+0x23/0x1f0
[176805.985847] mlx5_cleanup+0xc/0x20 [mlx5_core]
[176805.986483] __x64_sys_delete_module+0x2b3/0x450
[176805.987126] do_syscall_64+0x3d/0x90
[176805.987665] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[176805.988667] Last potentially related work creation:
[176805.989305] kasan_save_stack+0x1e/0x40
[176805.989839] __kasan_record_aux_stack+0x9c/0xb0
[176805.990443] kvfree_call_rcu+0x84/0xa30
[176805.990973] clean_xps_maps+0x265/0x6e0
[176805.991547] netif_reset_xps_queues.part.0+0x3f/0x80
[176805.992226] unregister_netdevice_many_notify+0xfcf/0x1500
[176805.992966] unregister_netdevice_queue+0x1e0/0x280
[176805.993638] unregister_netdev+0x18/0x20
[176805.994205] mlx5e_remove+0xba/0x1e0 [mlx5_core]
[176805.994872] auxiliary_bus_remove+0x52/0x70
[176805.995490] device_release_driver_internal+0x361/0x520
[176805.996196] bus_remove_device+0x1e1/0x3d0
[176805.996767] device_del+0x390/0x980
[176805.997270] mlx5_rescan_drivers_locked.part.0+0x130/0x540 [mlx5_core]
[176805.998195] mlx5_unregister_device+0x77/0xc0 [mlx5_core]
[176805.998989] mlx5_uninit_one+0x41/0x180 [mlx5_core]
[176805.999719] remove_one+0x8f/0x210 [mlx5_core]
[176806.000387] pci_device_remove+0x96/0x1c0
[176806.000938] device_release_driver_internal+0x361/0x520
[176806.001612] unbind_store+0xd8/0xf0
[176806.002108] kernfs_fop_write_iter+0x2c0/0x440
[176806.002748] vfs_write+0x725/0xba0
[176806.003294] ksys_write+0xed/0x1c0
[176806.003823] do_syscall_64+0x3d/0x90
[176806.004357] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[176806.005317] The buggy address belongs to the object at ffff888155090a80
which belongs to the cache kmalloc-64 of size 64
[176806.006774] The buggy address is located 32 bytes inside of
freed 64-byte region [ffff888155090a80, ffff888155090ac0)
[176806.008773] The buggy address belongs to the physical page:
[176806.009480] page:00000000a407e0e6 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x155090
[176806.010633] flags: 0x200000000000800(slab|node=0|zone=2)
[176806.011352] page_type: 0xffffffff()
[176806.011905] raw: 0200000000000800 ffff888100042640 ffffea000422b1c0 dead000000000004
[176806.012949] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[176806.013933] page dumped because: kasan: bad access detected
[176806.014935] Memory state around the buggy address:
[176806.015601] ffff888155090980: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.016568] ffff888155090a00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.017497] >ffff888155090a80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.018438] ^
[176806.019007] ffff888155090b00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.020001] ffff888155090b80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.020996] ==================================================================
Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
While handling new traces, to verify it is not the first block being
written, last_timestamp is checked. But instead of checking it is non
zero it is verified to be zero. Fix to verify last_timestamp is not
zero.
Fixes: c71ad41ccb0c ("net/mlx5: FW tracer, events handling")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Feras Daoud <ferasda@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
XDP transmits fragmented packets that are larger than MTU size instead of
dropping those packets. The drop check that checks whether a packet is larger
than MTU is comparing MTU size against the linear part length only.
Adjust the drop check to compare MTU size against both linear and non-linear
part lengths to avoid transmitting fragmented packets larger than MTU size.
Fixes: 39a1665d16a2 ("net/mlx5e: Implement sending multi buffer XDP frames")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commit increases num_block_tc when unblock tc offload.
Actually should decrease it.
Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Coverity Scan reports the following issue. But it's impossible that
mlx5_get_dev_index returns 7 for PF, even if the index is calculated
from PCI FUNC ID. So add the checking to make coverity slience.
CID 610894 (#2 of 2): Out-of-bounds write (OVERRUN)
Overrunning array esw->fdb_table.offloads.peer_miss_rules of 4 8-byte
elements at element index 7 (byte offset 63) using index
mlx5_get_dev_index(peer_dev) (which evaluates to 7).
Fixes: 9bee385a6e39 ("net/mlx5: E-switch, refactor FDB miss rule add/remove")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When kcalloc() for ft->g succeeds but kvzalloc() for in fails,
fs_udp_create_groups() will free ft->g. However, its caller
fs_udp_create_table() will free ft->g again through calling
mlx5e_destroy_flow_table(), which will lead to a double-free.
Fix this by setting ft->g to NULL in fs_udp_create_groups().
Fixes: 1c80bd684388 ("net/mlx5e: Introduce Flow Steering UDP API")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Fix a cmd->ent use after free due to a race on command entry.
Such race occurs when one of the commands releases its last refcount and
frees its index and entry while another process running command flush
flow takes refcount to this command entry. The process which handles
commands flush may see this command as needed to be flushed if the other
process allocated a ent->idx but didn't set ent to cmd->ent_arr in
cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
the spin lock.
[70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
[70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
[70013.081968]
[70013.082028] Workqueue: events aer_isr
[70013.082053] Call Trace:
[70013.082067] dump_stack+0x8b/0xbb
[70013.082086] print_address_description+0x6a/0x270
[70013.082102] kasan_report+0x179/0x2c0
[70013.082173] mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
[70013.082267] mlx5_cmd_flush+0x80/0x180 [mlx5_core]
[70013.082304] mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
[70013.082338] mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
[70013.082377] remove_one+0x200/0x2b0 [mlx5_core]
[70013.082409] pci_device_remove+0xf3/0x280
[70013.082439] device_release_driver_internal+0x1c3/0x470
[70013.082453] pci_stop_bus_device+0x109/0x160
[70013.082468] pci_stop_and_remove_bus_device+0xe/0x20
[70013.082485] pcie_do_fatal_recovery+0x167/0x550
[70013.082493] aer_isr+0x7d2/0x960
[70013.082543] process_one_work+0x65f/0x12d0
[70013.082556] worker_thread+0x87/0xb50
[70013.082571] kthread+0x2e9/0x3a0
[70013.082592] ret_from_fork+0x1f/0x40
The logical relationship of this error is as follows:
aer_recover_work | ent->work
-------------------------------------------+------------------------------
aer_recover_work_func |
|- pcie_do_recovery |
|- report_error_detected |
|- mlx5_pci_err_detected |cmd_work_handler
|- mlx5_enter_error_state | |- cmd_alloc_index
|- enter_error_state | |- lock cmd->alloc_lock
|- mlx5_cmd_flush | |- clear_bit
|- mlx5_cmd_trigger_completions| |- unlock cmd->alloc_lock
|- lock cmd->alloc_lock |
|- vector = ~dev->cmd.vars.bitmask
|- for_each_set_bit |
|- cmd_ent_get(cmd->ent_arr[i]) (UAF)
|- unlock cmd->alloc_lock | |- cmd->ent_arr[ent->idx]=ent
The cmd->ent_arr[ent->idx] assignment and the bit clearing are not
protected by the cmd->alloc_lock in cmd_work_handler().
Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler")
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Out_sz that the size of out buffer is calculated using query_nic_vport
_context_in structure when driver query the MAC list. However query_nic
_vport_context_in structure is smaller than query_nic_vport_context_out.
When allowed_list_size is greater than 96, calling ether_addr_copy() will
trigger an slab-out-of-bounds.
[ 1170.055866] BUG: KASAN: slab-out-of-bounds in mlx5_query_nic_vport_mac_list+0x481/0x4d0 [mlx5_core]
[ 1170.055869] Read of size 4 at addr ffff88bdbc57d912 by task kworker/u128:1/461
[ 1170.055870]
[ 1170.055932] Workqueue: mlx5_esw_wq esw_vport_change_handler [mlx5_core]
[ 1170.055936] Call Trace:
[ 1170.055949] dump_stack+0x8b/0xbb
[ 1170.055958] print_address_description+0x6a/0x270
[ 1170.055961] kasan_report+0x179/0x2c0
[ 1170.056061] mlx5_query_nic_vport_mac_list+0x481/0x4d0 [mlx5_core]
[ 1170.056162] esw_update_vport_addr_list+0x2c5/0xcd0 [mlx5_core]
[ 1170.056257] esw_vport_change_handle_locked+0xd08/0x1a20 [mlx5_core]
[ 1170.056377] esw_vport_change_handler+0x6b/0x90 [mlx5_core]
[ 1170.056381] process_one_work+0x65f/0x12d0
[ 1170.056383] worker_thread+0x87/0xb50
[ 1170.056390] kthread+0x2e9/0x3a0
[ 1170.056394] ret_from_fork+0x1f/0x40
Fixes: e16aea2744ab ("net/mlx5: Introduce access functions to modify/query vport mac lists")
Cc: Ding Hui <dinghui@sangfor.com.cn>
Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Cited commit introduced potential double free since encap_header can be
destroyed twice in some cases - once by error cleanup sequence in
mlx5e_tc_tun_{create|update}_header_ipv{4|6}(), once by generic
mlx5e_encap_put() that user calls as a result of getting an error from
tunnel create|update. At the same time the point where e->encap_header is
assigned can't be delayed because the function can still return non-error
code 0 as a result of checking for NUD_VALID flag, which will cause
neighbor update to dereference NULL encap_header.
Fix the issue by:
- Nulling local encap_header variables in
mlx5e_tc_tun_{create|update}_header_ipv{4|6}() to make kfree(encap_header)
call in error cleanup sequence noop after that point.
- Assigning reformat_params.data from e->encap_header instead of local
variable encap_header that was set to NULL pointer by previous step. Also
assign reformat_params.size from e->encap_size for uniformity and in order
to make the code less error-prone in the future.
Fixes: d589e785baf5 ("net/mlx5e: Allow concurrent creation of encap entries")
Reported-by: Dust Li <dust.li@linux.alibaba.com>
Reported-by: Cruz Zhao <cruzzhao@linux.alibaba.com>
Reported-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
This reverts commit 6f9b1a0731662648949a1c0587f6acb3b7f8acf1.
This patch is causing a null ptr issue, the proper fix is in the next
patch.
Fixes: 6f9b1a073166 ("net/mlx5e: fix double free of encap_header")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
This reverts commit 3a4aa3cb83563df942be49d145ee3b7ddf17d6bb.
This patch is causing a null ptr issue, the proper fix is in the next
patch.
Fixes: 3a4aa3cb8356 ("net/mlx5e: fix double free of encap_header in update funcs")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Implement the newly added .xmo_rx_vlan_tag() hint function.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20231205210847.28460-15-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Mode supported is currently reported to the user exactly the same, as
the current mode. That's because mode changing is not implemented.
Remove the leftover mode_supported() op and use mode_get() to fill up
the supported mode exposed to user.
One, if even, mode changing is going to be introduced, this could be
very easily taken back. In the meantime, prevent drivers form
implementing this in wrong way (as for example recent netdevsim
implementation attempt intended to do).
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Expose the ability the query the eswitch manager vport number.
Next patch will utilize this capability to reveal the correct
register C0 value to the users.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/614fb0e216250e2ce3340471ec141b83ec45c7f4.1701871118.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Support allocate/deallocate the new SW encap ICM type memory.
The new ICM type is used for encap context allocation managed by SW,
instead FW. It can increase encap context maximum number and allocation
speed
Signed-off-by: Shun Hao <shunh@nvidia.com>
Link: https://lore.kernel.org/r/bed5121255918eb132a1334141c76a0594df8143.1701871118.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The mlx5_esw_offloads_devlink_port() function returns error pointers, not
NULL.
Fixes: 7bef147a6ab6 ("net/mlx5: Don't skip vport check")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Previously, when comparing the net namespaces, the case where the netdev
doesn't exist wasn't taken into account, and therefore can cause a crash.
In such a case, the comparing function should return false, as there is no
netdev->net to compare the devlink->net to.
Furthermore, this will result in an attempt to enter switchdev mode
without a netdev to fail, and which is the desired result as there is no
meaning in switchdev mode without a net device.
Fixes: 662404b24a4c ("net/mlx5e: Block entering switchdev mode with ns inconsistency")
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Current sync reset flow is not supported when PCIe bridge connected
directly to mlx5 device has HotPlug interrupt enabled and can be
triggered on link state change event. Return nack on reset request in
such case.
Fixes: 92501fa6e421 ("net/mlx5: Ack on sync_reset_request only if PF can do reset_now")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
If post action is not supported, eg. ignore_flow_level is not
supported, don't offload post action rule. Otherwise, will hit
panic [1].
Fix it by checking if post action table is valid or not.
[1]
[445537.863880] BUG: unable to handle page fault for address: ffffffffffffffb1
[445537.864617] #PF: supervisor read access in kernel mode
[445537.865244] #PF: error_code(0x0000) - not-present page
[445537.865860] PGD 70683a067 P4D 70683a067 PUD 70683c067 PMD 0
[445537.866497] Oops: 0000 [#1] PREEMPT SMP NOPTI
[445537.867077] CPU: 19 PID: 248742 Comm: tc Kdump: loaded Tainted: G O 6.5.0+ #1
[445537.867888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[445537.868834] RIP: 0010:mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core]
[445537.869635] Code: c0 0d 00 00 e8 20 96 c6 d3 48 85 c0 0f 84 e5 00 00 00 c7 83 b0 01 00 00 00 00 00 00 49 89 c5 31 c0 31 d2 66 89 83 b4 01 00 00 <49> 8b 44 24 10 83 23 df 83 8b d8 01 00 00 04 48 89 83 c0 01 00 00
[445537.871318] RSP: 0018:ffffb98741cef428 EFLAGS: 00010246
[445537.871962] RAX: 0000000000000000 RBX: ffff8df341167000 RCX: 0000000000000001
[445537.872704] RDX: 0000000000000000 RSI: ffffffff954844e1 RDI: ffffffff9546e9cb
[445537.873430] RBP: ffffb98741cef448 R08: 0000000000000020 R09: 0000000000000246
[445537.874160] R10: 0000000000000000 R11: ffffffff943f73ff R12: ffffffffffffffa1
[445537.874893] R13: ffff8df36d336c20 R14: ffffffffffffffa1 R15: ffff8df341167000
[445537.875628] FS: 00007fcd6564f800(0000) GS:ffff8dfa9ea00000(0000) knlGS:0000000000000000
[445537.876425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[445537.877090] CR2: ffffffffffffffb1 CR3: 00000003b5884001 CR4: 0000000000770ee0
[445537.877832] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[445537.878564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[445537.879300] PKRU: 55555554
[445537.879797] Call Trace:
[445537.880263] <TASK>
[445537.880713] ? show_regs+0x6e/0x80
[445537.881232] ? __die+0x29/0x70
[445537.881731] ? page_fault_oops+0x85/0x160
[445537.882276] ? search_exception_tables+0x65/0x70
[445537.882852] ? kernelmode_fixup_or_oops+0xa2/0x120
[445537.883432] ? __bad_area_nosemaphore+0x18b/0x250
[445537.884019] ? bad_area_nosemaphore+0x16/0x20
[445537.884566] ? do_kern_addr_fault+0x8b/0xa0
[445537.885105] ? exc_page_fault+0xf5/0x1c0
[445537.885623] ? asm_exc_page_fault+0x2b/0x30
[445537.886149] ? __kmem_cache_alloc_node+0x1df/0x2a0
[445537.886717] ? mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core]
[445537.887431] ? mlx5e_tc_post_act_add+0x30/0x130 [mlx5_core]
[445537.888172] alloc_flow_post_acts+0xfb/0x1c0 [mlx5_core]
[445537.888849] parse_tc_actions+0x582/0x5c0 [mlx5_core]
[445537.889505] parse_tc_fdb_actions+0xd7/0x1f0 [mlx5_core]
[445537.890175] __mlx5e_add_fdb_flow+0x1ab/0x2b0 [mlx5_core]
[445537.890843] mlx5e_add_fdb_flow+0x56/0x120 [mlx5_core]
[445537.891491] ? debug_smp_processor_id+0x1b/0x30
[445537.892037] mlx5e_tc_add_flow+0x79/0x90 [mlx5_core]
[445537.892676] mlx5e_configure_flower+0x305/0x450 [mlx5_core]
[445537.893341] mlx5e_rep_setup_tc_cls_flower+0x3d/0x80 [mlx5_core]
[445537.894037] mlx5e_rep_setup_tc_cb+0x5c/0xa0 [mlx5_core]
[445537.894693] tc_setup_cb_add+0xdc/0x220
[445537.895177] fl_hw_replace_filter+0x15f/0x220 [cls_flower]
[445537.895767] fl_change+0xe87/0x1190 [cls_flower]
[445537.896302] tc_new_tfilter+0x484/0xa50
Fixes: f0da4daa3413 ("net/mlx5e: Refactor ct to use post action infrastructure")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Automatic Verification <verifier@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shachar Kagan <skagan@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
|
|
Due to the cited patch, devlink health commands take devlink lock and
this may result in deadlock for mlx5e_tx_reporter as it takes local
state_lock before calling devlink health report and on the other hand
devlink health commands such as diagnose for same reporter take local
state_lock after taking devlink lock (see kernel log below).
To fix it, remove local state_lock from mlx5e_tx_timeout_work() before
calling devlink_health_report() and take care to cancel the work before
any call to close channels, which may free the SQs that should be
handled by the work. Before cancel_work_sync(), use current_work() to
check we are not calling it from within the work, as
mlx5e_tx_timeout_work() itself may close the channels and reopen as part
of recovery flow.
While removing state_lock from mlx5e_tx_timeout_work() keep rtnl_lock to
ensure no change in netdev->real_num_tx_queues, but use rtnl_trylock()
and a flag to avoid deadlock by calling cancel_work_sync() before
closing the channels while holding rtnl_lock too.
Kernel log:
======================================================
WARNING: possible circular locking dependency detected
6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Not tainted
------------------------------------------------------
kworker/u16:2/65 is trying to acquire lock:
ffff888122f6c2f8 (&devlink->lock_key#2){+.+.}-{3:3}, at: devlink_health_report+0x2f1/0x7e0
but task is already holding lock:
ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&priv->state_lock){+.+.}-{3:3}:
__mutex_lock+0x12c/0x14b0
mlx5e_rx_reporter_diagnose+0x71/0x700 [mlx5_core]
devlink_nl_cmd_health_reporter_diagnose_doit+0x212/0xa50
genl_family_rcv_msg_doit+0x1e9/0x2f0
genl_rcv_msg+0x2e9/0x530
netlink_rcv_skb+0x11d/0x340
genl_rcv+0x24/0x40
netlink_unicast+0x438/0x710
netlink_sendmsg+0x788/0xc40
sock_sendmsg+0xb0/0xe0
__sys_sendto+0x1c1/0x290
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
-> #0 (&devlink->lock_key#2){+.+.}-{3:3}:
__lock_acquire+0x2c8a/0x6200
lock_acquire+0x1c1/0x550
__mutex_lock+0x12c/0x14b0
devlink_health_report+0x2f1/0x7e0
mlx5e_health_report+0xc9/0xd7 [mlx5_core]
mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core]
mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core]
process_one_work+0x7c2/0x1340
worker_thread+0x59d/0xec0
kthread+0x28f/0x330
ret_from_fork+0x1f/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&priv->state_lock);
lock(&devlink->lock_key#2);
lock(&priv->state_lock);
lock(&devlink->lock_key#2);
*** DEADLOCK ***
4 locks held by kworker/u16:2/65:
#0: ffff88811a55b138 ((wq_completion)mlx5e#2){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
#1: ffff888101de7db8 ((work_completion)(&priv->tx_timeout_work)){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
#2: ffffffff84ce8328 (rtnl_mutex){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x53/0x280 [mlx5_core]
#3: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core]
stack backtrace:
CPU: 1 PID: 65 Comm: kworker/u16:2 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x7d
check_noncircular+0x278/0x300
? print_circular_bug+0x460/0x460
? find_held_lock+0x2d/0x110
? __stack_depot_save+0x24c/0x520
? alloc_chain_hlocks+0x228/0x700
__lock_acquire+0x2c8a/0x6200
? register_lock_class+0x1860/0x1860
? kasan_save_stack+0x1e/0x40
? kasan_set_free_info+0x20/0x30
? ____kasan_slab_free+0x11d/0x1b0
? kfree+0x1ba/0x520
? devlink_health_do_dump.part.0+0x171/0x3a0
? devlink_health_report+0x3d5/0x7e0
lock_acquire+0x1c1/0x550
? devlink_health_report+0x2f1/0x7e0
? lockdep_hardirqs_on_prepare+0x400/0x400
? find_held_lock+0x2d/0x110
__mutex_lock+0x12c/0x14b0
? devlink_health_report+0x2f1/0x7e0
? devlink_health_report+0x2f1/0x7e0
? mutex_lock_io_nested+0x1320/0x1320
? trace_hardirqs_on+0x2d/0x100
? bit_wait_io_timeout+0x170/0x170
? devlink_health_do_dump.part.0+0x171/0x3a0
? kfree+0x1ba/0x520
? devlink_health_do_dump.part.0+0x171/0x3a0
devlink_health_report+0x2f1/0x7e0
mlx5e_health_report+0xc9/0xd7 [mlx5_core]
mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core]
? lockdep_hardirqs_on_prepare+0x400/0x400
? mlx5e_reporter_tx_err_cqe+0x1b0/0x1b0 [mlx5_core]
? mlx5e_tx_reporter_timeout_dump+0x70/0x70 [mlx5_core]
? mlx5e_tx_reporter_dump_sq+0x320/0x320 [mlx5_core]
? mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core]
? mutex_lock_io_nested+0x1320/0x1320
? process_one_work+0x70f/0x1340
? lockdep_hardirqs_on_prepare+0x400/0x400
? lock_downgrade+0x6e0/0x6e0
mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core]
process_one_work+0x7c2/0x1340
? lockdep_hardirqs_on_prepare+0x400/0x400
? pwq_dec_nr_in_flight+0x230/0x230
? rwlock_bug.part.0+0x90/0x90
worker_thread+0x59d/0xec0
? process_one_work+0x1340/0x1340
kthread+0x28f/0x330
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Fixes: c90005b5f75c ("devlink: Hold the instance lock in health callbacks")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
IPsec FDB offload can only work with FW steering as of now,
disable the cap upon non FW steering.
And since the IPSec cap is dynamic now based on steering mode.
Cleanup the resources if they exist instead of checking the
IPsec cap again.
Fixes: edd8b295f9e2 ("Merge branch 'mlx5-ipsec-packet-offload-support-in-eswitch-mode'")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
After IPSec TX tables are destroyed, the flow rules in TC rhashtable,
which have the destination to IPSec, are restored to the original
one, the uplink.
However, when the device is in switchdev mode and unload driver with
IPSec rules configured, TC rhashtable cleanup is done before IPSec
cleanup, which means tc_ht->tbl is already freed when walking TC
rhashtable, in order to restore the destination. So add the checking
before walking to avoid unexpected behavior.
Fixes: d1569537a837 ("net/mlx5e: Modify and restore TC rules for IPSec TX rules")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Currently eswitch mode_lock is so heavy, for example, it's locked
during the whole process of the mode change, which may need to hold
other locks. As the mode_lock is also used by IPSec to block mode and
encap change now, it is easy to cause lock dependency.
Since some of protections are also done by devlink lock, the eswitch
mode_lock is not needed at those places, and thus the possibility of
lockdep issue is reduced.
Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
IPsec NAT-T packets are UDP encapsulated packets over ESP normal ones.
In case they arrive to RX, the SPI and ESP are located in inner header,
while the check was performed on outer header instead.
That wrong check caused to the situation where received rekeying request
was missed and caused to rekey timeout, which "compensated" this failure
by completing rekeying.
Fixes: d65954934937 ("net/mlx5e: Support IPsec NAT-T functionality")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
After IPsec decryption it isn't enough to only check the IPsec syndrome
but need to also check the ASO syndrome in order to verify that the
operation was actually successful.
Verify that both syndromes are actually zero and in case not drop the
packet and increment the appropriate flow counter for the drop reason.
Fixes: 6b5c45e16e43 ("net/mlx5e: Configure IPsec packet offload flow steering")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
After previous commit, which unified various IPsec creation modes,
there is no need to have struct mlx5e_ipsec_rx exposed in global
IPsec header. Move it to ipsec_fs.c to be placed together with
already existing struct mlx5e_ipsec_tx.
Fixes: 1762f132d542 ("net/mlx5e: Support IPsec packet offload for RX in switchdev mode")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Change normal IPsec flow to use the same creation/destruction functions
for status flow table as that of ESW, which first of all refines the
code to have less code duplication.
And more importantly, the ESW status table handles IPsec syndrome
checks at steering by HW, which is more efficient than the previous
behaviour we had where it was copied to WQE meta data and checked
by the driver.
Fixes: 1762f132d542 ("net/mlx5e: Support IPsec packet offload for RX in switchdev mode")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
According to RFC4303, section "3.3.3. Sequence Number Generation",
the first packet sent using a given SA will contain a sequence
number of 1.
However if user didn't set seq/oseq, the HW used zero as first sequence
packet number. Such misconfiguration causes to drop of first packet
if replay window protection was enabled in SA.
To fix it, set sequence number to be at least 1.
Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Users can configure IPsec replay window size, but mlx5 driver didn't
honor their choice and set always 32bits. Fix assignment logic to
configure right size from the beginning.
Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers")
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-11-30
We've added 30 non-merge commits during the last 7 day(s) which contain
a total of 58 files changed, 1598 insertions(+), 154 deletions(-).
The main changes are:
1) Add initial TX metadata implementation for AF_XDP with support in mlx5
and stmmac drivers. Two types of offloads are supported right now, that
is, TX timestamp and TX checksum offload, from Stanislav Fomichev with
stmmac implementation from Song Yoong Siang.
2) Change BPF verifier logic to validate global subprograms lazily instead
of unconditionally before the main program, so they can be guarded using
BPF CO-RE techniques, from Andrii Nakryiko.
3) Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter, from Jiri Olsa.
4) Use pkg-config in BPF selftests to determine ld flags which is
in particular needed for linking statically, from Akihiko Odaki.
5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits)
bpf/tests: Remove duplicate JSGT tests
selftests/bpf: Add TX side to xdp_hw_metadata
selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP
selftests/bpf: Add TX side to xdp_metadata
selftests/bpf: Add csum helpers
selftests/xsk: Support tx_metadata_len
xsk: Add option to calculate TX checksum in SW
xsk: Validate xsk_tx_metadata flags
xsk: Document tx_metadata_len layout
net: stmmac: Add Tx HWTS support to XDP ZC
net/mlx5e: Implement AF_XDP TX timestamp and checksum offload
tools: ynl: Print xsk-features from the sample
xsk: Add TX timestamp and TX checksum offload support
xsk: Support tx_metadata_len
selftests/bpf: Use pkg-config for libelf
selftests/bpf: Override PKG_CONFIG for static builds
selftests/bpf: Choose pkg-config for the target
bpftool: Add support to display uprobe_multi links
selftests/bpf: Add link_info test for uprobe_multi link
selftests/bpf: Use bpf_link__destroy in fill_link_info tests
...
====================
Conflicts:
Documentation/netlink/specs/netdev.yaml:
839ff60df3ab ("net: page_pool: add nlspec for basic access to page pools")
48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/
While at it also regen, tree is dirty after:
48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
looks like code wasn't re-rendered after "render-max" was removed.
Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Mark all Spectrum>2 systems as preferring CFF flood mode if supported by
the firmware.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/8a3d2ad96b943f7e3f53f998bd333a14e19cd641.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In this patch, add the artifacts for the rFID family that works in CFF
flood mode.
The same that was said about PGT organization and lookup in bridge FID
families applies for the rFID family as well. The main difference lies in
the fact that in the controlled flood mode, the FW was taking care of
maintaining the PGT tables for rFIDs. In CFF mode, the responsibility
shifts to the driver.
All rFIDs are based off either a front panel port, or a LAG port. For those
based off ports, we need to maintain at worst one PGT block for each port,
for those based off LAGs, one PGT block per LAG. This reflects in the
pgt_size callback, which determines the PGT footprint based on number of
ports and the LAG capacity.
A number of FIDs may end up using the same PGT base. Unlike with bridges,
where membership of a port in a given FID is highly dynamic, an rFID based
of a port will just always need to flood to that port.
Both the port and the LAG subtables need to be actively maintained. To that
end, the CFF rFID family implements fid_port_init and fid_port_fini
callbacks, which toggle the necessary bits.
Both FID-MID translation and SFMR packing then point into either the port
or the LAG subtable, to the block that corresponds to a given port or a
given LAG, depending on what port the RIF bound to the rFID uses.
As in the previous patch, the way CFF flood mode organizes PGT accesses
allows for much more smarts and dynamism. As in the previous patch, we
rather aim to keep things simple and static.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/962deb4367585d38250e80c685a34735c0c7f3ad.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In this patch, add the artifacts for 802.1d and 802.1q FID families that
work in CFF flood mode.
In CFF flood mode, the way flood vectors are looked up changes: there's a
per-FID PGT base, to which a small offset is added depending on type of
traffic. Thus each FID occupies a small contiguous block of PGT memory,
whereas in the controlled flood mode, flood vectors for a given FID were
spread across the PGT.
The term "flood table" as used by the spectrum_fid module, borrows from
controlled flood mode way of organizing the PGT table. There flood tables
were actual tables, contiguous in the PGT. In the CFF flood mode, they are
more abstract: a flood table becomes a collection of e.g. all first rows of
the per-FID PGT blocks. Nonetheless we retain the nomenclature.
FIDs are still configured through the SFMR register, but there are
different fields to set under CFF mode: PGT base and profile. Thus register
packing gets a dedicated op overload as well.
The new organization of PGT makes it possible to treat the PGT as a block
of an ordinary memory, allocate and deallocate on demand, and achieve
better flexibility. Here instead, we aim to keep the code as close as
possible to the previous controlled flood mode, support for which we need
to retain for Spectrum-1 and older FW versions anyway. Thus the PGT
footprint of the individual families is the same as before, just the
internal organization of the per-family PGT region differs. Hence the
pgt_size callback is reused between the controlled and CFF flood modes.
Since the dummy family has no flood tables in either the CTL mode or in
CFF mode, the existing one can be reused for the CFF family array.
Users should not notice any changes between the controlled and CFF flood
modes.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/ca40b8163e6d6a21f63ef299619acee953cf9519.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In CFF flood mode, the way flood vectors are looked up changes: there's a
per-FID PGT base, to which a small offset is added depending on type of
traffic. Thus each FID occupies a small contiguous block of PGT memory,
whereas in the controlled flood mode, flood vectors for a given FID were
spread across the PGT.
Each FID is associated with one of a handful of profiles. The profile and
the traffic type are then used as keys to look up the PGT offset. This
offset is then added to the per-FID PGT base. The profile / type / offset
mapping needs to be configured by the driver, and is only relevant in CFF
flood mode.
In this patch, add the SFFP initialization code. Only initialize the one
profile currently explicitly used. As follow-up patch add more profiles,
this code will pick them up and initialize as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/2c4733ed72d439444218969c032acad22cd4ed88.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the CFF mode, flood profiles are identified by a unique numerical
identifier. This is used for configuration of FIDs and for configuration of
traffic-type to PGT offset rules. In both cases, the numerical identifier
serves as a handle for the flood profile. Add the identifier to the flood
profile structure.
There is currently only one flood profile in use explicitly, the one used
for all bridging. Eventually three will be necessary in total: one for
bridges, one for rFIDs, one for NVE underlay. A total of four profiles
are supported by the HW. Start allocating at 1, because 0 is currently
used for underlay NVE flood.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/19ea9c35ba8b522fa5f7eb6fd7bc1b68f0f66b41.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|