Age | Commit message (Collapse) | Author |
|
Pull rdma updates from Jason Gunthorpe:
"The majority of the patches are cleanups, refactorings and clarity
improvements.
This cycle saw some more activity from Syzkaller, I think we are now
clean on all but one of those bugs, including the long standing and
obnoxious rdma_cm locking design defect. Continue to see many drivers
getting cleanups, with a few new user visible features.
Summary:
- Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1
- Lots of cleanup patches for hns
- Convert more places to use refcount
- Aggressively lock the RDMA CM code that syzkaller says isn't
working
- Work to clarify ib_cm
- Use the new ib_device lifecycle model in bnxt_re
- Fix mlx5's MR cache which seems to be failing more often with the
new ODP code
- mlx5 'dynamic uar' and 'tx steering' user interfaces"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (144 commits)
RDMA/bnxt_re: make bnxt_re_ib_init static
IB/qib: Delete struct qib_ivdev.qp_rnd
RDMA/hns: Fix uninitialized variable bug
RDMA/hns: Modify the mask of QP number for CQE of hip08
RDMA/hns: Reduce the maximum number of extend SGE per WQE
RDMA/hns: Reduce PFC frames in congestion scenarios
RDMA/mlx5: Add support for RDMA TX flow table
net/mlx5: Add support for RDMA TX steering
IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
IB/hfi1: Fix memory leaks in sysfs registration and unregistration
IB/mlx5: Move to fully dynamic UAR mode once user space supports it
IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
IB/mlx5: Extend QP creation to get uar page index from user space
IB/mlx5: Extend CQ creation to get uar page index from user space
IB/mlx5: Expose UAR object and its alloc/destroy commands
IB/hfi1: Get rid of a warning
RDMA/hns: Remove redundant judgment of qp_type
RDMA/hns: Remove redundant assignment of wc->smac when polling cq
RDMA/hns: Remove redundant qpc setup operations
RDMA/hns: Remove meaningless prints
...
|
|
Add new RDMA TX flow steering namespace. Flow steering rules in
this namespace are used to filter transmitted RDMA traffic.
Link: https://lore.kernel.org/r/20200324061425.1570190-2-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Group version is used when modifying a rule is allowed
(FLOW_ACT_NO_APPEND is clear) to detect a case where the rule was found
but while the groups where unlocked a new FTE was added. In this case,
the added FTE could be one with identical match value so we need to
attempt again with group lock held.
Change the code so version is retrieved only when FLOW_ACT_NO_APPEND is
cleared. As result, later compare can also be avoided if FLOW_ACT_NO_APPEND
is cleared.
Also improve comments text.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
FTE version is not used anywhere in the code so avoid incrementing it.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
When adding a rule to a flow group we need increment the version of the
group. Current code fails to do that and as a result, when trying to add
a rule, we will fail to discover a case where an FTE with the same match
value was added while we scanned the groups of the same match criteria,
thus we may try to add an FTE that was already added.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Instead of using two different structs for searching groups with the
same match, use a single struct and thus simplify the code, make it more
readable and smaller size which means less code cache misses.
text data bss dec hex
before: 35524 2744 0 38268 957c
after: 35038 2744 0 37782 9396
When testing add 70000 rules, delete all the rules, and repeat three
times taking the average, we get (time in seconds):
Before the change: insert 16.80, delete 11.02
After the change: insert 16.55, delete 10.95
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Allow passing NULL spec when creating a flow rule. Such rules will act
as "catch all" flow rules.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
|
|
Uplink representor traffic will be redirected to an empty root ft rather
than directly to a direct tir or ttc table, this root ft will be empty and
will be used as a link for auto-chaining with ttc table or ethtool tables
in downstream patches.
On load, fs core will connect uplink rep root_ft with ttc table. In case
ethtool steering will be used, fs core will auto connect root_ft with
the ethtool bypass tables, which will be connected with the ttc table.
vport_rx_rule[uplink_rep]->root_ft->ethtool->ttc.
For non-uplink representors, for simplicity root_ft will always point at
ttc table, hence the replace vport_rx rule logic is removed.
vport_rx_rule[non_uplink_rep]->root_ft(ttc).
For now ethtool steering support can only be available on uplink rep.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
|
|
When using port mirroring, we forward the traffic to another table and
use that table to forward to the mirrored vport. Since the hardware
loses the values of reg c, and in particular reg c0, we fail the match
on the input vport which previously existed in reg c0. To overcome this
situation, we use a set of per vport tables, positioned at the lowest
priority, and forward traffic to those tables. Since these tables are
per vport, we can avoid matching on reg c0.
Fixes: c01cfd0f1115 ("net/mlx5: E-Switch, Add match on vport metadata for rule in fast path")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
On RX side create a restore table in OFFLOADS namespace.
This table will match on all values for reg_c0 we will use,
and set it to the flow_tag. This flow tag can then be read on the CQE.
As there is no copy action from reg c0 to flow tag, instead we have to
set the flow tag explictily. We add an API so callers can add all the used
reg_c0 values (tags) and for each of those we add a restore rule.
This will be used in a following patch to save the miss chain mapping
tag on reg_c0 and from it restore the tc chain on the skb.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
free_match_list could be called when the flow table is already
locked. We need to pass this notation to tree_put_node.
It fixes the following lockdep warnning:
[ 1797.268537] ============================================
[ 1797.276837] WARNING: possible recursive locking detected
[ 1797.285101] 5.5.0-rc5+ #10 Not tainted
[ 1797.291641] --------------------------------------------
[ 1797.299917] handler10/9296 is trying to acquire lock:
[ 1797.307885] ffff889ad399a0a0 (&node->lock){++++}, at:
tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.319694]
[ 1797.319694] but task is already holding lock:
[ 1797.330904] ffff889ad399a0a0 (&node->lock){++++}, at:
nested_down_write_ref_node.part.33+0x1a/0x60 [mlx5_core]
[ 1797.344707]
[ 1797.344707] other info that might help us debug this:
[ 1797.356952] Possible unsafe locking scenario:
[ 1797.356952]
[ 1797.368333] CPU0
[ 1797.373357] ----
[ 1797.378364] lock(&node->lock);
[ 1797.384222] lock(&node->lock);
[ 1797.390031]
[ 1797.390031] *** DEADLOCK ***
[ 1797.390031]
[ 1797.403003] May be due to missing lock nesting notation
[ 1797.403003]
[ 1797.414691] 3 locks held by handler10/9296:
[ 1797.421465] #0: ffff889cf2c5a110 (&block->cb_lock){++++}, at:
tc_setup_cb_add+0x70/0x250
[ 1797.432810] #1: ffff88a030081490 (&comp->sem){++++}, at:
mlx5_devcom_get_peer_data+0x4c/0xb0 [mlx5_core]
[ 1797.445829] #2: ffff889ad399a0a0 (&node->lock){++++}, at:
nested_down_write_ref_node.part.33+0x1a/0x60 [mlx5_core]
[ 1797.459913]
[ 1797.459913] stack backtrace:
[ 1797.469436] CPU: 1 PID: 9296 Comm: handler10 Kdump: loaded Not
tainted 5.5.0-rc5+ #10
[ 1797.480643] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS
2.4.3 01/17/2017
[ 1797.491480] Call Trace:
[ 1797.496701] dump_stack+0x96/0xe0
[ 1797.502864] __lock_acquire.cold.63+0xf8/0x212
[ 1797.510301] ? lockdep_hardirqs_on+0x250/0x250
[ 1797.517701] ? mark_held_locks+0x55/0xa0
[ 1797.524547] ? quarantine_put+0xb7/0x160
[ 1797.531422] ? lockdep_hardirqs_on+0x17d/0x250
[ 1797.538913] lock_acquire+0xd6/0x1f0
[ 1797.545529] ? tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.553701] down_write+0x94/0x140
[ 1797.560206] ? tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.568464] ? down_write_killable_nested+0x170/0x170
[ 1797.576925] ? del_hw_flow_group+0xde/0x1f0 [mlx5_core]
[ 1797.585629] tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.593891] ? free_match_list.part.25+0x147/0x170 [mlx5_core]
[ 1797.603389] free_match_list.part.25+0xe0/0x170 [mlx5_core]
[ 1797.612654] _mlx5_add_flow_rules+0x17e2/0x20b0 [mlx5_core]
[ 1797.621838] ? lock_acquire+0xd6/0x1f0
[ 1797.629028] ? esw_get_prio_table+0xb0/0x3e0 [mlx5_core]
[ 1797.637981] ? alloc_insert_flow_group+0x420/0x420 [mlx5_core]
[ 1797.647459] ? try_to_wake_up+0x4c7/0xc70
[ 1797.654881] ? lock_downgrade+0x350/0x350
[ 1797.662271] ? __mutex_unlock_slowpath+0xb1/0x3f0
[ 1797.670396] ? find_held_lock+0xac/0xd0
[ 1797.677540] ? mlx5_add_flow_rules+0xdc/0x360 [mlx5_core]
[ 1797.686467] mlx5_add_flow_rules+0xdc/0x360 [mlx5_core]
[ 1797.695134] ? _mlx5_add_flow_rules+0x20b0/0x20b0 [mlx5_core]
[ 1797.704270] ? irq_exit+0xa5/0x170
[ 1797.710764] ? retint_kernel+0x10/0x10
[ 1797.717698] ? mlx5_eswitch_set_rule_source_port.isra.9+0x122/0x230
[mlx5_core]
[ 1797.728708] mlx5_eswitch_add_offloaded_rule+0x465/0x6d0 [mlx5_core]
[ 1797.738713] ? mlx5_eswitch_get_prio_range+0x30/0x30 [mlx5_core]
[ 1797.748384] ? mlx5_fc_stats_work+0x670/0x670 [mlx5_core]
[ 1797.757400] mlx5e_tc_offload_fdb_rules.isra.27+0x24/0x90 [mlx5_core]
[ 1797.767665] mlx5e_tc_add_fdb_flow+0xaf8/0xd40 [mlx5_core]
[ 1797.776886] ? mlx5e_encap_put+0xd0/0xd0 [mlx5_core]
[ 1797.785562] ? mlx5e_alloc_flow.isra.43+0x18c/0x1c0 [mlx5_core]
[ 1797.795353] __mlx5e_add_fdb_flow+0x2e2/0x440 [mlx5_core]
[ 1797.804558] ? mlx5e_tc_update_neigh_used_value+0x8c0/0x8c0
[mlx5_core]
[ 1797.815093] ? wait_for_completion+0x260/0x260
[ 1797.823272] mlx5e_configure_flower+0xe94/0x1620 [mlx5_core]
[ 1797.832792] ? __mlx5e_add_fdb_flow+0x440/0x440 [mlx5_core]
[ 1797.842096] ? down_read+0x11a/0x2e0
[ 1797.849090] ? down_write+0x140/0x140
[ 1797.856142] ? mlx5e_rep_indr_setup_block_cb+0xc0/0xc0 [mlx5_core]
[ 1797.866027] tc_setup_cb_add+0x11a/0x250
[ 1797.873339] fl_hw_replace_filter+0x25e/0x320 [cls_flower]
[ 1797.882385] ? fl_hw_destroy_filter+0x1c0/0x1c0 [cls_flower]
[ 1797.891607] fl_change+0x1d54/0x1fb6 [cls_flower]
[ 1797.899772] ? __rhashtable_insert_fast.constprop.50+0x9f0/0x9f0
[cls_flower]
[ 1797.910728] ? lock_downgrade+0x350/0x350
[ 1797.918187] ? __radix_tree_lookup+0xa5/0x130
[ 1797.926046] ? fl_set_key+0x1590/0x1590 [cls_flower]
[ 1797.934611] ? __rhashtable_insert_fast.constprop.50+0x9f0/0x9f0
[cls_flower]
[ 1797.945673] tc_new_tfilter+0xcd1/0x1240
[ 1797.953138] ? tc_del_tfilter+0xb10/0xb10
[ 1797.960688] ? avc_has_perm_noaudit+0x92/0x320
[ 1797.968721] ? avc_has_perm_noaudit+0x1df/0x320
[ 1797.976816] ? avc_has_extended_perms+0x990/0x990
[ 1797.985090] ? mark_lock+0xaa/0x9e0
[ 1797.991988] ? match_held_lock+0x1b/0x240
[ 1797.999457] ? match_held_lock+0x1b/0x240
[ 1798.006859] ? find_held_lock+0xac/0xd0
[ 1798.014045] ? symbol_put_addr+0x40/0x40
[ 1798.021317] ? rcu_read_lock_sched_held+0xd0/0xd0
[ 1798.029460] ? tc_del_tfilter+0xb10/0xb10
[ 1798.036810] rtnetlink_rcv_msg+0x4d5/0x620
[ 1798.044236] ? rtnl_bridge_getlink+0x460/0x460
[ 1798.052034] ? lockdep_hardirqs_on+0x250/0x250
[ 1798.059837] ? match_held_lock+0x1b/0x240
[ 1798.067146] ? find_held_lock+0xac/0xd0
[ 1798.074246] netlink_rcv_skb+0xc6/0x1f0
[ 1798.081339] ? rtnl_bridge_getlink+0x460/0x460
[ 1798.089104] ? netlink_ack+0x440/0x440
[ 1798.096061] netlink_unicast+0x2d4/0x3b0
[ 1798.103189] ? netlink_attachskb+0x3f0/0x3f0
[ 1798.110724] ? _copy_from_iter_full+0xda/0x370
[ 1798.118415] netlink_sendmsg+0x3ba/0x6a0
[ 1798.125478] ? netlink_unicast+0x3b0/0x3b0
[ 1798.132705] ? netlink_unicast+0x3b0/0x3b0
[ 1798.139880] sock_sendmsg+0x94/0xa0
[ 1798.146332] ____sys_sendmsg+0x36c/0x3f0
[ 1798.153251] ? copy_msghdr_from_user+0x165/0x230
[ 1798.160941] ? kernel_sendmsg+0x30/0x30
[ 1798.167738] ___sys_sendmsg+0xeb/0x150
[ 1798.174411] ? sendmsg_copy_msghdr+0x30/0x30
[ 1798.181649] ? lock_downgrade+0x350/0x350
[ 1798.188559] ? rcu_read_lock_sched_held+0xd0/0xd0
[ 1798.196239] ? __fget+0x21d/0x320
[ 1798.202335] ? do_dup2+0x2a0/0x2a0
[ 1798.208499] ? lock_downgrade+0x350/0x350
[ 1798.215366] ? __fget_light+0xd6/0xf0
[ 1798.221808] ? syscall_trace_enter+0x369/0x5d0
[ 1798.229112] __sys_sendmsg+0xd3/0x160
[ 1798.235511] ? __sys_sendmsg_sock+0x60/0x60
[ 1798.242478] ? syscall_trace_enter+0x233/0x5d0
[ 1798.249721] ? syscall_slow_exit_work+0x280/0x280
[ 1798.257211] ? do_syscall_64+0x1e/0x2e0
[ 1798.263680] do_syscall_64+0x72/0x2e0
[ 1798.269950] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Exclude the last n entries for an autogrouped flow table.
Reserving entries at the end of the FT will ensure that this FG will be
the last to be evaluated. This will be used in the next patch to create
a miss group enabling custom actions on FT miss.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
If user sets ignore flow level flag on a rule, that rule can point to
a flow table of any level, including those with levels equal or less
than the level of the flow table it is added on.
This with unamanged tables will be used to create a FDB chain/prio
hierarchy much larger than currently supported level range.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently, Most of the steering tree is statically declared ahead of time,
with steering prios instances allocated for each fdb chain to assign max
number of levels for each of them. This allows fs_core to manage the
connections and levels of the flow tables hierarcy to prevent loops, but
restricts us with the number of supported chains and priorities.
Introduce unmananged flow tables, allowing the user to manage the flow
table connections. A unamanged table is detached from the fs_core flow
table hierarcy, and is only connected back to the hierarchy by explicit
FTEs forward actions.
This will be used together with firmware that supports ignoring the flow
table levels to increase the number of supported chains and prios.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
This merge syncs with mlx5-next latest HW bits and layout updates for next
features, in addition one patch that improves
mlx5_create_auto_grouped_flow_table() API across all mlx5 users.
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
net/mlx5e: Add discard counters per priority
net/mlx5e: Expose FEC feilds and related capability bit
net/mlx5: Add mlx5_ifc definitions for connection tracking support
net/mlx5: Add copy header action struct layout
net/mlx5: Expose resource dump register mapping
net/mlx5: Add structures and defines for MIRC register
net/mlx5: Read MCAM register groups 1 and 2
net/mlx5: Add structures layout for new MCAM access reg groups
net/mlx5: Expose vDPA emulation device capabilities
net/mlx5: Add Virtio Emulation related device capabilities
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
which already carries the max_fte, prio and flags memebers, and is
used the same in similar mlx5_create_flow_table() function.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
This reverts commit 7dee607ed0e04500459db53001d8e02f8831f084.
During cleanup path, FTE's parent node group is removed which is
referenced by the FTE while freeing the FTE.
Hence FTE's lockless read lookup optimization done in cited commit is
not possible at the moment.
Hence, revert the commit.
This avoid below KAZAN call trace.
[ 110.390896] BUG: KASAN: use-after-free in find_root.isra.14+0x56/0x60
[mlx5_core]
[ 110.391048] Read of size 4 at addr ffff888c19e6d220 by task
swapper/12/0
[ 110.391219] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 5.5.0-rc1+
[ 110.391222] Hardware name: HP ProLiant DL380p Gen8, BIOS P70
08/02/2014
[ 110.391225] Call Trace:
[ 110.391229] <IRQ>
[ 110.391246] dump_stack+0x95/0xd5
[ 110.391307] ? find_root.isra.14+0x56/0x60 [mlx5_core]
[ 110.391320] print_address_description.constprop.5+0x20/0x320
[ 110.391379] ? find_root.isra.14+0x56/0x60 [mlx5_core]
[ 110.391435] ? find_root.isra.14+0x56/0x60 [mlx5_core]
[ 110.391441] __kasan_report+0x149/0x18c
[ 110.391499] ? find_root.isra.14+0x56/0x60 [mlx5_core]
[ 110.391504] kasan_report+0x12/0x20
[ 110.391511] __asan_report_load4_noabort+0x14/0x20
[ 110.391567] find_root.isra.14+0x56/0x60 [mlx5_core]
[ 110.391625] del_sw_fte_rcu+0x4a/0x100 [mlx5_core]
[ 110.391633] rcu_core+0x404/0x1950
[ 110.391640] ? rcu_accelerate_cbs_unlocked+0x100/0x100
[ 110.391649] ? run_rebalance_domains+0x201/0x280
[ 110.391654] rcu_core_si+0xe/0x10
[ 110.391661] __do_softirq+0x181/0x66c
[ 110.391670] irq_exit+0x12c/0x150
[ 110.391675] smp_apic_timer_interrupt+0xf0/0x370
[ 110.391681] apic_timer_interrupt+0xf/0x20
[ 110.391684] </IRQ>
[ 110.391695] RIP: 0010:cpuidle_enter_state+0xfa/0xba0
[ 110.391703] Code: 3d c3 9b b5 50 e8 56 75 6e fe 48 89 45 c8 0f 1f 44
00 00 31 ff e8 a6 94 6e fe 45 84 ff 0f 85 f6 02 00 00 fb 66 0f 1f 44 00
00 <45> 85 f6 0f 88 db 06 00 00 4d 63 fe 4b 8d 04 7f 49 8d 04 87 49 8d
[ 110.391706] RSP: 0018:ffff888c23a6fce8 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff13
[ 110.391712] RAX: dffffc0000000000 RBX: ffffe8ffff7002f8 RCX:
000000000000001f
[ 110.391715] RDX: 1ffff11184ee6cb5 RSI: 0000000040277d83 RDI:
ffff888c277365a8
[ 110.391718] RBP: ffff888c23a6fd40 R08: 0000000000000002 R09:
0000000000035280
[ 110.391721] R10: ffff888c23a6fc80 R11: ffffed11847485d0 R12:
ffffffffb1017740
[ 110.391723] R13: 0000000000000003 R14: 0000000000000003 R15:
0000000000000000
[ 110.391732] ? cpuidle_enter_state+0xea/0xba0
[ 110.391738] cpuidle_enter+0x4f/0xa0
[ 110.391747] call_cpuidle+0x6d/0xc0
[ 110.391752] do_idle+0x360/0x430
[ 110.391758] ? arch_cpu_idle_exit+0x40/0x40
[ 110.391765] ? complete+0x67/0x80
[ 110.391771] cpu_startup_entry+0x1d/0x20
[ 110.391779] start_secondary+0x2f3/0x3c0
[ 110.391784] ? set_cpu_sibling_map+0x2500/0x2500
[ 110.391795] secondary_startup_64+0xa4/0xb0
[ 110.391841] Allocated by task 290:
[ 110.391917] save_stack+0x21/0x90
[ 110.391921] __kasan_kmalloc.constprop.8+0xa7/0xd0
[ 110.391925] kasan_kmalloc+0x9/0x10
[ 110.391929] kmem_cache_alloc_trace+0xf6/0x270
[ 110.391987] create_root_ns.isra.36+0x58/0x260 [mlx5_core]
[ 110.392044] mlx5_init_fs+0x5fd/0x1ee0 [mlx5_core]
[ 110.392092] mlx5_load_one+0xc7a/0x3860 [mlx5_core]
[ 110.392139] init_one+0x6ff/0xf90 [mlx5_core]
[ 110.392145] local_pci_probe+0xde/0x190
[ 110.392150] work_for_cpu_fn+0x56/0xa0
[ 110.392153] process_one_work+0x678/0x1140
[ 110.392157] worker_thread+0x573/0xba0
[ 110.392162] kthread+0x341/0x400
[ 110.392166] ret_from_fork+0x1f/0x40
[ 110.392218] Freed by task 2742:
[ 110.392288] save_stack+0x21/0x90
[ 110.392292] __kasan_slab_free+0x137/0x190
[ 110.392296] kasan_slab_free+0xe/0x10
[ 110.392299] kfree+0x94/0x250
[ 110.392357] tree_put_node+0x257/0x360 [mlx5_core]
[ 110.392413] tree_remove_node+0x63/0xb0 [mlx5_core]
[ 110.392469] clean_tree+0x199/0x240 [mlx5_core]
[ 110.392525] mlx5_cleanup_fs+0x76/0x580 [mlx5_core]
[ 110.392572] mlx5_unload+0x22/0xc0 [mlx5_core]
[ 110.392619] mlx5_unload_one+0x99/0x260 [mlx5_core]
[ 110.392666] remove_one+0x61/0x160 [mlx5_core]
[ 110.392671] pci_device_remove+0x10b/0x2c0
[ 110.392677] device_release_driver_internal+0x1e4/0x490
[ 110.392681] device_driver_detach+0x36/0x40
[ 110.392685] unbind_store+0x147/0x200
[ 110.392688] drv_attr_store+0x6f/0xb0
[ 110.392693] sysfs_kf_write+0x127/0x1d0
[ 110.392697] kernfs_fop_write+0x296/0x420
[ 110.392702] __vfs_write+0x66/0x110
[ 110.392707] vfs_write+0x1a0/0x500
[ 110.392711] ksys_write+0x164/0x250
[ 110.392715] __x64_sys_write+0x73/0xb0
[ 110.392720] do_syscall_64+0x9f/0x3a0
[ 110.392725] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 7dee607ed0e0 ("net/mlx5: Support lockless FTE read lookups")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().
This patch is generated using following script:
EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"
git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do
if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done
Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
|
|
Minor conflict in drivers/s390/net/qeth_l2_main.c, kept the lock
from commit c8183f548902 ("s390/qeth: fix potential deadlock on
workqueue flush"), removed the code which was removed by commit
9897d583b015 ("s390/qeth: consolidate some duplicated HW cmd code").
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Once all the large flow groups (defined by the user when the flow table
is created - max_num_groups) were created, then all the following new
flow groups will have only one flow table entry, even though the flow table
has place to larger groups.
Fix the condition to prefer large flow group.
Fixes: f0d22d187473 ("net/mlx5_core: Introduce flow steering autogrouped flow table")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
1) New generic devlink param "enable_roce", for downstream devlink
reload support
2) Do vport ACL configuration on per vport basis when
enabling/disabling a vport. This enables to have vports enabled/disabled
outside of eswitch config for future
3) Split the code for legacy vs offloads mode and make it clear
4) Tide up vport locking and workqueue usage
5) Fix metadata enablement for ECPF
6) Make explicit use of VF property to publish IB_DEVICE_VIRTUAL_FUNCTION
7) E-Switch and flow steering core low level support and refactoring for
netfilter flowtables offload
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Netfilter tables (nftables) implements a software datapath that
comes after tc ingress datapath. The datapath supports offloading
such rules via the flow table offload API.
This API is currently only used by NFT and it doesn't provide the
global priority in regards to tc offload, so we assume offloading such
rules must come after tc. It does provide a flow table priority
parameter, so we need to provide some supported priority range.
For that, split fastpath prio to two, flow table offload and tc offload,
with one dedicated priority chain for flow table offload.
Next patch will re-use the multi chain API to access this chain by
allowing access to this chain by the fdb_sub_namespace.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Next patch will re-use this to add a new chain but in a
different prio.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Tc chains are implemented by creating a chained prio steering type, and
inside it there is a namespace for each chain (FDB_TC_MAX_CHAINS). Each
of those has a list of priorities.
Currently, all namespaces in a prio start at the parent prio level.
But since we can jump from chain (namespace) to another chain in the
same prio, we need the levels for higher chains to be higher as well.
So we created unused prios to account for levels in previous namespaces.
Fix that by accumulating the namespaces levels if we are inside a chained
type prio, and removing the unused prios.
Fixes: 328edb499f99 ('net/mlx5: Split FDB fast path prio to multiple namespaces')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Define FDB_TC_LEVELS_PER_PRIO instead of magic number 2.
This is the number of levels used by each tc prio table in the fdb.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Rename it to prepare for next patch that will add a
different type of offload to the FDB.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
During connection tracking offloads with high number of connections,
(40K connections per second), flow table group lock contention is
observed.
To improve the performance by reducing lock contention, lockless
FTE read lookup is performed as described below.
Each flow table entry is refcounted.
Flow table entry is removed when refcount drops to zero.
rhash table allows rcu protected lookup.
Each hash table entry insertion and removal is write lock protected.
Hence, it is possible to perform lockless lookup in rhash table using
following scheme.
(a) Guard FTE entry lookup per group using rcu read lock.
(b) Before freeing the FTE entry, wait for all readers to finish
accessing the FTE.
Below example of one reader and write in parallel racing, shows
protection in effect with rcu lock.
lookup_fte_locked()
rcu_read_lock();
search_hash_table()
existing_flow_group_write_lock();
tree_put_node(fte)
drop_ref_cnt(fte)
del_sw_fte(fte)
del_hash_table_entry();
call_rcu();
existing_flow_group_write_unlock();
get_ref_cnt(fte) fails
rcu_read_unlock();
rcu grace period();
[..]
kmem_cache_free(fte);
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
FTE memory allocation using alloc_fte() doesn't have any dependency
on the flow group.
Hence, do not hold flow group lock while performing alloc_fte().
This helps to reduce contention of flow group lock.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add API to set the flow steering root namesapce mode.
Setting new mode should be called before any steering operation
is executed on the namespace.
This API is going to be used by steering users such switchdev.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add support to create flow steering objects
via direct rule API (SW steering).
New layer is added - fs_dr, this layer translates the command that
fs_core sends to the FW into direct rule API. In case that direct
rule is not supported in some feature then -EOPNOTSUPP is
returned.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add flow steering actions: modify header and packet reformat
to the fs_cmd shim layer. This allows each namespace to define
possibly different functionality for alloc/dealloc action commands.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Use different namespaces for bypass and switchdev loopback because they
have different priorities and default table miss action requirement:
1. bypass: with multiple priorities support, and
MLX5_FLOW_TABLE_MISS_ACTION_DEF as the default table miss action;
2. switchdev loopback: with single priority support, and
MLX5_FLOW_TABLE_MISS_ACTION_SWITCH_DOMAIN as the default table miss
action.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
Currently all the namespaces under the same steering domain share the same
default table miss action, however in some situations (e.g., RDMA RX)
different actions are required. This patch adds a per-namespace default
table miss action instead of using the miss action of the steering domain.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Misc updates from mlx5-next branch:
1) Add the required HW definitions and structures for upcoming TLS
support.
2) Add support for MCQI and MCQS hardware registers for fw version query.
3) Added hardware bits and structures definitions for sub-functions
4) Small code cleanup and improvement for PF pci driver.
5) Bluefield (ECPF) updates and refactoring for better E-Switch
management on ECPF embedded CPU NIC:
5.1) Consolidate querying eswitch number of VFs
5.2) Register event handler at the correct E-Switch init stage
5.3) Setup PF's inline mode and vlan pop when the ECPF is the
E-Swtich manager ( the host PF is basically a VF ).
5.4) Handle Vport UC address changes in switchdev mode.
6) Cleanup the rep and netdev reference when unloading IB rep.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
i# All conflicts fixed but you are still merging.
|
|
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports().
mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF
vports as well.
Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to
more generic vport.h header file. Such exposure is not desired.
Hence a mlx5_eswitch_get_total_vports() is introduced.
Given that mlx5_eswitch_get_total_vports() API wants to work on const
mlx5_core_dev*, change its helper functions also to accept const *dev.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Putting an empty 'mlx5_flow_spec' structure on the stack is a bit
wasteful and causes a warning on 32-bit architectures when building
with clang -fsanitize-coverage:
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c: In function 'mlx5_eswitch_termtbl_create':
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c:90:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Since the structure is never written to, we can statically allocate
it to avoid the stack usage. To be on the safe side, mark all
subsequent function arguments that we pass it into as 'const'
as well.
Fixes: 10caabdaad5a ("net/mlx5e: Use termination table for VLAN push actions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Misc updates from mlx5-next branch:
1) E-Switch vport metadata support for source vport matching
2) Convert mkey_table to XArray
3) Shared IRQs and to use single IRQ for all async EQs
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Refactor the flow data structures, add new flow_context and move
flow_tag into it, as flow_tag doesn't belong to the rule action.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
root ns is yet another fs core node which is freed using kfree() by
tree_put_node().
Rest of the other fs core objects are also allocated using kmalloc
variants.
However, root ns memory is allocated using kvzalloc().
Hence allocate root ns memory using kzalloc().
Fixes: 2530236303d9e ("net/mlx5_core: Flow steering tree initialization")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
In below code flow, for ingress acl table root ns memory leads
to double free.
mlx5_init_fs
init_ingress_acls_root_ns()
init_ingress_acl_root_ns
kfree(steering->esw_ingress_root_ns);
/* steering->esw_ingress_root_ns is not marked NULL */
mlx5_cleanup_fs
cleanup_ingress_acls_root_ns
steering->esw_ingress_root_ns non NULL check passes.
kfree(steering->esw_ingress_root_ns);
/* double free */
Similar issue exist for other tables.
Hence zero out the pointers to not process the table again.
Fixes: 9b93ab981e3bf ("net/mlx5: Separate ingress/egress namespaces for each vport")
Fixes: 40c3eebb49e51 ("net/mlx5: Add support in RDMA RX steering")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
When root ns setup for rdma, sniffer tx and sniffer rx fails,
such root ns cleanup is done by the error unwinding path of
mlx5_cleanup_fs().
Below call graph shows an example for sniffer_rx_root_ns.
mlx5_init_fs()
init_sniffer_rx_root_ns()
cleanup_root_ns(steering->sniffer_rx_root_ns);
mlx5_cleanup_fs()
cleanup_root_ns(steering->sniffer_rx_root_ns);
/* double free of sniffer_rx_root_ns */
Hence, use the existing cleanup_fs to cleanup.
Fixes: d83eb50e29de3 ("net/mlx5: Add support in RDMA RX steering")
Fixes: 87d22483ce68e ("net/mlx5: Add sniffer namespaces")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Flow destination comparison has an inaccuracy: code see no
difference between same vf ports, which belong to different pfs.
Example: If start ping from VF0 (PF1) to VF1 (PF1) and mirror
all traffic to VF0 (PF2), icmp reply to VF0 (PF1) and mirrored
flow to VF0 (PF2) would be determined as same destination. It lead
to creating flow handler with rule nodes, which not added to node
tree. When later driver try to delete this flow rules we got
kernel crash.
Add comparison of vhca_id field to avoid this.
Fixes: 1228e912c934 ("net/mlx5: Consider encapsulation properties when comparing destinations")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Flow table supports three types of miss action:
1. Default miss action - go to default miss table according to table.
2. Go to specific table.
3. Switch domain - go to the root table of an alternative steering
table domain.
New table miss action was added - switch_domain.
The next domain for RDMA_RX namespace is the NIC RX domain.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add new flow steering namespace - MLX5_FLOW_NAMESPACE_RDMA_RX.
Flow steering rules in this namespace are used to filter
RDMA traffic.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Pass the flow steering objects instead of their attributes
to fs_cmd in order to decrease number of arguments and in
addition it will be used to update object fields.
Pass the flow steering root namespace instead of the device
so will have context to the namespace in the fs_cmd layer.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mlx5-next
Linux 5.1-rc1
We forgot to reset the branch last merge window thus mlx5-next is outdated
and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next
branch with 5.1-rc1.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Create a new prio in the FDB, it will be used when inserting steering rules
into the FDB from the RDMA side. We create a new PRIO so rules from the
net side and rules from the RDMA side won't be inserted to the same PRIO,
each side has it's own sandbox to play in.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
When creating the FDB prios, use the enum values already defined and not
the hardcoded values.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
Fix the following warning:
drivers/net/ethernet/mellanox/mlx5/core//fs_core.c:845:5:
warning: 'err' may be used uninitialized in this function
[-Wmaybe-uninitialized]
No real issue here. This is only a false compiler warning.
The 'err' variable is guaranteed to be init by time of usage.
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|