summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5
AgeCommit message (Collapse)Author
2019-11-20net/mlx5: Update the list of the PCI supported devicesShani Shapp
Add the upcoming ConnectX-6 LX device ID. Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Shani Shapp <shanish@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5: Fix auto group size calculationMaor Gottlieb
Once all the large flow groups (defined by the user when the flow table is created - max_num_groups) were created, then all the following new flow groups will have only one flow table entry, even though the flow table has place to larger groups. Fix the condition to prefer large flow group. Fixes: f0d22d187473 ("net/mlx5_core: Introduce flow steering autogrouped flow table") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Add missing capability bit check for IP-in-IPMarina Varshaver
Device that doesn't support IP-in-IP offloads has to filter csum and gso offload support, otherwise kernel will conclude that device is capable of offloading csum and gso for IP-in-IP tunnels and that might result in IP-in-IP tunnel not functioning. Fixes: 25948b87dda2 ("net/mlx5e: Support TSO and TX checksum offloads for IP-in-IP") Signed-off-by: Marina Varshaver <marinav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Do not use non-EXT link modes in EXT modeEran Ben Elisha
On some old Firmwares, connector type value was not supported, and value read from FW was 0. For those, driver used link mode in order to set connector type in link_ksetting. After FW exposed the connector type, driver translated the value to ethtool definitions. However, as 0 is a valid value, before returning PORT_OTHER, driver run the check of link mode in order to maintain backward compatibility. Cited patch added support to EXT mode. With both features (connector type and EXT link modes) ,if connector_type read from FW is 0 and EXT mode is set, driver mistakenly compare EXT link modes to non-EXT link mode. Fixed that by skipping this comparison if we are in EXT mode, as connector type value is valid in this scenario. Fixes: 6a897372417e ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Fix set vf link state error flowRoi Dayan
Before this commit the ndo always returned success. Fix that. Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5: DR, Limit STE hash table enlarge based on bytemaskAlex Vesker
When an ste hash table has too many collision we enlarge it to a bigger hash table (rehash). Rehashing collision improvement depends on the bytemask value. The more 1 bits we have in bytemask means better spreading in the table. Without this fix tables can grow in size without providing any improvement which can lead to memory depletion and failures. This patch will limit table rehash to reduce memory and improve the performance. Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality") Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5: DR, Skip rehash for tables with byte mask zeroAlex Vesker
The byte mask fields affect on the hash index distribution, when the byte mask is zero, the hash calculation will always be equal to the same index. To avoid unneeded rehash of hash tables mark the table to skip rehash. This is needed by the next patch which will limit table rehash to reduce memory consumption. Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality") Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5: DR, Fix invalid EQ vector number on CQ creationAlex Vesker
When creating a CQ, the CPU id is used for the vector value. This would fail in-case the CPU id was higher than the maximum vector value. Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Reorder mirrer action parsing to check for encap firstVlad Buslov
Mirred action parsing code in parse_tc_fdb_actions() first checks if out_dev has same parent id, and only verifies that there is a pending encap action that was parsed before. Recent change in vxlan module made function netdev_port_same_parent_id() to return true when called for mlx5 eswitch representor and vxlan device created explicitly on mlx5 representor device (vxlan devices created with "external" flag without explicitly specifying parent interface are not affected). With call to netdev_port_same_parent_id() returning true, incorrect code path is chosen and encap rules fail to offload because vxlan dev is not a valid eswitch forwarding dev. Dmesg log of error: [ 1784.389797] devices ens1f0_0 vxlan1 not on same switch HW, can't offload forwarding In order to fix the issue, rearrange conditional in parse_tc_fdb_actions() to check for pending encap action before checking if out_dev has the same parent id. Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Fix ingress rate configuration for representorsEli Cohen
Current code uses the old method of prio encoding in flow_cls_common_offload. Fix to follow the changes introduced in commit ef01adae0e43 ("net: sched: use major priority number as hardware priority"). Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Fix error flow cleanup in mlx5e_tc_tun_create_header_ipv4/6Eli Cohen
Be sure to release the neighbour in case of failures after successful route lookup. Fixes: 101f4de9dd52 ("net/mlx5e: Move TC tunnel offloading code to separate source file") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-15mlx5: Reject requests to enable time stamping on both edges.Richard Cochran
This driver enables rising edge or falling edge, but not both, and so this patch validates that the request contains only one of the two edges. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15ptp: Introduce strict checking of external time stamp options.Richard Cochran
User space may request time stamps on rising edges, falling edges, or both. However, the particular mode may or may not be supported in the hardware or in the driver. This patch adds a "strict" flag that tells drivers to ensure that the requested mode will be honored. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15mlx5: reject unsupported external timestamp flagsJacob Keller
Fix the mlx5 core PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. [ RC: I'm not 100% sure what this driver does, but if I'm not wrong it follows the dp83640: flags Meaning ---------------------------------------------------- -------------------------- PTP_ENABLE_FEATURE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp falling edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp falling edge ] Cc: Feras Daoud <ferasda@mellanox.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: reject PTP periodic output requests with unsupported flagsJacob Keller
Commit 823eb2a3c4c7 ("PTP: add support for one-shot output") introduced a new flag for the PTP periodic output request ioctl. This flag is not currently supported by any driver. Fix all drivers which implement the periodic output request ioctl to explicitly reject any request with flags they do not understand. This ensures that the driver does not accidentally misinterpret the PTP_PEROUT_ONE_SHOT flag, or any new flag introduced in the future. This is important for forward compatibility: if a new flag is introduced, the driver should reject requests to enable the flag until the driver has actually been modified to support the flag in question. Cc: Felipe Balbi <felipe.balbi@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Christopher Hall <christopher.s.hall@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net/mlx5e: Use correct enum to determine uplink portDmytro Linkin
For vlan push action, if eswitch flow source capability is enabled, flow source value compared with MLX5_VPORT_UPLINK enum, to determine uplink port. This lead to syndrome in dmesg if try to add vlan push action. For example: $ tc filter add dev vxlan0 ingress protocol ip prio 1 flower \ enc_dst_port 4789 \ action tunnel_key unset pipe \ action vlan push id 20 pipe \ action mirred egress redirect dev ens1f0_0 $ dmesg ... [ 2456.883693] mlx5_core 0000:82:00.0: mlx5_cmd_check:756:(pid 5273): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xa9c090) Use the correct enum value MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK. Fixes: bb204dcf39fe ("net/mlx5e: Determine source port properly for vlan push action") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-06net/mlx5: DR, Fix memory leak during rule creationAlex Vesker
During rule creation hw_ste_arr was not freed. Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-06net/mlx5: DR, Fix memory leak in modify action destroyAlex Vesker
The rewrite data was no freed. Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-06net/mlx5e: Fix eswitch debug print of max fdb flowRoi Dayan
The value is already the calculation so remove the log prefix. Fixes: e52c28024008 ("net/mlx5: E-Switch, Add chains and priorities") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29net/mlx5e: Initialize on stack link modes bitmapAya Levin
Initialize link modes bitmap on stack before using it, otherwise the outcome of ethtool set link ksettings might have unexpected values. Fixes: 4b95840a6ced ("net/mlx5e: Fix matching of speed to PRM link modes") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29net/mlx5e: Fix ethtool self test: link speedAya Levin
Ethtool self test contains a test for link speed. This test reads the PTYS register and determines whether the current speed is valid or not. Change current implementation to use the function mlx5e_port_linkspeed() that does the same check and fails when speed is invalid. This code redundancy lead to a bug when mlx5e_port_linkspeed() was updated with expended speeds and the self test was not. Fixes: 2c81bfd5ae56 ("net/mlx5e: Move port speed code from en_ethtool.c to en/port.c") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29net/mlx5e: Fix handling of compressed CQEs in case of low NAPI budgetMaxim Mikityanskiy
When CQE compression is enabled, compressed CQEs use the following structure: a title is followed by one or many blocks, each containing 8 mini CQEs (except the last, which may contain fewer mini CQEs). Due to NAPI budget restriction, a complete structure is not always parsed in one NAPI run, and some blocks with mini CQEs may be deferred to the next NAPI poll call - we have the mlx5e_decompress_cqes_cont call in the beginning of mlx5e_poll_rx_cq. However, if the budget is extremely low, some blocks may be left even after that, but the code that follows the mlx5e_decompress_cqes_cont call doesn't check it and assumes that a new CQE begins, which may not be the case. In such cases, random memory corruptions occur. An extremely low NAPI budget of 8 is used when busy_poll or busy_read is active. This commit adds a check to make sure that the previous compressed CQE has been completely parsed after mlx5e_decompress_cqes_cont, otherwise it prevents a new CQE from being fetched in the middle of a compressed CQE. This commit fixes random crashes in __build_skb, __page_pool_put_page and other not-related-directly places, that used to happen when both CQE compression and busy_poll/busy_read were enabled. Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29net/mlx5e: Don't store direct pointer to action's tunnel infoVlad Buslov
Geneve implementation changed mlx5 tc to user direct pointer to tunnel_key action's internal struct ip_tunnel_info instance. However, this leads to use-after-free error when initial filter that caused creation of new encap entry is deleted or when tunnel_key action is manually overwritten through action API. Moreover, with recent TC offloads API unlocking change struct flow_action_entry->tunnel point to temporal copy of tunnel info that is deallocated after filter is offloaded to hardware which causes bug to reproduce every time new filter is attached to existing encap entry with following KASAN bug: [ 314.885555] ================================================================== [ 314.886641] BUG: KASAN: use-after-free in memcmp+0x2c/0x60 [ 314.886864] Read of size 1 at addr ffff88886c746280 by task tc/2682 [ 314.887179] CPU: 22 PID: 2682 Comm: tc Not tainted 5.3.0-rc7+ #703 [ 314.887188] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 314.887195] Call Trace: [ 314.887215] dump_stack+0x9a/0xf0 [ 314.887236] print_address_description+0x67/0x323 [ 314.887248] ? memcmp+0x2c/0x60 [ 314.887257] ? memcmp+0x2c/0x60 [ 314.887272] __kasan_report.cold+0x1a/0x3d [ 314.887474] ? __mlx5e_tc_del_fdb_peer_flow+0x100/0x1b0 [mlx5_core] [ 314.887484] ? memcmp+0x2c/0x60 [ 314.887509] kasan_report+0xe/0x12 [ 314.887521] memcmp+0x2c/0x60 [ 314.887662] mlx5e_tc_add_fdb_flow+0x51b/0xbe0 [mlx5_core] [ 314.887838] ? mlx5e_encap_take+0x110/0x110 [mlx5_core] [ 314.887902] ? lockdep_init_map+0x87/0x2c0 [ 314.887924] ? __init_waitqueue_head+0x4f/0x60 [ 314.888062] ? mlx5e_alloc_flow.isra.0+0x18c/0x1c0 [mlx5_core] [ 314.888207] __mlx5e_add_fdb_flow+0x2d7/0x440 [mlx5_core] [ 314.888359] ? mlx5e_tc_update_neigh_used_value+0x6f0/0x6f0 [mlx5_core] [ 314.888374] ? match_held_lock+0x2e/0x240 [ 314.888537] mlx5e_configure_flower+0x830/0x16a0 [mlx5_core] [ 314.888702] ? __mlx5e_add_fdb_flow+0x440/0x440 [mlx5_core] [ 314.888713] ? down_read+0x118/0x2c0 [ 314.888728] ? down_read_killable+0x300/0x300 [ 314.888882] ? mlx5e_rep_get_ethtool_stats+0x180/0x180 [mlx5_core] [ 314.888899] tc_setup_cb_add+0x127/0x270 [ 314.888937] fl_hw_replace_filter+0x2ac/0x380 [cls_flower] [ 314.888976] ? fl_hw_destroy_filter+0x1b0/0x1b0 [cls_flower] [ 314.888990] ? fl_change+0xbcf/0x27ef [cls_flower] [ 314.889030] ? fl_change+0xa57/0x27ef [cls_flower] [ 314.889069] fl_change+0x16bd/0x27ef [cls_flower] [ 314.889135] ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower] [ 314.889167] ? __radix_tree_lookup+0xa4/0x130 [ 314.889200] ? fl_get+0x169/0x240 [cls_flower] [ 314.889218] ? fl_walk+0x230/0x230 [cls_flower] [ 314.889249] tc_new_tfilter+0x5e1/0xd40 [ 314.889281] ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower] [ 314.889309] ? tc_del_tfilter+0xa30/0xa30 [ 314.889335] ? __lock_acquire+0x5b5/0x2460 [ 314.889378] ? find_held_lock+0x85/0xa0 [ 314.889442] ? tc_del_tfilter+0xa30/0xa30 [ 314.889465] rtnetlink_rcv_msg+0x4ab/0x5f0 [ 314.889488] ? rtnl_dellink+0x490/0x490 [ 314.889518] ? lockdep_hardirqs_on+0x260/0x260 [ 314.889538] ? netlink_deliver_tap+0xab/0x5a0 [ 314.889550] ? match_held_lock+0x1b/0x240 [ 314.889575] netlink_rcv_skb+0xd0/0x200 [ 314.889588] ? rtnl_dellink+0x490/0x490 [ 314.889605] ? netlink_ack+0x440/0x440 [ 314.889635] ? netlink_deliver_tap+0x161/0x5a0 [ 314.889648] ? lock_downgrade+0x360/0x360 [ 314.889657] ? lock_acquire+0xe5/0x210 [ 314.889686] netlink_unicast+0x296/0x350 [ 314.889707] ? netlink_attachskb+0x390/0x390 [ 314.889726] ? _copy_from_iter_full+0xe0/0x3a0 [ 314.889738] ? __virt_addr_valid+0xbb/0x130 [ 314.889771] netlink_sendmsg+0x394/0x600 [ 314.889800] ? netlink_unicast+0x350/0x350 [ 314.889817] ? move_addr_to_kernel.part.0+0x90/0x90 [ 314.889852] ? netlink_unicast+0x350/0x350 [ 314.889872] sock_sendmsg+0x96/0xa0 [ 314.889891] ___sys_sendmsg+0x482/0x520 [ 314.889919] ? copy_msghdr_from_user+0x250/0x250 [ 314.889930] ? __fput+0x1fa/0x390 [ 314.889941] ? task_work_run+0xb7/0xf0 [ 314.889957] ? exit_to_usermode_loop+0x117/0x120 [ 314.889972] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 314.889982] ? do_syscall_64+0x74/0xe0 [ 314.889992] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 314.890012] ? mark_lock+0xac/0x9a0 [ 314.890028] ? __lock_acquire+0x5b5/0x2460 [ 314.890053] ? mark_lock+0xac/0x9a0 [ 314.890083] ? __lock_acquire+0x5b5/0x2460 [ 314.890112] ? match_held_lock+0x1b/0x240 [ 314.890144] ? __fget_light+0xa1/0xf0 [ 314.890166] ? sockfd_lookup_light+0x91/0xb0 [ 314.890187] __sys_sendmsg+0xba/0x130 [ 314.890201] ? __sys_sendmsg_sock+0xb0/0xb0 [ 314.890225] ? __blkcg_punt_bio_submit+0xd0/0xd0 [ 314.890264] ? lockdep_hardirqs_off+0xbe/0x100 [ 314.890274] ? mark_held_locks+0x24/0x90 [ 314.890286] ? do_syscall_64+0x1e/0xe0 [ 314.890308] do_syscall_64+0x74/0xe0 [ 314.890325] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 314.890336] RIP: 0033:0x7f00ca33d7b8 [ 314.890348] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5 4 [ 314.890356] RSP: 002b:00007ffea2983928 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 314.890369] RAX: ffffffffffffffda RBX: 000000005d777d5b RCX: 00007f00ca33d7b8 [ 314.890377] RDX: 0000000000000000 RSI: 00007ffea2983990 RDI: 0000000000000003 [ 314.890384] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 314.890392] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001 [ 314.890400] R13: 000000000047f640 R14: 00007ffea2987b58 R15: 0000000000000021 [ 314.890529] Allocated by task 2687: [ 314.890684] save_stack+0x1b/0x80 [ 314.890694] __kasan_kmalloc.constprop.0+0xc2/0xd0 [ 314.890705] __kmalloc_track_caller+0x102/0x340 [ 314.890721] kmemdup+0x1d/0x40 [ 314.890730] tc_setup_flow_action+0x731/0x2c27 [ 314.890743] fl_hw_replace_filter+0x23b/0x380 [cls_flower] [ 314.890756] fl_change+0x16bd/0x27ef [cls_flower] [ 314.890765] tc_new_tfilter+0x5e1/0xd40 [ 314.890776] rtnetlink_rcv_msg+0x4ab/0x5f0 [ 314.890786] netlink_rcv_skb+0xd0/0x200 [ 314.890796] netlink_unicast+0x296/0x350 [ 314.890805] netlink_sendmsg+0x394/0x600 [ 314.890815] sock_sendmsg+0x96/0xa0 [ 314.890825] ___sys_sendmsg+0x482/0x520 [ 314.890834] __sys_sendmsg+0xba/0x130 [ 314.890844] do_syscall_64+0x74/0xe0 [ 314.890854] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 314.890937] Freed by task 2687: [ 314.891076] save_stack+0x1b/0x80 [ 314.891086] __kasan_slab_free+0x12c/0x170 [ 314.891095] kfree+0xeb/0x2f0 [ 314.891106] tc_cleanup_flow_action+0x69/0xa0 [ 314.891119] fl_hw_replace_filter+0x2c5/0x380 [cls_flower] [ 314.891132] fl_change+0x16bd/0x27ef [cls_flower] [ 314.891140] tc_new_tfilter+0x5e1/0xd40 [ 314.891151] rtnetlink_rcv_msg+0x4ab/0x5f0 [ 314.891161] netlink_rcv_skb+0xd0/0x200 [ 314.891170] netlink_unicast+0x296/0x350 [ 314.891180] netlink_sendmsg+0x394/0x600 [ 314.891190] sock_sendmsg+0x96/0xa0 [ 314.891200] ___sys_sendmsg+0x482/0x520 [ 314.891208] __sys_sendmsg+0xba/0x130 [ 314.891218] do_syscall_64+0x74/0xe0 [ 314.891228] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 314.891315] The buggy address belongs to the object at ffff88886c746280 which belongs to the cache kmalloc-96 of size 96 [ 314.891762] The buggy address is located 0 bytes inside of 96-byte region [ffff88886c746280, ffff88886c7462e0) [ 314.892196] The buggy address belongs to the page: [ 314.892387] page:ffffea0021b1d180 refcount:1 mapcount:0 mapping:ffff88835d00ef80 index:0x0 [ 314.892398] flags: 0x57ffffc0000200(slab) [ 314.892413] raw: 0057ffffc0000200 ffffea00219e0340 0000000800000008 ffff88835d00ef80 [ 314.892423] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [ 314.892430] page dumped because: kasan: bad access detected [ 314.892515] Memory state around the buggy address: [ 314.892707] ffff88886c746180: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 314.892976] ffff88886c746200: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 314.893251] >ffff88886c746280: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 314.893522] ^ [ 314.893657] ffff88886c746300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 314.893924] ffff88886c746380: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc [ 314.894189] ================================================================== Fix the issue by duplicating tunnel info into per-encap copy that is deallocated with encap structure. Also, duplicate tunnel info in flow parse attribute to support cases when flow might be attached asynchronously. Fixes: 1f6da30697d0 ("net/mlx5e: Geneve, Keep tunnel info as pointer to the original struct") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29net/mlx5: Fix NULL pointer dereference in extended destinationEli Britstein
The cited commit refactored the encap id into a struct pointed from the destination. Bug fix for the case there is no encap for one of the destinations. Fixes: 2b688ea5efde ("net/mlx5: Add flow steering actions to fs_cmd shim layer") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29net/mlx5: Fix rtable reference leakParav Pandit
If the rt entry gateway family is not AF_INET for multipath device, rtable reference is leaked. Hence, fix it by releasing the reference. Fixes: 5fb091e8130b ("net/mlx5e: Use hint to resolve route when in HW multipath mode") Fixes: e32ee6c78efa ("net/mlx5e: Support tunnel encap over tagged Ethernet") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29net/mlx5e: Only skip encap flows update when encap init failedVlad Buslov
When encap entry initialization completes successfully e->compl_result is set to positive value and not zero, like mlx5e_rep_update_flows() assumes at the moment. Fix the conditional to only skip encap flows update when e->compl_result < 0. Fixes: 2a1f1768fa17 ("net/mlx5e: Refactor neigh update for concurrent execution") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29net/mlx5e: Replace kfree with kvfree when free vhca statsMaor Gottlieb
Memory allocated by kvzalloc should be freed by kvfree. Fixes: cef35af34d6d ("net/mlx5e: Add mlx5e HV VHCA stats agent") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29net/mlx5e: Remove incorrect match criteria assignment lineDmytro Linkin
Driver have function, which enable match criteria for misc parameters in dependence of eswitch capabilities. Fixes: 4f5d1beadc10 ("Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29net/mlx5e: Determine source port properly for vlan push actionDmytro Linkin
Termination tables are used for vlan push actions on uplink ports. To support RoCE dual port the source port value was placed in a register. Fix the code to use an API method returning the source port according to the FW capabilities. Fixes: 10caabdaad5a ("net/mlx5e: Use termination table for VLAN push actions") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-24net: remove unnecessary variables and callbackTaehee Yoo
This patch removes variables and callback these are related to the nested device structure. devices that can be nested have their own nest_level variable that represents the depth of nested devices. In the previous patch, new {lower/upper}_level variables are added and they replace old private nest_level variable. So, this patch removes all 'nest_level' variables. In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added to get lockdep subclass value, which is actually lower nested depth value. But now, they use the dynamic lockdep key to avoid lockdep warning instead of the subclass. So, this patch removes ->ndo_get_lock_subclass() callback. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21Merge tag 'mlx5-fixes-2019-10-18' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 kTLS fixes 18-10-2019 This series introduces kTLS related fixes to mlx5 driver from Tariq, and two misc memory leak fixes form Navid Emamdoost. Please pull and let me know if there is any problem. I would appreciate it if you queue up kTLS fixes from the list below to stable kernel v5.3 ! For -stable v4.13: nett/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: "I was battling a cold after some recent trips, so quite a bit piled up meanwhile, sorry about that. Highlights: 1) Fix fd leak in various bpf selftests, from Brian Vazquez. 2) Fix crash in xsk when device doesn't support some methods, from Magnus Karlsson. 3) Fix various leaks and use-after-free in rxrpc, from David Howells. 4) Fix several SKB leaks due to confusion of who owns an SKB and who should release it in the llc code. From Eric Biggers. 5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet. 6) Jumbo packets don't work after resume on r8169, as the BIOS resets the chip into non-jumbo mode during suspend. From Heiner Kallweit. 7) Corrupt L2 header during MPLS push, from Davide Caratti. 8) Prevent possible infinite loop in tc_ctl_action, from Eric Dumazet. 9) Get register bits right in bcmgenet driver, based upon chip version. From Florian Fainelli. 10) Fix mutex problems in microchip DSA driver, from Marek Vasut. 11) Cure race between route lookup and invalidation in ipv4, from Wei Wang. 12) Fix performance regression due to false sharing in 'net' structure, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits) net: reorder 'struct net' fields to avoid false sharing net: dsa: fix switch tree list net: ethernet: dwmac-sun8i: show message only when switching to promisc net: aquantia: add an error handling in aq_nic_set_multicast_list net: netem: correct the parent's backlog when corrupted packet was dropped net: netem: fix error path for corrupted GSO frames macb: propagate errors when getting optional clocks xen/netback: fix error path of xenvif_connect_data() net: hns3: fix mis-counting IRQ vector numbers issue net: usb: lan78xx: Connect PHY before registering MAC vsock/virtio: discard packets if credit is not respected vsock/virtio: send a credit update when buffer size is changed mlxsw: spectrum_trap: Push Ethernet header before reporting trap net: ensure correct skb->tstamp in various fragmenters net: bcmgenet: reset 40nm EPHY on energy detect net: bcmgenet: soft reset 40nm EPHYs before MAC init net: phy: bcm7xxx: define soft_reset for 40nm EPHY net: bcmgenet: don't set phydev->link from MAC net: Update address for MediaTek ethernet driver in MAINTAINERS ipv4: fix race condition between route lookup and invalidation ...
2019-10-18net/mlx5: fix memory leak in mlx5_fw_fatal_reporter_dumpNavid Emamdoost
In mlx5_fw_fatal_reporter_dump if mlx5_crdump_collect fails the allocated memory for cr_data must be released otherwise there will be memory leak. To fix this, this commit changes the return instruction into goto error handling. Fixes: 9b1f29823605 ("net/mlx5: Add support for FW fatal reporter dump") Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5: prevent memory leak in mlx5_fpga_conn_create_cqNavid Emamdoost
In mlx5_fpga_conn_create_cq if mlx5_vector2eqn fails the allocated memory should be released. Fixes: 537a50574175 ("net/mlx5: FPGA, Add high-speed connection routines") Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: TX, Fix consumer index of error cqe dumpTariq Toukan
The completion queue consumer index increments upon a call to mlx5_cqwq_pop(). When dumping an error CQE, the index is already incremented. Decrease one for the print command. Fixes: 16cc14d81733 ("net/mlx5e: Dump xmit error completions") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: kTLS, Enhance TX resync flowTariq Toukan
Once the kTLS TX resync function is called, it used to return a binary value, for success or failure. However, in case the TLS SKB is a retransmission of the connection handshake, it initiates the resync flow (as the tcp seq check holds), while regular packet handle is expected. In this patch, we identify this case and skip the resync operation accordingly. Counters: - Add a counter (tls_skip_no_sync_data) to monitor this. - Bump the dump counters up as they are used more frequently. - Add a missing counter descriptor declaration for tls_resync_bytes in sq_stats_desc. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: kTLS, Save a copy of the crypto infoTariq Toukan
Do not assume the crypto info is accessible during the connection lifetime. Save a copy of it in the private TX context. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: kTLS, Remove unneeded cipher type checksTariq Toukan
Cipher type is checked upon connection addition. No need to recheck it per every TX resync invocation. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: kTLS, Limit DUMP wqe sizeTariq Toukan
HW expects the data size in DUMP WQEs to be up to MTU. Make sure they are in range. We elevate the frag page refcount by 'n-1', in addition to the one obtained in tx_sync_info_get(), having an overall of 'n' references. We bulk increments by using a single page_ref_add() command, to optimize perfermance. The refcounts are released one by one, by the corresponding completions. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: kTLS, Fix missing SQ edge fillTariq Toukan
Before posting the context params WQEs, make sure there is enough contiguous room for them, and fill frag edge if needed. When posting only a nop, no need for room check, as it needs a single WQEBB, meaning no contiguity issue. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: kTLS, Fix page refcnt leak in TX resync error flowTariq Toukan
All references for frag pages that are obtained in tx_sync_info_get() should be released. Release usually occurs in the corresponding CQE of the WQE. In error flows, not all fragments have a WQE posted for them, hence no matching CQE will be generated. For these pages, release the reference in the error flow. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: kTLS, Save by-value copy of the record fragsTariq Toukan
Access the record fragments only under the TLS ctx lock. In the resync flow, save a copy of them to be used when preparing and posting the required DUMP WQEs. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: kTLS, Save only the frag page to release at completionTariq Toukan
In TX resync flow where DUMP WQEs are posted, keep a pointer to the fragment page to unref it upon completion, instead of saving the whole fragment. In addition, move it the end of the arguments list in tx_fill_wi(). Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: kTLS, Size of a Dump WQE is fixedTariq Toukan
No Eth segment, so no dynamic inline headers. The size of a Dump WQE is fixed, use constants and remove unnecessary checks. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: kTLS, Release reference on DUMPed fragments in shutdown flowTariq Toukan
A call to kTLS completion handler was missing in the TXQSQ release flow. Add it. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: Tx, Zero-memset WQE info struct upon updateTariq Toukan
Not all fields of WQE info are being written in the function, having some leftovers from previous rounds. Zero-memset it upon update. Particularly, not nullifying the wi->resync_dump_frag field will cause double free of the kTLS DUMPed frags. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: Tx, Fix assumption of single WQEBB of NOP in cleanup flowTariq Toukan
Cited patch removed the assumption only in datapath. Here we remove it also form control/cleanup flow. Fixes: 9ab0233728ca ("net/mlx5e: Tx, Don't implicitly assume SKB-less wqe has one WQEBB") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "The usual collection of driver bug fixes, and a few regressions from the merge window. Nothing particularly worrisome. - Various missed memory frees and error unwind bugs - Fix regressions in a few iwarp drivers from 5.4 patches - A few regressions added in past kernels - Squash a number of races in mlx5 ODP code" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/mlx5: Add missing synchronize_srcu() for MW cases RDMA/mlx5: Put live in the correct place for ODP MRs RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages() RDMA/mlx5: Fix a race with mlx5_ib_update_xlt on an implicit MR RDMA/mlx5: Do not allow rereg of a ODP MR IB/core: Fix wrong iterating on ports RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error path RDMA/cxgb4: Do not dma memory off of the stack RDMA/cm: Fix memory leak in cm_add/remove_one RDMA/core: Fix an error handling path in 'res_get_common_doit()' RDMA/i40iw: Associate ibdev to netdev before IB device registration RDMA/iwcm: Fix a lock inversion issue RDMA/iw_cxgb4: fix SRQ access from dump_qp() RDMA/hfi1: Prevent memory leak in sdma_init RDMA/core: Fix use after free and refcnt leak on ndev in_device in iwarp_query_port RDMA/siw: Fix serialization issue in write_space() RDMA/vmw_pvrdma: Free SRQ only once
2019-10-08net/mlx5: DR, Allow insertion of duplicate rulesAlex Vesker
Duplicate rules were not allowed to be configured with SW steering. This restriction caused failures with the replace rule logic done by upper layers. This fix allows for multiple rules with the same match values, in such case the first inserted rules will match. Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-04RDMA/mlx5: Add missing synchronize_srcu() for MW casesJason Gunthorpe
While MR uses live as the SRCU 'update', the MW case uses the xarray directly, xa_erase() causes the MW to become inaccessible to the pagefault thread. Thus whenever a MW is removed from the xarray we must synchronize_srcu() before freeing it. This must be done before freeing the mkey as re-use of the mkey while the pagefault thread is using the stale mkey is undesirable. Add the missing synchronizes to MW and DEVX indirect mkey and delete the bogus protection against double destroy in mlx5_core_destroy_mkey() Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>