summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5
AgeCommit message (Collapse)Author
2020-01-06net/mlx5: DR, Init lists that are used in rule's memberErez Shitrit
Whenever adding new member of rule object we attach it to 2 lists, These 2 lists should be initialized first. Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06net/mlx5e: Fix hairpin RSS table sizeEli Cohen
Set hairpin table size to the corret size, based on the groups that would be created in it. Groups are laid out on the table such that a group occupies a range of entries in the table. This implies that the group ranges should have correspondence to the table they are laid upon. The patch cited below made group 1's size to grow hence causing overflow of group range laid on the table. Fixes: a795d8db2a6d ("net/mlx5e: Support RSS for IP-in-IP and IPv6 tunneled packets") Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06net/mlx5: DR, No need for atomic refcount for internal SW steering resourcesYevgeny Kliteynik
No need for an atomic refcounter for the STE and hashtables. These are internal SW steering resources and they are always under domain mutex. This also fixes the following refcount error: refcount_t: addition on 0; use-after-free. WARNING: CPU: 9 PID: 3527 at lib/refcount.c:25 refcount_warn_saturate+0x81/0xe0 Call Trace: dr_table_init_nic+0x10d/0x110 [mlx5_core] mlx5dr_table_create+0xb4/0x230 [mlx5_core] mlx5_cmd_dr_create_flow_table+0x39/0x120 [mlx5_core] __mlx5_create_flow_table+0x221/0x5f0 [mlx5_core] esw_create_offloads_fdb_tables+0x180/0x5a0 [mlx5_core] ... Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities") Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06Revert "net/mlx5: Support lockless FTE read lookups"Parav Pandit
This reverts commit 7dee607ed0e04500459db53001d8e02f8831f084. During cleanup path, FTE's parent node group is removed which is referenced by the FTE while freeing the FTE. Hence FTE's lockless read lookup optimization done in cited commit is not possible at the moment. Hence, revert the commit. This avoid below KAZAN call trace. [ 110.390896] BUG: KASAN: use-after-free in find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391048] Read of size 4 at addr ffff888c19e6d220 by task swapper/12/0 [ 110.391219] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 5.5.0-rc1+ [ 110.391222] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014 [ 110.391225] Call Trace: [ 110.391229] <IRQ> [ 110.391246] dump_stack+0x95/0xd5 [ 110.391307] ? find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391320] print_address_description.constprop.5+0x20/0x320 [ 110.391379] ? find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391435] ? find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391441] __kasan_report+0x149/0x18c [ 110.391499] ? find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391504] kasan_report+0x12/0x20 [ 110.391511] __asan_report_load4_noabort+0x14/0x20 [ 110.391567] find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391625] del_sw_fte_rcu+0x4a/0x100 [mlx5_core] [ 110.391633] rcu_core+0x404/0x1950 [ 110.391640] ? rcu_accelerate_cbs_unlocked+0x100/0x100 [ 110.391649] ? run_rebalance_domains+0x201/0x280 [ 110.391654] rcu_core_si+0xe/0x10 [ 110.391661] __do_softirq+0x181/0x66c [ 110.391670] irq_exit+0x12c/0x150 [ 110.391675] smp_apic_timer_interrupt+0xf0/0x370 [ 110.391681] apic_timer_interrupt+0xf/0x20 [ 110.391684] </IRQ> [ 110.391695] RIP: 0010:cpuidle_enter_state+0xfa/0xba0 [ 110.391703] Code: 3d c3 9b b5 50 e8 56 75 6e fe 48 89 45 c8 0f 1f 44 00 00 31 ff e8 a6 94 6e fe 45 84 ff 0f 85 f6 02 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 db 06 00 00 4d 63 fe 4b 8d 04 7f 49 8d 04 87 49 8d [ 110.391706] RSP: 0018:ffff888c23a6fce8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [ 110.391712] RAX: dffffc0000000000 RBX: ffffe8ffff7002f8 RCX: 000000000000001f [ 110.391715] RDX: 1ffff11184ee6cb5 RSI: 0000000040277d83 RDI: ffff888c277365a8 [ 110.391718] RBP: ffff888c23a6fd40 R08: 0000000000000002 R09: 0000000000035280 [ 110.391721] R10: ffff888c23a6fc80 R11: ffffed11847485d0 R12: ffffffffb1017740 [ 110.391723] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000000000000 [ 110.391732] ? cpuidle_enter_state+0xea/0xba0 [ 110.391738] cpuidle_enter+0x4f/0xa0 [ 110.391747] call_cpuidle+0x6d/0xc0 [ 110.391752] do_idle+0x360/0x430 [ 110.391758] ? arch_cpu_idle_exit+0x40/0x40 [ 110.391765] ? complete+0x67/0x80 [ 110.391771] cpu_startup_entry+0x1d/0x20 [ 110.391779] start_secondary+0x2f3/0x3c0 [ 110.391784] ? set_cpu_sibling_map+0x2500/0x2500 [ 110.391795] secondary_startup_64+0xa4/0xb0 [ 110.391841] Allocated by task 290: [ 110.391917] save_stack+0x21/0x90 [ 110.391921] __kasan_kmalloc.constprop.8+0xa7/0xd0 [ 110.391925] kasan_kmalloc+0x9/0x10 [ 110.391929] kmem_cache_alloc_trace+0xf6/0x270 [ 110.391987] create_root_ns.isra.36+0x58/0x260 [mlx5_core] [ 110.392044] mlx5_init_fs+0x5fd/0x1ee0 [mlx5_core] [ 110.392092] mlx5_load_one+0xc7a/0x3860 [mlx5_core] [ 110.392139] init_one+0x6ff/0xf90 [mlx5_core] [ 110.392145] local_pci_probe+0xde/0x190 [ 110.392150] work_for_cpu_fn+0x56/0xa0 [ 110.392153] process_one_work+0x678/0x1140 [ 110.392157] worker_thread+0x573/0xba0 [ 110.392162] kthread+0x341/0x400 [ 110.392166] ret_from_fork+0x1f/0x40 [ 110.392218] Freed by task 2742: [ 110.392288] save_stack+0x21/0x90 [ 110.392292] __kasan_slab_free+0x137/0x190 [ 110.392296] kasan_slab_free+0xe/0x10 [ 110.392299] kfree+0x94/0x250 [ 110.392357] tree_put_node+0x257/0x360 [mlx5_core] [ 110.392413] tree_remove_node+0x63/0xb0 [mlx5_core] [ 110.392469] clean_tree+0x199/0x240 [mlx5_core] [ 110.392525] mlx5_cleanup_fs+0x76/0x580 [mlx5_core] [ 110.392572] mlx5_unload+0x22/0xc0 [mlx5_core] [ 110.392619] mlx5_unload_one+0x99/0x260 [mlx5_core] [ 110.392666] remove_one+0x61/0x160 [mlx5_core] [ 110.392671] pci_device_remove+0x10b/0x2c0 [ 110.392677] device_release_driver_internal+0x1e4/0x490 [ 110.392681] device_driver_detach+0x36/0x40 [ 110.392685] unbind_store+0x147/0x200 [ 110.392688] drv_attr_store+0x6f/0xb0 [ 110.392693] sysfs_kf_write+0x127/0x1d0 [ 110.392697] kernfs_fop_write+0x296/0x420 [ 110.392702] __vfs_write+0x66/0x110 [ 110.392707] vfs_write+0x1a0/0x500 [ 110.392711] ksys_write+0x164/0x250 [ 110.392715] __x64_sys_write+0x73/0xb0 [ 110.392720] do_syscall_64+0x9f/0x3a0 [ 110.392725] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 7dee607ed0e0 ("net/mlx5: Support lockless FTE read lookups") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06net/mlx5: Move devlink registration before interfaces loadMichael Guralnik
Register devlink before interfaces are added. This will allow interfaces to use devlink while initalizing. For example, call mlx5_is_roce_enabled. Fixes: aba25279c100 ("net/mlx5e: Add TX reporter support") Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06net/mlx5e: Always print health reporter message to dmesgEran Ben Elisha
In case a reporter exists, error message is logged only to the devlink tracer. The devlink tracer is a visibility utility only, which user can choose not to monitor. After cited patch, 3rd party monitoring tools that tracks these error message will no longer find them in dmesg, causing a regression. With this patch, error messages are also logged into the dmesg. Fixes: c50de4af1d63 ("net/mlx5e: Generalize tx reporter's functionality") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06net/mlx5e: Avoid duplicating rule destinationsDmytro Linkin
Following scenario easily break driver logic and crash the kernel: 1. Add rule with mirred actions to same device. 2. Delete this rule. In described scenario rule is not added to database and on deletion driver access invalid entry. Example: $ tc filter add dev ens1f0_0 ingress protocol ip prio 1 \ flower skip_sw \ action mirred egress mirror dev ens1f0_1 pipe \ action mirred egress redirect dev ens1f0_1 $ tc filter del dev ens1f0_0 ingress protocol ip prio 1 Dmesg output: [ 376.634396] mlx5_core 0000:82:00.0: mlx5_cmd_check:756:(pid 3439): DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x563e2f) [ 376.654983] mlx5_core 0000:82:00.0: del_hw_flow_group:567:(pid 3439): flow steering can't destroy fg 89 of ft 3145728 [ 376.673433] kasan: CONFIG_KASAN_INLINE enabled [ 376.683769] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 376.695229] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI [ 376.705069] CPU: 7 PID: 3439 Comm: tc Not tainted 5.4.0-rc5+ #76 [ 376.714959] Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0a 08/12/2016 [ 376.726371] RIP: 0010:mlx5_del_flow_rules+0x105/0x960 [mlx5_core] [ 376.735817] Code: 01 00 00 00 48 83 eb 08 e8 28 d9 ff ff 4c 39 e3 75 d8 4c 8d bd c0 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 84 04 00 00 48 8d 7d 28 8b 9 d [ 376.761261] RSP: 0018:ffff888847c56db8 EFLAGS: 00010202 [ 376.770054] RAX: dffffc0000000000 RBX: ffff8888582a6da0 RCX: ffff888847c56d60 [ 376.780743] RDX: 0000000000000058 RSI: 0000000000000008 RDI: 0000000000000282 [ 376.791328] RBP: 0000000000000000 R08: fffffbfff0c60ea6 R09: fffffbfff0c60ea6 [ 376.802050] R10: fffffbfff0c60ea5 R11: ffffffff8630752f R12: ffff8888582a6da0 [ 376.812798] R13: dffffc0000000000 R14: ffff8888582a6da0 R15: 00000000000002c0 [ 376.823445] FS: 00007f675f9a8840(0000) GS:ffff88886d200000(0000) knlGS:0000000000000000 [ 376.834971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 376.844179] CR2: 00000000007d9640 CR3: 00000007d3f26003 CR4: 00000000001606e0 [ 376.854843] Call Trace: [ 376.868542] __mlx5_eswitch_del_rule+0x49/0x300 [mlx5_core] [ 376.877735] mlx5e_tc_del_fdb_flow+0x6ec/0x9e0 [mlx5_core] [ 376.921549] mlx5e_flow_put+0x2b/0x50 [mlx5_core] [ 376.929813] mlx5e_delete_flower+0x5b6/0xbd0 [mlx5_core] [ 376.973030] tc_setup_cb_reoffload+0x29/0xc0 [ 376.980619] fl_reoffload+0x50a/0x770 [cls_flower] [ 377.015087] tcf_block_playback_offloads+0xbd/0x250 [ 377.033400] tcf_block_setup+0x1b2/0xc60 [ 377.057247] tcf_block_offload_cmd+0x195/0x240 [ 377.098826] tcf_block_offload_unbind+0xe7/0x180 [ 377.107056] __tcf_block_put+0xe5/0x400 [ 377.114528] ingress_destroy+0x3d/0x60 [sch_ingress] [ 377.122894] qdisc_destroy+0xf1/0x5a0 [ 377.129993] qdisc_graft+0xa3d/0xe50 [ 377.151227] tc_get_qdisc+0x48e/0xa20 [ 377.165167] rtnetlink_rcv_msg+0x35d/0x8d0 [ 377.199528] netlink_rcv_skb+0x11e/0x340 [ 377.219638] netlink_unicast+0x408/0x5b0 [ 377.239913] netlink_sendmsg+0x71b/0xb30 [ 377.267505] sock_sendmsg+0xb1/0xf0 [ 377.273801] ___sys_sendmsg+0x635/0x900 [ 377.312784] __sys_sendmsg+0xd3/0x170 [ 377.338693] do_syscall_64+0x95/0x460 [ 377.344833] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 377.352321] RIP: 0033:0x7f675e58e090 To avoid this, for every mirred action check if output device was already processed. If so - drop rule with EOPNOTSUPP error. Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso, including adding a missing ipv6 match description. 2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi Bhat. 3) Fix uninit value in bond_neigh_init(), from Eric Dumazet. 4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold. 5) Fix use after free in tipc_disc_rcv(), from Tuong Lien. 6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul Chaignon. 7) Multicast MAC limit test is off by one in qede, from Manish Chopra. 8) Fix established socket lookup race when socket goes from TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening RCU grace period. From Eric Dumazet. 9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet. 10) Fix active backup transition after link failure in bonding, from Mahesh Bandewar. 11) Avoid zero sized hash table in gtp driver, from Taehee Yoo. 12) Fix wrong interface passed to ->mac_link_up(), from Russell King. 13) Fix DSA egress flooding settings in b53, from Florian Fainelli. 14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost. 15) Fix double free in dpaa2-ptp code, from Ioana Ciornei. 16) Reject invalid MTU values in stmmac, from Jose Abreu. 17) Fix refcount leak in error path of u32 classifier, from Davide Caratti. 18) Fix regression causing iwlwifi firmware crashes on boot, from Anders Kaseorg. 19) Fix inverted return value logic in llc2 code, from Chan Shu Tak. 20) Disable hardware GRO when XDP is attached to qede, frm Manish Chopra. 21) Since we encode state in the low pointer bits, dst metrics must be at least 4 byte aligned, which is not necessarily true on m68k. Add annotations to fix this, from Geert Uytterhoeven. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits) sfc: Include XDP packet headroom in buffer step size. sfc: fix channel allocation with brute force net: dst: Force 4-byte alignment of dst_metrics selftests: pmtu: fix init mtu value in description hv_netvsc: Fix unwanted rx_table reset net: phy: ensure that phy IDs are correctly typed mod_devicetable: fix PHY module format qede: Disable hardware gro when xdp prog is installed net: ena: fix issues in setting interrupt moderation params in ethtool net: ena: fix default tx interrupt moderation interval net/smc: unregister ib devices in reboot_event net: stmmac: platform: Fix MDIO init for platforms without PHY llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) net: hisilicon: Fix a BUG trigered by wrong bytes_compl net: dsa: ksz: use common define for tag len s390/qeth: don't return -ENOTSUPP to userspace s390/qeth: fix promiscuous mode after reset s390/qeth: handle error due to unsupported transport mode cxgb4: fix refcount init for TC-MQPRIO offload tc-testing: initial tdc selftests for cls_u32 ...
2019-12-19net/mlx5e: Fix concurrency issues between config flow and XSKMaxim Mikityanskiy
After disabling resources necessary for XSK (the XDP program, channels, XSK queues), use synchronize_rcu to wait until the XSK wakeup function finishes, before freeing the resources. Suspend XSK wakeups during switching channels. If the XDP program is being removed, synchronize_rcu before closing the old channels to allow XSK wakeup to complete. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-3-maximmi@mellanox.com
2019-12-09treewide: Use sizeof_field() macroPankaj Bharadiya
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) More jumbo frame fixes in r8169, from Heiner Kallweit. 2) Fix bpf build in minimal configuration, from Alexei Starovoitov. 3) Use after free in slcan driver, from Jouni Hogander. 4) Flower classifier port ranges don't work properly in the HW offload case, from Yoshiki Komachi. 5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin. 6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk. 7) Fix flow dissection in dsa TX path, from Alexander Lobakin. 8) Stale syncookie timestampe fixes from Guillaume Nault. [ Did an evil merge to silence a warning introduced by this pull - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits) r8169: fix rtl_hw_jumbo_disable for RTL8168evl net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add() r8169: add missing RX enabling for WoL on RTL8125 vhost/vsock: accept only packets with the right dst_cid net: phy: dp83867: fix hfs boot in rgmii mode net: ethernet: ti: cpsw: fix extra rx interrupt inet: protect against too small mtu values. gre: refetch erspan header from skb->data after pskb_may_pull() pppoe: remove redundant BUG_ON() check in pppoe_pernet tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() tcp: tighten acceptance of ACKs not matching a child socket tcp: fix rejected syncookies due to stale timestamps lpc_eth: kernel BUG on remove tcp: md5: fix potential overestimation of TCP option space net: sched: allow indirect blocks to bind to clsact in TC net: core: rename indirect block ingress cb function net-sysfs: Call dev_hold always in netdev_queue_add_kobject net: dsa: fix flow dissection on Tx path net/tls: Fix return values to avoid ENOTSUPP net: avoid an indirect call in ____sys_recvmsg() ...
2019-12-05net/mlx5e: E-switch, Fix Ingress ACL groups in switchdev mode for prio tagParav Pandit
In cited commit, when prio tag mode is enabled, FTE creation fails due to missing group with valid match criteria. Hence, (a) create prio tag group metadata_prio_tag_grp when prio tag is enabled with match criteria for vlan push FTE. (b) Rename metadata_grp to metadata_allmatch_grp to reflect its purpose. Also when priority tag is enabled, delete metadata settings after deleting ingress rules, which are using it. Tide up rest of the ingress config code for unnecessary labels. Fixes: 10652f39943e ("net/mlx5: Refactor ingress acl configuration") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: ethtool, Fix analysis of speed settingAya Levin
When setting speed to 100G via ethtool (AN is set to off), only 25G*4 is configured while the user, who has an advanced HW which supports extended PTYS, expects also 50G*2 to be configured. With this patch, when extended PTYS mode is available, configure PTYS via extended fields. Fixes: 4b95840a6ced ("net/mlx5e: Fix matching of speed to PRM link modes") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix translation of link mode into speedAya Levin
Add a missing value in translation of PTYS ext_eth_proto_oper to its corresponding speed. When ext_eth_proto_oper bit 10 is set, ethtool shows unknown speed. With this fix, ethtool shows speed is 100G as expected. Fixes: a08b4ed1373d ("net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix free peer_flow when refcount is 0Roi Dayan
It could be neigh update flow took a refcount on peer flow so sometimes we cannot release peer flow even if parent flow is being freed now. Fixes: 5a7e5bcb663d ("net/mlx5e: Extend tc flow struct with reference counter") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix freeing flow with kfree() and not kvfree()Roi Dayan
Flows are allocated with kzalloc() so free with kfree(). Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix SFF 8472 eeprom lengthEran Ben Elisha
SFF 8472 eeprom length is 512 bytes. Fix module info return value to support 512 bytes read. Fixes: ace329f4ab3b ("net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Query global pause state before setting prio2bufferHuy Nguyen
When the user changes prio2buffer mapping while global pause is enabled, mlx5 driver incorrectly sets all active buffers (buffer that has at least one priority mapped) to lossy. Solution: If global pause is enabled, set all the active buffers to lossless in prio2buffer command. Also, add error message when buffer size is not enough to meet xoff threshold. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix TXQ indices to be sequentialEran Ben Elisha
Cited patch changed (channel index, tc) => (TXQ index) mapping to be a static one, in order to keep indices consistent when changing number of channels or TCs. For 32 channels (OOB) and 8 TCs, real num of TXQs is 256. When reducing the amount of channels to 8, the real num of TXQs will be changed to 64. This indices method is buggy: - Channel #0, TC 3, the TXQ index is 96. - Index 8 is not valid, as there is no such TXQ from driver perspective (As it represents channel #8, TC 0, which is not valid with the above configuration). As part of driver's select queue, it calls netdev_pick_tx which returns an index in the range of real number of TXQs. Depends on the return value, with the examples above, driver could have returned index larger than the real number of tx queues, or crash the kernel as it tries to read invalid address of SQ which was not allocated. Fix that by allocating sequential TXQ indices, and hold a new mapping between (channel index, tc) => (real TXQ index). This mapping will be updated as part of priv channels activation, and is used in mlx5e_select_queue to find the selected queue index. The existing indices mapping (channel_tc2txq) is no longer needed, as it is used only for statistics structures and can be calculated on run time. Delete its definintion and updates. Fixes: 8bfaf07f7806 ("net/mlx5e: Present SW stats when state is not opened") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-04net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookupSabrina Dubroca
ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-30net/mlx5e: Fix build error without IPV6YueHaibing
If IPV6 is not set and CONFIG_MLX5_ESWITCH is y, building fails: drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c:322:5: error: redefinition of mlx5e_tc_tun_create_header_ipv6 int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c:7:0: drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h:67:1: note: previous definition of mlx5e_tc_tun_create_header_ipv6 was here mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use #ifdef to guard this, also move mlx5e_route_lookup_ipv6 to cleanup unused warning. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: e689e998e102 ("net/mlx5e: TC, Stub out ipv6 tun create header function") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Again another fairly quiet cycle with few notable core code changes and the usual variety of driver bug fixes and small improvements. - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr, iw_cxgb4, vmw_pvrdma, mlx5 - Improvements in SRPT from working with iWarp - SRIOV VF support for bnxt_re - Skeleton kernel-doc files for drivers/infiniband - User visible counters for events related to ODP - Common code for tracking of mmap lifetimes so that drivers can link HW object liftime to a VMA - ODP bug fixes and rework - RDMA READ support for efa - Removal of the very old cxgb3 driver" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (168 commits) RDMA/hns: Delete unnecessary callback functions for cq RDMA/hns: Rename the functions used inside creating cq RDMA/hns: Redefine the member of hns_roce_cq struct RDMA/hns: Redefine interfaces used in creating cq RDMA/efa: Expose RDMA read related attributes RDMA/efa: Support remote read access in MR registration RDMA/efa: Store network attributes in device attributes IB/hfi1: remove redundant assignment to variable ret RDMA/bnxt_re: Fix missing le16_to_cpu RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series RDMA/bnxt_re: Fix Kconfig indentation IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset RDMA/cm: Use refcount_t type for refcount variable IB/mlx5: Support extended number of strides for Striding RQ IB/mlx4: Update HW GID table while adding vlan GID ...
2019-11-25Merge branch 'ib-guids' into rdma.git for-nextJason Gunthorpe
Danit Goldberg says: ==================== This series extends RTNETLINK to provide IB port and node GUIDs, which were configured for Infiniband VFs. The functionality to set VF GUIDs already existed for a long time, and here we are adding the missing "get" so that netlink will be symmetric and various cloud orchestration tools will be able to manage such VFs more naturally. The iproute2 was extended too to present those GUIDs. - ip link show <device> For example: - ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33 - ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10 - ip link show ib4 ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'ib-guids': (35 commits) IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs net/mlx5: Add new chain for netfilter flow table offload net/mlx5: Refactor creating fast path prio chains net/mlx5: Accumulate levels for chains prio namespaces net/mlx5: Define fdb tc levels per prio net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines net/mlx5: Simplify fdb chain and prio eswitch defines IB/mlx5: Load profile according to RoCE enablement state IB/mlx5: Rename profile and init methods net/mlx5: Handle "enable_roce" devlink param net/mlx5: Document flow_steering_mode devlink param devlink: Add new "enable_roce" generic device param net/mlx5: fix spelling mistake "metdata" -> "metadata" net/mlx5: fix kvfree of uninitialized pointer spec IB/mlx5: Introduce and use mlx5_core_is_vf() net/mlx5: E-switch, Enable metadata on own vport net/mlx5: Refactor ingress acl configuration ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23Merge tag 'mlx5-updates-2019-11-22' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-11-22 1) Misc Cleanups 2) Software steering support for Geneve ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-23net: use rhashtable_lookup() instead of rhashtable_lookup_fast()Taehee Yoo
rhashtable_lookup_fast() internally calls rcu_read_lock() then, calls rhashtable_lookup(). So if rcu_read_lock() is already held, rhashtable_lookup() is enough. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Minor conflict in drivers/s390/net/qeth_l2_main.c, kept the lock from commit c8183f548902 ("s390/qeth: fix potential deadlock on workqueue flush"), removed the code which was removed by commit 9897d583b015 ("s390/qeth: consolidate some duplicated HW cmd code"). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-22net/mlx5e: Remove redundant pointer checkEli Cohen
When code reaches the "out" label, n is guaranteed to be valid so we can unconditionally call neigh_release. Also change the label to release_neigh to better reflect the fact that we unconditionally free the neighbour and also match other labels convention. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-22net/mlx5e: TC, Stub out ipv6 tun create header functionSaeed Mahameed
Improve mlx5e_route_lookup_ipv6 function structure by avoiding #ifdef then return -EOPNOTSUPP in the middle of the function code. To do so, we stub out mlx5e_tc_tun_create_header_ipv6 which is the only caller of this helper function to avoid calling it altogether when ipv6 is compiled out, which should also cleanup some compiler warnings of unused variables. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-22net/mlx5: DR, Add support for Geneve packets SW steeringYevgeny Kliteynik
Add support for SW steering matching on Geneve header fields: - VNI - OAM - protocol type - options length Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-22net/mlx5: DR, Add HW bits and definitions for Geneve flex parserYevgeny Kliteynik
Add definition for flex parser tunneling header for Geneve. Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-22net/mlx5: DR, Refactor VXLAN GPE flex parser tunnel code for SW steeringYevgeny Kliteynik
Refactor flex parser tunnel code: - Add definition for flex parser tunneling header for VXLAN-GPE - Use macros for VXLAN-GPE SW steering when building STE - Refactor the code to reflect that this is a VXLAN GPE only code and not a general flex parser code. This also significantly simplifies addition of more flex parser protocols, such as Geneve. Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-22net/mlx5: Don't write read-only fields in MODIFY_HCA_VPORT_CONTEXT commandLeon Romanovsky
The MODIFY_HCA_VPORT_CONTEXT uses field_selector to mask fields needed to be written, other fields are required to be zero according to the HW specification. The supported fields are controlled by bitfield and limited to vport state, node and port GUIDs. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-11-20 The following pull-request contains BPF updates for your *net-next* tree. We've added 81 non-merge commits during the last 17 day(s) which contain a total of 120 files changed, 4958 insertions(+), 1081 deletions(-). There are 3 trivial conflicts, resolve it by always taking the chunk from 196e8ca74886c433: <<<<<<< HEAD ======= void *bpf_map_area_mmapable_alloc(u64 size, int numa_node); >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 <<<<<<< HEAD void *bpf_map_area_alloc(u64 size, int numa_node) ======= static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable) >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 <<<<<<< HEAD if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { ======= /* kmalloc()'ed memory can't be mmap()'ed */ if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 The main changes are: 1) Addition of BPF trampoline which works as a bridge between kernel functions, BPF programs and other BPF programs along with two new use cases: i) fentry/fexit BPF programs for tracing with practically zero overhead to call into BPF (as opposed to k[ret]probes) and ii) attachment of the former to networking related programs to see input/output of networking programs (covering xdpdump use case), from Alexei Starovoitov. 2) BPF array map mmap support and use in libbpf for global data maps; also a big batch of libbpf improvements, among others, support for reading bitfields in a relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko. 3) Extend s390x JIT with usage of relative long jumps and loads in order to lift the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich. 4) Add BPF audit support and emit messages upon successful prog load and unload in order to have a timeline of events, from Daniel Borkmann and Jiri Olsa. 5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode (XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson. 6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API call named bpf_get_link_xdp_info() for retrieving the full set of prog IDs attached to XDP, from Toke Høiland-Jørgensen. 7) Add BTF support for array of int, array of struct and multidimensional arrays and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau. 8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo. 9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid xdping to be run as standalone, from Jiri Benc. 10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song. 11) Fix a memory leak in BPF fentry test run data, from Colin Ian King. 12) Various smaller misc cleanups and improvements mostly all over BPF selftests and samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20net/mlx5: Update the list of the PCI supported devicesShani Shapp
Add the upcoming ConnectX-6 LX device ID. Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Shani Shapp <shanish@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5: Fix auto group size calculationMaor Gottlieb
Once all the large flow groups (defined by the user when the flow table is created - max_num_groups) were created, then all the following new flow groups will have only one flow table entry, even though the flow table has place to larger groups. Fix the condition to prefer large flow group. Fixes: f0d22d187473 ("net/mlx5_core: Introduce flow steering autogrouped flow table") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Add missing capability bit check for IP-in-IPMarina Varshaver
Device that doesn't support IP-in-IP offloads has to filter csum and gso offload support, otherwise kernel will conclude that device is capable of offloading csum and gso for IP-in-IP tunnels and that might result in IP-in-IP tunnel not functioning. Fixes: 25948b87dda2 ("net/mlx5e: Support TSO and TX checksum offloads for IP-in-IP") Signed-off-by: Marina Varshaver <marinav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Do not use non-EXT link modes in EXT modeEran Ben Elisha
On some old Firmwares, connector type value was not supported, and value read from FW was 0. For those, driver used link mode in order to set connector type in link_ksetting. After FW exposed the connector type, driver translated the value to ethtool definitions. However, as 0 is a valid value, before returning PORT_OTHER, driver run the check of link mode in order to maintain backward compatibility. Cited patch added support to EXT mode. With both features (connector type and EXT link modes) ,if connector_type read from FW is 0 and EXT mode is set, driver mistakenly compare EXT link modes to non-EXT link mode. Fixed that by skipping this comparison if we are in EXT mode, as connector type value is valid in this scenario. Fixes: 6a897372417e ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Fix set vf link state error flowRoi Dayan
Before this commit the ndo always returned success. Fix that. Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5: DR, Limit STE hash table enlarge based on bytemaskAlex Vesker
When an ste hash table has too many collision we enlarge it to a bigger hash table (rehash). Rehashing collision improvement depends on the bytemask value. The more 1 bits we have in bytemask means better spreading in the table. Without this fix tables can grow in size without providing any improvement which can lead to memory depletion and failures. This patch will limit table rehash to reduce memory and improve the performance. Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality") Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5: DR, Skip rehash for tables with byte mask zeroAlex Vesker
The byte mask fields affect on the hash index distribution, when the byte mask is zero, the hash calculation will always be equal to the same index. To avoid unneeded rehash of hash tables mark the table to skip rehash. This is needed by the next patch which will limit table rehash to reduce memory consumption. Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality") Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5: DR, Fix invalid EQ vector number on CQ creationAlex Vesker
When creating a CQ, the CPU id is used for the vector value. This would fail in-case the CPU id was higher than the maximum vector value. Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Reorder mirrer action parsing to check for encap firstVlad Buslov
Mirred action parsing code in parse_tc_fdb_actions() first checks if out_dev has same parent id, and only verifies that there is a pending encap action that was parsed before. Recent change in vxlan module made function netdev_port_same_parent_id() to return true when called for mlx5 eswitch representor and vxlan device created explicitly on mlx5 representor device (vxlan devices created with "external" flag without explicitly specifying parent interface are not affected). With call to netdev_port_same_parent_id() returning true, incorrect code path is chosen and encap rules fail to offload because vxlan dev is not a valid eswitch forwarding dev. Dmesg log of error: [ 1784.389797] devices ens1f0_0 vxlan1 not on same switch HW, can't offload forwarding In order to fix the issue, rearrange conditional in parse_tc_fdb_actions() to check for pending encap action before checking if out_dev has the same parent id. Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Fix ingress rate configuration for representorsEli Cohen
Current code uses the old method of prio encoding in flow_cls_common_offload. Fix to follow the changes introduced in commit ef01adae0e43 ("net: sched: use major priority number as hardware priority"). Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Fix error flow cleanup in mlx5e_tc_tun_create_header_ipv4/6Eli Cohen
Be sure to release the neighbour in case of failures after successful route lookup. Fixes: 101f4de9dd52 ("net/mlx5e: Move TC tunnel offloading code to separate source file") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20net/mlx5e: Rx, Update page pool numa node when changedSaeed Mahameed
Once every napi poll cycle, check if numa node is different than the page pool's numa id, and update it using page_pool_update_nid(). Alternatively, we could have registered an irq affinity change handler, but page_pool_update_nid() must be called from napi context anyways, so the handler won't actually help. Performance testing: XDP drop/tx rate and TCP single/multi stream, on mlx5 driver while migrating rx ring irq from close to far numa: mlx5 internal page cache was locally disabled to get pure page pool results. CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G) XDP Drop/TX single core: NUMA | XDP | Before | After --------------------------------------- Close | Drop | 11 Mpps | 10.9 Mpps Far | Drop | 4.4 Mpps | 5.8 Mpps Close | TX | 6.5 Mpps | 6.5 Mpps Far | TX | 3.5 Mpps | 4 Mpps Improvement is about 30% drop packet rate, 15% tx packet rate for numa far test. No degradation for numa close tests. TCP single/multi cpu/stream: NUMA | #cpu | Before | After -------------------------------------- Close | 1 | 18 Gbps | 18 Gbps Far | 1 | 15 Gbps | 18 Gbps Close | 12 | 80 Gbps | 80 Gbps Far | 12 | 68 Gbps | 80 Gbps In all test cases we see improvement for the far numa case, and no impact on the close numa case. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-18bpf: Convert bpf_prog refcnt to atomic64_tAndrii Nakryiko
Similarly to bpf_map's refcnt/usercnt, convert bpf_prog's refcnt to atomic64 and remove artificial 32k limit. This allows to make bpf_prog's refcounting non-failing, simplifying logic of users of bpf_prog_add/bpf_prog_inc. Validated compilation by running allyesconfig kernel build. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191117172806.2195367-3-andriin@fb.com
2019-11-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Lots of overlapping changes and parallel additions, stuff like that. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15mlx5: Reject requests to enable time stamping on both edges.Richard Cochran
This driver enables rising edge or falling edge, but not both, and so this patch validates that the request contains only one of the two edges. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15ptp: Introduce strict checking of external time stamp options.Richard Cochran
User space may request time stamps on rising edges, falling edges, or both. However, the particular mode may or may not be supported in the hardware or in the driver. This patch adds a "strict" flag that tells drivers to ensure that the requested mode will be honored. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15mlx5: reject unsupported external timestamp flagsJacob Keller
Fix the mlx5 core PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. [ RC: I'm not 100% sure what this driver does, but if I'm not wrong it follows the dp83640: flags Meaning ---------------------------------------------------- -------------------------- PTP_ENABLE_FEATURE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp falling edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp falling edge ] Cc: Feras Daoud <ferasda@mellanox.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>