summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx4
AgeCommit message (Collapse)Author
2019-11-09devlink: disallow reload operation during device cleanupJiri Pirko
There is a race between driver code that does setup/cleanup of device and devlink reload operation that in some drivers works with the same code. Use after free could we easily obtained by running: while true; do echo "0000:00:10.0" >/sys/bus/pci/drivers/mlxsw_spectrum2/bind devlink dev reload pci/0000:00:10.0 & echo "0000:00:10.0" >/sys/bus/pci/drivers/mlxsw_spectrum2/unbind done Fix this by enabling reload only after setup of device is complete and disabling it at the beginning of the cleanup process. Reported-by: Ido Schimmel <idosch@mellanox.com> Fixes: 2d8dc5bbf4e7 ("devlink: Add support for reload") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05mlx4_core: fix wrong comment about the reason of subtract one from the max_cqesDotan Barak
The reason for the pre-allocation of one CQE is to enable resizing of the CQ. Fix comment accordingly. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29net/mlx4_core: Dynamically set guaranteed amount of counters per VFEran Ben Elisha
Prior to this patch, the amount of counters guaranteed per VF in the resource tracker was MLX4_VF_COUNTERS_PER_PORT * MLX4_MAX_PORTS. It was set regardless if the VF was single or dual port. This caused several VFs to have no guaranteed counters although the system could satisfy their request. The fix is to dynamically guarantee counters, based on each VF specification. Fixes: 9de92c60beaa ("net/mlx4_core: Adjust counter grant policy in the resource tracker") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Minor overlapping changes in the btusb and ixgbe drivers. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13net: devlink: split reload op into twoJiri Pirko
In order to properly implement failure indication during reload, split the reload op into two ops, one for down phase and one for up phase. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13mlx4: Split restart_one into two functionsJiri Pirko
Split the function restart_one into two functions and separate teardown and buildup. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11mlx4: fix spelling mistake "veify" -> "verify"Colin Ian King
There is a spelling mistake in a mlx4_err error message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10net/mlx4_en: ethtool: make array modes static const, makes object smallerColin Ian King
Don't populate the array modes on the stack but instead make it static const. Makes the object code smaller by 303 bytes. Before: text data bss dec hex filename 51240 5008 1312 57560 e0d8 mellanox/mlx4/en_ethtool.o After: text data bss dec hex filename 50937 5008 1312 57257 dfa9 mellanox/mlx4/en_ethtool.o (gcc version 9.2.1, amd64) Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Merge conflict of mlx5 resolved using instructions in merge commit 9566e650bf7fdf58384bb06df634f7531ca3a97e. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-13net/mlx4_en: fix a memory leak bugWenwen Wang
In mlx4_en_config_rss_steer(), 'rss_map->indir_qp' is allocated through kzalloc(). After that, mlx4_qp_alloc() is invoked to configure RSS indirection. However, if mlx4_qp_alloc() fails, the allocated 'rss_map->indir_qp' is not deallocated, leading to a memory leak bug. To fix the above issue, add the 'qp_alloc_err' label to free 'rss_map->indir_qp'. Fixes: 4931c6ef04b4 ("net/mlx4_en: Optimized single ring steering") Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-08-09devlink: remove pointless data_len arg from region snapshot createJiri Pirko
The size of the snapshot has to be the same as the size of the region, therefore no need to pass it again during snapshot creation. Remove the arg and use region->size instead. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-27mlx4/en_netdev: allow offloading VXLAN over VLANDavide Caratti
ConnectX-3 Pro can offload transmission of VLAN packets with VXLAN inside: enable tunnel offloads in dev->vlan_features, like it's done with other NIC drivers (e.g. be2net and ixgbe). It's no more necessary to change dev->hw_enc_features when VXLAN are added or removed, since .ndo_features_check() already checks for VXLAN packet where the UDP destination port matches the configured value. Just set dev->hw_enc_features when the NIC is initialized, so that overlying VLAN can correctly inherit the tunnel offload capabilities. Changes since v1: - avoid flipping hw_enc_features, instead of calling netdev notifiers, thanks to Saeed Mahameed - squash two patches into a single one CC: Paolo Abeni <pabeni@redhat.com> CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-24mlx4: avoid large stack usage in mlx4_init_hca()Arnd Bergmann
The mlx4_dev_cap and mlx4_init_hca_param are really too large to be put on the kernel stack, as shown by this clang warning: drivers/net/ethernet/mellanox/mlx4/main.c:3304:12: error: stack frame size of 1088 bytes in function 'mlx4_load_one' [-Werror,-Wframe-larger-than=] With gcc, the problem is the same, but it does not warn because it does not inline this function, and therefore stays just below the warning limit, while clang is just above it. Use kzalloc for dynamic allocation instead of putting them on stack. This gets the combined stack frame down to 424 bytes. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-22net: Use skb accessors in network driversMatthew Wilcox (Oracle)
In preparation for unifying the skb_frag and bio_vec, use the fine accessors which already exist and use skb_frag_t instead of struct skb_frag_struct. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-15ethernet: remove redundant memsetFuqian Huang
kvzalloc already zeroes the memory during the allocation. pci_alloc_consistent calls dma_alloc_coherent directly. In commit 518a2f1925c3 ("dma-mapping: zero memory returned from dma_alloc_*"), dma_alloc_coherent has already zeroed the memory. So the memset after these function is not needed. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Clear up some recent tipc regressions because of registration ordering. Fix from Junwei Hu. 2) tipc's TLV_SET() can read past the end of the supplied buffer during the copy. From Chris Packham. 3) ptp example program doesn't match the kernel, from Richard Cochran. 4) Outgoing message type fix in qrtr, from Bjorn Andersson. 5) Flow control regression in stmmac, from Tan Tee Min. 6) Fix inband autonegotiation in phylink, from Russell King. 7) Fix sk_bound_dev_if handling in rawv6_bind(), from Mike Manning. 8) Fix usbnet crash after disconnect, from Kloetzke Jan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits) usbnet: fix kernel crash after disconnect selftests: fib_rule_tests: use pre-defined DEV_ADDR net-next: net: Fix typos in ip-sysctl.txt ipv6: Consider sk_bound_dev_if when binding a raw socket to an address net: phylink: ensure inband AN works correctly usbnet: ipheth: fix racing condition net: stmmac: dma channel control register need to be init first net: stmmac: fix ethtool flow control not able to get/set net: qrtr: Fix message type of outgoing packets networking: : fix typos in code comments ptp: Fix example program to match kernel. fddi: fix typos in code comments selftests: fib_rule_tests: enable forwarding before ipv4 from/iif test selftests: fib_rule_tests: fix local IPv4 address typo tipc: Avoid copying bytes beyond the supplied data 2/2] net: xilinx_emaclite: use readx_poll_timeout() in mdio wait function 1/2] net: axienet: use readx_poll_timeout() in mdio wait function vlan: Mark expected switch fall-through macvlan: Mark expected switch fall-through net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query ...
2019-05-21Merge tag 'spdx-5.2-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull SPDX update from Greg KH: "Here is a series of patches that add SPDX tags to different kernel files, based on two different things: - SPDX entries are added to a bunch of files that we missed a year ago that do not have any license information at all. These were either missed because the tool saw the MODULE_LICENSE() tag, or some EXPORT_SYMBOL tags, and got confused and thought the file had a real license, or the files have been added since the last big sweep, or they were Makefile/Kconfig files, which we didn't touch last time. - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan tools can determine the license text in the file itself. Where this happens, the license text is removed, in order to cut down on the 700+ different ways we have in the kernel today, in a quest to get rid of all of these. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers. The reason for these "large" patches is if we were to continue to progress at the current rate of change in the kernel, adding license tags to individual files in different subsystems, we would be finished in about 10 years at the earliest. There will be more series of these types of patches coming over the next few weeks as the tools and reviewers crunch through the more "odd" variants of how to say "GPLv2" that developers have come up with over the years, combined with other fun oddities (GPL + a BSD disclaimer?) that are being unearthed, with the goal for the whole kernel to be cleaned up. These diffstats are not small, 3840 files are touched, over 10k lines removed in just 24 patches" * tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3 ...
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-20net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages queryErez Alfasi
Querying EEPROM high pages data for SFP module is currently not supported by our driver but is still tried, resulting in invalid FW queries. Set the EEPROM ethtool data length to 256 for SFP module to limit the reading for page 0 only and prevent invalid FW queries. Fixes: 7202da8b7f71 ("ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support") Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14net/mlx4_core: Change the error print to info printYunjian Wang
The error print within mlx4_flow_steer_promisc_add() should be a info print. Fixes: 592e49dda812 ('net/mlx4: Implement promiscuous mode with device managed flow-steering') Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support AES128-CCM ciphers in kTLS, from Vakul Garg. 2) Add fib_sync_mem to control the amount of dirty memory we allow to queue up between synchronize RCU calls, from David Ahern. 3) Make flow classifier more lockless, from Vlad Buslov. 4) Add PHY downshift support to aquantia driver, from Heiner Kallweit. 5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces contention on SLAB spinlocks in heavy RPC workloads. 6) Partial GSO offload support in XFRM, from Boris Pismenny. 7) Add fast link down support to ethtool, from Heiner Kallweit. 8) Use siphash for IP ID generator, from Eric Dumazet. 9) Pull nexthops even further out from ipv4/ipv6 routes and FIB entries, from David Ahern. 10) Move skb->xmit_more into a per-cpu variable, from Florian Westphal. 11) Improve eBPF verifier speed and increase maximum program size, from Alexei Starovoitov. 12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit spinlocks. From Neil Brown. 13) Allow tunneling with GUE encap in ipvs, from Jacky Hu. 14) Improve link partner cap detection in generic PHY code, from Heiner Kallweit. 15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan Maguire. 16) Remove SKB list implementation assumptions in SCTP, your's truly. 17) Various cleanups, optimizations, and simplifications in r8169 driver. From Heiner Kallweit. 18) Add memory accounting on TX and RX path of SCTP, from Xin Long. 19) Switch PHY drivers over to use dynamic featue detection, from Heiner Kallweit. 20) Support flow steering without masking in dpaa2-eth, from Ioana Ciocoi. 21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri Pirko. 22) Increase the strict parsing of current and future netlink attributes, also export such policies to userspace. From Johannes Berg. 23) Allow DSA tag drivers to be modular, from Andrew Lunn. 24) Remove legacy DSA probing support, also from Andrew Lunn. 25) Allow ll_temac driver to be used on non-x86 platforms, from Esben Haabendal. 26) Add a generic tracepoint for TX queue timeouts to ease debugging, from Cong Wang. 27) More indirect call optimizations, from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits) cxgb4: Fix error path in cxgb4_init_module net: phy: improve pause mode reporting in phy_print_status dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings net: macb: Change interrupt and napi enable order in open net: ll_temac: Improve error message on error IRQ net/sched: remove block pointer from common offload structure net: ethernet: support of_get_mac_address new ERR_PTR error net: usb: smsc: fix warning reported by kbuild test robot staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check net: dsa: support of_get_mac_address new ERR_PTR error net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats vrf: sit mtu should not be updated when vrf netdev is the link net: dsa: Fix error cleanup path in dsa_init_module l2tp: Fix possible NULL pointer dereference taprio: add null check on sched_nest to avoid potential null pointer dereference net: mvpp2: cls: fix less than zero check on a u32 variable net_sched: sch_fq: handle non connected flows net_sched: sch_fq: do not assume EDT packets are ordered net: hns3: use devm_kcalloc when allocating desc_cb net: hns3: some cleanup for struct hns3_enet_ring ...
2019-04-08drivers: Remove explicit invocations of mmiowb()Will Deacon
mmiowb() is now implied by spin_unlock() on architectures that require it, so there is no reason to call it from driver code. This patch was generated using coccinelle: @mmiowb@ @@ - mmiowb(); and invoked as: $ for d in drivers include/linux/qed sound; do \ spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done NOTE: mmiowb() has only ever guaranteed ordering in conjunction with spin_unlock(). However, pairing each mmiowb() removal in this patch with the corresponding call to spin_unlock() is not at all trivial, so there is a small chance that this change may regress any drivers incorrectly relying on mmiowb() to order MMIO writes between CPUs using lock-free synchronisation. If you've ended up bisecting to this commit, you can reintroduce the mmiowb() calls using wmb() instead, which should restore the old behaviour on all architectures other than some esoteric ia64 systems. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-01drivers: mellanox: use netdev_xmit_more() helperFlorian Westphal
skb->xmit_more hint is now always 0. This switches the mellanox drivers to the netdev_xmit_more() helper. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Boris Pismenny <borisp@mellanox.com> Cc: Ilya Lesokhin <ilyal@mellanox.com> Cc: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-24net: devlink: select NET_DEVLINK from driversJiri Pirko
Some drivers are becoming more dependent on NET_DEVLINK being selected in configuration. With upcoming compat functions, the behavior would be wrong in case devlink was not compiled in. So make the drivers select NET_DEVLINK and rely on the functions being there, not just stubs. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20net: remove 'fallback' argument from dev->ndo_select_queue()Paolo Abeni
After the previous patch, all the callers of ndo_select_queue() provide as a 'fallback' argument netdev_pick_tx. The only exceptions are nested calls to ndo_select_queue(), which pass down the 'fallback' available in the current scope - still netdev_pick_tx. We can drop such argument and replace fallback() invocation with netdev_pick_tx(). This avoids an indirect call per xmit packet in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen) with device drivers implementing such ndo. It also clean the code a bit. Tested with ixgbe and CONFIG_FCOE=m With pktgen using queue xmit: threads vanilla patched (kpps) (kpps) 1 2334 2428 2 4166 4278 4 7895 8100 v1 -> v2: - rebased after helper's name change Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-12net/mlx4_core: Fix qp mtt size calculationJack Morgenstein
Calculation of qp mtt size (in function mlx4_RST2INIT_wrapper) ultimately depends on function roundup_pow_of_two. If the amount of memory required by the QP is less than one page, roundup_pow_of_two is called with argument zero. In this case, the roundup_pow_of_two result is undefined. Calling roundup_pow_of_two with a zero argument resulted in the following stack trace: UBSAN: Undefined behaviour in ./include/linux/log2.h:61:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 4 PID: 26939 Comm: rping Tainted: G OE 4.19.0-rc1 Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 Call Trace: dump_stack+0x9a/0xeb ubsan_epilogue+0x9/0x7c __ubsan_handle_shift_out_of_bounds+0x254/0x29d ? __ubsan_handle_load_invalid_value+0x180/0x180 ? debug_show_all_locks+0x310/0x310 ? sched_clock+0x5/0x10 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x260 ? find_held_lock+0x35/0x1e0 ? mlx4_RST2INIT_QP_wrapper+0xfb1/0x1440 [mlx4_core] mlx4_RST2INIT_QP_wrapper+0xfb1/0x1440 [mlx4_core] Fix this by explicitly testing for zero, and returning one if the argument is zero (assuming that the next higher power of 2 in this case should be one). Fixes: c82e9aa0a8bc ("mlx4_core: resource tracking for HCA resources used by guests") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-12net/mlx4_core: Fix locking in SRIOV mode when switching between events and ↵Jack Morgenstein
polling In procedures mlx4_cmd_use_events() and mlx4_cmd_use_polling(), we need to guarantee that there are no FW commands in progress on the comm channel (for VFs) or wrapped FW commands (on the PF) when SRIOV is active. We do this by also taking the slave_cmd_mutex when SRIOV is active. This is especially important when switching from event to polling, since we free the command-context array during the switch. If there are FW commands in progress (e.g., waiting for a completion event), the completion event handler will access freed memory. Since the decision to use comm_wait or comm_poll is taken before grabbing the event_sem/poll_sem in mlx4_comm_cmd_wait/poll, we must take the slave_cmd_mutex as well (to guarantee that the decision to use events or polling and the call to the appropriate cmd function are atomic). Fixes: a7e1f04905e5 ("net/mlx4_core: Fix deadlock when switching between polling and event fw commands") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-12net/mlx4_core: Fix reset flow when in command polling modeJack Morgenstein
As part of unloading a device, the driver switches from FW command event mode to FW command polling mode. Part of switching over to polling mode is freeing the command context array memory (unfortunately, currently, without NULLing the command context array pointer). The reset flow calls "complete" to complete all outstanding fw commands (if we are in event mode). The check for event vs. polling mode here is to test if the command context array pointer is NULL. If the reset flow is activated after the switch to polling mode, it will attempt (incorrectly) to complete all the commands in the context array -- because the pointer was not NULLed when the driver switched over to polling mode. As a result, we have a use-after-free situation, which results in a kernel crash. For example: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90 PGD 0 Oops: 0000 [#1] SMP Modules linked in: netconsole nfsv3 nfs_acl nfs lockd grace ... CPU: 2 PID: 940 Comm: kworker/2:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 04/28/2016 Workqueue: events hv_eject_device_work [pci_hyperv] task: ffff8d1734ca0fd0 ti: ffff8d17354bc000 task.ti: ffff8d17354bc000 RIP: 0010:[<ffffffff876c4a8e>] [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90 RSP: 0018:ffff8d17354bfa38 EFLAGS: 00010082 RAX: 0000000000000000 RBX: ffff8d17362d42c8 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8d17362d42c8 RBP: ffff8d17354bfa70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000298 R11: ffff8d173610e000 R12: ffff8d17362d42d0 R13: 0000000000000246 R14: 0000000000000000 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8d1802680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000f16d8000 CR4: 00000000001406e0 Call Trace: [<ffffffff876c7adc>] complete+0x3c/0x50 [<ffffffffc04242f0>] mlx4_cmd_wake_completions+0x70/0x90 [mlx4_core] [<ffffffffc041e7b1>] mlx4_enter_error_state+0xe1/0x380 [mlx4_core] [<ffffffffc041fa4b>] mlx4_comm_cmd+0x29b/0x360 [mlx4_core] [<ffffffffc041ff51>] __mlx4_cmd+0x441/0x920 [mlx4_core] [<ffffffff877f62b1>] ? __slab_free+0x81/0x2f0 [<ffffffff87951384>] ? __radix_tree_lookup+0x84/0xf0 [<ffffffffc043a8eb>] mlx4_free_mtt_range+0x5b/0xb0 [mlx4_core] [<ffffffffc043a957>] mlx4_mtt_cleanup+0x17/0x20 [mlx4_core] [<ffffffffc04272c7>] mlx4_free_eq+0xa7/0x1c0 [mlx4_core] [<ffffffffc042803e>] mlx4_cleanup_eq_table+0xde/0x130 [mlx4_core] [<ffffffffc0433e08>] mlx4_unload_one+0x118/0x300 [mlx4_core] [<ffffffffc0434191>] mlx4_remove_one+0x91/0x1f0 [mlx4_core] The fix is to set the command context array pointer to NULL after freeing the array. Fixes: f5aef5aa3506 ("net/mlx4_core: Activate reset flow upon fatal command cases") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04mellanox: Switch to bitmap_zalloc()Andy Shevchenko
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26net: devlink: turn devlink into a built-inJakub Kicinski
Being able to build devlink as a module causes growing pains. First all drivers had to add a meta dependency to make sure they are not built in when devlink is built as a module. Now we are struggling to invoke ethtool compat code reliably. Make devlink code built-in, users can still not build it at all but the dynamically loadable module option is removed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two easily resolvable overlapping change conflicts, one in TCP and one in the eBPF verifier. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18net/mlx4_en: fix spelling mistake: "quiting" -> "quitting"Colin Ian King
There is a spelling mistake in a en_err error message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net/mlx4_en: Force CHECKSUM_NONE for short ethernet framesSaeed Mahameed
When an ethernet frame is padded to meet the minimum ethernet frame size, the padding octets are not covered by the hardware checksum. Fortunately the padding octets are usually zero's, which don't affect checksum. However, it is not guaranteed. For example, switches might choose to make other use of these octets. This repeatedly causes kernel hardware checksum fault. Prior to the cited commit below, skb checksum was forced to be CHECKSUM_NONE when padding is detected. After it, we need to keep skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP headers, it does not worth the effort as the packets are so small that CHECKSUM_COMPLETE has no significant advantage. Future work: when reporting checksum complete is not an option for IP non-TCP/UDP packets, we can actually fallback to report checksum unnecessary, by looking at cqe IPOK bit. Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net/mlx4_en: Force CHECKSUM_NONE for short ethernet framesSaeed Mahameed
When an ethernet frame is padded to meet the minimum ethernet frame size, the padding octets are not covered by the hardware checksum. Fortunately the padding octets are usually zero's, which don't affect checksum. However, it is not guaranteed. For example, switches might choose to make other use of these octets. This repeatedly causes kernel hardware checksum fault. Prior to the cited commit below, skb checksum was forced to be CHECKSUM_NONE when padding is detected. After it, we need to keep skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP headers, it does not worth the effort as the packets are so small that CHECKSUM_COMPLETE has no significant advantage. Future work: when reporting checksum complete is not an option for IP non-TCP/UDP packets, we can actually fallback to report checksum unnecessary, by looking at cqe IPOK bit. Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08devlink: publish params only after driver init is doneJiri Pirko
Currently, user can do dump or get of param values right after the devlink params are registered. However the driver may not be initialized which is an issue. The same problem happens during notification upon param registration. Allow driver to publish devlink params whenever it is ready to handle get() ops. Note that this cannot be resolved by init reordering, as the "driverinit" params have to be available before the driver is initialized (it needs the param values there). Signed-off-by: Jiri Pirko <jiri@mellanox.com> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2019-01-24net/mlx4_core: A write memory barrier is sufficient in EQ ci updateTariq Toukan
Soften the memory barrier call of mb() by a sufficient wmb() in the consumer index update of the event queues. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24net/mlx4_core: Fix error handling when initializing CQ bufs in the driverJack Morgenstein
Procedure mlx4_init_user_cqes() handles returns by copy_to_user incorrectly. copy_to_user() returns the number of bytes not copied. Thus, a non-zero return should be treated as a -EFAULT error (as is done elsewhere in the kernel). However, mlx4_init_user_cqes() error handling simply returns the number of bytes not copied (instead of -EFAULT). Note, though, that this is a harmless bug: procedure mlx4_alloc_cq() (which is the only caller of mlx4_init_user_cqes()) treats any non-zero return as an error, but that returned error value is processed internally, and not passed further up the call stack. In addition, fixes the following sparse warning: warning: incorrect type in argument 1 (different address spaces) expected void [noderef] <asn:1>*to got void *buf Fixes: e45678973dcb ("{net, IB}/mlx4: Initialize CQ buffers in the driver when possible") Reported by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24net/mlx4_core: Add masking for a few queries on HCA capsAya Levin
Driver reads the query HCA capabilities without the corresponding masks. Without the correct masks, the base addresses of the queues are unaligned. In addition some reserved bits were wrongly read. Using the correct masks, ensures alignment of the base addresses and allows future firmware versions safe use of the reserved bits. Fixes: ab9c17a009ee ("mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet") Fixes: 0ff1fb654bec ("{NET, IB}/mlx4: Add device managed flow steering firmware API") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-23net/mlx4: Mark expected switch fall-throughGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warning: drivers/net/ethernet/mellanox/mlx4/eq.c: In function ‘mlx4_eq_int’: drivers/net/ethernet/mellanox/mlx4/mlx4.h:219:5: warning: this statement may fall through [-Wimplicit-fallthrough=] if (mlx4_debug_level) \ ^ drivers/net/ethernet/mellanox/mlx4/eq.c:558:4: note: in expansion of macro ‘mlx4_dbg’ mlx4_dbg(dev, "%s: MLX4_EVENT_TYPE_SRQ_LIMIT. srq_no=0x%x, eq 0x%x\n", ^~~~~~~~ drivers/net/ethernet/mellanox/mlx4/eq.c:561:3: note: here case MLX4_EVENT_TYPE_SRQ_CATAS_ERROR: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17net/mlx4: remove unneeded semicolonYueHaibing
Remove unneeded semicolon. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix regression in multi-SKB responses to RTM_GETADDR, from Arthur Gautier. 2) Fix ipv6 frag parsing in openvswitch, from Yi-Hung Wei. 3) Unbounded recursion in ipv4 and ipv6 GUE tunnels, from Stefano Brivio. 4) Use after free in hns driver, from Yonglong Liu. 5) icmp6_send() needs to handle the case of NULL skb, from Eric Dumazet. 6) Missing rcu read lock in __inet6_bind() when operating on mapped addresses, from David Ahern. 7) Memory leak in tipc-nl_compat_publ_dump(), from Gustavo A. R. Silva. 8) Fix PHY vs r8169 module loading ordering issues, from Heiner Kallweit. 9) Fix bridge vlan memory leak, from Ido Schimmel. 10) Dev refcount leak in AF_PACKET, from Jason Gunthorpe. 11) Infoleak in ipv6_local_error(), flow label isn't completely initialized. From Eric Dumazet. 12) Handle mv88e6390 errata, from Andrew Lunn. 13) Making vhost/vsock CID hashing consistent, from Zha Bin. 14) Fix lack of UMH cleanup when it unexpectedly exits, from Taehee Yoo. 15) Bridge forwarding must clear skb->tstamp, from Paolo Abeni. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits) bnxt_en: Fix context memory allocation. bnxt_en: Fix ring checking logic on 57500 chips. mISDN: hfcsusb: Use struct_size() in kzalloc() net: clear skb->tstamp in bridge forwarding path net: bpfilter: disallow to remove bpfilter module while being used net: bpfilter: restart bpfilter_umh when error occurred net: bpfilter: use cleanup callback to release umh_info umh: add exit routine for UMH process isdn: i4l: isdn_tty: Fix some concurrency double-free bugs vhost/vsock: fix vhost vsock cid hashing inconsistent net: stmmac: Prevent RX starvation in stmmac_napi_poll() net: stmmac: Fix the logic of checking if RX Watchdog must be enabled net: stmmac: Check if CBS is supported before configuring net: stmmac: dwxgmac2: Only clear interrupts that are active net: stmmac: Fix PCI module removal leak tools/bpf: fix bpftool map dump with bitfields tools/bpf: test btf bitfield with >=256 struct member offset bpf: fix bpffs bitfield pretty print net: ethernet: mediatek: fix warning in phy_start_aneg tcp: change txhash on SYN-data timeout ...
2019-01-08cross-tree: phase out dma_zalloc_coherent()Luis Chamberlain
We already need to zero out memory for dma_alloc_coherent(), as such using dma_zalloc_coherent() is superflous. Phase it out. This change was generated with the following Coccinelle SmPL patch: @ replace_dma_zalloc_coherent @ expression dev, size, data, handle, flags; @@ -dma_zalloc_coherent(dev, size, handle, flags) +dma_alloc_coherent(dev, size, handle, flags) Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: re-ran the script on the latest tree] Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-07net/mlx4: replace pci_{,un}map_sg with dma_{,un}map_sgStephen Warren
pci_{,un}map_sg are deprecated and replaced by dma_{,un}map_sg. This is especially relevant since the rest of the driver uses the DMA API. Fix the driver to use the replacement APIs. Signed-off-by: Stephen Warren <swarren@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-07net/mlx4: Get rid of page operation after dma_alloc_coherentStephen Warren
This patch solves a crash at the time of mlx4 driver unload or system shutdown. The crash occurs because dma_alloc_coherent() returns one value in mlx4_alloc_icm_coherent(), but a different value is passed to dma_free_coherent() in mlx4_free_icm_coherent(). In turn this is because when allocated, that pointer is passed to sg_set_buf() to record it, then when freed it is re-calculated by calling lowmem_page_address(sg_page()) which returns a different value. Solve this by recording the value that dma_alloc_coherent() returns, and passing this to dma_free_coherent(). This patch is roughly equivalent to commit 378efe798ecf ("RDMA/hns: Get rid of page operation after dma_alloc_coherent"). Based-on-code-from: Christoph Hellwig <hch@lst.de> Signed-off-by: Stephen Warren <swarren@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24net/mlx4_core: drop useless LIST_HEADJulia Lawall
Drop LIST_HEAD where the variable it declares has never been used. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: c82e9aa0a8bc ("mlx4_core: resource tracking for HCA resources used by guests") Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17net/mlx4_en: remove fallback after kzalloc_node()Eric Dumazet
kzalloc_node(..., GFP_KERNEL, node) will attempt to allocate memory as close as possible to the node. There is no need to fallback to kzalloc() if this has failed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several conflicts, seemingly all over the place. I used Stephen Rothwell's sample resolutions for many of these, if not just to double check my own work, so definitely the credit largely goes to him. The NFP conflict consisted of a bug fix (moving operations past the rhashtable operation) while chaning the initial argument in the function call in the moved code. The net/dsa/master.c conflict had to do with a bug fix intermixing of making dsa_master_set_mtu() static with the fixing of the tagging attribute location. cls_flower had a conflict because the dup reject fix from Or overlapped with the addition of port range classifiction. __set_phy_supported()'s conflict was relatively easy to resolve because Andrew fixed it in both trees, so it was just a matter of taking the net-next copy. Or at least I think it was :-) Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup() intermixed with changes on how the sdif and caller_net are calculated in these code paths in net-next. The remaining BPF conflicts were largely about the addition of the __bpf_md_ptr stuff in 'net' overlapping with adjustments and additions to the relevant data structure where the MD pointer macros are used. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-08net/mlx4_core: Correctly set PFC param if global pause is turned off.Tarick Bedeir
rx_ppp and tx_ppp can be set between 0 and 255, so don't clamp to 1. Fixes: 6e8814ceb7e8 ("net/mlx4_en: Fix mixed PFC and Global pause user control requests") Signed-off-by: Tarick Bedeir <tarick@google.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net/mlx4_core: Fix several coding style errorsErez Alfasi
Fix 3 checkpatch errors within mlx4/main.c: - Unnecessary mlx4_debug_level global variable initialization to 0. - Prohibited space before comma. - Whitespaces instead of tab. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>