summaryrefslogtreecommitdiff
path: root/include/linux/mlx5/driver.h
AgeCommit message (Collapse)Author
2018-02-05mlx5: fix mlx5_get_vector_affinity to start from completion vector 0Sagi Grimberg
The consumers of this routine expects the affinity map of of vector index relative to the first completion vector. The upper layers are not aware of internal/private completion vectors that mlx5 allocates for its own usage. Hence, return the affinity map of vector index relative to the first completion vector. Fixes: 05e0cc84e00c ("net/mlx5: Fix get vector affinity helper function") Reported-by: Logan Gunthorpe <logang@deltatee.com> Tested-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Cc: <stable@vger.kernel.org> # v4.15 Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-30Merge tag v4.15 of ↵Jason Gunthorpe
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git To resolve conflicts in: drivers/infiniband/hw/mlx5/main.c drivers/infiniband/hw/mlx5/qp.c From patches merged into the -rc cycle. The conflict resolution matches what linux-next has been carrying. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-18net/mlx5e: Add clock info page to mlx5 core devicesFeras Daoud
Adds a new page to mlx5 core containing clock info data that allows user level applications to translate between cqe timestamp to nanoseconds. The information stored into this page is represented through mlx5_ib_clock_info. In order to synchronize between kernel and user space a sequence number is incremented at the beginning and end of each update. An odd number means the data is being updated while an even means the access was already done. To guarantee that the data structure was accessed atomically user will: repeat: seq1 = <read sequence> goto <repeate> while odd <read data structure> seq2 = <read sequence> if seq1 != seq2 goto repeat Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Eitan Rabin <rabin@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-12net/mlx5: Fix get vector affinity helper functionSaeed Mahameed
mlx5_get_vector_affinity used to call pci_irq_get_affinity and after reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY API, calling pci_irq_get_affinity becomes useless and it breaks RDMA mlx5 users. To fix this, this patch provides an alternative way to retrieve IRQ vector affinity using legacy IRQ API, following smp_affinity read procfs implementation. Fixes: 231243c82793 ("Revert mlx5: move affinity hints assignments to generic code") Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code") Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-08{net, IB}/mlx5: Change set_roce_gid to take a port numberDaniel Jurgens
When in dual port mode setting a RoCE GID for any port flows through the master ports mlx5_core_dev. Provide an interface to set the port when sending this command. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08{net, IB}/mlx5: Manage port association for multiport RoCEDaniel Jurgens
When mlx5_ib_add is called determine if the mlx5 core device being added is capable of dual port RoCE operation. If it is, determine whether it is a master device or a slave device using the num_vhca_ports and affiliate_nic_vport_criteria capabilities. If the device is a slave, attempt to find a master device to affiliate it with. Devices that can be affiliated will share a system image guid. If none are found place it on a list of unaffiliated ports. If a master is found bind the port to it by configuring the port affiliation in the NIC vport context. Similarly when mlx5_ib_remove is called determine the port type. If it's a slave port, unaffiliate it from the master device, otherwise just remove it from the unaffiliated port list. The IB device is registered as a multiport device, even if a 2nd port is not available for affiliation. When the 2nd port is affiliated later the GID cache must be refreshed in order to get the default GIDs for the 2nd port in the cache. Export roce_rescan_device to provide a mechanism to refresh the cache after a new port is bound. In a multiport configuration all IB object (QP, MR, PD, etc) related commands should flow through the master mlx5_core_dev, other commands must be sent to the slave port mlx5_core_mdev, an interface is provide to get the correct mdev for non IB object commands. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08IB/mlx5: Make netdev notifications multiport capableDaniel Jurgens
When multiple RoCE ports are supported registration for events on multiple netdevs is required. Refactor the event registration and handling to support multiple ports. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08net/mlx5: Fix race for multiple RoCE enableDaniel Jurgens
There are two potential problems with the existing implementation. 1. Enable and disable can race after the atomic operations. 2. If a command fails the refcount is left in an inconsistent state. Introduce a lock and perform error checking. Fixes: a6f7d2aff623 ("net/mlx5: Add support for multiple RoCE enable") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08net/mlx5: Add DCT command interfaceMoni Shoua
Add a missing command interface to work with a DCT. It includes: creating, destroying and get events for. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "This is the next batch of for-rc patches from RDMA. It includes the fix for the ipoib regression I mentioned last time, and the result of a fairly major debugging effort to get iser working reliably on cxgb4 hardware - it turns out the cxgb4 driver was not handling QP error flushing properly causing iser to fail. - cxgb4 fix for an iser testing failure as debugged by Steve and Sagi. The problem was a driver bug in the handling of shutting down a QP. - Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on error unwind and a use after free bug - Improper congestion counter values on mlx5 when link aggregation is enabled - ipoib lockdep regression introduced in this merge window - hfi1 regression supporting the device in a VM introduced in a recent patch - Typo that breaks future uAPI compatibility in the verbs core - More SELinux related oops fixing - Fix an oops during error unwind in mlx5" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/mlx5: Fix mlx5_ib_alloc_mr error flow IB/core: Verify that QP is security enabled in create and destroy IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp() IB/mlx5: Serialize access to the VMA list IB/hfi: Only read capability registers if the capability exists IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush IB/mlx5: Fix congestion counters in LAG mode RDMA/vmw_pvrdma: Avoid use after free due to QP/CQ/SRQ destroy RDMA/vmw_pvrdma: Use refcount_dec_and_test to avoid warning RDMA/vmw_pvrdma: Call ib_umem_release on destroy QP path iw_cxgb4: when flushing, complete all wrs in a chain iw_cxgb4: reflect the original WR opcode in drain cqes iw_cxgb4: Only validate the MSN for successful completions
2017-12-28IB/mlx5: Extend UAR stuff to support dynamic allocationYishai Hadas
This patch extends the alloc context flow to be prepared for working with dynamic UAR allocations. Currently upon alloc context there is some fix size of UARs that are allocated (named 'static allocation') and there is no option to user application to ask for more or control which UAR will be used by which QP. In this patch the driver prepares its data structures to manage both the static and the dynamic allocations and let the user driver knows about the max value of dynamic blue-flame registers that are allowed. Downstream patches from this series will enable the dynamic allocation and the association as part of QP creation. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-21IB/mlx5: Fix congestion counters in LAG modeMajd Dibbiny
Congestion counters are counted and queried per physical function. When working in LAG mode, CNP packets can be sent or received on both of the functions, thus congestion counters should be aggregated from the two physical functions. Fixes: e1f24a79f424 ("IB/mlx5: Support congestion related counters") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-19net/mlx5: Cleanup IRQs in case of unload failureMoshe Shemesh
When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error. In such failure flow the function will return without releasing all EQs irqs and then pci_free_irq_vectors will fail. Fix by only warn on destroy EQ failure and continue to release other EQs and their irqs. It fixes the following kernel trace: kernel: kernel BUG at drivers/pci/msi.c:352! ... ... kernel: Call Trace: kernel: pci_disable_msix+0xd3/0x100 kernel: pci_free_irq_vectors+0xe/0x20 kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core] Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19Revert "mlx5: move affinity hints assignments to generic code"Saeed Mahameed
Before the offending commit, mlx5 core did the IRQ affinity itself, and it seems that the new generic code have some drawbacks and one of them is the lack for user ability to modify irq affinity after the initial affinity values got assigned. The issue is still being discussed and a solution in the new generic code is required, until then we need to revert this patch. This fixes the following issue: echo <new affinity> > /proc/irq/<x>/smp_affinity fails with -EIO This reverts commit a435393acafbf0ecff4deb3e3cb554b34f0d0664. Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since it is used in mlx5_ib driver. Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code") Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jes Sorensen <jsorensen@fb.com> Reported-by: Jes Sorensen <jsorensen@fb.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5: QPTS and QPDPM register firmware command supportHuy Nguyen
The QPTS register allows changing the priority trust state between pcp and dscp. Add support to get/set trust state from device. When the port is in pcp/dscp trust state, packet is routed by hardware to matching priority based on its pcp/dscp value respectively. The QPDPM register allow channing the dscp to priority mapping. Add support to get/set dscp to priority mapping from device. Note that to change a dscp mapping, the "e" bit of this dscp structure must be set in the QPDPM firmware command. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5: QCAM register firmware command supportHuy Nguyen
The QCAM register provides capability bit for all the QoS registers using ACCESS_REG command. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-14net/mlx5: PTP code migration to driver core sectionFeras Daoud
PTP code is moved to core section of mlx5 driver in order to share it between ethernet and infiniband. This movement involves the following changes: - Change mlx5e_ prefix to be mlx5_ - Add clock structs to Core - Add clock object to mlx5_core_dev - Call Init/Uninit clock from core init/cleanup - Rename mlx5e_tstamp to be mlx5_clock Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Eitan Rabin <rabin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28net/mlx5: Fix FPGA capability locationInbar Karmy
Currently, FPGA capability is located in (mdev)->caps.hca_cur, change the location to be (mdev)->caps.fpga, since hca_cur is reserved for HCA device capabilities. Fixes: e29341fb3a5b ("net/mlx5: FPGA, Add basic support for Innova") Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support ipv6 checksum offload in sunvnet driver, from Shannon Nelson. 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric Dumazet. 3) Allow generic XDP to work on virtual devices, from John Fastabend. 4) Add bpf device maps and XDP_REDIRECT, which can be used to build arbitrary switching frameworks using XDP. From John Fastabend. 5) Remove UFO offloads from the tree, gave us little other than bugs. 6) Remove the IPSEC flow cache, from Florian Westphal. 7) Support ipv6 route offload in mlxsw driver. 8) Support VF representors in bnxt_en, from Sathya Perla. 9) Add support for forward error correction modes to ethtool, from Vidya Sagar Ravipati. 10) Add time filter for packet scheduler action dumping, from Jamal Hadi Salim. 11) Extend the zerocopy sendmsg() used by virtio and tap to regular sockets via MSG_ZEROCOPY. From Willem de Bruijn. 12) Significantly rework value tracking in the BPF verifier, from Edward Cree. 13) Add new jump instructions to eBPF, from Daniel Borkmann. 14) Rework rtnetlink plumbing so that operations can be run without taking the RTNL semaphore. From Florian Westphal. 15) Support XDP in tap driver, from Jason Wang. 16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal. 17) Add Huawei hinic ethernet driver. 18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan Delalande. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits) i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq i40e: avoid NVM acquire deadlock during NVM update drivers: net: xgene: Remove return statement from void function drivers: net: xgene: Configure tx/rx delay for ACPI drivers: net: xgene: Read tx/rx delay for ACPI rocker: fix kcalloc parameter order rds: Fix non-atomic operation on shared flag variable net: sched: don't use GFP_KERNEL under spin lock vhost_net: correctly check tx avail during rx busy polling net: mdio-mux: add mdio_mux parameter to mdio_mux_init() rxrpc: Make service connection lookup always check for retry net: stmmac: Delete dead code for MDIO registration gianfar: Fix Tx flow control deactivation cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 cxgb4: Fix pause frame count in t4_get_port_stats cxgb4: fix memory leak tun: rename generic_xdp to skb_xdp tun: reserve extra headroom only when XDP is set net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping net: dsa: bcm_sf2: Advertise number of egress queues ...
2017-09-03Merge tag 'for-linus-ioctl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is a big pull request. Of note is that I'm sending you the new ioctl API for the rdma subsystem. We put it up on linux-api@, but didn't get much response. The API is complex, but it solves two different problems in one go: 1) The bi-directional nature of the RDMA file write calls, which created the security hole we had to handle (and for which the fix is now causing problems for systems in production, we were a bit over zealous in the fix and the ability to open a device, then fork, then create new queue pairs on the device and use them is broken). 2) The bloat caused by different vendors implementing extensions to the base verbs API. Each vendor's hardware is slightly different, and the hardware might be suitable for one extension but not another. By the time we add generic extensions for all the different ways that the different hardware can offload things, the API becomes bloated. Things like our completion structs have started to exceed a cache line in size because of all the elements needed to support this. That in turn shows up heavily in the performance graphs with a noticable drop in performance on 100Gigabit links as our completion structs go from occupying one cache line to 1+. This API makes things like the completion structs modular in a very similar way to netlink so that your structs can only include the items needed for the offloads/features you are actually using on a given queue pair. In that way we support everything, but only use what we need, and our structs stay smaller. The ioctl API is better explained by the posting on linux-api@ than I can explain it here, so I'll just leave it at that. The rest of the pull request is typical stuff. Updates for 4.14 kernel merge window - Lots of hfi1 driver updates (mixed with a few qib and core updates as well) - rxe updates - various mlx updates - Set default roce type to RoCEv2 - Several larger fixes for bnxt_re that were too big for -rc - Several larger fixes for qedr that, likewise, were too big for -rc - Misc core changes - Make the hns_roce driver compilable on arches other than aarch64 so we can more easily debug build issues related to it - Add rdma-netlink infrastructure updates - Add automatic IRQ affinity infrastructure - Add 32bit lid support - Lots of misc fixes across the subsystem from random people - Autoloading of RDMA netlink modules - PCI pool cleanups from Romain Perier - mlx5 driver feature additions and fixes - Hardware tag matchine feature - Fix sleeping in atomic when resolving roce ah - Add experimental ioctl interface as posted to linux-api@" * tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits) IB/core: Expose ioctl interface through experimental Kconfig IB/core: Assign root to all drivers IB/core: Add completion queue (cq) object actions IB/core: Add legacy driver's user-data IB/core: Export ioctl enum types to user-space IB/core: Explicitly destroy an object while keeping uobject IB/core: Add macros for declaring methods and attributes IB/core: Add uverbs merge trees functionality IB/core: Add DEVICE object and root tree structure IB/core: Declare an object instead of declaring only type attributes IB/core: Add new ioctl interface RDMA/vmw_pvrdma: Fix a signedness RDMA/vmw_pvrdma: Report network header type in WC IB/core: Add might_sleep() annotation to ib_init_ah_from_wc() IB/cm: Fix sleeping in atomic when RoCE is used IB/core: Add support to finalize objects in one transaction IB/core: Add a generic way to execute an operation on a uobject Documentation: Hardware tag matching IB/mlx5: Support IB_SRQT_TM net/mlx5: Add XRQ support ...
2017-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30net/mlx5: Remove the flag MLX5_INTERFACE_STATE_SHUTDOWNHuy Nguyen
MLX5_INTERFACE_STATE_SHUTDOWN is not used in the code. Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5: Skip mlx5_unload_one if mlx5_load_one failsHuy Nguyen
There is an issue where the firmware fails during mlx5_load_one, the health_care timer detects the issue and schedules a health_care call. Then the mlx5_load_one detects the issue, cleans up and quits. Then the health_care starts and calls mlx5_unload_one to clean up the resources that no longer exist and causes kernel panic. The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set after mlx5_load_one fails. The solution is removing the bit MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN is redundant and we can use MLX5_INTERFACE_STATE_UP instead. Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-29net/mlx5: Add XRQ supportArtemy Kovalyov
Add support to new XRQ(eXtended shared Receive Queue) hardware object. It supports SRQ semantics with addition of extended receive buffers topologies and offloads. Currently supports tag matching topology and rendezvouz offload. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24IB/mlx5: Enable UMR for MRs created with reg_createIlya Lesokhin
This patch is the first step in decoupling UMR usage and allocation from the MR cache. The only functional change in this patch is to enables UMR for MRs created with reg_create. This change fixes a bug where ODP memory regions that were not allocated from the MR cache did not have UMR enabled. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24net/mlx5: Remove a leftover unused variableGal Pressman
mlx5_core_wq is no longer being used and should be removed from the code. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-22mlx5: Replace PCI pool old APIRomain Perier
The PCI pool API is deprecated. This commit replaces the PCI pool old API by the appropriate function with the DMA pool API. Signed-off-by: Romain Perier <romain.perier@collabora.com> Reviewed-by: Peter Senna Tschudin <peter.senna@collabora.com> Acked-by: Doug Ledford <dledford@redhat.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10Merge branches '32bit_lid' and 'irq_affinity' into k.o/merge-testDoug Ledford
Conflicts: drivers/infiniband/hw/mlx5/main.c - Both add new code include/rdma/ib_verbs.h - Both add new code Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08mlx5: move affinity hints assignments to generic codeSagi Grimberg
generic api takes care of spreading affinity similar to what mlx5 open coded (and even handles better asymmetric configurations). Ask the generic API to spread affinity for us, and feed him pre_vectors that do not participate in affinity settings (which is an improvement to what we had before). The affinity assignments should match what mlx5 tried to do earlier but now we do not set affinity to async, cmd and pages dedicated vectors. Also, remove mlx5e_get_cpu and introduce mlx5e_get_node (used for allocation purposes) and mlx5_get_vector_affinity (for indirection table construction) as they provide the needed information. Luckily, we have generic helpers to get cpumask and node given a irq vector. mlx5_get_vector_affinity will be used by mlx5_ib in a subsequent patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08mlx5: convert to generic pci_alloc_irq_vectorsSagi Grimberg
Now that we have a generic code to allocate an array of irq vectors and even correctly spread their affinity, correctly handle cpu hotplug events and more, were much better off using it. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-07net/mlx5: Delay events till ib registration endsErez Shitrit
When mlx5_ib registers itself to mlx5_core as an interface, it will call mlx5_add_device which will call mlx5_ib interface add callback, in case the latter successfully returns, only then mlx5_core will add it to the interface list and async events will be forwarded to mlx5_ib. Between mlx5_ib interface add callback and mlx5_core adding the mlx5_ib interface to its devices list, arriving mlx5_core events can be missed by the new mlx5_ib registering interface. In other words: thread 1: mlx5_ib: mlx5_register_interface(dev) thread 1: mlx5_core: mlx5_add_device(dev) thread 1: mlx5_core: ctx = dev->add => (mlx5_ib)->mlx5_ib_add thread 2: mlx5_core_event: **new event arrives, forward to dev_list thread 1: mlx5_core: add_ctx_to_dev_list(ctx) /* previous event was missed by the new interface.*/ It is ok to miss events before dev->add (mlx5_ib)->mlx5_ib_add_device but not after. We fix this race by accumulating the events that come between the ib_register_device (inside mlx5_add_device->(dev->add)) till the adding to the list completes and fire them to the new registering interface after that. Fixes: f1ee87fe55c8 ("net/mlx5: Organize device list API in one place") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-07net/mlx5: Separate between E-Switch and MPFSSaeed Mahameed
Multi-Physical Function Switch (MPFs) is required for when multi-PF configuration is enabled to allow passing user configured unicast MAC addresses to the requesting PF. Before this patch eswitch.c used to manage the HW MPFS l2 table, E-Switch always (regardless of sriov) enabled vport(0) (NIC PF) vport's contexts update on unicast mac address list changes, to populate the PF's MPFS L2 table accordingly. In downstream patch we would like to allow compiling the driver without E-Switch functionalities, for that we move MPFS l2 table logic out of eswitch.c into its own file, and provide Kconfig flag (MLX5_MPFS) to allow compiling out MPFS for those who don't want Multi-PF support. NIC PF netdevice will now directly update MPFS l2 table via the new MPFS API. VF netdevice has no access to MPFS L2 table, so E-Switch will remain responsible of updating its MPFS l2 table on behalf of its VFs. Due to this change we also don't require enabling vport(0) (PF vport) unicast mac changes events anymore, for when SRIOV is not enabled. Which means E-Switch is now activated only on SRIOV activation, and not required otherwise. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Jes Sorensen <jsorensen@fb.com> Cc: kernel-team@fb.com
2017-07-24net/mlx5: Introduce general notification eventMaor Gottlieb
When delay drop timeout is expired, the firmware raises general notification event of DELAY_DROP_TIMEOUT subtype. In addition the feature is disable so the driver have to reactivate the timeout. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24IB/mlx5: Restore IB guid/policy for virtual functionsBodong Wang
When a user sets port_guid, node_guid or policy of an IB virtual function, save this information in "struct mlx5_vf_context". This information will be restored later when pci_resume is called. To make sure this works, one can use aer-inject to generate PCI errors on mlx5 devices and verify if relevant fields are restored after PCI resume. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-03Merge https://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Some overlapping changes in the mlx5 driver. A merge conflict resolution posted by Stephen Rothwell was used as a guide. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-27net/mlx5: FPGA, Add SBU infrastructureIlan Tayari
Add interface to initialize and interact with Innova FPGA SBU connections. A client driver may use these functions to set up a high-speed DMA connection with its SBU hardware logic, and send/receive messages over this connection. A later patch in this patchset will make use of these functions for Innova IPSec offload in mlx5 Ethernet driver. Add commands to retrieve Innova FPGA SBU capabilities, and to read/write Innova FPGA configuration space registers and memory, over internal I2C. At high level, the FPGA configuration space is divided such: 0x00000000 - 0x007fffff is reserved for the SBU 0x00800000 - 0xffffffff is reserved for the Shell 0x400000000 - ... is DDR memory A later patchset will add support for accessing FPGA CrSpace and memory over a high-speed connection. This is the reason for the ACCESS_TYPE enumeration, which currently only supports I2C. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27net/mlx5: Add support for multiple RoCE enableIlan Tayari
Previously, only mlx5_ib enabled RoCE on the port, but FPGA needs it as well. Add support for counting number of enables, so that FPGA and IB can work in parallel and independently. Program the HW to enable RoCE on the first enable call, and program to disable RoCE on the last disable call. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27net/mlx5: Add reserved-gids supportIlan Tayari
Reserved GIDs are entries in the GID table in use by the mlx5_core and its submodules (e.g. FPGA, SRIOV, E-Swtich, netdev). The entries are reserved at the high indexes of the GID table. A mlx5 submodule may reserve a certain amount of GIDs for its own use during the load sequence by calling mlx5_core_reserve_gids, and must also take care to un-reserve these GIDs when it closes. Reservation is only allowed during the load sequence and before any interfaces (e.g. mlx5_ib or mlx5_en) are up. After reservation, a submodule may call mlx5_core_reserved_gid_alloc/ free to allocate entries from the reserved GIDs pool. Reserve a GID table entry for every supported FPGA QP. A later patch in the patchset will remove them from being reported to IB core. Another such patch will make use of these for FPGA QPs in Innova NIC. Added lib/mlx5.h to serve as a library for mlx5 submodlues, and to expose only public mlx5 API, more mlx5 library files will be added in future submissions. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27net/mlx5: Cancel delayed recovery work when unloading the driverMohamad Haj Yahia
Draining the health workqueue will ignore future health works including the one that report hardware failure and thus we can't enter error state Instead cancel the recovery flow and make sure only recovery flow won't be scheduled. Fixes: 5e44fca50470 ('net/mlx5: Only cancel recovery work when cleaning up device') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22net/mlx5: Add MCC (Management Component Control) register definitionsOr Gerlitz
MCC (Management Component Control) allows to control a firmware component update. MCDA (Management Component Data Access) allows to read and write a firmware component. MCQI (Management Component Query Information) allows to query information about firmware components. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16net/mlx5: Expose command polling interfaceMajd Dibbiny
Add a new interface for commands execution that allows the caller to wait for the command's completion in a busy-wait loop (polling mode). This is useful if we want to execute a command in a polling mode while the driver is working in events mode for the rest of the commands. This interface will be used in the downstream patches. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Overlapping changes in drivers/net/phy/marvell.c, bug fix in 'net' restricting a HW workaround alongside cleanups in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-23net/mlx5: Avoid using pending command interface slotsMohamad Haj Yahia
Currently when firmware command gets stuck or it takes long time to complete, the driver command will get timeout and the command slot is freed and can be used for new commands, and if the firmware receive new command on the old busy slot its behavior is unexpected and this could be harmful. To fix this when the driver command gets timeout we return failure, but we don't free the command slot and we wait for the firmware to explicitly respond to that command. Once all the entries are busy we will stop processing new firmware commands. Fixes: 9cba4ebcf374 ('net/mlx5: Fix potential deadlock in command mode change') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14net/mlx5: FPGA, Add basic support for InnovaIlan Tayari
Mellanox Innova is a NIC with ConnectX and an FPGA on the same board. The FPGA is a bump-on-the-wire and thus affects operation of the mlx5_core driver on the ConnectX ASIC. Add basic support for Innova in mlx5_core. This allows using the Innova card as a regular NIC, by detecting the FPGA capability bit, and verifying its load state before initializing ConnectX interfaces. Also detect FPGA fatal runtime failures and enter error state if they ever happen. All new FPGA-related logic is placed in its own subdirectory 'fpga', which may be built by selecting CONFIG_MLX5_FPGA. This prepares for further support of various Innova features in later patchsets. Additional details about hardware architecture will be provided as more features get submitted. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14net/mlx5: Introduce trigger_health_work functionIlan Tayari
Introduce new function for entering bad-health state. This function will be called from FPGA-related logic in a later patch from asynchronous event (IRQ) context, for that we change the spin lock to an IRQ-safe one. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14{net, IB}/mlx5: Replace mlx5_vzalloc with kvzallocLeon Romanovsky
Commit a7c3e901a46f ("mm: introduce kv[mz]alloc helpers") added proper implementation of mlx5_vzalloc function to the MM core. This made the mlx5_vzalloc function useless, so let's remove it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-08Merge tags 'for-linus' and 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "As mentioned in my first pull request, this is the subsequent pull requests I had. This is all I have, and in fact this cleans out the RDMA subsystem's entire patchworks queue of kernel changes that are ready to go (well, it did for the weekend anyway, a few new patches are in, but they'll be coming during the -rc cycle). The first tag contains a single patch that would have conflicted if taken from my tree or DaveM's tree as it needed our trees merged to come cleanly. The second tag contains the patch series from Intel plus three other stragllers that came in late last week. I took them because it allowed me to legitimately claim that the RDMA patchworks queue was, for a short time, 100% cleared of all waiting kernel patches, woohoo! :-). I have it under my for-next tag, so it did get 0day and linux- next over the end of last week, and linux-next did show one minor conflict. Summary: 'for-linus' tag: - mlx5/IPoIB fixup patch 'for-next' tag: - the hfi1 15 patch set that landed late - IPoIB get_link_ksettings which landed late because I asked for a respin - one late rxe change - one -rc worthy fix that's in early" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/mlx5: Enable IPoIB acceleration * tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: rxe: expose num_possible_cpus() cnum_comp_vectors IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type IB/hfi1: Clean up on context initialization failure IB/hfi1: Fix an assign/ordering issue with shared context IDs IB/hfi1: Clean up context initialization IB/hfi1: Correctly clear the pkey IB/hfi1: Search shared contexts on the opened device, not all devices IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit IB/hfi1: Use filedata rather than filepointer IB/hfi1: Name function prototype parameters IB/hfi1: Fix a subcontext memory leak IB/hfi1: Return an error on memory allocation failure IB/hfi1: Adjust default eager_buffer_size to 8MB IB/hfi1: Get rid of divide when setting the tx request header IB/hfi1: Fix yield logic in send engine IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line IB/hfi1: Fix checks for Offline transient state IB/ipoib: add get_link_ksettings in ethtool
2017-05-08treewide: use kv[mz]alloc* rather than opencoded variantsMichal Hocko
There are many code paths opencoding kvmalloc. Let's use the helper instead. The main difference to kvmalloc is that those users are usually not considering all the aspects of the memory allocator. E.g. allocation requests <= 32kB (with 4kB pages) are basically never failing and invoke OOM killer to satisfy the allocation. This sounds too disruptive for something that has a reasonable fallback - the vmalloc. On the other hand those requests might fallback to vmalloc even when the memory allocator would succeed after several more reclaim/compaction attempts previously. There is no guarantee something like that happens though. This patch converts many of those places to kv[mz]alloc* helpers because they are more conservative. Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390 Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim Acked-by: David Sterba <dsterba@suse.com> # btrfs Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4 Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5 Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Santosh Raspatur <santosh@chelsio.com> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Yishai Hadas <yishaih@mellanox.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: "Yan, Zheng" <zyan@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-04IB/mlx5: Enable IPoIB accelerationErez Shitrit
Enable mlx5 IPoIB acceleration by declaring mlx5_ib_{alloc,free}_rdma_netdev and assigning the mlx5 IPoIB rdma_netdev callbacks. In addition, this patch brings in sync mlx5's IPoIB parts for net and IB trees. As a precaution, we disabled IPoIB acceleration by default (in the mlx5_core Kconfig file). Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-30net/mlx5e: Update neighbour 'used' state using HW flow rules countersHadar Hen Zion
When IP tunnel encapsulation rules are offloaded, the kernel can't see the traffic of the offloaded flow. The neighbour for the IP tunnel destination of the offloaded flow can mistakenly become STALE and deleted by the kernel since its 'used' value wasn't changed. To make sure that a neighbour which is used by the HW won't become STALE, we proactively update the neighbour 'used' value every DELAY_PROBE_TIME period, when packets were matched and counted by the HW for one of the tunnel encap flows related to this neighbour. The periodic task that updates the used neighbours is scheduled when a tunnel encap rule is successfully offloaded into HW and keeps re-scheduling itself as long as the representor's neighbours list isn't empty. Add, remove, lookup and status change operations done over the representor's neighbours list or the neighbour hash entry encaps list are all serialized by RTNL lock. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>