summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5/core/main.c
AgeCommit message (Collapse)Author
2019-02-14net/mlx5: Update enable HCA dependencyBodong Wang
With the introduction of ECPF, we require that the ECPF driver will aways call enable/disable HCA for that PF in the same way a PF does this for its VFs. The PF is still responsible for calling enable and disable HCA for its VFs. To distinguish between the ECPF executing enable/disable HCA for itself or for the PF, it sets the embedded CPU function bit in the input params struct of these commands. When the bit is cleared and function ID is zero, it refers to the peer PF. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Introduce Mellanox SmartNIC and modify page management logicBodong Wang
Mellanox's SmartNIC combines embedded CPU(e.g, ARM) processing power with advanced network offloads to accelerate a multitude of security, networking and storage applications. With the introduction of the SmartNIC, there is a new PCI function called Embedded CPU Physical Function(ECPF). And it's possible for a PF to get its ICM pages from the ECPF PCI function. Driver shall identify if it is running on such a function by reading a bit in the initialization segment. When firmware asks for pages, it would issue a page request event specifying how many pages it requests and for which function. That driver responds with a manage_pages command providing the requested pages along with an indication for which function it is providing these pages. The encoding before this patch was as follows: function_id == 0: pages are requested for the function receiving the EQE. function_id != 0: pages are requested for VF identified by the function_id value A new one bit field in the EQE identifies that pages are requested for the ECPF. The notion of page_supplier can be introduced here and to support that, manage pages and query pages were modified so firmware can distinguish the following cases: 1. Function provides pages for itself 2. PF provides pages for its VF 3. ECPF provides pages to itself 4. ECPF provides pages for another function This distinction is possible through the introduction of the bit "embedded_cpu_function" in query_pages, manage_pages and page request EQE. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13net/mlx5: Align ODP capability function with netdev coding styleLeon Romanovsky
Update newly introduced function to be aligned to netdev coding style. Fixes: 46861e3e88be ("net/mlx5: Set ODP SRQ support in firmware") Reported-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-02-03net/mlx5: Set ODP SRQ support in firmwareMoni Shoua
To avoid compatibility issue with older kernels the firmware doesn't allow SRQ to work with ODP unless kernel asks for it. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-01-24net/mlx5: Add pci AtomicOps requestMichael Guralnik
Calling pci_enable_atomic_ops_to_root enables AtomicOp requests to pci root port. AtomicOp requests will be enabled only if the completer and all intermediate pci bridges support PCI atomic operations. This, together with appropriate settings in the NVCONFIG should enable PCI atomic operations on the device. PCI atomic operations were first introduced in PCI Express Base Specification 2.1. The Supported operations are Swap (Unconditional Swap), CAS (Compare and Swap) and FetchAdd (Fetch and Add). Unlike other atomic operation modes PCI atomic operations gives the user the option to do atomic operations on local memory, without involving verbs api, without it compromising the operation's atomicity. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This has been a fairly typical cycle, with the usual sorts of driver updates. Several series continue to come through which improve and modernize various parts of the core code, and we finally are starting to get the uAPI command interface cleaned up. - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5, qib, rxe, usnic - Rework the entire syscall flow for uverbs to be able to run over ioctl(). Finally getting past the historic bad choice to use write() for command execution - More functional coverage with the mlx5 'devx' user API - Start of the HFI1 series for 'TID RDMA' - SRQ support in the hns driver - Support for new IBTA defined 2x lane widths - A big series to consolidate all the driver function pointers into a big struct and have drivers provide a 'static const' version of the struct instead of open coding initialization - New 'advise_mr' uAPI to control device caching/loading of page tables - Support for inline data in SRPT - Modernize how umad uses the driver core and creates cdev's and sysfs files - First steps toward removing 'uobject' from the view of the drivers" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits) RDMA/srpt: Use kmem_cache_free() instead of kfree() RDMA/mlx5: Signedness bug in UVERBS_HANDLER() IB/uverbs: Signedness bug in UVERBS_HANDLER() IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported IB/umad: Start using dev_groups of class IB/umad: Use class_groups and let core create class file IB/umad: Refactor code to use cdev_device_add() IB/umad: Avoid destroying device while it is accessed IB/umad: Simplify and avoid dynamic allocation of class IB/mlx5: Fix wrong error unwind IB/mlx4: Remove set but not used variable 'pd' RDMA/iwcm: Don't copy past the end of dev_name() string IB/mlx5: Fix long EEH recover time with NVMe offloads IB/mlx5: Simplify netdev unbinding IB/core: Move query port to ioctl RDMA/nldev: Expose port_cap_flags2 IB/core: uverbs copy to struct or zero helper IB/rxe: Reuse code which sets port state IB/rxe: Make counters thread safe IB/mlx5: Use the correct commands for UMEM and UCTX allocation ...
2018-12-20mlx5: extend PTP gettime function to read system clockMiroslav Lichvar
Read the system time right before and immediately after reading the low register of the internal timer. This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl. Cc: Richard Cochran <richardcochran@gmail.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-17net/mlx5: Continue driver initialization despite debugfs failureLeon Romanovsky
The failure to create debugfs entry is unpleasant event, but not enough to abort drier initialization. Align the mlx5_core code to debugfs design and continue execution whenever debugfs_create_dir() successes or not. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14net/mlx5: Introduce inter-device communication mechanismAviv Heller
This introduces devcom, a generic mechanism for performing operations on both physical functions of the same Connect-X card. The first user of this API is merged eswitch, which will be introduced in subsequent patches. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-04RDMA/mlx5: Initialize SRQ tables on mlx5_ibLeon Romanovsky
Transfer initialization and cleanup from mlx5_priv struct of mlx5_core_dev to be part of mlx5_ib_dev. This completes removal of SRQ from mlx5_core. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-29net/mlx5: Remove unused events callback and logicSaeed Mahameed
The mlx5_interface->event callback is not used by mlx5e/mlx5_ib anymore. We totally remove the delayed events logic work around, since with the dynamic notifier registration API it is not needed anymore, mlx5_ib can register its notifier and start receiving events exactly at the moment it is ready to handle them. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: Device events, Use async events chainSaeed Mahameed
Move all the generic async events handling into new specific events handling file events.c to keep eq.c file clean from concrete event logic handling. Use new API to register for NOTIFY_ANY to handle generic events and dispatch allowed events to mlx5_core consumers (mlx5_ib and mlx5e) Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: FWPage, Use async events chainSaeed Mahameed
Remove the explicit call to mlx5_core_req_pages_handler on MLX5_EVENT_TYPE_PAGE_REQUEST and let FW page logic to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20{net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMASaeed Mahameed
Use the new generic EQ API to move all ODP RDMA data structures and logic form mlx5 core driver into mlx5_ib driver. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Different EQ typesSaeed Mahameed
In mlx5 we have three types of usages for EQs, 1. Asynchronous EQs, used internally by mlx5 core for a. FW command completions b. FW page requests c. one EQ for all other Asynchronous events 2. Completion EQs, used for CQ completion (we create one per core) 3. *Special type of EQ (page fault) used for RDMA on demand paging (ODP). *The 3rd type shouldn't be special at least in mlx5 core, it is yet another async events EQ with specific use case, it will be removed in the next two patches, and will completely move its logic to mlx5_ib, as it is rdma specific. In this patch we remove use case (eq type) specific fields from struct mlx5_eq into a new eq type specific structures. struct mlx5_eq_async; truct mlx5_eq_comp; struct mlx5_eq_pagefault; Separate between their type specific flows. In the future we will allow users to create there own generic EQs. for now we will allow only one for ODP in next patches. We will introduce event listeners registration API for those who want to receive mlx5 async events. After that mlx5 eq handling will be clean from feature/user specific handling. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Privatize eq_table and friendsSaeed Mahameed
Move unnecessary EQ table structures and declaration from the public include/linux/mlx5/driver.h into the private area of mlx5_core and into eq.c/eq.h. Introduce new mlx5 EQ APIs: mlx5_comp_vectors_count(dev); mlx5_comp_irq_get_affinity_mask(dev, vector); And use them from mlx5_ib or mlx5e netdevice instead of direct access to mlx5_core internal structures. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Create all EQs in one placeSaeed Mahameed
Instead of creating the EQ table in three steps at driver load, - allocate irq vectors - allocate async EQs - allocate completion EQs Gather all of the procedures into one function in eq.c and call it from driver load. This will help us reduce the EQ and EQ table private structures visibility to eq.c in downstream refactoring. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Move all EQ logic to eq.cSaeed Mahameed
Move completion EQs flows from main.c to eq.c, reasons: 1) It is where this logic belongs. 2) It will help centralize the EQ logic in one file for downstream refactoring, and future extensions/updates. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Remove redundant completion EQ list lockSaeed Mahameed
Completion EQs list is only modified on driver load/unload, locking is not required, remove it. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, No need to store eq index as a fieldSaeed Mahameed
eq->index is used only for completion EQs and is assigned to be the completion eq index, it is used only when traversing the completion eqs list, and it can be calculated dynamically, thus remove the eq->index field. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Use the right place to store/read IRQ affinity hintSaeed Mahameed
Currently the cpu affinity hint mask for completion EQs is stored and read from the wrong place, since reading and storing is done from the same index, there is no actual issue with that, but internal irq_info for completion EQs stars at MLX5_EQ_VEC_COMP_BASE offset in irq_info array, this patch changes the code to use the correct offset to store and read the IRQ affinity hint. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-10-03net/mlx5: Add Fast teardown supportFeras Daoud
Today mlx5 devices support two teardown modes: 1- Regular teardown 2- Force teardown This change introduces the enhanced version of the "Force teardown" that allows SW to perform teardown in a faster way without the need to reclaim all the pages. Fast teardown provides the following advantages: 1- Fix a FW race condition that could cause command timeout 2- Avoid moving to polling mode 3- Close the vport to prevent PCI ACK to be sent without been scatter to memory Signed-off-by: Feras Daoud <ferasda@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Fix debugfs cleanup in the device init/remove flowJack Morgenstein
When initializing the device (procedure init_one), the driver calls mlx5_pci_init to perform pci initialization. As part of this initialization, mlx5_pci_init creates a debugfs directory. If this creation fails, init_one aborts, returning failure to the caller (which is the probe method caller). The main reason for such a failure to occur is if the debugfs directory already exists. This can happen if the last time mlx5_pci_close was called, debugfs_remove (silently) failed due to the debugfs directory not being empty. Guarantee that such a debugfs_remove failure will not occur by instead calling debugfs_remove_recursive in procedure mlx5_pci_close. Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware and software flows") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Fix use-after-free in self-healing flowJack Morgenstein
When the mlx5 health mechanism detects a problem while the driver is in the middle of init_one or remove_one, the driver needs to prevent the health mechanism from scheduling future work; if future work is scheduled, there is a problem with use-after-free: the system WQ tries to run the work item (which has been freed) at the scheduled future time. Prevent this by disabling work item scheduling in the health mechanism when the driver is in the middle of init_one() or remove_one(). Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-08-08net/mlx5: Use max_num_eqs for calculation of required MSIX vectorsDenis Drozdov
New firmware has defined new HCA capability field called "max_num_eqs", that is the number of available EQs after subtracting reserved FW EQs. Before this capability the FW reported the EQ number in "log_max_eqs", the reported value also contained FW reserved EQs, but the driver might be failing to load on 320 cpus systems due to the fact that FW reserved EQs were not available to the driver. Now the driver has to obtain max_num_eqs value from new FW to get real number of EQs available. Signed-off-by: Denis Drozdov <denisd@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net/mlx5e: Vxlan, move vxlan logic to core driverSaeed Mahameed
Move vxlan logic and objects to mlx5 core dirver. Since it going to be used from different mlx5 interfaces. e.g. mlx5e PF NIC netdev and mlx5e E-Switch representors. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2018-07-23net/mlx5: FW tracer, Enable tracingFeras Daoud
Add the tracer file to the makefile and add the init function to the load one flow. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18net/mlx5: Fix tristate and description for MLX5 moduleEran Ben Elisha
Current description did not include new devices. Fix that by proving the correct generic description. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The bpf syscall and selftests conflicts were trivial overlapping changes. The r8169 change involved moving the added mdelay from 'net' into a different function. A TLS close bug fix overlapped with the splitting of the TLS state into separate TX and RX parts. I just expanded the tests in the bug fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf == X". Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-10net/mlx5: Free IRQs in shutdown pathDaniel Jurgens
Some platforms require IRQs to be free'd in the shutdown path. Otherwise they will fail to be reallocated after a kexec. Fixes: 8812c24d28f4 ("net/mlx5: Add fast unload support in shutdown flow") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-01net/mlx5: Accel, Add TLS tx offload interfaceIlya Lesokhin
Add routines for manipulating TLS TX offload contexts. In Innova TLS, TLS contexts are added or deleted via a command message over the SBU connection. The HW then sends a response message over the same connection. Add implementation for Innova TLS (FPGA-based) hardware. These routines will be used by the TLS offload support in a later patch mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs to work directly with mlx5_core rather than Innova FPGA or other mlx5 acceleration providers. In the future, when IPSec/TLS or any other acceleration gets integrated into ConnectX chip, mlx5/accel layer will provide the integrated acceleration, rather than the Innova one. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-06Merge tag 'pci-v4.17-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - move pci_uevent_ers() out of pci.h (Michael Ellerman) - skip ASPM common clock warning if BIOS already configured it (Sinan Kaya) - fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva) - remove last user of pci_get_bus_and_slot() and the function itself (Sinan Kaya) - add decoding for 16 GT/s link speed (Jay Fang) - add interfaces to get max link speed and width (Tal Gilboa) - add pcie_bandwidth_capable() to compute max supported link bandwidth (Tal Gilboa) - add pcie_bandwidth_available() to compute bandwidth available to device (Tal Gilboa) - add pcie_print_link_status() to log link speed and whether it's limited (Tal Gilboa) - use PCI core interfaces to report when device performance may be limited by its slot instead of doing it in each driver (Tal Gilboa) - fix possible cpqphp NULL pointer dereference (Shawn Lin) - rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI hotplug (Mika Westerberg) - add support for PCI I/O port space that's neither directly accessible via CPU in/out instructions nor directly mapped into CPU physical memory space. This is fairly intrusive and includes minor changes to interfaces used for I/O space on most platforms (Zhichang Yuan, John Garry) - add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan, John Garry) - use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas) - remove possible NULL pointer dereference in of_pci_bus_find_domain_nr() (Shawn Lin) - report quirk timings with dev_info (Bjorn Helgaas) - report quirks that take longer than 10ms (Bjorn Helgaas) - add and use Altera Vendor ID (Johannes Thumshirn) - tidy Makefiles and comments (Bjorn Helgaas) - don't set up INTx if MSI or MSI-X is enabled to align cris, frv, ia64, and mn10300 with x86 (Bjorn Helgaas) - move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick Lawler) - merge pcieport_if.h into portdrv.h (Bjorn Helgaas) - move workaround for BIOS PME issue from portdrv to PCI core (Bjorn Helgaas) - completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas) - remove portdrv link order dependency (Bjorn Helgaas) - remove support for unused VC portdrv service (Bjorn Helgaas) - simplify portdrv feature permission checking (Bjorn Helgaas) - remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn Helgaas) - remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas) - use cached AER capability offset (Frederick Lawler) - don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg) - rename pcie-dpc.c to dpc.c (Bjorn Helgaas) - use generic pci_mmap_resource_range() instead of powerpc and xtensa arch-specific versions (David Woodhouse) - support arbitrary PCI host bridge offsets on sparc (Yinghai Lu) - remove System and Video ROM reservations on sparc (Bjorn Helgaas) - probe for device reset support during enumeration instead of runtime (Bjorn Helgaas) - add ACS quirk for Ampere (née APM) root ports (Feng Kan) - add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas Vincent-Cross) - protect device restore with device lock (Sinan Kaya) - handle failure of FLR gracefully (Sinan Kaya) - handle CRS (config retry status) after device resets (Sinan Kaya) - skip various config reads for SR-IOV VFs as an optimization (KarimAllah Ahmed) - consolidate VPD code in vpd.c (Bjorn Helgaas) - add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann) - add DT support for R-Car r8a7743 (Biju Das) - fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host bridge driver that causes a general protection fault (Dexuan Cui) - fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV (Dexuan Cui) - fix Hyper-V host bridge hang when ejecting a VF before setting up MSI (Dexuan Cui) - make several structures static (Fengguang Wu) - increase number of MSI IRQs supported by Synopsys DesignWare bridges from 32 to 256 (Gustavo Pimentel) - implemented multiplexed IRQ domain API and remove obsolete MSI IRQ API from DesignWare drivers (Gustavo Pimentel) - add Tegra power management support (Manikanta Maddireddy) - add Tegra loadable module support (Manikanta Maddireddy) - handle 64-bit BARs correctly in endpoint support (Niklas Cassel) - support optional regulator for HiSilicon STB (Shawn Guo) - use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla) - support power supplies for Qualcomm msm8996 (Srinivas Kandagatla) * tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits) MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver HISI LPC: Add ACPI support ACPI / scan: Do not enumerate Indirect IO host children ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings of: Add missing I/O range exception for indirect-IO devices PCI: Apply the new generic I/O management on PCI IO hosts PCI: Add fwnode handler as input param of pci_register_io_range() PCI: Remove __weak tag from pci_register_io_range() MAINTAINERS: Add missing /drivers/pci/cadence directory entry fm10k: Report PCIe link properties with pcie_print_link_status() net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth net/mlx5: Report PCIe link properties with pcie_print_link_status() net/mlx4_core: Report PCIe link properties with pcie_print_link_status() PCI: Add pcie_print_link_status() to log link speed and whether it's limited PCI: Add pcie_bandwidth_available() to compute bandwidth available to device misc: pci_endpoint_test: Handle 64-bit BARs properly PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar ...
2018-04-03net/mlx5: Report PCIe link properties with pcie_print_link_status()Tal Gilboa
Use pcie_print_link_status() to report PCIe link speed and possible limitations. Signed-off-by: Tal Gilboa <talgi@mellanox.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2018-03-08Merge tag 'mlx5-updates-2018-02-28-2' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-updates-2018-02-28-2 (IPSec-2) This series follows our previous one to lay out the foundations for IPSec in user-space and extend current kernel netdev IPSec support. As noted in our previous pull request cover letter "mlx5-updates-2018-02-28-1 (IPSec-1)", the IPSec mechanism will be supported through our flow steering mechanism. Therefore, we need to change the initialization order. Furthermore, IPsec is also supported in both egress and ingress. Since our current flow steering is egress only, we add an empty (only implemented through FPGA steering ops) egress namespace to handle that case. We also implement the required flow steering callbacks and logic in our FPGA driver. We extend the FPGA support for ESN and modifying a xfrm too. Therefore, we add support for some new FPGA command interface that supports them. The other required bits are added too. The new features and requirements are advertised via cap bits. Last but not least, we revise our driver's accel_esp API. This API will be shared between our netdev and IB driver, so we need to have all the required functionality from both worlds. Regards, Aviad and Matan ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07net/mlx5: Add flow-steering commands for FPGA IPSec implementationAviad Yehezkel
In order to add a context to the FPGA, we need to get both the software transform context (which includes the keys, etc) and the source/destination IPs (which are included in the steering rule). Therefore, we register new set of firmware like commands for the FPGA. Each time a rule is added, the steering core infrastructure calls the FPGA command layer. If the rule is intended for the FPGA, it combines the IPs information with the software transformation context and creates the respective hardware transform. Afterwards, it calls the standard steering command layer. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07Merge tag 'mlx5-updates-2018-02-28-1' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-updates-2018-02-28-1 (IPSec-1) This series consists of some fixes and refactors for the mlx5 drivers, especially around the FPGA and flow steering. Most of them are trivial fixes and are the foundation of allowing IPSec acceleration from user-space. We use flow steering abstraction in order to accelerate IPSec packets. When a user creates a steering rule, [s]he states that we'll carry an encrypt/decrypt flow action (using a specific configuration) for every packet which conforms to a certain match. Since currently offloading these packets is done via FPGA, we'll add another set of flow steering ops. These ops will execute the required FPGA commands and then call the standard steering ops. In order to achieve this, we need that the commands will get all the required information. Therefore, we pass the fte object and embed the flow_action struct inside the fte. In addition, we add the shim layer that will later be used for alternating between the standard and the FPGA steering commands. Some fixes, like " net/mlx5e: Wait for FPGA command responses with a timeout" are very relevant for user-space applications, as these applications could be killed, but we still want to wait for the FPGA and update the kernel's database. Regards, Aviad and Matan ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-06net/mlx5: FPGA and IPSec initialization to be before flow steeringMatan Barak
Some flow steering namespace initialization (i.e. egress namespace) might depend on FPGA capabilities. Changing the initialization order such that the FPGA will be initialized before flow steering. Flow steering fs cmds initialization might depend on IPSec capabilities. Changing the initialization order such that the IPSec will be initialized before flow steering as well. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-02-20net/mlx5: Use 128B cacheline size for 128B or larger cachelinesDaniel Jurgens
The adapter uses the cache_line_128byte setting to set the bounds for end padding. On systems where the cacheline size is greater than 128B use 128B instead of the default of 64B. This results in fewer partial cacheline writes. There's a 50% chance it will pad to the end of a 256B cache line vs only 25% when using 64B. Fixes: f32f5bd2eb7e ("net/mlx5: Configure cache line size for start and end padding") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-15net/mlx5: CQ Database per EQSaeed Mahameed
Before this patch the driver had one CQ database protected via one spinlock, this spinlock is meant to synchronize between CQ adding/removing and CQ IRQ interrupt handling. On a system with large number of CPUs and on a work load that requires lots of interrupts, this global spinlock becomes a very nasty hotspot and introduces a contention between the active cores, which will significantly hurt performance and becomes a bottleneck that prevents seamless cpu scaling. To solve this we simply move the CQ database and its spinlock to be per EQ (IRQ), thus per core. Tested with: system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores netperf command: ./super_netperf 200 -P 0 -t TCP_RR -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M WITHOUT THIS PATCH: Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 4.32 0.00 36.15 0.09 0.00 34.02 0.00 0.00 0.00 25.41 Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271 Overhead Command Shared Object Symbol + 14.28% swapper [kernel.vmlinux] [k] intel_idle + 12.25% swapper [kernel.vmlinux] [k] queued_spin_lock_slowpath + 10.29% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath + 1.32% netserver [kernel.vmlinux] [k] mlx5e_xmit WITH THIS PATCH: Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 4.27 0.00 34.31 0.01 0.00 18.71 0.00 0.00 0.00 42.69 Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483 Overhead Command Shared Object Symbol + 23.33% swapper [kernel.vmlinux] [k] intel_idle + 1.69% netserver [kernel.vmlinux] [k] mlx5e_xmit Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-01-30Merge tag v4.15 of ↵Jason Gunthorpe
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git To resolve conflicts in: drivers/infiniband/hw/mlx5/main.c drivers/infiniband/hw/mlx5/qp.c From patches merged into the -rc cycle. The conflict resolution matches what linux-next has been carrying. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-12net/mlx5: Fix error handling in load oneMaor Gottlieb
We didn't store the result of mlx5_init_once, due to that mlx5_load_one returned success on error. Fix that. Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware and software flows") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12net/mlx5: Fix mlx5_get_uars_page to return error codeEran Ben Elisha
Change mlx5_get_uars_page to return ERR_PTR in case of allocation failure. Change all callers accordingly to check the IS_ERR(ptr) instead of NULL. Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware and software flows") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12net/mlx5: Fix memory leak in bad flow of mlx5_alloc_irq_vectorsAlaa Hleihel
Fix a memory leak where in case that pci_alloc_irq_vectors failed, priv->irq_info was not released. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12{net,ib}/mlx5: Don't disable local loopback multicast traffic when neededEran Ben Elisha
There are systems platform information management interfaces (such as HOST2BMC) for which we cannot disable local loopback multicast traffic. Separate disable_local_lb_mc and disable_local_lb_uc capability bits so driver will not disable multicast loopback traffic if not supported. (It is expected that Firmware will not set disable_local_lb_mc if HOST2BMC is running for example.) Function mlx5_nic_vport_update_local_lb will do best effort to disable/enable UC/MC loopback traffic and return success only in case it succeeded to changed all allowed by Firmware. Adapt mlx5_ib and mlx5e to support the new cap bits. Fixes: 2c43c5a036be ("net/mlx5e: Enable local loopback in loopback selftest") Fixes: c85023e153e3 ("IB/mlx5: Add raw ethernet local loopback support") Fixes: bded747bb432 ("net/mlx5: Add raw ethernet local loopback firmware command") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-08net/mlx5: Set num_vhca_ports capabilityDaniel Jurgens
Set the current capability to the max capability. Doing so enables dual port RoCE functionality if supported by the firmware. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08net/mlx5: Set software owner ID during init HCADaniel Jurgens
Generate a unique 128bit identifier for each host and pass that value to firmware in the INIT_HCA command if it reports the sw_owner_id capability. Each device bound to the mlx5_core driver will have the same software owner ID. In subsequent patches mlx5_core devices will be bound via a new VPort command so that they can operate together under a single InfiniBand device. Only devices that have the same software owner ID can be bound, to prevent traffic intended for one host arriving at another. The INIT_HCA command length was expanded by 128 bits. The command length is provided as an input FW commands. Older FW does not have a problem receiving this command in the new longer form. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08net/mlx5: Enable DC transportMoni Shoua
Enable DC transport in the firmware to provide its functionality. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-19Revert "mlx5: move affinity hints assignments to generic code"Saeed Mahameed
Before the offending commit, mlx5 core did the IRQ affinity itself, and it seems that the new generic code have some drawbacks and one of them is the lack for user ability to modify irq affinity after the initial affinity values got assigned. The issue is still being discussed and a solution in the new generic code is required, until then we need to revert this patch. This fixes the following issue: echo <new affinity> > /proc/irq/<x>/smp_affinity fails with -EIO This reverts commit a435393acafbf0ecff4deb3e3cb554b34f0d0664. Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since it is used in mlx5_ib driver. Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code") Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jes Sorensen <jsorensen@fb.com> Reported-by: Jes Sorensen <jsorensen@fb.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller