summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-02-15RDMA/nes: Remove useless usecnt variable and redundant memsetLeon Romanovsky
The internal design of RDMA/core ensures that there dealloc ucontext will be called only if alloc_ucontext succeeded, hence there is no need to manage internal variable to mark validity of ucontext. As part of this change, remove redundant memeset too. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/hfi1: Fix a build warning for TID RDMA READKaike Wan
The following build warning was produced for the TID RDMA READ patch ("IB/hfi1: Enable TID RDMA READ protocol"): drivers/infiniband/hw/hfi1/qp.c: In function 'hfi1_setup_wqe': drivers/infiniband/hw/hfi1/qp.c:328:3: warning: this statement may fall through [-Wimplicit-fallthrough=] hfi1_setup_tid_rdma_wqe(qp, wqe); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/qp.c:329:2: note: here case IB_QPT_UC: ^~~~ This patch will fix the issue by adding the "fall through" comment. Fixes: f1ab4efa6d32 ("IB/hfi1: Enable TID RDMA READ protocol") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15RDMA/uverbs: Fix an error flow in ib_uverbs_poll_cqJason Gunthorpe
The new output_written block was wrongly placed before the ret=0, causing the error code to be lost. uverbs_output_written is not expected to fail, and even if it does fail it has no significant impact on the userspace flow. Reported-by: Bart Van Assche <bvanassche@acm.org> Fixes: d6f4a21f309d ("RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2019-02-15IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIsShamir Rabinovitch
Now when we have the udata passed to all the ib_xxx object creation APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/verbs: Add helper function rdma_udata_to_drv_contextShamir Rabinovitch
Helper function to get driver's context out of ib_udata wrapped in uverbs_attr_bundle for user objects or NULL for kernel objects. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flowsShamir Rabinovitch
Add ib_ucontext to the uverbs_attr_bundle sent down the iocl and cmd flows as soon as the flow has ib_uobject. In addition, remove rdma_get_ucontext helper function that is only used by ib_umem_get. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14IB/ipoib: Use __func__ instead of function's nameErez Alfasi
Changed debug statements to use %s and __func__ instead of hard-coded function's name. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14RDMA/iwpm: Remove set but not used variable 'msg_seq'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: drivers/infiniband/core/iwpm_util.c: In function 'iwpm_send_hello': drivers/infiniband/core/iwpm_util.c:811:6: warning: variable 'msg_seq' set but not used [-Wunused-but-set-variable] It never used since introduction in commit b0bad9ad514f ("RDMA/IWPM: Support no port mapping requirements") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14RDMA/hns: Configure capacity of hns deviceLijun Ou
This patch adds new device capability for IB_DEVICE_MEM_MGT_EXTENSIONS to indicate device support for the following features: 1. Fast register memory region. 2. send with remote invalidate by frmr 3. local invalidate memory regsion As well as adds the max depth of frmr page list len. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14RDMA/hns: Delete useful prints for aeq subtype eventYixian Liu
Current all messages printed for aeq subtype event are wrong. Thus, delete them and only the value of subtype event is printed. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14RDMA/hns: Set allocated memory to zero for wridYixian Liu
The memory allocated for wrid should be initialized to zero. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14RDMA/hns: Fix the state of rereg mrYixian Liu
The state of mr after reregister operation should be set to valid state. Otherwise, it will keep the same as the state before reregistered. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14RDMA/hns: Limit minimum ROCE CQ depth to 64chenglang
This patch modifies the minimum CQ depth specification of hip08 and is consistent with the processing of hip06. Signed-off-by: chenglang <chenglang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14IB/mlx5: Add support for 50Gbps per lane link modesAya Levin
Driver now supports new link modes: 50Gbps per lane support for 50G/100G/200G. This patch reads the correct field (legacy vs. extended) based on a FW indication bit, and adds a translation function (link modes to IB width and speed) to the new link modes. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Add support to ext_* fields introduced in Port Type and Speed registerAya Levin
This patch exposes new link modes (including 50Gbps per lane), and ext_* fields which describes the new link modes in Port Type and Speed register (PTYS). Access functions, translation functions (speed <-> HW bits) and link max speed function were modified. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Add new fields to Port Type and Speed registerAya Levin
Register Port Type and Speed (PTYS) introduces three new fields extending the speed/protocols the can be reported and configured. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Refactor queries to speed fields in Port Type and Speed registerAya Levin
This patch fascicles queries to speed related fields in Port Type and Speed register (PTYS) into a single API. I addition, this patch refactors functions which serves only Ethernet driver: remove the protocol type as an input parameter, move code from 'core' directory into 'en' directory and add 'eth' prefix to the function's name. The patch also encapsulates functions that are not used outside the Ethernet driver removes redundant include files. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: E-Switch, Avoid magic numbers when initializing offloads modeBodong Wang
When dealing with the offloads mode initialization, driver refers to the number of VFs and add magic number one (1) to take account of the uplink. This is not clear and will make the code less readable after adding other vports (e.g. host PF). As these are special vports compared to VF vports, add a helper macro to denote such special vports and eliminate the use of magic number. Moreover, when creating offloads flow table and groups, the driver reserves two more slots for UC and MC miss rules. Replace this magic number with a helper macro as well. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Relocate vport macros to the vport header fileBodong Wang
These are two macros in the driver general header which deal with the number of total vports and if a vport is vport manager. Such macros are vport entities, better to place them at the vport header file. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: E-Switch, Normalize the name of uplink vport numberBodong Wang
Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to comply with the same naming convention along with the introduction of other vports. Use MLX5_VPORT as the prefix for such vports and relocate the uplink vport definition to public header file for the benefits of both net and IB drivers. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Provide an alternative VF upper bound for ECPFBodong Wang
ECPF doesn't support SR-IOV, but an ECPF E-Switch manager shall know the max VFs supported by its peer host PF in order to control those VF vports. The current driver implementation uses the total vfs quantity as provided by the pci sub-system for an upper bound of the VF vports the e-switch code needs to deal with. This obviously can't work as is on ECPF e-switch manager. For now, we use a hard coded value of 128 on such systems. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Add host params change eventBodong Wang
In Embedded CPU (EC) configurations, the EC driver needs to know when the number of virtual functions change on the corresponding PF at the host side. This is required so the EC driver can create or destroy representor net devices that represent the VFs ports. Whenever a change in the number of VFs occurs, firmware will generate an event towards the EC which will trigger a work to complete the rest of the handling. The specifics of the handling will be introduced in a downstream patch. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Add query host params commandBodong Wang
The QUERY_HOST_PARAMS command is used by an Embedded CPU Physical Function (ECPF) driver to identify and retrieve information about the PF on the host side. E.g, number of virtual functions and PCI BDF. The number of VFs can be changed on the fly, a function is added to query current number of VFs and will be used in downstream patches. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Update enable HCA dependencyBodong Wang
With the introduction of ECPF, we require that the ECPF driver will aways call enable/disable HCA for that PF in the same way a PF does this for its VFs. The PF is still responsible for calling enable and disable HCA for its VFs. To distinguish between the ECPF executing enable/disable HCA for itself or for the PF, it sets the embedded CPU function bit in the input params struct of these commands. When the bit is cleared and function ID is zero, it refers to the peer PF. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Introduce Mellanox SmartNIC and modify page management logicBodong Wang
Mellanox's SmartNIC combines embedded CPU(e.g, ARM) processing power with advanced network offloads to accelerate a multitude of security, networking and storage applications. With the introduction of the SmartNIC, there is a new PCI function called Embedded CPU Physical Function(ECPF). And it's possible for a PF to get its ICM pages from the ECPF PCI function. Driver shall identify if it is running on such a function by reading a bit in the initialization segment. When firmware asks for pages, it would issue a page request event specifying how many pages it requests and for which function. That driver responds with a manage_pages command providing the requested pages along with an indication for which function it is providing these pages. The encoding before this patch was as follows: function_id == 0: pages are requested for the function receiving the EQE. function_id != 0: pages are requested for VF identified by the function_id value A new one bit field in the EQE identifies that pages are requested for the ECPF. The notion of page_supplier can be introduced here and to support that, manage pages and query pages were modified so firmware can distinguish the following cases: 1. Function provides pages for itself 2. PF provides pages for its VF 3. ECPF provides pages to itself 4. ECPF provides pages for another function This distinction is possible through the introduction of the bit "embedded_cpu_function" in query_pages, manage_pages and page request EQE. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14IB/mlx5: Use unified register/load function for uplink and VF vportsBodong Wang
IB driver maintains different registration and load function calls for uplink and VF vports. This is not necessary as they only differ with each other on their profiles. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Use consistent vport num argument typeBodong Wang
Use u16 for vport number, which matches how hardware refers to this argument throughout commands. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Use void pointer as the type in address_of macroBodong Wang
Better to use void * and avoid unnecessary casts. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13net/mlx5: Align ODP capability function with netdev coding styleLeon Romanovsky
Update newly introduced function to be aligned to netdev coding style. Fixes: 46861e3e88be ("net/mlx5: Set ODP SRQ support in firmware") Reported-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-02-13RDMA/nes: Use for_each_sg_dma_page iterator for umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-13RDMA/rdmavt: Adapt to handle non-uniform sizes on umem SGEsShiraz, Saleem
rdmavt expects a uniform size on all umem SGEs which is currently at PAGE_SIZE. Adapt to a umem API change which could return non-uniform sized SGEs due to combining contiguous PAGE_SIZE regions into an SGE. Use for_each_sg_page variant to unfold the larger SGEs into a list of PAGE_SIZE elements. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-13RDMA: Fix allocation failure on pointer pdColin Ian King
The null check on an allocation failure on pd is currently checking if pd is non-null rather than null. Fix this by adding the missing ! operator. Fixes: 21a428a019c9 ("RDMA: Handle PD allocations by IB/core") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-13Merge branch 'for-next' of ↵Doug Ledford
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma into for-next I had merged the hfi1-tid code into my local copy of for-next, but was waiting on 0day testing before pushing it (I pushed it to my wip branch). Having waited several days for 0day testing to show up, I'm finally just going to push it out. In the meantime, though, Jason pushed other stuff to for-next, so I needed to merge up the branches before pushing. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-11mlx5: use RCU lock in mlx5_eq_cq_get()Cong Wang
mlx5_eq_cq_get() is called in IRQ handler, the spinlock inside gets a lot of contentions when we test some heavy workload with 60 RX queues and 80 CPU's, and it is clearly shown in the flame graph. In fact, radix_tree_lookup() is perfectly fine with RCU read lock, we don't have to take a spinlock on this hot path. This is pretty much similar to commit 291c566a2891 ("net/mlx4_core: Fix racy CQ (Completion Queue) free"). Slow paths are still serialized with the spinlock, and with synchronize_irq() it should be safe to just move the fast path to RCU read lock. This patch itself reduces the latency by about 50% for our memcached workload on a 4.14 kernel we test. In upstream, as pointed out by Saeed, this spinlock gets some rework in commit 02d92f790364 ("net/mlx5: CQ Database per EQ"), so the difference could be smaller. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-11RDMA/bnxt_re: fix or'ing of data into an uninitialized struct memberColin Ian King
The struct member comp_mask has not been initialized however a bit pattern is being bitwise or'd into the member and hence other bit fields in comp_mask may contain any garbage from the stack. Fix this by making the bitwise or into an assignment. Fixes: 95b86d1c91ad ("RDMA/bnxt_re: Update kernel user abi to pass chip context") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/mlx5: Fix memory leak in case we fail to add an IB deviceMark Bloch
Make sure the IB device is freed on failure. Fixes: b5ca15ad7e61 ("IB/mlx5: Add proper representors support") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11IB/mlx5: Fix bad flow upon DEVX mkey creationYishai Hadas
Fix bad flow upon DEVX mkey creation to prevent deleting the indirect mkey from the radix tree in case there was a previous failure to insert it. Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/rxe: Use for_each_sg_page iterator on umem SGLShiraz, Saleem
The driver walks the umem SGL assuming a 1:1 mapping between SGE and system page. Update to use the for_each_sg_page iterator to get individual pages contained in the SGEs. This is a pre-requisite before adding page combining into SGEs while building the scatter table in IB core. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/ocrdma: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/qedr: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/vmw_pvrdma: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/cxgb3: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/cxgb4: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/hns: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/i40iw: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/mthca: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11lib/scatterlist: Provide a DMA page iteratorJason Gunthorpe
Commit 2db76d7c3c6d ("lib/scatterlist: sg_page_iter: support sg lists w/o backing pages") introduced the sg_page_iter_dma_address() function without providing a way to use it in the general case. If the sg_dma_len() is not equal to the sg length callers cannot safely use the for_each_sg_page/sg_page_iter_dma_address combination. Resolve this API mistake by providing a DMA specific iterator, for_each_sg_dma_page(), that uses the right length so sg_page_iter_dma_address() works as expected with all sglists. A new iterator type is introduced to provide compile-time safety against wrongly mixing accessors and iterators. Acked-by: Christoph Hellwig <hch@lst.de> (for scatterlist) Acked-by: Thomas Hellstrom <thellstrom@vmware.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> (ipu3-cio2) Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-09Merge branch 'wip/dl-for-next' into for-nextDoug Ledford
Due to concurrent work by myself and Jason, a normal fast forward merge was not possible. This brings in a number of hfi1 changes, mainly the hfi1 TID RDMA support (roughly 10,000 LOC change), which was reviewed and integrated over a period of days. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-09Merge branch 'hfi1-tid' into wip/dl-for-nextDoug Ledford
Omni-Path TID RDMA Feature Intel Omni-Path (OPA) TID RDMA support is a feature that accelerates data movement between two OPA nodes through the IB Verbs interface. It improves RDMA READ/WRITE performance by delivering the data payload to a user buffer directly without any software copying. Architecture ============= The TID RDMA protocol is implemented on the hfi1 driver level and is therefore transparent to the ULPs. It is designed to facilitate the data transactions for two specific RDMA requests: - RDMA READ; - RDMA WRITE. Previously, when a verbs data packet is received at the destination (requester side for RDMA READ and responder side for RDMA WRITE), the data payload is copied to the user buffer by software, which slows down the performance significantly for large requests. Internally, hfi1 converts qualified RDMA READ/WRITE requests into TID RDMA READ/WRITE requests when the requests are post sent to the hfi1 driver. Non-qualified RDMA requests are handled by normal RDMA protocol. For TID RDMA requests, hardware resources (hardware flow and TID entries) are allocated on the destination side (the requester side for TID RDMA READ and the responder side for TID RDMA WRITE). The information for these resources is conveyed to the data source side (the responder side for TID RDMA READ and the requester side for TID RDMA WRITE) and embedded in data packets. When data packets are received by the destination, hardware will deliver the data payload to the destination buffer without involving software and therefore improve the performance. Details ======= RDMA READ/WRITE requests are qualified by the following: - Total data length >= 256k; - Totoal data length is a multiple of 4K pages. Additional qualifications are enforced for the destination buffers: For RDMA RAED: - Each destination sge buffer is 4K aligned; - Each destination sge buffer is a multiple of 4K pages. For RDMA WRITE: - The destination number is 4K aligned. In addition, in an OPA fabric, some nodes may support TID RDMA while others may not. As such, it is important for two transaction nodes to exchange the information about the features they support. This discovery mechanism is called OPA Feature Negotion (OPFN) and is described in details in the patch series. Through OPFN, two nodes can find whether they both support TID RDMA and subsequently convert RDMA requests into TID RDMA requests. * hfi1-tid: (46 commits) IB/hfi1: Prioritize the sending of ACK packets IB/hfi1: Add static trace for TID RDMA WRITE protocol IB/hfi1: Enable TID RDMA WRITE protocol IB/hfi1: Add interlock between TID RDMA WRITE and other requests IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs IB/hfi1: Add the dual leg code IB/hfi1: Add the TID second leg ACK packet builder IB/hfi1: Add the TID second leg send packet builder IB/hfi1: Resend the TID RDMA WRITE DATA packets IB/hfi1: Add a function to receive TID RDMA RESYNC packet IB/hfi1: Add a function to build TID RDMA RESYNC packet IB/hfi1: Add TID RDMA retry timer IB/hfi1: Add a function to receive TID RDMA ACK packet IB/hfi1: Add a function to build TID RDMA ACK packet IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet IB/hfi1: Add a function to build TID RDMA WRITE DATA packet IB/hfi1: Add a function to receive TID RDMA WRITE response IB/hfi1: Add TID resource timer IB/hfi1: Add a function to build TID RDMA WRITE response IB/hfi1: Add functions to receive TID RDMA WRITE request ... Signed-off-by: Doug Ledford <dledford@redhat.com>