summaryrefslogtreecommitdiff
path: root/include/rdma
AgeCommit message (Collapse)Author
2019-05-06RDMA: Add EFA related definitionsGal Pressman
Add EFA driver ID to the IOCTL interface uapi. This patch also adds unspecified node/transport type that will be used by EFA (usnic is left unchanged as it's already part of our ABI). Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06RDMA/umem: Remove hugetlb flagShiraz Saleem
The drivers i40iw and bnxt_re no longer dependent on the hugetlb flag. So remove this flag from ib_umem structure. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocksShiraz Saleem
This helper iterates over a DMA-mapped SGL and returns contiguous memory blocks aligned to a HW supported page size. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06RDMA/umem: Add API to find best driver supported page size in an MRShiraz Saleem
This helper iterates through the SG list to find the best page size to use from a bitmap of HW supported page sizes. Drivers that support multiple page sizes, but not mixed sizes in an MR can use this API. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03RDMA/core: Allow detaching gid attribute netdevice for RoCEParav Pandit
When there is active traffic through a GID, a QP/AH holds reference to this GID entry. RoCE GID entry holds reference to its attached netdevice. Due to this when netdevice is deleted by admin user, its refcount is not dropped. Therefore, while deleting RoCE GID, wait for all GID attribute's netdev users to finish accessing netdev in rcu context. Once all users done accessing it, release the netdev refcount. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03RDMA/cma: Use rdma_read_gid_attr_ndev_rcu to access netdevParav Pandit
To access the netdevice of the GID attribute, use an existing API rdma_read_gid_attr_ndev_rcu(). This further reduces dependency on open access to netdevice of GID attribute. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03RDMA: Introduce and use GID attr helper to read RoCE L2 fieldsParav Pandit
Instead of RoCE drivers figuring out vlan, smac fields while working on QP/AH, provide a helper routine to read the L2 fields such as vlan_id and source mac address. This moves logic from mlx5 driver to core for wider usage for RoCE ports. This is a preparation patch to allow detaching netdev in subsequent patch. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03RDMA: Get rid of iw_cm_verbsKamal Heib
Integrate iw_cm_verbs data members into ib_device_ops and ib_device structs, this is done to achieve the following: 1) Avoid memory related bugs durring error unwind 2) Make the code more cleaner 3) Reduce code duplication Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-01RDMA/core: Introduce RDMA subsystem ibdev_* print functionsGal Pressman
Similarly to dev/netdev/etc printk helpers, add standard printk helpers for the RDMA subsystem. Example output: efa 0000:00:06.0 efa_0: Hello World! efa_0: Hello World! (no parent device set) (NULL ib_device): Hello World! (ibdev is NULL) Cc: Jason Baron <jbaron@akamai.com> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-04-24Merge branch 'rdma_mmap' into rdma.git for-nextJason Gunthorpe
Jason Gunthorpe says: ==================== Upon review it turns out there are some long standing problems in BAR mapping area: * BAR pages intended for read-only can be switched to writable via mprotect. * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page. * Disassociate causes SIGBUS when touching the pages. * CPU pages are being mapped through to the process via remap_pfn_range instead of the more appropriate vm_insert_page, causing weird behaviors during disassociation. This series adds the missing VM_* flag manipulation, adds faulting a zero page for disassociation and revises the CPU page mappings to use vm_insert_page. ==================== For dependencies this branch is based on for-rc from git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git * branch 'rdma_mmap': RDMA: Remove rdma_user_mmap_page RDMA/mlx5: Use get_zeroed_page() for clock_info RDMA/ucontext: Fix regression with disassociate RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages RDMA/mlx5: Do not allow the user to write to the clock page Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24RDMA: Remove rdma_user_mmap_pageJason Gunthorpe
Upon further research drivers that want this should simply call the core function vm_insert_page(). The VMA holds a reference on the page and it will be automatically freed when the last reference drops. No need for disassociate to sequence the cleanup. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24IB/{rdmavt, qib, hfi1}: Use new routine to release reference countsMike Marciniszyn
The reference count adjustments on reference count completion are open coded throughout. Add a routine to do all reference count adjustments and use. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24IB/rdmavt: Fix ab/ba include issuesMike Marciniszyn
The currently include file ordering for rdmavt headers has an ab/ba include issue the precludes using inlines from rdma_vt.h in rdmavt_qp.h. At the heart of the issue is that rdma_vt.h includes rdmavt_qp.h. Fix the ordering issue by adjusting rdma_vt.h to not require rdmavt_qp.h and move qp related inlines to rdmavt_qp.h. Additionally, promote rvt_mmap_info to rdma_vt.h since it is shared by rdmavt_cq.h and rdmavt_qp.h. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24IB/{rdmavt, hfi1): Miscellaneous comment fixesKaike Wan
This patch fixes miscellaneous comment errors. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08RDMA: Handle SRQ allocations by IB/coreLeon Romanovsky
Convert SRQ allocation from drivers to be in the IB/core Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08RDMA: Handle AH allocations by IB/coreLeon Romanovsky
Simplify drivers by ensuring lifetime of ib_ah object. The changes in .create_ah() go hand in hand with relevant update in .destroy_ah(). We will use this opportunity and convert .destroy_ah() to don't fail, as it was suggested a long time ago, because there is nothing to do in case of failure during destroy. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08RDMA/core: Support object allocation in atomic contextLeon Romanovsky
AH objects are allocated in atomic context and those allocations should be done with GFP_ATOMIC. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08IB: When attrs.udata/ufile is available use that instead of uobjectJason Gunthorpe
The ucontext and ufile should not be accessed via the uobject, all these cases have an attrs so use that instead. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEsShiraz Saleem
Combine contiguous regions of PAGE_SIZE pages into single scatter list entry while building the scatter table for a umem. This minimizes the number of the entries in the scatter list and reduces the DMA mapping overhead, particularly with the IOMMU. Set default max_seg_size in core for IB devices to 2G and do not combine if we exceed this limit. Also, purge npages in struct ib_umem as we now DMA map the umem SGL with sg_nents and npage computation is not needed. Drivers should now be using ib_umem_num_pages(), so fix the last stragglers. Move npages tracking to ib_umem_odp as ODP drivers still need it. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Tested-by: Gal Pressman <galpress@amazon.com> Tested-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: Pass only ib_udata in function prototypesShamir Rabinovitch
Now when ib_udata is passed to all the driver's object create/destroy APIs the ib_udata will carry the ib_ucontext for every user command. There is no need to also pass the ib_ucontext via the functions prototypes. Make ib_udata the only argument psssed. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: Pass uverbs_attr_bundle down ib_x destroy pathShamir Rabinovitch
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x destroy path as ib_udata. The next patch will use the ib_udata to free the drivers destroy path from the dependency in 'uobject->context' as we already did for the create path. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: Pass uverbs_attr_bundle down uobject destroy pathShamir Rabinovitch
Pass uverbs_attr_bundle down the uobject destroy path. The next patch will use this to eliminate the dependecy of the drivers in ib_x->uobject pointers. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: ucontext should be set properly for all cmd & ioctl pathsShamir Rabinovitch
the Attempt to use the below commit to initialize the ucontext for the uobject destroy path has shown that the below commit is incomplete. Parts were reverted and the ucontext set up in the uverbs_attr_bundle was moved to rdma_lookup_get_uobject which is called from the uobj_get_XXX macros and rdma_alloc_begin_uobject which is called when uobject is created. Fixes: 3d9dfd060391 ("IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows") Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-29RDMA/core: Don't compare specific bit after boolean ANDLeon Romanovsky
There is no need to perform extra comparison after boolean AND. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28RDMA: Check net namespace access for uverbs, umad, cma and nldevParav Pandit
Introduce an API rdma_dev_access_netns() to check whether a rdma device can be accessed from the specified net namespace or not. Use rdma_dev_access_netns() while opening character uverbs, umad network device and also check while rdma cm_id binds to rdma device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28RDMA/core: Implement compat device/sysfs tree in net namespaceParav Pandit
Implement compatibility layer sysfs entries of ib_core so that non init_net net namespaces can also discover rdma devices. Each non init_net net namespace has ib_core_device created in it. Such ib_core_device sysfs tree resembles rdma devices found in init_net namespace. This allows discovering rdma devices in multiple non init_net net namespaces via sysfs entries and helpful to rdma-core userspace. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28RDMA/core: Introduce ib_core_device to hold deviceParav Pandit
In order to support sysfs entries in multiple net namespaces for a rdma device, introduce a ib_core_device whose scope is limited to hold core device and per port sysfs related entries. This is preparation patch so that multiple ib_core_devices in each net namespace can be created in subsequent patch who all can share ib_device. (a) Move sysfs specific fields to ib_core_device. (b) Make sysfs and device life cycle related routines to work on ib_core_device. (c) Introduce and use rdma_init_coredev() helper to initialize coredev fields. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25RDMA: Use __packed annotation instead of __attribute__ ((packed))Erez Alfasi
"__attribute__" set of macros has been standardized, have became more potentially portable and consistent code back in v2.6.21 by commit 82ddcb040 ("[PATCH] extend the set of "__attribute__" shortcut macros"). Moreover, nowadays checkpatch.pl warns about using __attribute__((packed)) instead of __packed. This patch converts all the "__attribute__ ((packed))" annotations to "__packed" within the RDMA subsystem. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22RDMA: Handle ucontext allocations by IB/coreLeon Romanovsky
Following the PD conversion patch, do the same for ucontext allocations. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK supportSteve Wise
Add support for new LINK messages to allow adding and deleting rdma interfaces. This will be used initially for soft rdma drivers which instantiate device instances dynamically by the admin specifying a netdev device to use. The rdma_rxe module will be the first user of these messages. The design is modeled after RTNL_NEWLINK/DELLINK: rdma drivers register with the rdma core if they provide link add/delete functions. Each driver registers with a unique "type" string, that is used to dispatch messages coming from user space. A new RDMA_NLDEV_ATTR is defined for the "type" string. User mode will pass 3 attributes in a NEWLINK message: RDMA_NLDEV_ATTR_DEV_NAME for the desired rdma device name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface to use for this link. The DELLINK message will contain the RDMA_NLDEV_ATTR_DEV_INDEX of the device to delete. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/rxe: Close a race after ib_register_deviceJason Gunthorpe
Since rxe allows unregistration from other threads the rxe pointer can become invalid any moment after ib_register_driver returns. This could cause a user triggered use after free. Add another driver callback to be called right after the device becomes registered to complete any device setup required post-registration. This callback has enough core locking to prevent the device from becoming unregistered. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/device: Provide APIs from the core code to help unregistrationJason Gunthorpe
These APIs are intended to support drivers that exist outside the usual driver core probe()/remove() callbacks. Normally the driver core will prevent remove() from running concurrently with probe(), once this safety is lost drivers need more support to get the locking and lifetimes right. ib_unregister_driver() is intended to be used during module_exit of a driver using these APIs. It unregisters all the associated ib_devices. ib_unregister_device_and_put() is to be used by a driver-specific removal function (ie removal by name, removal from a netdev notifier, removal from netlink) ib_unregister_queued() is to be used from netdev notifier chains where RTNL is held. The locking is tricky here since once things become async it is possible to race unregister with registration. This is largely solved by relying on the registration refcount, unregistration will only ever work on something that has a positive registration refcount - and then an unregistration mutex serializes all competing unregistrations of the same device. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/device: Add ib_device_get_by_netdev()Jason Gunthorpe
Several drivers need to find the ib_device from a given netdev. rxe needs this at speed in an unsleepable context, so choose to implement the translation using a RCU safe hash table. The hash table can have a many to one mapping. This is intended to support some future case where multiple IB drivers (ie iWarp and RoCE) connect to the same netdevs. driver_ids will need to be different to support this. In the process this makes the struct ib_device and ib_port_data RCU safe by deferring their kfrees. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdevJason Gunthorpe
The associated netdev should not actually be very dynamic, so for most drivers there is no reason for a callback like this. Provide an API to inform the core code about the net dev affiliation and use a core maintained data structure instead. This allows the core code to be more aware of the ndev relationship which will allow some new APIs based around this. This also uses locking that makes some kind of sense, many drivers had a confusing RCU lock, or missing locking which isn't right. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/cache: Move the cache per-port data into the main ib_port_dataJason Gunthorpe
Like the other cases there no real reason to have another array just for the cache. This larger conversion gets its own patch. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/device: Consolidate ib_device per_port data into one placeJason Gunthorpe
There is no reason to have three allocations of per-port data. Combine them together and make the lifetime for all the per-port data match the struct ib_device. Following patches will require more port-specific data, now there is a good place to put it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA: Add and use rdma_for_each_portJason Gunthorpe
We have many loops iterating over all of the end port numbers on a struct ib_device, simplify them with a for_each helper. Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-18RDMA/restrack: Hide restrack DB from IB/coreLeon Romanovsky
There is no need to expose internals of restrack DB to IB/core. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-18RDMA/restrack: Reduce scope of synchronization lock while updating DBLeon Romanovsky
XArray uses internal lock for updates to XArray. This means that our external RW lock is needed to ensure that entry is not deleted while we are performing iteration over list. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-18RDMA/restrack: Translate from ID to restrack objectLeon Romanovsky
Add new general helper to get restrack entry given by ID and their respective type. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-18RDMA/restrack: Convert internal DB from hash to XArrayLeon Romanovsky
The additions of .doit callbacks posses new access pattern to the resource entries by some user visible index. Back then, the legacy DB was implemented as hash because per-index access wasn't needed and XArray wasn't accepted yet. Acceptance of XArray together with per-index access requires the refresh of DB implementation. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIsShamir Rabinovitch
Now when we have the udata passed to all the ib_xxx object creation APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/verbs: Add helper function rdma_udata_to_drv_contextShamir Rabinovitch
Helper function to get driver's context out of ib_udata wrapped in uverbs_attr_bundle for user objects or NULL for kernel objects. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flowsShamir Rabinovitch
Add ib_ucontext to the uverbs_attr_bundle sent down the iocl and cmd flows as soon as the flow has ib_uobject. In addition, remove rdma_get_ucontext helper function that is only used by ib_umem_get. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-13Merge branch 'for-next' of ↵Doug Ledford
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma into for-next I had merged the hfi1-tid code into my local copy of for-next, but was waiting on 0day testing before pushing it (I pushed it to my wip branch). Having waited several days for 0day testing to show up, I'm finally just going to push it out. In the meantime, though, Jason pushed other stuff to for-next, so I needed to merge up the branches before pushing. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-09Merge branch 'wip/dl-for-next' into for-nextDoug Ledford
Due to concurrent work by myself and Jason, a normal fast forward merge was not possible. This brings in a number of hfi1 changes, mainly the hfi1 TID RDMA support (roughly 10,000 LOC change), which was reviewed and integrated over a period of days. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-08RDMA/devices: Re-organize device.c lockingJason Gunthorpe
The locking here started out with a single lock that covered everything and then has lately veered into crazy town. The fundamental problem is that several places need to iterate over a linked list, but also need to drop their locks to avoid deadlock during client callbacks. xarray's restartable iteration offers a simple solution to the problem. Once all the lists are xarrays we can drop locks in the places that need that and rely on xarray to provide consistency and locking for the data structure. The resulting simplification is that each of the three lists has a dedicated rwsem that must be held when working with the list it covers. One data structure is no longer covered by multiple locks. The sleeping semaphore is selected because the read side generally needs to be held over something sleeping, and using RCU reader locking in those cases is overkill. In the process this simplifies the entire registration/unregistration flow to be the expected list of setups and the reversed list of matching teardowns, and the registration lock 'refcount' can now be revised to be released after the ULPs are removed, providing a very sane semantic for this feature. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08RDMA/devices: Use xarray to store the client_dataJason Gunthorpe
Now that we have a small ID for each client we can use xarray instead of linearly searching linked lists for client data. This will give much faster and scalable client data lookup, and will lets us revise the locking scheme. Since xarray can store 'going_down' using a mark just entirely eliminate the struct ib_client_data and directly store the client_data value in the xarray. However this does require a special iterator as we must still iterate over any NULL client_data values. Also eliminate the client_data_lock in favour of internal xarray locking. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08RDMA/devices: Use xarray to store the clientsJason Gunthorpe
This gives each client a unique ID and will let us move client_data to use xarray, and revise the locking scheme. clients have to be add/removed in strict FIFO/LIFO order as they interdepend. To support this the client_ids are assigned to increase in FIFO order. The existing linked list is kept to support reverse iteration until xarray can get a reverse iteration API. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
2019-02-08RDMA/device: Get rid of reg_stateJason Gunthorpe
This really has no purpose anymore, refcount can be used to tell if the device is still registered. Keeping it around just invites mis-use. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>