summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-08-24RDMA/hns: Delete unused hns bitmap interfaceYangyang Li
The resources that use the hns bitmap interface: qp, cq, mr, pd, xrcd, uar, srq, have been changed to IDA interfaces, and the unused hns' own bitmap interfaces need to be deleted. Link: https://lore.kernel.org/r/1629336980-17499-4-git-send-email-liangwenpeng@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-24RDMA/hns: Use IDA interface to manage srq indexYangyang Li
Switch srq index allocation and release from hns' own bitmap interface to IDA interface. Link: https://lore.kernel.org/r/1629336980-17499-3-git-send-email-liangwenpeng@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-24RDMA/hns: Use IDA interface to manage uar indexYangyang Li
Switch uar index allocation and release from hns' own bitmap interface to IDA interface. Link: https://lore.kernel.org/r/1629336980-17499-2-git-send-email-liangwenpeng@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-23RDMA/hns: Ownerbit mode add control fieldLang Cheng
The ownerbit mode is for external card mode. Make it controlled by the firmware. Fixes: aba457ca890c ("RDMA/hns: Support owner mode doorbell") Link: https://lore.kernel.org/r/1629539607-33217-4-git-send-email-liangwenpeng@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-23RDMA/hns: Enable stash feature of HIP09Yixing Liu
The stash feature is enabled by default on HIP09. Fixes: f93c39bc9547 ("RDMA/hns: Add support for QP stash") Fixes: bfefae9f108d ("RDMA/hns: Add support for CQ stash") Link: https://lore.kernel.org/r/1629539607-33217-3-git-send-email-liangwenpeng@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-23RDMA/hns: Remove unsupport cmdq modeLang Cheng
CMDQ support un-interrupt mode only, and firmware ignores this mode flag, so remove it. Fixes: a04ff739f2a9 ("RDMA/hns: Add command queue support for hip08 RoCE driver") Link: https://lore.kernel.org/r/1629539607-33217-2-git-send-email-liangwenpeng@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-23RDMA: switch from 'pci_' to 'dma_' APIChristophe JAILLET
The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below. It has been hand modified to use 'dma_set_mask_and_coherent()' instead of 'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable. This is less verbose. It has been compile tested. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Link: https://lore.kernel.org/r/259e53b7a00f64bf081d41da8761b171b2ad8f5c.1629634798.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-23IB/core: Remove deprecated current_seq commentsLi Zhijian
current_seq was removed since the commit below. Fixes: 36f30e486dce ("IB/core: Improve ODP to use hmm_range_fault()") Link: https://lore.kernel.org/r/20210823035246.3506-1-lizhijian@cn.fujitsu.com Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-22RDMA/efa: Rename vector field in efa_irq struct to irqnGal Pressman
The vector field naming is quite confusing, it is better referred to as irqn. Link: https://lore.kernel.org/r/20210811151131.39138-4-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-22RDMA/efa: Remove unused cpu field from irq structGal Pressman
The cpu field in efa_irq struct is unused, remove it. Link: https://lore.kernel.org/r/20210811151131.39138-3-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-22RDMA/rtrs: Remove (void) casting for functionsGioh Kim
Casting to (void) does nothing, remove them. Link: https://lore.kernel.org/r/20210806112112.124313-7-haris.iqbal@ionos.com Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-22RDMA/rtrs-clt: Fix counting inflight IOGioh Kim
There are mis-match at counting inflight IO after changing the multipath policy. For example, we started fio test with round-robin policy and then we changed the policy to min-inflight. IOs created under the RR policy is finished under the min-inflight policy and inflight counter only decreased. So the counter would be negative value. And also we started fio test with min-inflight policy and changed the policy to the round-robin. IOs created under the min-inflight policy increased the inflight IO counter but the inflight IO counter was not decreased because the policy was the round-robin when IO was finished. So it should count IOs only if the IO is created under the min-inflight policy. It should not care the policy when the IO is finished. This patch adds a field mp_policy in struct rtrs_clt_io_req and stores the multipath policy when an object of rtrs_clt_io_req is created. Then rtrs-clt checks the mp_policy of only struct rtrs_clt_io_req instead of the struct rtrs_clt. Link: https://lore.kernel.org/r/20210806112112.124313-6-haris.iqbal@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-22RDMA/rtrs: Remove all likely and unlikelyGioh Kim
The IO performance test with fio after swapping the likely and unlikely macros in all if-statement shows no difference. They do not help for the performance of rtrs. Thanks to Haakon Bugge for the test scenario. The fio test did random read on 32 rnbd devices and 64 processes. Test environment: - Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz - 376G memory - kernel version: 5.4.86 - gcc version: gcc (Debian 8.3.0-6) 8.3.0 - Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] Test result: - before swapping: IOPS=829k, BW=3239MiB/s - after swapping: IOPS=829k, BW=3238MiB/s - remove all (un)likely: IOPS=829k, BW=3238MiB/s Link: https://lore.kernel.org/r/20210806112112.124313-5-haris.iqbal@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-22RDMA/rtrs: Remove unused functionsJack Wang
The two functions are unused, so just remove them. Link: https://lore.kernel.org/r/20210806112112.124313-3-haris.iqbal@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-22RDMA/rtrs-clt: During add_path change for_new_clt according to path_numMd Haris Iqbal
When all the paths are removed for a session, the addition of the first path is like a new session for the storage server. Hence, for_new_clt has to be set to 1. Link: https://lore.kernel.org/r/20210806112112.124313-2-haris.iqbal@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-22Merge branch 'mlx5-next' of ↵Jason Gunthorpe
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== This pulls mlx5-next branch which includes patches already reviewed on net-next and rdma mailing lists. 1) mlx5 single E-Switch FDB for lag 2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq 3) Add DCS caps & fields support We need this in net-next as multiple features are dependent on the single FDB feature. ==================== Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> * mellanox/mlx5-next: net/mlx5: Lag, Create shared FDB when in switchdev mode net/mlx5: E-Switch, add logic to enable shared FDB net/mlx5: Lag, move lag destruction to a workqueue net/mlx5: Lag, properly lock eswitch if needed net/mlx5: Add send to vport rules on paired device net/mlx5: E-Switch, Add event callback for representors net/mlx5e: Use shared mappings for restoring from metadata net/mlx5e: Add an option to create a shared mapping net/mlx5: E-Switch, set flow source for send to uplink rule RDMA/mlx5: Add shared FDB support {net, RDMA}/mlx5: Extend send to vport rules RDMA/mlx5: Fill port info based on the relevant eswitch net/mlx5: Lag, add initial logic for shared FDB net/mlx5: Return mdev from eswitch IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
2021-08-19RDMA/core/sa_query: Remove unused functionHåkon Bugge
ib_sa_service_rec_query() was introduced in kernel v2.6.13 by commit cbae32c56314 ("[PATCH] IB: Add Service Record support to SA client") in 2005. It was not used then and have never been used since. Removing it and related functions/structs. Link: https://lore.kernel.org/r/1628702736-12651-1-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19RDMA/qedr: Move variables reset to qedr_set_common_qp_params()Prabhakar Kushwaha
Qedr code is tightly coupled with existing both INIT transitions. Here, during first INIT transition all variables are reset and the RESET state is checked in post_recv() before any posting. Commit dc70f7c3ed34 ("RDMA/cma: Remove unnecessary INIT->INIT transition") exposed this bug. So moving variables reset to qedr_set_common_qp_params() and also avoid RESET state check for post_recv(). Link: https://lore.kernel.org/r/20210811051650.14914-1-pkushwaha@marvell.com Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19RDMA/hfi1: Stop using seq_get_buf in _driver_stats_seq_showChristoph Hellwig
Just use seq_write to copy the stats into the seq_file buffer instead of poking holes into the seq_file abstraction. Link: https://lore.kernel.org/r/20210810151711.1795374-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19RDMA/rtrs: Remove a useless kfree()Christophe JAILLET
'sess->rbufs' is known to be NULL here, so there is no point in kfree'ing it. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/9a57c9f837fa2c6f0070578a1bc4840688f62962.1628185335.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19RDMA/hns: Fix return in hns_roce_rereg_user_mr()YueHaibing
If re-registering an MR in hns_roce_rereg_user_mr(), we should return NULL instead of passing 0 to ERR_PTR for clarity. Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process") Link: https://lore.kernel.org/r/20210804125939.20516-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-05net/mlx5: Lag, Create shared FDB when in switchdev modeMark Bloch
If both eswitches are in switchdev mode and the uplink representors are enslaved to the same bond device create a shared FDB configuration. When moving to shared FDB mode not only the hardware needs be configured but the RDMA driver needs to reconfigure itself. When such change is done, unload the RDMA devices, configure the hardware and load the RDMA representors. When destroying the lag (can happen if a PCI function is unbinded, driver is unloaded or by just removing a netdev from the bond) make sure to restore the system to the previous state only if possible. For example, if a PCI function is unbinded there is no need to load the representors as the device is going away. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: E-Switch, add logic to enable shared FDBMark Bloch
Shared FDB allows to direct traffic from all the vports in the HCA to a single eswitch. In order to do that three things are needed. 1) Point the ingress ACL of the slave uplink to that of the master. With this, wire traffic from both uplinks will reach the same eswitch with the same metadata where a single steering rule can catch traffic from both ports. 2) Set the FDB root flow table of the slave's eswitch to that of the master. As this flow table can change dynamically make sure to sync it on any set root flow table FDB command. This will make sure traffic from SFs, VFs, ECPFs and PFs reach the master eswitch. 3) Split wire traffic at the eswitch manager egress ACL so that it's directed to the native eswitch manager. We only treat wire traffic from both ports the same at the eswitch level. If such traffic wasn't handled in the eswitch it needs to reach the right representor to be processed by software. For example LACP packets should *always* reach the right uplink representor for correct operation. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: Lag, move lag destruction to a workqueueMark Bloch
If a netdev is removed from the lag the lag should be destroyed. With downstream patches this might trigger a reconfiguration of representors on a different eswitch and such we don't have the proper locking to so from this path. Move the destruction to be done by the workqueue. As the destruction won't affect the netdev side it okay to do so. The RDMA side will be reconfigured and it already coded to handle such reconfiguration. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: Lag, properly lock eswitch if neededMark Bloch
Currently when doing hardware lag we check the eswitch mode but as this isn't done under a lock the check isn't valid. As the code needs to sync between two different devices an extra care is needed. - When going to change eswitch mode, if hardware lag is active destroy it. - While changing eswitch modes block any hardware bond creation. - Delay handling bonding events until there are no mode changes in progress. - When attaching a new mdev to lag, block until there is no mode change in progress. In order for the mode change to finish the interface lock will have to be taken. Release the lock and sleep for 100ms to allow forward progress. As this is a very rare condition (can happen if the user unbinds and binds a PCI function while also changing eswitch mode of the other PCI function) it has no real world impact. As taking multiple eswitch mode locks is now required lockdep will complain about a possible deadlock. Register a key per eswitch to make lockdep happy. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: Add send to vport rules on paired deviceMark Bloch
When two mlx5 devices are paired in switchdev mode, always offload the send-to-vport rule to the peer E-Switch. This allows to abstract the logic when this is really necessary (single FDB) and combine the logic of both cases into one. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: E-Switch, Add event callback for representorsMark Bloch
This callback will allow to notify representors about relevant events when in OFFLOADS mode. In downstream patches, this will be used to notify about PAIR/UNPAIR devcom events. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5e: Use shared mappings for restoring from metadataRoi Dayan
FTEs are added with mapped metadata which is saved per eswitch. When uplink reps are bonded and we are in a single FDB mode, we could fail to find metadata which was stored on one eswitch mapping but not the other or with a different id. To resolve this issue use shared mapping between eswitch ports. We do not have any conflict using a single mapping, for a type, between the ports. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5e: Add an option to create a shared mappingRoi Dayan
The shared mapping is identified by an id and type. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: E-Switch, set flow source for send to uplink ruleAriel Levkovich
Set the flow source param to local vport for the uplink rep send-to-vport rule. This will comply with the recent changes in SW steering that use the flow source as an indication for the rule type - rx or tx. Since the uplink send-to-vport rule is forwarding traffic to the wire it has to indicate that it is an sx rule and can't use the any port value in the flow source. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05RDMA/mlx5: Add shared FDB supportMark Bloch
Shared FDB allows to create a single RDMA device that holds representors from both eswitches. As shared FDB is only active when both uplink representors are enslaved there is a single RDMA port that represents both uplinks. The number of ports is the number of vports on both eswitches minus one as we only need 1 port for both uplinks. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05{net, RDMA}/mlx5: Extend send to vport rulesMark Bloch
In shared FDB there is only one eswitch which is active and it receives traffic from all representors and all vports in the HCA. While the Ethernet representor will always reside on its native PF the IB representor will not. Extend send to vport rule creation to support such flows. Need to account for source vport that sends the traffic (on which the representors resides) and the target eswitch the traffic which reach. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05RDMA/mlx5: Fill port info based on the relevant eswitchMark Bloch
In shared FDB a single RDMA device can have representors that are connected to two different eswitches. Use the right eswitch when preparing the response to userspace. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: Lag, add initial logic for shared FDBMark Bloch
As shared FDB requires changes in two subsystems first expose the needed core functions so the RDMA side can be changed. mlx5_lag_is_master(): return true if a given mlx5 device is the lag master. mlx5_lag_is_shared_fdb(): Returns true if the lag mode is shared FDB. mlx5_lag_get_peer_mdev(): Return the peer mdev in lag. The mentioned functions will be used by downstream patches in order to add support for shared FDB for the RDMA side. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: Return mdev from eswitchMark Bloch
Export a function so users can retrieve the mellanox device that manages the eswitch from the eswitch device. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-03RDMA/core: Create clean QP creations interface for uverbsLeon Romanovsky
Unify create QP creation interface to make clean approach to create XRC_TGT and regular QPs. Link: https://lore.kernel.org/r/5cd50e7d8ad9112545a1a61dea62799a5cb3224a.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/core: Properly increment and decrement QP usecntsLeon Romanovsky
The QP usecnts were incremented through QP attributes structure while decreased through QP itself. Rely on the ib_creat_qp_user() code that initialized all QP parameters prior returning to the user and increment exactly like destroy does. Link: https://lore.kernel.org/r/25d256a3bb1fc480b77d7fe439817b993de48610.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/core: Configure selinux QP during creationLeon Romanovsky
All QP creation flows called ib_create_qp_security(), but differently. This caused to the need to provide exclusion conditions for the XRC_TGT, because such QP already had selinux configuration call. In order to fix it, move ib_create_qp_security() to the general QP creation routine. Link: https://lore.kernel.org/r/4d7cd6f5828aca37fb62283e6b126b73ab86b18c.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/core: Reorganize create QP low-level functionsLeon Romanovsky
The low-level create QP function grew to be larger than any sensible inline function should be. The inline attribute is not really needed for that function and can be implemented as exported symbol. Link: https://lore.kernel.org/r/2c08709d86f876c3dfb77684357b2a939e570ca4.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/core: Remove protection from wrong in-kernel API usageLeon Romanovsky
The ib_create_named_qp() is kernel verb that is not used for user supplied attributes. In such case, it is ULP responsibility to provide valid QP attributes. In-kernel API shouldn't check it, exactly like other functions that don't check device capabilities. Link: https://lore.kernel.org/r/b9b9e981d1af148b750750196e686199dbbf61f8.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/core: Delete duplicated and unreachable codeLeon Romanovsky
The ib_create_named_qp() is kernel verb and no kernel users exist that use XRC_INI QP. Hence such QP path is not reachable. In addition, delete duplicated assignments of QP attributes from the initialization structure. Link: https://lore.kernel.org/r/1b4c0d1def5f8f6d26839e14d19da950cc4a0b05.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/mlx5: Delete not-available udata checkLeon Romanovsky
XRC_TGT QPs are created through kernel verbs and don't have udata at all. Fixes: 6eefa839c4dd ("RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata") Fixes: e383085c2425 ("RDMA/mlx5: Set ECE options during QP create") Link: https://lore.kernel.org/r/b68228597e730675020aa5162745390a2d39d3a2.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/mlx5: Drop in-driver verbs object creationsLeon Romanovsky
There is no real value in bypassing IB/core APIs for creating standard objects with standard types. The open-coded variant didn't have any restrack task management calls and caused to such objects to be not present when running rdmatoool. Link: https://lore.kernel.org/r/f745590e5fb7d56f90fdb25f64ee3983ba17e1e4.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA: Globally allocate and release QP memoryLeon Romanovsky
Convert QP object to follow IB/core general allocation scheme. That change allows us to make sure that restrack properly kref the memory. Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com Reviewed-by: Gal Pressman <galpress@amazon.com> #efa Tested-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/rdmavt: Decouple QP and SGE lists allocationsLeon Romanovsky
The rdmavt QP has fields that are both needed for the control and data path. Such mixed declaration caused to the very specific allocation flow with kzalloc_node and SGE list embedded into the struct rvt_qp. This patch separates QP creation to two: regular memory allocation for the control path and specific code for the SGE list, while the access to the later is performed through derefenced pointer. Such pointer and its context are expected to be in the cache, so performance difference is expected to be negligible, if any exists. Link: https://lore.kernel.org/r/f66c1e20ccefba0db3c69c58ca9c897f062b4d1c.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/mlx5: Rework custom driver QP type creationLeon Romanovsky
Starting from commit 2b1f747071c5 ("RDMA/core: Allow drivers to disable restrack DB") the restrack is able to handle non-standard QP types either. That change allows us to rewrite custom QP calls to their IB/core counterparts, so we will use general QP creation flow even for the driver QP types. Link: https://lore.kernel.org/r/51682ab82298748941f38bd23ee3bf77ef1cab7b.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/mlx5: Delete device resource mutex that didn't protect anythingLeon Romanovsky
The dev->devr.mutex was intended to protect GSI QP pointer change in the struct mlx5_ib_port_resources when it is accessed from the pkey_change_work. However that pointer isn't changed during the runtime and once IB/core adds MAD, it stays stable. Link: https://lore.kernel.org/r/6e338c561033df20d92e1371fc6a7a0d93aad945.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/mlx5: Cancel pkey work before destroying device resourcesLeon Romanovsky
In the driver release flow, we are ensuring that notifier is disabled and no new works can be added to pkey_change_handler. It means that we can cancel that handler before destroying resources to make sure that our unwind routine is symmetrical to the allocation one. Link: https://lore.kernel.org/r/f2b1ea1bad952e4e7a48a6f731de9e0344986b29.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/efa: Remove double QP type assignmentLeon Romanovsky
The QP type is set by the IB/core and shouldn't be set in the driver. Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation") Link: https://lore.kernel.org/r/838c40134c1590167b888ca06ad51071139ff2ae.1627040189.git.leonro@nvidia.com Acked-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/hns: Don't overwrite supplied QP attributesLeon Romanovsky
QP attributes that were supplied by IB/core already have all parameters set when they are passed to the driver. The drivers are not supposed to change anything in struct ib_qp_init_attr. Fixes: 66d86e529dd5 ("RDMA/hns: Add UD support for HIP09") Link: https://lore.kernel.org/r/5987138875e8ade9aa339d4db6e1bd9694ed4591.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>