summaryrefslogtreecommitdiff
path: root/drivers/infiniband/sw/rxe/rxe_comp.c
AgeCommit message (Collapse)Author
2025-06-08treewide, timers: Rename from_timer() to timer_container_of()Ingo Molnar
Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2024-04-22RDMA/rxe: Don't call direct between tasksBob Pearson
Replace calls to rxe_run_task() with rxe_sched_task(). This prevents the tasks from all running on the same cpu. This change slightly reduces performance for single qp send and write benchmarks in loopback mode but greatly improves the performance with multiple qps because if run task is used all the work tends to be performed on one cpu. For actual on the wire benchmarks there is no noticeable performance change. Link: https://lore.kernel.org/r/20240329145513.35381-11-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Don't call rxe_requester from rxe_completerBob Pearson
Instead of rescheduling rxe_requester from rxe_completer() just extend the duration of rxe_sender() by one pass. Setting run_requester_again forces rxe_completer() to return 0 which will cause rxe_sender() to be called at least one more time. Link: https://lore.kernel.org/r/20240329145513.35381-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Merge request and complete tasksBob Pearson
Currently the rxe driver has three work queue tasks per qp. These are the req.task, comp.task and resp.task which call rxe_requester(), rxe_completer() and rxe_responder() respectively directly or on work queues. Each of these subroutines checks to see if there is work to be performed on the send queue or on the response packet queue or the request packet queue and will run until there is no work remaining or yield the cpu and reschedule itself until there is no work remaining. This commit combines the req.task and comp.task into a single send.task and renames the resp.task to the recv.task. The combined send.task calls rxe_requester() and rxe_completer() serially and continues until all work on both the send queue and the response packet queue are done. In various benchmarks the performance is either improved or left the same. At high scale there is a significant reduction in the load on the cpu. This is the first step in combining these two tasks. Once they are serialized cross rescheduling of req.task and comp.task can be more efficiently handled by just letting the send.task continue to run. This will be done in the next several patches. Link: https://lore.kernel.org/r/20240329145513.35381-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Fix seg fault in rxe_comp_queue_pktBob Pearson
In rxe_comp_queue_pkt() an incoming response packet skb is enqueued to the resp_pkts queue and then a decision is made whether to run the completer task inline or schedule it. Finally the skb is dereferenced to bump a 'hw' performance counter. This is wrong because if the completer task is already running in a separate thread it may have already processed the skb and freed it which can cause a seg fault. This has been observed infrequently in testing at high scale. This patch fixes this by changing the order of enqueuing the packet until after the counter is accessed. Link: https://lore.kernel.org/r/20240329145513.35381-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: 0b1e5b99a48b ("IB/rxe: Add port protocol stats") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-01-25RDMA/rxe: Improve newline in printing messagesLi Zhijian
Previously rxe_{dbg,info,err}() macros are appended built-in newline, but some users will add redundant newline sometimes. So remove the built-in newline for these macros. In terms of rxe_{dbg,info,err}_xxx() macros, because they don't have built-in newline, append newline when using them. CC: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20240109083253.3629967-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-07-31RDMA/rxe: Fix unsafe drain work queue codeBob Pearson
If create_qp does not fully succeed it is possible for qp cleanup code to attempt to drain the send or recv work queues before the queues have been created causing a seg fault. This patch checks to see if the queues exist before attempting to drain them. Link: https://lore.kernel.org/r/20230620135519.9365-3-rpearsonhpe@gmail.com Reported-by: syzbot+2da1965168e7dbcba136@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-rdma/00000000000012d89205fe7cfe00@google.com/raw Fixes: 49dc9c1f0c7e ("RDMA/rxe: Cleanup reset state handling in rxe_resp.c") Fixes: fbdeb828a21f ("RDMA/rxe: Cleanup error state handling in rxe_comp.c") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-27Merge tag 'v6.4' into rdma.git for-nextJason Gunthorpe
Linux 6.4 Resolve conflicts between rdma rc and next in rxe_cq matching linux-next: drivers/infiniband/sw/rxe/rxe_cq.c: https://lore.kernel.org/r/20230622115246.365d30ad@canb.auug.org.au Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-19RDMA/rxe: Fix comments about removed taskletsDaisuke Matsuda
The commit 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks") removed tasklets and replaced them with a workqueue, but relevant comments are still remaining in the source code. Fixes: 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks") Link: https://lore.kernel.org/r/20230518070027.942715-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-16RDMA/rxe: Convert spin_{lock_bh,unlock_bh} to ↵Guoqing Jiang
spin_{lock_irqsave,unlock_irqrestore} We need to call spin_lock_irqsave()/spin_unlock_irqrestore() for state_lock in rxe, otherwsie the callchain: ib_post_send_mad -> spin_lock_irqsave -> ib_post_send -> rxe_post_send -> spin_lock_bh -> spin_unlock_bh -> spin_unlock_irqrestore Causes below traces during run block nvmeof-mp/001 test due to mismatched spinlock nesting: WARNING: CPU: 0 PID: 94794 at kernel/softirq.c:376 __local_bh_enable_ip+0xc2/0x140 [ ... ] CPU: 0 PID: 94794 Comm: kworker/u4:1 Tainted: G E 6.4.0-rc1 #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Workqueue: rdma_cm cma_work_handler [rdma_cm] RIP: 0010:__local_bh_enable_ip+0xc2/0x140 Code: 48 85 c0 74 72 5b 41 5c 5d 31 c0 89 c2 89 c1 89 c6 89 c7 41 89 c0 e9 bd 0e 11 01 65 8b 05 f2 65 72 48 85 c0 0f 85 76 ff ff ff <0f> 0b e9 6f ff ff ff e8 d2 39 1c 00 eb 80 4c 89 e7 e8 68 ad 0a 00 RSP: 0018:ffffb7cf818539f0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000201 RDI: ffffffffc0f25f79 RBP: ffffb7cf81853a00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc0f25f79 R13: ffff8db1f0fa6000 R14: ffff8db2c63ff000 R15: 00000000000000e8 FS: 0000000000000000(0000) GS:ffff8db33bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559758db0f20 CR3: 0000000105124000 CR4: 00000000003506f0 Call Trace: <TASK> _raw_spin_unlock_bh+0x31/0x40 rxe_post_send+0x59/0x8b0 [rdma_rxe] ib_send_mad+0x26b/0x470 [ib_core] ib_post_send_mad+0x150/0xb40 [ib_core] ? cm_form_tid+0x5b/0x90 [ib_cm] ib_send_cm_req+0x7c8/0xb70 [ib_cm] rdma_connect_locked+0x433/0x940 [rdma_cm] nvme_rdma_cm_handler+0x5d7/0x9c0 [nvme_rdma] cma_cm_event_handler+0x4f/0x170 [rdma_cm] cma_work_handler+0x6a/0xe0 [rdma_cm] process_one_work+0x2a9/0x580 worker_thread+0x52/0x3f0 ? __pfx_worker_thread+0x10/0x10 kthread+0x109/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 0 PID: 94794 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x37/0x60 [ ... ] CPU: 0 PID: 94794 Comm: kworker/u4:1 Tainted: G W E 6.4.0-rc1 #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Workqueue: rdma_cm cma_work_handler [rdma_cm] RIP: 0010:warn_bogus_irq_restore+0x37/0x60 Code: fb 01 77 36 83 e3 01 74 0e 48 8b 5d f8 c9 31 f6 89 f7 e9 ac ea 01 00 48 c7 c7 e0 52 33 b9 c6 05 bb 1c 69 01 01 e8 39 24 f0 fe <0f> 0b 48 8b 5d f8 c9 31 f6 89 f7 e9 89 ea 01 00 0f b6 f3 48 c7 c7 RSP: 0018:ffffb7cf81853a58 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb7cf81853a60 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8db2cfb1a9e8 R13: ffff8db2cfb1a9d8 R14: ffff8db2c63ff000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8db33bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559758db0f20 CR3: 0000000105124000 CR4: 00000000003506f0 Call Trace: <TASK> _raw_spin_unlock_irqrestore+0x91/0xa0 ib_send_mad+0x1e3/0x470 [ib_core] ib_post_send_mad+0x150/0xb40 [ib_core] ? cm_form_tid+0x5b/0x90 [ib_cm] ib_send_cm_req+0x7c8/0xb70 [ib_cm] rdma_connect_locked+0x433/0x940 [rdma_cm] nvme_rdma_cm_handler+0x5d7/0x9c0 [nvme_rdma] cma_cm_event_handler+0x4f/0x170 [rdma_cm] cma_work_handler+0x6a/0xe0 [rdma_cm] process_one_work+0x2a9/0x580 worker_thread+0x52/0x3f0 ? __pfx_worker_thread+0x10/0x10 kthread+0x109/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Fixes: f605f26ea196 ("RDMA/rxe: Protect QP state with qp->state_lock") Link: https://lore.kernel.org/r/20230510035056.881196-1-guoqing.jiang@linux.dev Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17RDMA/rxe: Protect QP state with qp->state_lockBob Pearson
Currently the rxe driver makes little effort to make the changes to qp state (which includes qp->attr.qp_state, qp->attr.sq_draining and qp->valid) atomic between different client threads and IO threads. In particular a common template is for an RDMA application to call ib_modify_qp() to move a qp to ERR state and then wait until all the packet and work queues have drained before calling ib_destroy_qp(). None of these state changes are protected by locks to assure that the changes are executed atomically and that memory barriers are included. This has been observed to lead to incorrect behavior around qp cleanup. This patch continues the work of the previous patches in this series and adds locking code around qp state changes and lookups. Link: https://lore.kernel.org/r/20230405042611.6467-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17RDMA/rxe: Move code to check if drained to subroutineBob Pearson
Move two blocks of code in rxe_comp.c and rxe_req.c to subroutines that check if draining is complete in the SQD state and, if so, generate a SQ_DRAINED event. Link: https://lore.kernel.org/r/20230405042611.6467-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17RDMA/rxe: Remove qp->req.stateBob Pearson
The rxe driver has four different QP state variables, qp->attr.qp_state, qp->req.state, qp->comp.state, and qp->resp.state. All of these basically carry the same information. This patch replaces uses of qp->req.state by qp->attr.qp_state and enum rxe_qp_state. This is the third of three patches which will remove all but the qp->attr.qp_state variable. This will bring the driver closer to the IBA description. Link: https://lore.kernel.org/r/20230405042611.6467-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17RDMA/rxe: Remove qp->comp.stateBob Pearson
The rxe driver has four different QP state variables, qp->attr.qp_state, qp->req.state, qp->comp.state, and qp->resp.state. All of these basically carry the same information. This patch replaces uses of qp->comp.state by qp->attr.qp_state. This is the second of three patches which will remove all but the qp->attr.qp_state variable. This will bring the driver closer to the IBA description. Link: https://lore.kernel.org/r/20230405042611.6467-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Make tasks schedule each otherBob Pearson
Replace rxe_run_task() by rxe_sched_task() when tasks call each other. These are not performance critical and mainly involve error paths but they run the risk of causing deadlocks. Link: https://lore.kernel.org/r/20230304174533.11296-8-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Remove qp reference counting in tasksBob Pearson
Currently each of the three tasklets requester, completer and responder in the rxe driver take and release a reference to the qp argument at the beginning and end of the subroutines. The caller passing in the qp argument should be responsible for holding a reference to qp so these are not required. Further doing so breaks the qp cleanup code in rxe_qp_do_cleanup which calls these routines after all the references have been dropped so they cannot drain the packet and work request queues as intended. In fact if these routines are deferred by calling tasklet_schedule there is no guarantee that the calling code does have a qp reference. That is a bug in rxe_task.c which will be fixed later in this series. Link: https://lore.kernel.org/r/20230304174533.11296-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Cleanup error state handling in rxe_comp.cBob Pearson
Cleanup the handling of qp in the error state, reset state and during rxe_qp_do_cleanup. Make the same as rxe_resp.c Link: https://lore.kernel.org/r/20230304174533.11296-5-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Convert tasklet args to queue pairsBob Pearson
Originally is was thought that the tasklet machinery in rxe_task.c would be used in other applications but that has not happened for years. This patch replaces the 'void *arg' by struct 'rxe_qp *qp' in the parameters to the tasklet calls. This change will have no affect on performance but may make the code a little clearer. Link: https://lore.kernel.org/r/20230304174533.11296-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Add error messagesBob Pearson
This patch adds error and debug messages so that every interaction with rdma-core through a verbs API call or a completion error return will generate at least one error message backed up by debug messages with more detail. With dynamic debugging one can follow up after seeing an error message by turning on the appropriate debug messages. Link: https://lore.kernel.org/r/20230303221623.8053-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Implement flush completionLi Zhijian
Per IBA SPEC, FLUSH will ack in rdma read response with 0 length. Use IB_WC_FLUSH (aka IB_UVERBS_WC_FLUSH) code to tell userspace a FLUSH completion. Link: https://lore.kernel.org/r/20221206130201.30986-9-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01RDMA/rxe: Implement atomic write completionXiao Yang
Generate an atomic write completion when the atomic write request has been finished. Link: https://lore.kernel.org/r/1669905568-62-3-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_comp.cBob Pearson
Replace calls to pr_xxx() in rxe_comp.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/rxe: Split rxe_run_task() into two subroutinesBob Pearson
Split rxe_run_task(task, sched) into rxe_run_task(task) and rxe_sched_task(task). Link: https://lore.kernel.org/r/20221021200118.2163-5-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-25RDMA/rxe: Handle remote errors in the midst of a Read reply sequenceDaisuke Matsuda
Requesting nodes do not handle a reported error correctly if it is generated in the middle of multi-packet Read responses, and the node tries to resend the request endlessly. Let completer terminate the connection in that case. Link: https://lore.kernel.org/r/20221013014724.3786212-2-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-02RDMA/rxe: Split qp state for requester and completerBob Pearson
Currently the requester can continue to process send wqes after an local qp operation error is detected because the setting of the qp state to the error state is deferred until later. This patch splits the qp state for the completer and requester into two separate states and sets qp->req.state = QP_STATE_ERROR as soon as the error is detected before another wqe can be executed. Link: https://lore.kernel.org/r/1658307368-1851-4-git-send-email-lizhijian@fujitsu.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22RDMA/rxe: Make the tasklet exits the sameBob Pearson
Make changes to the three tasklets so that the exit logic from each is the same. This makes the code easier to understand. Link: https://lore.kernel.org/r/20220630190425.2251-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22RDMA/rxe: Fix rnr retry behaviorBob Pearson
Currently the completer tasklet when retransmit timer or the rnr timer fires the same flag (qp->req.need_retry) is set so that if either timer fires it will attempt to perform a retry flow on the send queue. This has the effect of responding to an RNR NAK at the first retransmit timer event which might not allow the requested rnr timeout. This patch adds a new flag (qp->req.wait_for_rnr_timer) which, if set, prevents a retry flow until the rnr nak timer fires. This patch fixes rnr retry errors which can be observed by running the pyverbs test_rdmacm_async_traffic_external_qp multiple times. With this patch applied they do not occur. Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/ Link: https://lore.kernel.org/linux-rdma/2bafda9e-2bb6-186d-12a1-179e8f6a2678@talpey.com/ Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220630190425.2251-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-09RDMA/rxe: Check rxe_get() return valueBob Pearson
In the tasklets (completer, responder, and requester) check the return value from rxe_get() to detect failures to get a reference. This only occurs if the qp has had its reference count drop to zero which indicates that it no longer should be used. The ref is never 0 today because the tasklets are flushed before the ref is dropped. The next patch changes this so that the ref is dropped then the tasklets are flushed. Link: https://lore.kernel.org/r/20220421014042.26985-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-03-16RDMA/rxe: Use standard names for ref countingBob Pearson
Rename rxe_add_ref() to rxe_get() and rxe_drop_ref() to rxe_put(). Significantly improves readability for new readers. Link: https://lore.kernel.org/r/20220304000808.225811-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19RDMA/rxe: Replace irqsave locks with bh locksBob Pearson
Most of the locks in the rxe driver are _irqsave/restore locks but in fact there are no interrupt threads that run rxe code or share data with rxe. There are softirq threads and data sharing so the appropriate lock type is _bh. This patch replaces all irqsave type locks with bh type locks. Link: https://lore.kernel.org/r/20211103050241.61293-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-06RDMA/rxe: Set partial attributes when completion status != IBV_WC_SUCCESSXiao Yang
As ibv_poll_cq()'s manual said, only partial attributes are valid when completion status != IBV_WC_SUCCESS. Link: https://lore.kernel.org/r/20210930094813.226888-4-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-24RDMA/rxe: Add memory barriers to kernel queuesBob Pearson
Earlier patches added memory barriers to protect user space to kernel space communications. The user space queues were previously shown to have occasional memory synchonization errors which were removed by adding smp_load_acquire, smp_store_release barriers. This patch extends that to the case where queues are used between kernel space threads. This patch also extends the queue types to include kernel ULP queues which access the other end of the queues in kernel verbs calls like poll_cq and post_send/recv. Link: https://lore.kernel.org/r/20210914164206.19768-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16RDMA/rxe: Move ICRC generation to a subroutineBob Pearson
Isolate ICRC generation into a single subroutine named rxe_generate_icrc() in rxe_icrc.c. Remove scattered crc generation code from elsewhere. Link: https://lore.kernel.org/r/20210707040040.15434-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22Merge tag 'v5.13-rc7' into rdma.git for-nextJason Gunthorpe
Linux 5.13-rc7 Needed for dependencies in following patches. Merge conflict in rxe_cmop.c resolved by compining both patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/rxe: Implement invalidate MW operationsBob Pearson
Implement invalidate MW and cleaned up invalidate MR operations. Added code to perform remote invalidate for send with invalidate. Added code to perform local invalidation. Deleted some blank lines in rxe_loc.h. Link: https://lore.kernel.org/r/20210608042552.33275-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/rxe: Add support for bind MW work requestsBob Pearson
Add support for bind MW work requests from user space. Since rdma/core does not support bind mw in ib_send_wr there is no way to support bind mw in kernel space. Added bind_mw local operation in rxe_req.c. Added bind_mw WR operation in rxe_opcode.c. Added bind_mw WC in rxe_comp.c. Added additional fields to rxe_mw in rxe_verbs.h. Added rxe_do_dealloc_mw() subroutine to cleanup an mw when rxe_dealloc_mw is called. Added code to implement bind_mw operation in rxe_mw.c Link: https://lore.kernel.org/r/20210608042552.33275-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-03RDMA/rxe: Protext kernel index from user spaceBob Pearson
In order to prevent user space from modifying the index that belongs to the kernel for shared queues let the kernel use a local copy of the index and copy any new values of that index to the shared rxe_queue_bus struct. This adds more switch statements which decreases the performance of the queue API. Move the type into the parameter list for these functions so that the compiler can optimize out the switch statements when the explicit type is known. Modify all the calls in the driver on performance paths to pass in the explicit queue type. Link: https://lore.kernel.org/r/20210527194748.662636-4-rpearsonhpe@gmail.com Link: https://lore.kernel.org/linux-rdma/20210526165239.GP1002214@@nvidia.com/ Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-17RDMA/rxe: Return CQE error if invalid lkey was suppliedLeon Romanovsky
RXE is missing update of WQE status in LOCAL_WRITE failures. This caused the following kernel panic if someone sent an atomic operation with an explicitly wrong lkey. [leonro@vm ~]$ mkt test test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ... WARNING: CPU: 5 PID: 263 at drivers/infiniband/sw/rxe/rxe_comp.c:740 rxe_completer+0x1a6d/0x2e30 [rdma_rxe] Modules linked in: crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core ptp pps_core CPU: 5 PID: 263 Comm: python3 Not tainted 5.13.0-rc1+ #2936 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:rxe_completer+0x1a6d/0x2e30 [rdma_rxe] Code: 03 0f 8e 65 0e 00 00 3b 93 10 06 00 00 0f 84 82 0a 00 00 4c 89 ff 4c 89 44 24 38 e8 2d 74 a9 e1 4c 8b 44 24 38 e9 1c f5 ff ff <0f> 0b e9 0c e8 ff ff b8 05 00 00 00 41 bf 05 00 00 00 e9 ab e7 ff RSP: 0018:ffff8880158af090 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888016a78000 RCX: ffffffffa0cf1652 RDX: 1ffff9200004b442 RSI: 0000000000000004 RDI: ffffc9000025a210 RBP: dffffc0000000000 R08: 00000000ffffffea R09: ffff88801617740b R10: ffffed1002c2ee81 R11: 0000000000000007 R12: ffff88800f3b63e8 R13: ffff888016a78008 R14: ffffc9000025a180 R15: 000000000000000c FS: 00007f88b622a740(0000) GS:ffff88806d540000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f88b5a1fa10 CR3: 000000000d848004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rxe_do_task+0x130/0x230 [rdma_rxe] rxe_rcv+0xb11/0x1df0 [rdma_rxe] rxe_loopback+0x157/0x1e0 [rdma_rxe] rxe_responder+0x5532/0x7620 [rdma_rxe] rxe_do_task+0x130/0x230 [rdma_rxe] rxe_rcv+0x9c8/0x1df0 [rdma_rxe] rxe_loopback+0x157/0x1e0 [rdma_rxe] rxe_requester+0x1efd/0x58c0 [rdma_rxe] rxe_do_task+0x130/0x230 [rdma_rxe] rxe_post_send+0x998/0x1860 [rdma_rxe] ib_uverbs_post_send+0xd5f/0x1220 [ib_uverbs] ib_uverbs_write+0x847/0xc80 [ib_uverbs] vfs_write+0x1c5/0x840 ksys_write+0x176/0x1d0 do_syscall_64+0x3f/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/11e7b553f3a6f5371c6bb3f57c494bb52b88af99.1620711734.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/rxe: Fix missing acks from responderBob Pearson
All responder errors from request packets that do not consume a receive WQE fail to generate acks for RC QPs. This patch corrects this behavior by making the flow follow the same path as request packets that do consume a WQE after the completion. Link: https://lore.kernel.org/r/20210402001016.3210-1-rpearson@hpe.com Link: https://lore.kernel.org/linux-rdma/1a7286ac-bcea-40fb-2267-480134dd301b@gmail.com/ Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-30RDMA/rxe: Split MEM into MR and MWBob Pearson
In the original rxe implementation it was intended to use a common object to represent MRs and MWs but they are different enough to separate these into two objects. This allows replacing the mem name with mr for MRs which is more consistent with the style for the other objects and less likely to be confusing. This is a long patch that mostly changes mem to mr where it makes sense and adds a new rxe_mw struct. Link: https://lore.kernel.org/r/20210325212425.2792-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-05RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()Bob Pearson
In rxe_comp.c in rxe_completer() the function free_pkt() did not clear skb which triggered a warning at 'done:' and could possibly at 'exit:'. The WARN_ONCE() calls are not actually needed. The call to free_pkt() is moved to the end to clearly show that all skbs are freed. Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()") Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-08RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()Bob Pearson
rxe_udp_encap_recv() drops the reference to rxe->ib_dev taken by rxe_get_dev_from_net() which should be held until each received skb is freed. This patch moves the calls to ib_device_put() to each place a received skb is freed. It also takes references to the ib_device for each cloned skb created to process received multicast packets. Fixes: 4c173f596b3f ("RDMA/rxe: Use ib_device_get_by_netdev() instead of open coding") Link: https://lore.kernel.org/r/20210128233318.2591-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31Merge tag 'v5.9-rc3' into rdma.git for-nextJason Gunthorpe
Required due to dependencies in following patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/rxe: Add SPDX hdrs to rxe source filesBob Pearson
Add SPDX headers to all rxe .c and .h files. Link: https://lore.kernel.org/r/20200827145439.2273-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/rxe: Fix style warningsBob Pearson
Fixed several minor checkpatch warnings in existing rxe source. Link: https://lore.kernel.org/r/20200820224638.3212-3-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-02-13RDMA/rxe: Fix soft lockup problem due to using tasklets in softirqZhu Yanjun
When run stress tests with RXE, the following Call Traces often occur watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0] ... Call Trace: <IRQ> create_object+0x3f/0x3b0 kmem_cache_alloc_node_trace+0x129/0x2d0 __kmalloc_reserve.isra.52+0x2e/0x80 __alloc_skb+0x83/0x270 rxe_init_packet+0x99/0x150 [rdma_rxe] rxe_requester+0x34e/0x11a0 [rdma_rxe] rxe_do_task+0x85/0xf0 [rdma_rxe] tasklet_action_common.isra.21+0xeb/0x100 __do_softirq+0xd0/0x298 irq_exit+0xc5/0xd0 smp_apic_timer_interrupt+0x68/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> ... The root cause is that tasklet is actually a softirq. In a tasklet handler, another softirq handler is triggered. Usually these softirq handlers run on the same cpu core. So this will cause "soft lockup Bug". Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-07ibverbs/rxe: Remove variable self-initializationMaksym Planeta
In some cases (not in this particular one) variable self-initialization can lead to undefined behavior. In this case, it is just obscure code. Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21IB/rxe: Remove unnecessary rxe variableZhu Yanjun
The variable rxe in the function is not used. So it is removed. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-08RDMA/rxe: Add link_down, rdma_sends, rdma_recvs stats countersAndrew Boyer
link_down is self-explanatory. rdma_sends and rdma_recvs count the number of RDMA Send and RDMA Receive operations completed successfully. This is different from the existing sent_pkts and rcvd_pkts counters because the existing counters measure packets, not RDMA operations. ack_deffered is renamed to ack_deferred to fix the spelling. out_of_sequence is renamed to out_of_seq_request to make clear that it is counting only requests and not other packets which can be out of sequence. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>