summaryrefslogtreecommitdiff
path: root/include/linux/sunrpc/svc_rdma.h
AgeCommit message (Collapse)Author
2020-07-13svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()Chuck Lever
First, refactor: Dereference the svc_rdma_send_ctxt inside svc_rdma_send() instead of at every call site. Then, it can be passed into trace_svcrdma_post_send() to get the proper completion ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Introduce Send completion IDsChuck Lever
Set up a completion ID in each svc_rdma_send_ctxt. The ID is used to match an incoming Send completion to a transport and to a previous ib_post_send(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Introduce Receive completion IDsChuck Lever
Set up a completion ID in each svc_rdma_recv_ctxt. The ID is used to match an incoming Receive completion to a transport and to a previous ib_post_recv(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Remove declarations for functions long removedChuck Lever
Pavane pour une infante défunte. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Make svc_rdma_send_error_msg() a global functionChuck Lever
Prepare for svc_rdma_send_error_msg() to be invoked from another source file. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18svcrdma: Remove the SVCRDMA_DEBUG macroChuck Lever
Clean up: Commit d21b05f101ae ("rdma: SVCRMDA Header File") introduced the SVCRDMA_DEBUG macro, but it doesn't seem to have been used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18svcrdma: Fix backchannel return codeChuck Lever
Way back when I was writing the RPC/RDMA server-side backchannel code, I misread the TCP backchannel reply handler logic. When svc_tcp_recvfrom() successfully receives a backchannel reply, it does not return -EAGAIN. It sets XPT_DATA and returns zero. Update svc_rdma_recvfrom() to return zero. Here, XPT_DATA doesn't need to be set again: it is set whenever a new message is received, behind a spin lock in a single threaded context. Also, if handling the cb reply is not successful, the message is simply dropped. There's no special message framing to deal with as there is in the TCP case. Now that the handle_bc_reply() return value is ignored, I've removed the dprintk call sites in the error exit of handle_bc_reply() in favor of trace points in other areas that already report the error cases. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-04-17svcrdma: Fix leak of svc_rdma_recv_ctxt objectsChuck Lever
Utilize the xpo_release_rqst transport method to ensure that each rqstp's svc_rdma_recv_ctxt object is released even when the server cannot return a Reply for that rqstp. Without this fix, each RPC whose Reply cannot be sent leaks one svc_rdma_recv_ctxt. This is a 2.5KB structure, a 4KB DMA-mapped Receive buffer, and any pages that might be part of the Reply message. The leak is infrequent unless the network fabric is unreliable or Kerberos is in use, as GSS sequence window overruns, which result in connection loss, are more common on fast transports. Fixes: 3a88092ee319 ("svcrdma: Preserve Receive buffer until svc_rdma_sendto") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16svcrdma: Avoid DMA mapping small RPC RepliesChuck Lever
On some platforms, DMA mapping part of a page is more costly than copying bytes. Indeed, not involving the I/O MMU can help the RPC/RDMA transport scale better for tiny I/Os across more RDMA devices. This is because interaction with the I/O MMU is eliminated for each of these small I/Os. Without the explicit unmapping, the NIC no longer needs to do a costly internal TLB shoot down for buffers that are just a handful of bytes. Since pull-up is now a more a frequent operation, I've introduced a trace point in the pull-up path. It can be used for debugging or user-space tools that count pull-up frequency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16svcrdma: Fix double sync of transport header bufferChuck Lever
Performance optimization: Avoid syncing the transport buffer twice when Reply buffer pull-up is necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16svcrdma: Refactor chunk list encodersChuck Lever
Same idea as the receive-side changes I did a while back: use xdr_stream helpers rather than open-coding the XDR chunk list encoders. This builds the Reply transport header from beginning to end without backtracking. As additional clean-ups, fill in documenting comments for the XDR encoders and sprinkle some trace points in the new encoding functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16svcrdma: Update synopsis of svc_rdma_map_reply_msg()Chuck Lever
Preparing for subsequent patches, no behavior change expected. Pass the RPC Call's svc_rdma_recv_ctxt deeper into the sendto() path. This enables passing more information about Requester- provided Write and Reply chunks into those lower-level functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16svcrdma: Update synopsis of svc_rdma_send_reply_chunk()Chuck Lever
Preparing for subsequent patches, no behavior change expected. Pass the RPC Call's svc_rdma_recv_ctxt deeper into the sendto() path. This enables passing more information about Requester- provided Write and Reply chunks into the lower-level send functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16svcrdma: De-duplicate code that locates Write and Reply chunksChuck Lever
Cache the locations of the Requester-provided Write list and Reply chunk so that the Send path doesn't need to parse the Call header again. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16svcrdma: Use struct xdr_stream to decode ingress transport headersChuck Lever
The logic that checks incoming network headers has to be scrupulous. De-duplicate: replace open-coded buffer overflow checks with the use of xdr_stream helpers that are used most everywhere else XDR decoding is done. One minor change to the sanity checks: instead of checking the length of individual segments, cap the length of the whole chunk to be sure it can fit in the set of pages available in rq_pages. This should be a better test of whether the server can handle the chunks in each request. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16nfsd: Fix NFSv4 READ on RDMA when using readvChuck Lever
svcrdma expects that the payload falls precisely into the xdr_buf page vector. This does not seem to be the case for nfsd4_encode_readv(). This code is called only when fops->splice_read is missing or when RQ_SPLICE_OK is clear, so it's not a noticeable problem in many common cases. Add new transport method: ->xpo_read_payload so that when a READ payload does not fit exactly in rq_res's page vector, the XDR encoder can inform the RPC transport exactly where that payload is, without the payload's XDR pad. That way, when a Write chunk is present, the transport knows what byte range in the Reply message is supposed to be matched with the chunk. Note that the Linux NFS server implementation of NFS/RDMA can currently handle only one Write chunk per RPC-over-RDMA message. This simplifies the implementation of this fix. Fixes: b04209806384 ("nfsd4: allow exotic read compounds") Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053 Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2019-08-19svcrdma: Use llist for managing cache of recv_ctxtsChuck Lever
Use a wait-free mechanism for managing the svc_rdma_recv_ctxts free list. Subsequently, sc_recv_lock can be eliminated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19svcrdma: Remove svc_rdma_wqChuck Lever
Clean up: the system workqueue will work just as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: remove unused xpo_prep_reply_hdr callbackVasily Averin
xpo_prep_reply_hdr are not used now. It was defined for tcp transport only, however it cannot be called indirectly, so let's move it to its caller and remove unused callback. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-28svcrdma: Optimize the logic that selects the R_key to invalidateChuck Lever
o Select the R_key to invalidate while the CPU cache still contains the received RPC Call transport header, rather than waiting until we're about to send the RPC Reply. o Choose Send With Invalidate if there is exactly one distinct R_key in the received transport header. If there's more than one, the client will have to perform local invalidation after it has already waited for remote invalidation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29svcrdma: Increase the default connection credit limitChuck Lever
Reduce queuing on clients by allowing more credits by default. 64 is the default NFSv4.1 slot table size on Linux clients. This size prevents the credit limit from putting RPC requests to sleep again after they have already slept waiting for a session slot. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Remove unused svc_rdma_op_ctxtChuck Lever
Clean up: Eliminate a structure that is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Persistently allocate and DMA-map Send buffersChuck Lever
While sending each RPC Reply, svc_rdma_sendto allocates and DMA- maps a separate buffer where the RPC/RDMA transport header is constructed. The buffer is unmapped and released in the Send completion handler. This is significant per-RPC overhead, especially for small RPCs. Instead, allocate and DMA-map a buffer, and cache it in each svc_rdma_send_ctxt. This buffer and its mapping can be re-used for each RPC, saving the cost of memory allocation and DMA mapping. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Remove post_send_wrChuck Lever
Clean up: Now that the send_wr is part of the svc_rdma_send_ctxt, svc_rdma_post_send_wr is nearly empty. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxtChuck Lever
Receive buffers are always the same size, but each Send WR has a variable number of SGEs, based on the contents of the xdr_buf being sent. While assembling a Send WR, keep track of the number of SGEs so that we don't exceed the device's maximum, or walk off the end of the Send SGE array. For now the Send path just fails if it exceeds the maximum. The current logic in svc_rdma_accept bases the maximum number of Send SGEs on the largest NFS request that can be sent or received. In the transport layer, the limit is actually based on the capabilities of the underlying device, not on properties of the Upper Layer Protocol. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Introduce svc_rdma_send_ctxtChuck Lever
svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt free list. This eliminates the overhead of calling kmalloc / kfree, both of which grab a globally shared lock that disables interrupts. Introduce a replacement to svc_rdma_op_ctxt's that is built especially for the svcrdma Send path. Subsequent patches will take advantage of this new structure by allocating real resources which are then cached in these objects. The allocations are freed when the transport is torn down. I've renamed the structure so that static type checking can be used to ensure that uses of op_ctxt and send_ctxt are not confused. As an additional clean up, structure fields are renamed to conform with kernel coding conventions. Additional clean ups: - Handle svc_rdma_send_ctxt_get allocation failure at each call site, rather than pre-allocating and hoping we guessed correctly - All send_ctxt_put call-sites request page freeing, so remove the @free_pages argument - All send_ctxt_put call-sites unmap SGEs, so fold that into svc_rdma_send_ctxt_put Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Clean up Send SGE accountingChuck Lever
Clean up: Since there's already a svc_rdma_op_ctxt being passed around with the running count of mapped SGEs, drop unneeded parameters to svc_rdma_post_send_wr(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Refactor svc_rdma_dma_map_bufChuck Lever
Clean up: svc_rdma_dma_map_buf does mostly the same thing as svc_rdma_dma_map_page, so let's fold these together. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Allocate recv_ctxt's on CPU handling ReceivesChuck Lever
There is a significant latency penalty when processing an ingress Receive if the Receive buffer resides in memory that is not on the same NUMA node as the the CPU handling completions for a CQ. The system administrator and the device driver determine which CPU handles completions. This CPU does not change during life of the CQ. Further the Upper Layer does not have any visibility of which CPU it is. Allocating Receive buffers in the Receive completion handler guarantees that Receive buffers are allocated on the preferred NUMA node for that CQ. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Persistently allocate and DMA-map Receive buffersChuck Lever
The current Receive path uses an array of pages which are allocated and DMA mapped when each Receive WR is posted, and then handed off to the upper layer in rqstp::rq_arg. The page flip releases unused pages in the rq_pages pagelist. This mechanism introduces a significant amount of overhead. So instead, kmalloc the Receive buffer, and leave it DMA-mapped while the transport remains connected. This confers a number of benefits: * Each Receive WR requires only one receive SGE, no matter how large the inline threshold is. This helps the server-side NFS/RDMA transport operate on less capable RDMA devices. * The Receive buffer is left allocated and mapped all the time. This relieves svc_rdma_post_recv from the overhead of allocating and DMA-mapping a fresh buffer. * svc_rdma_wc_receive no longer has to DMA unmap the Receive buffer. It has to DMA sync only the number of bytes that were received. * svc_rdma_build_arg_xdr no longer has to free a page in rq_pages for each page in the Receive buffer, making it a constant-time function. * The Receive buffer is now plugged directly into the rq_arg's head[0].iov_vec, and can be larger than a page without spilling over into rq_arg's page list. This enables simplification of the RDMA Read path in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Simplify svc_rdma_recv_ctxt_putChuck Lever
Currently svc_rdma_recv_ctxt_put's callers have to know whether they want to free the ctxt's pages or not. This means the human developers have to know when and why to set that free_pages argument. Instead, the ctxt should carry that information with it so that svc_rdma_recv_ctxt_put does the right thing no matter who is calling. We want to keep track of the number of pages in the Receive buffer separately from the number of pages pulled over by RDMA Read. This is so that the correct number of pages can be freed properly and that number is well-documented. So now, rc_hdr_count is the number of pages consumed by head[0] (ie., the page index where the Read chunk should start); and rc_page_count is always the number of pages that need to be released when the ctxt is put. The @free_pages argument is no longer needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Remove sc_rq_depthChuck Lever
Clean up: No need to retain rq_depth in struct svcrdma_xprt, it is used only in svc_rdma_accept(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Introduce svc_rdma_recv_ctxtChuck Lever
svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt free list. This eliminates the overhead of calling kmalloc / kfree, both of which grab a globally shared lock that disables interrupts. To reduce contention further, separate the use of these objects in the Receive and Send paths in svcrdma. Subsequent patches will take advantage of this separation by allocating real resources which are then cached in these objects. The allocations are freed when the transport is torn down. I've renamed the structure so that static type checking can be used to ensure that uses of op_ctxt and recv_ctxt are not confused. As an additional clean up, structure fields are renamed to conform with kernel coding conventions. As a final clean up, helpers related to recv_ctxt are moved closer to the functions that use them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11svcrdma: Add proper SPDX tags for NetApp-contributed sourceChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-03-20svcrdma: Consult max_qp_init_rd_atom when accepting connectionsChuck Lever
The target needs to return the lesser of the client's Inbound RDMA Read Queue Depth (IRD), provided in the connection parameters, and the local device's Outbound RDMA Read Queue Depth (ORD). The latter limit is max_qp_init_rd_atom, not max_qp_rd_atom. The svcrdma_ord value caps the ORD value for iWARP transports, which do not exchange ORD/IRD values at connection time. Since no other Linux kernel RDMA-enabled storage target sees fit to provide this cap, I'm removing it here too. initiator_depth is a u8, so ensure the computed ORD value does not overflow that field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-01-18svcrdma: Post Receives in the Receive completion handlerChuck Lever
This change improves Receive efficiency by posting Receives only on the same CPU that handles Receive completion. Improved latency and throughput has been noted with this change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Clean up after converting svc_rdma_recvfrom to rdma_rw APIChuck Lever
Clean up: Registration mode details are now handled by the rdma_rw API, and thus can be removed from svcrdma. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Remove frmr cacheChuck Lever
Clean up: Now that the svc_rdma_recvfrom path uses the rdma_rw API, the details of Read sink buffer registration are dealt with by the kernel's RDMA core. This cache is no longer used, and can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Remove unused Read completion handlersChuck Lever
Clean up: The generic RDMA R/W API conversion of svc_rdma_recvfrom replaced the Register, Read, and Invalidate completion handlers. Remove the old ones, which are no longer used. These handlers shared some helper code with svc_rdma_wc_send. Fold the wc_common helper back into the one remaining completion handler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Use generic RDMA R/W API in RPC Call pathChuck Lever
The current svcrdma recvfrom code path has a lot of detail about registration mode and the type of port (iWARP, IB, etc). Instead, use the RDMA core's generic R/W API. This shares code with other RDMA-enabled ULPs that manages the gory details of buffer registration and the posting of RDMA Read Work Requests. Since the Read list marshaling code is being replaced, I took the opportunity to replace C structure-based XDR encoding code with more portable code that uses pointer arithmetic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Add recvfrom helpers to svc_rdma_rw.cChuck Lever
svc_rdma_rw.c already contains helpers for the sendto path. Introduce helpers for the recvfrom path. The plan is to replace the local NFSD bespoke code that constructs and posts RDMA Read Work Requests with calls to the rdma_rw API. This shares code with other RDMA-enabled ULPs that manages the gory details of buffer registration and posting Work Requests. This new code also puts all RDMA_NOMSG-specific logic in one place. Lastly, the use of rqstp->rq_arg.pages is deprecated in favor of using rqstp->rq_pages directly, for clarity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-06-28svcrdma: Remove svc_rdma_marshal.cChuck Lever
svc_rdma_marshal.c has one remaining exported function -- svc_rdma_xdr_decode_req -- and it has a single call site. Take the same approach as the sendto path, and move this function into the source file where it is called. This is a refactoring change only. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25svcrdma: Clean out old XDR encodersChuck Lever
Clean up: These have been replaced and are no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25svcrdma: Remove the req_map cacheChuck Lever
req_maps are no longer used by the send path and can thus be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25svcrdma: Remove unused RDMA Write completion handlerChuck Lever
Clean up. All RDMA Write completions are now handled by svc_rdma_wc_write_ctx. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25svcrdma: Reduce size of sge array in struct svc_rdma_op_ctxtChuck Lever
The sge array in struct svc_rdma_op_ctxt is no longer used for sending RDMA Write WRs. It need only accommodate the construction of Send and Receive WRs. The maximum inline size is the largest payload it needs to handle now. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25svcrdma: Clean up RPC-over-RDMA backchannel reply processingChuck Lever
Replace C structure-based XDR decoding with pointer arithmetic. Pointer arithmetic is considered more portable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25svcrdma: Clean up RDMA_ERROR pathChuck Lever
Now that svc_rdma_sendto has been renovated, svc_rdma_send_error can be refactored to reduce code duplication and remove C structure- based XDR encoding. It is also relocated to the source file that contains its only caller. This is a refactoring change only. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25svcrdma: Use rdma_rw API in RPC reply pathChuck Lever
The current svcrdma sendto code path posts one RDMA Write WR at a time. Each of these Writes typically carries a small number of pages (for instance, up to 30 pages for mlx4 devices). That means a 1MB NFS READ reply requires 9 ib_post_send() calls for the Write WRs, and one for the Send WR carrying the actual RPC Reply message. Instead, use the new rdma_rw API. The details of Write WR chain construction and memory registration are taken care of in the RDMA core. svcrdma can focus on the details of the RPC-over-RDMA protocol. This gives three main benefits: 1. All Write WRs for one RDMA segment are posted in a single chain. As few as one ib_post_send() for each Write chunk. 2. The Write path can now use FRWR to register the Write buffers. If the device's maximum page list depth is large, this means a single Write WR is needed for each RPC's Write chunk data. 3. The new code introduces support for RPCs that carry both a Write list and a Reply chunk. This combination can be used for an NFSv4 READ where the data payload is large, and thus is removed from the Payload Stream, but the Payload Stream is still larger than the inline threshold. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25svcrdma: Introduce local rdma_rw API helpersChuck Lever
The plan is to replace the local bespoke code that constructs and posts RDMA Read and Write Work Requests with calls to the rdma_rw API. This shares code with other RDMA-enabled ULPs that manages the gory details of buffer registration and posting Work Requests. Some design notes: o The structure of RPC-over-RDMA transport headers is flexible, allowing multiple segments per Reply with arbitrary alignment, each with a unique R_key. Write and Send WRs continue to be built and posted in separate code paths. However, one whole chunk (with one or more RDMA segments apiece) gets exactly one ib_post_send and one work completion. o svc_xprt reference counting is modified, since a chain of rdma_rw_ctx structs generates one completion, no matter how many Write WRs are posted. o The current code builds the transport header as it is construct- ing Write WRs. I've replaced that with marshaling of transport header data items in a separate step. This is because the exact structure of client-provided segments may not align with the components of the server's reply xdr_buf, or the pages in the page list. Thus parts of each client-provided segment may be written at different points in the send path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>