summaryrefslogtreecommitdiff
path: root/net/sunrpc/xprtrdma
AgeCommit message (Collapse)Author
2019-01-02xprtrdma: Reduce max_frwr_depthChuck Lever
Some devices advertise a large max_fast_reg_page_list_len capability, but perform optimally when MRs are significantly smaller than that depth -- probably when the MR itself is no larger than a page. By default, the RDMA R/W core API uses max_sge_rd as the maximum page depth for MRs. For some devices, the value of max_sge_rd is 1, which is also not optimal. Thus, when max_sge_rd is larger than 1, use that value. Otherwise use the value of the max_fast_reg_page_list_len attribute. I've tested this with CX-3 Pro, FastLinq, and CX-5 devices. It reproducibly improves the throughput of large I/Os by several percent. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Fix ri_max_segs and the result of ro_maxpagesChuck Lever
With certain combinations of krb5i/p, MR size, and r/wsize, I/O can fail with EMSGSIZE. This is because the calculated value of ri_max_segs (the max number of MRs per RPC) exceeded RPCRDMA_MAX_HDR_SEGS, which caused Read or Write list encoding to walk off the end of the transport header. Once that was addressed, the ro_maxpages result has to be corrected to account for the number of MRs needed for Reply chunks, which is 2 MRs smaller than a normal Read or Write chunk. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Don't wake pending tasks until disconnect is doneChuck Lever
Transport disconnect processing does a "wake pending tasks" at various points. Suppose an RPC Reply is being processed. The RPC task that Reply goes with is waiting on the pending queue. If a disconnect wake-up happens before reply processing is done, that reply, even if it is good, is thrown away, and the RPC has to be sent again. This window apparently does not exist for socket transports because there is a lock held while a reply is being received which prevents the wake-up call until after reply processing is done. To resolve this, all RPC replies being processed on an RPC-over-RDMA transport have to complete before pending tasks are awoken due to a transport disconnect. Callers that already hold the transport write lock may invoke ->ops->close directly. Others use a generic helper that schedules a close when the write lock can be taken safely. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: No qp_event disconnectChuck Lever
After thinking about this more, and auditing other kernel ULP imple- mentations, I believe that a DISCONNECT cm_event will occur after a fatal QP event. If that's the case, there's no need for an explicit disconnect in the QP event handler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueueChuck Lever
To address a connection-close ordering problem, we need the ability to drain the RPC completions running on rpcrdma_receive_wq for just one transport. Give each transport its own RPC completion workqueue, and drain that workqueue when disconnecting the transport. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Refactor Receive accountingChuck Lever
Clean up: Divide the work cleanly: - rpcrdma_wc_receive is responsible only for RDMA Receives - rpcrdma_reply_handler is responsible only for RPC Replies - the posted send and receive counts both belong in rpcrdma_ep Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Ensure MRs are DMA-unmapped when posting LOCAL_INV failsChuck Lever
The recovery case in frwr_op_unmap_sync needs to DMA unmap each MR. frwr_release_mr does not DMA-unmap, but the recycle worker does. Fixes: 61da886bf74e ("xprtrdma: Explicitly resetting MRs is ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Yet another double DMA-unmapChuck Lever
While chasing yet another set of DMAR fault reports, I noticed that the frwr recycler conflates whether or not an MR has been DMA unmapped with frwr->fr_state. Actually the two have only an indirect relationship. It's in fact impossible to guess reliably whether the MR has been DMA unmapped based on its fr_state field, especially as the surrounding code and its assumptions have changed over time. A better approach is to track the DMA mapping status explicitly so that the recycler is less brittle to unexpected situations, and attempts to DMA-unmap a second time are prevented. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-27sunrpc: remove unused xpo_prep_reply_hdr callbackVasily Averin
xpo_prep_reply_hdr are not used now. It was defined for tcp transport only, however it cannot be called indirectly, so let's move it to its caller and remove unused callback. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: remove svc_rdma_bc_classVasily Averin
Remove svc_xprt_class svc_rdma_bc_class and related functions. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: remove unused bc_up operation from rpc_xprt_opsVasily Averin
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: replace svc_serv->sv_bc_xprt by boolean flagVasily Averin
svc_serv-> sv_bc_xprt is netns-unsafe and cannot be used as pointer. To prevent its misuse in future it is replaced by new boolean flag. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-12RDMA: Start use ib_device_opsKamal Heib
Make all the required change to start use the ib_device_ops structure. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-28svcrdma: Optimize the logic that selects the R_key to invalidateChuck Lever
o Select the R_key to invalidate while the CPU cache still contains the received RPC Call transport header, rather than waiting until we're about to send the RPC Reply. o Choose Send With Invalidate if there is exactly one distinct R_key in the received transport header. If there's more than one, the client will have to perform local invalidation after it has already waited for remote invalidation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-30Merge tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Olga added support for the NFSv4.2 asynchronous copy protocol. We already supported COPY, by copying a limited amount of data and then returning a short result, letting the client resend. The asynchronous protocol should offer better performance at the expense of some complexity. The other highlight is Trond's work to convert the duplicate reply cache to a red-black tree, and to move it and some other server caches to RCU. (Previously these have meant taking global spinlocks on every RPC) Otherwise, some RDMA work and miscellaneous bugfixes" * tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux: (30 commits) lockd: fix access beyond unterminated strings in prints nfsd: Fix an Oops in free_session() nfsd: correctly decrement odstate refcount in error path svcrdma: Increase the default connection credit limit svcrdma: Remove try_module_get from backchannel svcrdma: Remove ->release_rqst call in bc reply handler svcrdma: Reduce max_send_sges nfsd: fix fall-through annotations knfsd: Improve lookup performance in the duplicate reply cache using an rbtree knfsd: Further simplify the cache lookup knfsd: Simplify NFS duplicate replay cache knfsd: Remove dead code from nfsd_cache_lookup SUNRPC: Simplify TCP receive code SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock SUNRPC: Remove non-RCU protected lookup NFS: Fix up a typo in nfs_dns_ent_put NFS: Lockless DNS lookups knfsd: Lockless lookup of NFSv4 identities. SUNRPC: Lockless server RPCSEC_GSS context lookup knfsd: Allow lockless lookups of the exports ...
2018-10-29svcrdma: Remove try_module_get from backchannelChuck Lever
Since commit ffe1f0df5862 ("rpcrdma: Merge svcrdma and xprtrdma modules into one"), the forward and backchannel components are part of the same kernel module. A separate try_module_get() call in the backchannel code is no longer necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29svcrdma: Remove ->release_rqst call in bc reply handlerChuck Lever
Similar to a change made in the client's forward channel reply handler: The xprt_release_rqst_cong() call is not necessary. Also, release xprt->recv_lock when taking xprt->transport_lock to avoid disabling and enabling BH's while holding another spin lock. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29svcrdma: Reduce max_send_sgesChuck Lever
There's no need to request a large number of send SGEs because the inline threshold already constrains the number of SGEs per Send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-18Merge tag 'nfs-rdma-for-4.20-1' of ↵Trond Myklebust
git://git.linux-nfs.org/projects/anna/linux-nfs NFS RDMA client updates for Linux 4.20 Stable bugfixes: - Reset credit grant properly after a disconnect Other bugfixes and cleanups: - xprt_release_rqst_cong is called outside of transport_lock - Create more MRs at a time and toss out old ones during recovery - Various improvements to the RDMA connection and disconnection code: - Improve naming of trace events, functions, and variables - Add documenting comments - Fix metrics and stats reporting - Fix a tracepoint sparse warning Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-10-03xprtrdma: Clean up xprt_rdma_disconnect_injectChuck Lever
Clean up: Use the appropriate C macro instead of open-coding container_of() . Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Add documenting commentsChuck Lever
Clean up: fill in or update documenting comments for transport switch entry points. For xprt_rdma_allocate: The first paragraph is no longer true since commit 5a6d1db45569 ("SUNRPC: Add a transport-specific private field in rpc_rqst"). The second paragraph is no longer true since commit 54cbd6b0c6b9 ("xprtrdma: Delay DMA mapping Send and Receive buffers"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Report when there were zero posted ReceivesChuck Lever
To show that a caller did attempt to allocate and post more Receive buffers, the trace point in rpcrdma_post_recvs() should report when rpcrdma_post_recvs() was invoked but no new Receive buffers were posted. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Move rb_flags initializationChuck Lever
Clean up: rb_flags might be used for other things besides RPCRDMA_BUF_F_EMPTY_SCQ, so initialize it in a generic spot instead of in a send-completion-queue-related helper. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Don't disable BH's in backchannel serverChuck Lever
Clean up: This code was copied from xprtsock.c and backchannel_rqst.c. For rpcrdma, the backchannel server runs exclusively in process context, thus disabling bottom-halves is unnecessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Remove memory address of "ep" from an error messageChuck Lever
Clean up: Replace the hashed memory address of the target rpcrdma_ep with the server's IP address and port. The server address is more useful in an administrative error message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Rename rpcrdma_qp_async_error_upcallChuck Lever
Clean up: Use a function name that is consistent with the RDMA core API and with other consumers. Because this is a function that is invoked from outside the rpcrdma.ko module, add an appropriate documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Simplify RPC wake-ups on connectChuck Lever
Currently, when a connection is established, rpcrdma_conn_upcall invokes rpcrdma_conn_func and then wake_up_all(&ep->rep_connect_wait). The former wakes waiting RPCs, but the connect worker is not done yet, and that leads to races, double wakes, and difficulty understanding how this logic is supposed to work. Instead, collect all the "connection established" logic in the connect worker (xprt_rdma_connect_worker). A disconnect worker is retained to handle provider upcalls safely. Fixes: 254f91e2fa1f ("xprtrdma: RPC/RDMA must invoke ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Re-organize the switch() in rpcrdma_conn_upcallChuck Lever
Clean up: Eliminate the FALLTHROUGH into the default arm to make the switch easier to understand. Also, as long as I'm here, do not display the memory address of the target rpcrdma_ep. A hashed memory address is of marginal use here. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Eliminate "connstate" variable from rpcrdma_conn_upcall()Chuck Lever
Clean up. Since commit 173b8f49b3af ("xprtrdma: Demote "connect" log messages") there has been no need to initialize connstat to zero. In fact, in this code path there's now no reason not to set rep_connected directly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Conventional variable names in rpcrdma_conn_upcallChuck Lever
Clean up: The convention throughout other parts of xprtrdma is to name variables of type struct rpcrdma_xprt "r_xprt", not "xprt". This convention enables the use of the name "xprt" for a "struct rpc_xprt" type variable, as in other parts of the RPC client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Rename rpcrdma_conn_upcallChuck Lever
Clean up: Use a function name that is consistent with the RDMA core API and with other consumers. Because this is a function that is invoked from outside the rpcrdma.ko module, add an appropriate documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02sunrpc: Report connect_time in secondsChuck Lever
The way connection-oriented transports report connect_time is wrong: it's supposed to be in seconds, not in jiffies. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02sunrpc: Fix connect metricsChuck Lever
For TCP, the logic in xprt_connect_status is currently never invoked to record a successful connection. Commit 2a4919919a97 ("SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending") changed the way TCP xprt's are awoken after a connect succeeds. Instead, change connection-oriented transports to bump connect_count and compute connect_time the moment that XPRT_CONNECTED is set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Name MR trace events consistentlyChuck Lever
Clean up the names of trace events related to MRs so that it's easy to enable these with a glob. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Explicitly resetting MRs is no longer necessaryChuck Lever
When a memory operation fails, the MR's driver state might not match its hardware state. The only reliable recourse is to dereg the MR. This is done in ->ro_recover_mr, which then attempts to allocate a fresh MR to replace the released MR. Since commit e2ac236c0b651 ("xprtrdma: Allocate MRs on demand"), xprtrdma dynamically allocates MRs. It can add more MRs whenever they are needed. That makes it possible to simply release an MR when a memory operation fails, instead of "recovering" it. It will automatically be replaced by the on-demand MR allocator. This commit is a little larger than I wanted, but it replaces ->ro_recover_mr, rb_recovery_lock, rb_recovery_worker, and the rb_stale_mrs list with a generic work queue. Since MRs are no longer orphaned, the mrs_orphaned metric is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Create more MRs at a timeChuck Lever
Some devices require more than 3 MRs to build a single 1MB I/O. Ensure that rpcrdma_mrs_create() will add enough MRs to build that I/O. In a subsequent patch I'm changing the MR recovery logic to just toss out the MRs. In that case it's possible for ->send_request to loop acquiring some MRs, not getting enough, getting called again, recycling the previous MRs, then not getting enough, lather rinse repeat. Thus first we need to ensure enough MRs are created to prevent that loop. I'm "reusing" ia->ri_max_segs. All of its accessors seem to want the maximum number of data segments plus two, so I'm going to bake that into the initial calculation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Reset credit grant properly after a disconnectChuck Lever
On a fresh connection, an RPC/RDMA client is supposed to send only one RPC Call until it gets a credit grant in the first RPC Reply from the server [RFC 8166, Section 3.3.3]. There is a bug in the Linux client's credit accounting mechanism introduced by commit e7ce710a8802 ("xprtrdma: Avoid deadlock when credit window is reset"). On connect, it simply dumps all pending RPC Calls onto the new connection. Servers have been tolerant of this bad behavior. Currently no server implementation ever changes its credit grant over reconnects, and servers always repost enough Receives before connections are fully established. To correct this issue, ensure that the client resets both the credit grant _and_ the congestion window when handling a reconnect. Fixes: e7ce710a8802 ("xprtrdma: Avoid deadlock when credit ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: xprt_release_rqst_cong is called outside of transport_lockChuck Lever
Since commit ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect the RPC request receive list") the RPC/RDMA reply handler has been calling xprt_release_rqst_cong without holding xprt->transport_lock. I think the only way this call is ever made is if the credit grant increases and there are RPCs pending. Current server implementations do not change their credit grant during operation (except at connect time). Commit e7ce710a8802 ("xprtrdma: Avoid deadlock when credit window is reset") added the ->release_rqst call because UDP invokes xprt_adjust_cwnd(), which calls __xprt_put_cong() after adjusting xprt->cwnd. Both xprt_release() and ->xprt_release_xprt already wake another task in this case, so it is safe to remove this call from the reply handler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-09-30SUNRPC: Cleanup: remove the unused 'task' argument from the request_send()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Clean up transport write space handlingTrond Myklebust
Treat socket write space handling in the same way we now treat transport congestion: by denying the XPRT_LOCK until the transport signals that it has free buffer space. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Support for congestion control when queuing is enabledTrond Myklebust
Both RDMA and UDP transports require the request to get a "congestion control" credit before they can be transmitted. Right now, this is done when the request locks the socket. We'd like it to happen when a request attempts to be transmitted for the first time. In order to support retransmission of requests that already hold such credits, we also want to ensure that they get queued first, so that we don't deadlock with requests that have yet to obtain a credit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Treat the task and request as separate in the xprt_ops->send_request()Trond Myklebust
When we shift to using the transmit queue, then the task that holds the write lock will not necessarily be the same as the one being transmitted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Refactor xprt_transmit() to remove the reply queue codeTrond Myklebust
Separate out the action of adding a request to the reply queue so that the backchannel code can simply skip calling it altogether. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Rename xprt->recv_lock to xprt->queue_lockTrond Myklebust
We will use the same lock to protect both the transmit and receive queues. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-08-23Merge tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "These patches include adding async support for the v4.2 COPY operation. I think Bruce is planning to send the server patches for the next release, but I figured we could get the client side out of the way now since it's been in my tree for a while. This shouldn't cause any problems, since the server will still respond with synchronous copies even if the client requests async. Features: - Add support for asynchronous server-side COPY operations Stable bufixes: - Fix an off-by-one in bl_map_stripe() (v3.17+) - NFSv4 client live hangs after live data migration recovery (v4.9+) - xprtrdma: Fix disconnect regression (v4.18+) - Fix locking in pnfs_generic_recover_commit_reqs (v4.14+) - Fix a sleep in atomic context in nfs4_callback_sequence() (v4.9+) Other bugfixes and cleanups: - Optimizations and fixes involving NFS v4.1 / pNFS layout handling - Optimize lseek(fd, SEEK_CUR, 0) on directories to avoid locking - Immediately reschedule writeback when the server replies with an error - Fix excessive attribute revalidation in nfs_execute_ok() - Add error checking to nfs_idmap_prepare_message() - Use new vm_fault_t return type - Return a delegation when reclaiming one that the server has recalled - Referrals should inherit proto setting from parents - Make rpc_auth_create_args a const - Improvements to rpc_iostats tracking - Fix a potential reference leak when there is an error processing a callback - Fix rmdir / mkdir / rename nlink accounting - Fix updating inode change attribute - Fix error handling in nfsn4_sp4_select_mode() - Use an appropriate work queue for direct-write completion - Don't busy wait if NFSv4 session draining is interrupted" * tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (54 commits) pNFS: Remove unwanted optimisation of layoutget pNFS/flexfiles: ff_layout_pg_init_read should exit on error pNFS: Treat RECALLCONFLICT like DELAY... pNFS: When updating the stateid in layoutreturn, also update the recall range NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence() NFSv4: Fix locking in pnfs_generic_recover_commit_reqs NFSv4: Fix a typo in nfs4_init_channel_attrs() NFSv4: Don't busy wait if NFSv4 session draining is interrupted NFS recover from destination server reboot for copies NFS add a simple sync nfs4_proc_commit after async COPY NFS handle COPY ERR_OFFLOAD_NO_REQS NFS send OFFLOAD_CANCEL when COPY killed NFS export nfs4_async_handle_error NFS handle COPY reply CB_OFFLOAD call race NFS add support for asynchronous COPY NFS COPY xdr handle async reply NFS OFFLOAD_CANCEL xdr NFS CB_OFFLOAD xdr NFS: Use an appropriate work queue for direct-write completion NFSv4: Fix error handling in nfs4_sp4_select_mode() ...
2018-08-23Merge tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Chuck Lever fixed a problem with NFSv4.0 callbacks over GSS from multi-homed servers. The only new feature is a minor bit of protocol (change_attr_type) which the client doesn't even use yet. Other than that, various bugfixes and cleanup" * tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linux: (27 commits) sunrpc: Add comment defining gssd upcall API keywords nfsd: Remove callback_cred nfsd: Use correct credential for NFSv4.0 callback with GSS sunrpc: Extract target name into svc_cred sunrpc: Enable the kernel to specify the hostname part of service principals sunrpc: Don't use stack buffer with scatterlist rpc: remove unneeded variable 'ret' in rdma_listen_handler nfsd: use true and false for boolean values nfsd: constify write_op[] fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_id NFSD: Handle full-length symlinks NFSD: Refactor the generic write vector fill helper svcrdma: Clean up Read chunk path svcrdma: Avoid releasing a page in svc_xprt_release() nfsd: Mark expected switch fall-through sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data' nfsd: fix leaked file lock with nfs exported overlayfs nfsd: don't advertise a SCSI layout for an unsupported request_queue nfsd: fix corrupted reply to badly ordered compound nfsd: clarify check_op_ordering ...
2018-08-09rpc: remove unneeded variable 'ret' in rdma_listen_handlerzhong jiang
The ret is not modified after initalization, So just remove the variable and return 0. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09svcrdma: Clean up Read chunk pathChuck Lever
Simplify the error handling at the tail of recv_read_chunk() by re-arranging rq_pages[] housekeeping and documenting it properly. NB: In this path, svc_rdma_recvfrom returns zero. Therefore no subsequent reply processing is done on the svc_rqstp, and thus the rq_respages field does not need to be updated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09svcrdma: Avoid releasing a page in svc_xprt_release()Chuck Lever
svc_xprt_release() invokes svc_free_res_pages(), which releases pages between rq_respages and rq_next_page. Historically, the RPC/RDMA transport has set these two pointers to be different by one, which means: - one page gets released when svc_recv returns 0. This normally happens whenever one or more RDMA Reads need to be dispatched to complete construction of an RPC Call. - one page gets released after every call to svc_send. In both cases, this released page is immediately refilled by svc_alloc_arg. There does not seem to be a reason for releasing this page. To avoid this unnecessary memory allocator traffic, set rq_next_page more carefully. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data'YueHaibing
Variables 'checksumlen','blocksize' and 'data' are being assigned, but are never used, hence they are redundant and can be removed. Fix the following warning: net/sunrpc/auth_gss/gss_krb5_wrap.c:443:7: warning: variable ‘blocksize’ set but not used [-Wunused-but-set-variable] net/sunrpc/auth_gss/gss_krb5_crypto.c:376:15: warning: variable ‘checksumlen’ set but not used [-Wunused-but-set-variable] net/sunrpc/xprtrdma/svc_rdma.c:97:9: warning: variable ‘data’ set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>