summaryrefslogtreecommitdiff
path: root/net/sunrpc
AgeCommit message (Collapse)Author
2018-10-18sunrpc: safely reallow resvport min/max inversionJ. Bruce Fields
Commits ffb6ca33b04b and e08ea3a96fc7 prevent setting xprt_min_resvport greater than xprt_max_resvport, but may also break simple code that sets one parameter then the other, if the new range does not overlap the old. Also it looks racy to me, unless there's some serialization I'm not seeing. Granted it would probably require malicious privileged processes (unless there's a chance these might eventually be settable in unprivileged containers), but still it seems better not to let userspace panic the kernel. Simpler seems to be to allow setting the parameters to whatever you want but interpret xprt_min_resvport > xprt_max_resvport as the empty range. Fixes: ffb6ca33b04b "sunrpc: Prevent resvport min/max inversion..." Fixes: e08ea3a96fc7 "sunrpc: Prevent rexvport min/max inversion..." Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-10-18SUNRPC: Fix a compile warning for cmpxchg64()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-10-05SUNRPC: use cmpxchg64() in gss_seq_send64_fetch_and_inc()Arnd Bergmann
The newly introduced gss_seq_send64_fetch_and_inc() fails to build on 32-bit architectures: net/sunrpc/auth_gss/gss_krb5_seal.c:144:14: note: in expansion of macro 'cmpxchg' seq_send = cmpxchg(&ctx->seq_send64, old, old + 1); ^~~~~~~ arch/x86/include/asm/cmpxchg.h:128:3: error: call to '__cmpxchg_wrong_size' declared with attribute error: Bad argument size for cmpxchg __cmpxchg_wrong_size(); \ As the message tells us, cmpxchg() cannot be used on 64-bit arguments, that's what cmpxchg64() does. Fixes: 571ed1fd2390 ("SUNRPC: Replace krb5_seq_lock with a lockless scheme") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-10-03xprtrdma: Clean up xprt_rdma_disconnect_injectChuck Lever
Clean up: Use the appropriate C macro instead of open-coding container_of() . Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Add documenting commentsChuck Lever
Clean up: fill in or update documenting comments for transport switch entry points. For xprt_rdma_allocate: The first paragraph is no longer true since commit 5a6d1db45569 ("SUNRPC: Add a transport-specific private field in rpc_rqst"). The second paragraph is no longer true since commit 54cbd6b0c6b9 ("xprtrdma: Delay DMA mapping Send and Receive buffers"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Report when there were zero posted ReceivesChuck Lever
To show that a caller did attempt to allocate and post more Receive buffers, the trace point in rpcrdma_post_recvs() should report when rpcrdma_post_recvs() was invoked but no new Receive buffers were posted. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Move rb_flags initializationChuck Lever
Clean up: rb_flags might be used for other things besides RPCRDMA_BUF_F_EMPTY_SCQ, so initialize it in a generic spot instead of in a send-completion-queue-related helper. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03SUNRPC: Refactor sunrpc_cache_lookupTrond Myklebust
This is a trivial split into lookup and insert functions, no change in behavior. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-03SUNRPC: Add lockless lookup of the server's auth domainTrond Myklebust
Avoid taking the global auth_domain_lock in most lookups of the auth domain by adding an RCU protected lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-03SUNRPC: Remove the server 'authtab_lock' and just use RCUTrond Myklebust
Module removal is RCU safe by design, so we really have no need to lock the 'authtab[]' array. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-03xprtrdma: Don't disable BH's in backchannel serverChuck Lever
Clean up: This code was copied from xprtsock.c and backchannel_rqst.c. For rpcrdma, the backchannel server runs exclusively in process context, thus disabling bottom-halves is unnecessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Remove memory address of "ep" from an error messageChuck Lever
Clean up: Replace the hashed memory address of the target rpcrdma_ep with the server's IP address and port. The server address is more useful in an administrative error message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Rename rpcrdma_qp_async_error_upcallChuck Lever
Clean up: Use a function name that is consistent with the RDMA core API and with other consumers. Because this is a function that is invoked from outside the rpcrdma.ko module, add an appropriate documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Simplify RPC wake-ups on connectChuck Lever
Currently, when a connection is established, rpcrdma_conn_upcall invokes rpcrdma_conn_func and then wake_up_all(&ep->rep_connect_wait). The former wakes waiting RPCs, but the connect worker is not done yet, and that leads to races, double wakes, and difficulty understanding how this logic is supposed to work. Instead, collect all the "connection established" logic in the connect worker (xprt_rdma_connect_worker). A disconnect worker is retained to handle provider upcalls safely. Fixes: 254f91e2fa1f ("xprtrdma: RPC/RDMA must invoke ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03xprtrdma: Re-organize the switch() in rpcrdma_conn_upcallChuck Lever
Clean up: Eliminate the FALLTHROUGH into the default arm to make the switch easier to understand. Also, as long as I'm here, do not display the memory address of the target rpcrdma_ep. A hashed memory address is of marginal use here. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Eliminate "connstate" variable from rpcrdma_conn_upcall()Chuck Lever
Clean up. Since commit 173b8f49b3af ("xprtrdma: Demote "connect" log messages") there has been no need to initialize connstat to zero. In fact, in this code path there's now no reason not to set rep_connected directly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Conventional variable names in rpcrdma_conn_upcallChuck Lever
Clean up: The convention throughout other parts of xprtrdma is to name variables of type struct rpcrdma_xprt "r_xprt", not "xprt". This convention enables the use of the name "xprt" for a "struct rpc_xprt" type variable, as in other parts of the RPC client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Rename rpcrdma_conn_upcallChuck Lever
Clean up: Use a function name that is consistent with the RDMA core API and with other consumers. Because this is a function that is invoked from outside the rpcrdma.ko module, add an appropriate documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02sunrpc: Report connect_time in secondsChuck Lever
The way connection-oriented transports report connect_time is wrong: it's supposed to be in seconds, not in jiffies. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02sunrpc: Fix connect metricsChuck Lever
For TCP, the logic in xprt_connect_status is currently never invoked to record a successful connection. Commit 2a4919919a97 ("SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending") changed the way TCP xprt's are awoken after a connect succeeds. Instead, change connection-oriented transports to bump connect_count and compute connect_time the moment that XPRT_CONNECTED is set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Name MR trace events consistentlyChuck Lever
Clean up the names of trace events related to MRs so that it's easy to enable these with a glob. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Explicitly resetting MRs is no longer necessaryChuck Lever
When a memory operation fails, the MR's driver state might not match its hardware state. The only reliable recourse is to dereg the MR. This is done in ->ro_recover_mr, which then attempts to allocate a fresh MR to replace the released MR. Since commit e2ac236c0b651 ("xprtrdma: Allocate MRs on demand"), xprtrdma dynamically allocates MRs. It can add more MRs whenever they are needed. That makes it possible to simply release an MR when a memory operation fails, instead of "recovering" it. It will automatically be replaced by the on-demand MR allocator. This commit is a little larger than I wanted, but it replaces ->ro_recover_mr, rb_recovery_lock, rb_recovery_worker, and the rb_stale_mrs list with a generic work queue. Since MRs are no longer orphaned, the mrs_orphaned metric is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Create more MRs at a timeChuck Lever
Some devices require more than 3 MRs to build a single 1MB I/O. Ensure that rpcrdma_mrs_create() will add enough MRs to build that I/O. In a subsequent patch I'm changing the MR recovery logic to just toss out the MRs. In that case it's possible for ->send_request to loop acquiring some MRs, not getting enough, getting called again, recycling the previous MRs, then not getting enough, lather rinse repeat. Thus first we need to ensure enough MRs are created to prevent that loop. I'm "reusing" ia->ri_max_segs. All of its accessors seem to want the maximum number of data segments plus two, so I'm going to bake that into the initial calculation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: Reset credit grant properly after a disconnectChuck Lever
On a fresh connection, an RPC/RDMA client is supposed to send only one RPC Call until it gets a credit grant in the first RPC Reply from the server [RFC 8166, Section 3.3.3]. There is a bug in the Linux client's credit accounting mechanism introduced by commit e7ce710a8802 ("xprtrdma: Avoid deadlock when credit window is reset"). On connect, it simply dumps all pending RPC Calls onto the new connection. Servers have been tolerant of this bad behavior. Currently no server implementation ever changes its credit grant over reconnects, and servers always repost enough Receives before connections are fully established. To correct this issue, ensure that the client resets both the credit grant _and_ the congestion window when handling a reconnect. Fixes: e7ce710a8802 ("xprtrdma: Avoid deadlock when credit ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02xprtrdma: xprt_release_rqst_cong is called outside of transport_lockChuck Lever
Since commit ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect the RPC request receive list") the RPC/RDMA reply handler has been calling xprt_release_rqst_cong without holding xprt->transport_lock. I think the only way this call is ever made is if the credit grant increases and there are RPCs pending. Current server implementations do not change their credit grant during operation (except at connect time). Commit e7ce710a8802 ("xprtrdma: Avoid deadlock when credit window is reset") added the ->release_rqst call because UDP invokes xprt_adjust_cwnd(), which calls __xprt_put_cong() after adjusting xprt->cwnd. Both xprt_release() and ->xprt_release_xprt already wake another task in this case, so it is safe to remove this call from the reply handler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-09-30SUNRPC: Replace krb5_seq_lock with a lockless schemeTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Lockless lookup of RPCSEC_GSS mechanismsTrond Myklebust
Use RCU protected lookups for discovering the supported mechanisms. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Remove rpc_authflavor_lock in favour of RCU lockingTrond Myklebust
Module removal is RCU safe by design, so we really have no need to lock the auth_flavors[] array. Substitute a lockless scheme to add/remove entries in the array, and then use rcu. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Unexport xdr_partial_copy_from_skb()Trond Myklebust
It is no longer used outside of net/sunrpc/socklib.c Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Clean up xs_udp_data_receive()Trond Myklebust
Simplify the retry logic. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Allow AF_LOCAL sockets to use the generic stream receiveTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive()Trond Myklebust
In preparation for sharing with AF_LOCAL. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Simplify TCP receive code by switching to using iteratorsTrond Myklebust
Most of this code should also be reusable with other socket types. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter()Trond Myklebust
Add a bvec array to struct xdr_buf, and have the client allocate it when we need to receive data into pages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Add a label for RPC calls that require allocation on receiveTrond Myklebust
If the RPC call relies on the receive call allocating pages as buffers, then let's label it so that we a) Don't leak memory by allocating pages for requests that do not expect this behaviour b) Can optimise for the common case where calls do not require allocation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Convert the xprt->sending queue back to an ordinary wait queueTrond Myklebust
We no longer need priority semantics on the xprt->sending queue, because the order in which tasks are sent is now dictated by their position in the send queue. Note that the backlog queue remains a priority queue, meaning that slot resources are still managed in order of task priority. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Fix priority queue fairnessTrond Myklebust
Fix up the priority queue to not batch by owner, but by queue, so that we allow '1 << priority' elements to be dequeued before switching to the next priority queue. The owner field is still used to wake up requests in round robin order by owner to avoid single processes hogging the RPC layer by loading the queues. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Convert xprt receive queue to use an rbtreeTrond Myklebust
If the server is slow, we can find ourselves with quite a lot of entries on the receive queue. Converting the search from an O(n) to O(log(n)) can make a significant difference, particularly since we have to hold a number of locks while searching. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCKTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Cleanup: remove the unused 'task' argument from the request_send()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Clean up transport write space handlingTrond Myklebust
Treat socket write space handling in the same way we now treat transport congestion: by denying the XPRT_LOCK until the transport signals that it has free buffer space. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Turn off throttling of RPC slots for TCP socketsTrond Myklebust
The theory was that we would need to grab the socket lock anyway, so we might as well use it to gate the allocation of RPC slots for a TCP socket. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCKTrond Myklebust
This no longer causes them to lose their place in the transmission queue. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queueTrond Myklebust
Rather than forcing each and every RPC task to grab the socket write lock in order to send itself, we allow whichever task is holding the write lock to attempt to drain the entire transmit queue. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queueTrond Myklebust
Avoid memory starvation by giving RPCs that are tagged with the RPC_TASK_SWAPPER flag the highest priority. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Support for congestion control when queuing is enabledTrond Myklebust
Both RDMA and UDP transports require the request to get a "congestion control" credit before they can be transmitted. Right now, this is done when the request locks the socket. We'd like it to happen when a request attempts to be transmitted for the first time. In order to support retransmission of requests that already hold such credits, we also want to ensure that they get queued first, so that we don't deadlock with requests that have yet to obtain a credit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Improve latency for interactive tasksTrond Myklebust
One of the intentions with the priority queues was to ensure that no single process can hog the transport. The field task->tk_owner therefore identifies the RPC call's origin, and is intended to allow the RPC layer to organise queues for fairness. This commit therefore modifies the transmit queue to group requests by task->tk_owner, and ensures that we round robin among those groups. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Move RPC retransmission stat counter to xprt_transmit()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Simplify xprt_prepare_transmit()Trond Myklebust
Remove the checks for whether or not we need to transmit, and whether or not a reply has been received. Those are already handled in call_transmit() itself. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCKTrond Myklebust
If the request is still on the queue, this will be incorrect behaviour. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>