linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-09-30	SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive()	Trond Myklebust
	In preparation for sharing with AF_LOCAL. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Simplify TCP receive code by switching to using iterators	Trond Myklebust
	Most of this code should also be reusable with other socket types. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter()	Trond Myklebust
	Add a bvec array to struct xdr_buf, and have the client allocate it when we need to receive data into pages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Add a label for RPC calls that require allocation on receive	Trond Myklebust
	If the RPC call relies on the receive call allocating pages as buffers, then let's label it so that we a) Don't leak memory by allocating pages for requests that do not expect this behaviour b) Can optimise for the common case where calls do not require allocation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue	Trond Myklebust
	We no longer need priority semantics on the xprt->sending queue, because the order in which tasks are sent is now dictated by their position in the send queue. Note that the backlog queue remains a priority queue, meaning that slot resources are still managed in order of task priority. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Fix priority queue fairness	Trond Myklebust
	Fix up the priority queue to not batch by owner, but by queue, so that we allow '1 << priority' elements to be dequeued before switching to the next priority queue. The owner field is still used to wake up requests in round robin order by owner to avoid single processes hogging the RPC layer by loading the queues. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Convert xprt receive queue to use an rbtree	Trond Myklebust
	If the server is slow, we can find ourselves with quite a lot of entries on the receive queue. Converting the search from an O(n) to O(log(n)) can make a significant difference, particularly since we have to hold a number of locks while searching. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Cleanup: remove the unused 'task' argument from the request_send()	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Clean up transport write space handling	Trond Myklebust
	Treat socket write space handling in the same way we now treat transport congestion: by denying the XPRT_LOCK until the transport signals that it has free buffer space. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Turn off throttling of RPC slots for TCP sockets	Trond Myklebust
	The theory was that we would need to grab the socket lock anyway, so we might as well use it to gate the allocation of RPC slots for a TCP socket. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK	Trond Myklebust
	This no longer causes them to lose their place in the transmission queue. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queue	Trond Myklebust
	Rather than forcing each and every RPC task to grab the socket write lock in order to send itself, we allow whichever task is holding the write lock to attempt to drain the entire transmit queue. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue	Trond Myklebust
	Avoid memory starvation by giving RPCs that are tagged with the RPC_TASK_SWAPPER flag the highest priority. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Support for congestion control when queuing is enabled	Trond Myklebust
	Both RDMA and UDP transports require the request to get a "congestion control" credit before they can be transmitted. Right now, this is done when the request locks the socket. We'd like it to happen when a request attempts to be transmitted for the first time. In order to support retransmission of requests that already hold such credits, we also want to ensure that they get queued first, so that we don't deadlock with requests that have yet to obtain a credit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Improve latency for interactive tasks	Trond Myklebust
	One of the intentions with the priority queues was to ensure that no single process can hog the transport. The field task->tk_owner therefore identifies the RPC call's origin, and is intended to allow the RPC layer to organise queues for fairness. This commit therefore modifies the transmit queue to group requests by task->tk_owner, and ensures that we round robin among those groups. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Move RPC retransmission stat counter to xprt_transmit()	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Simplify xprt_prepare_transmit()	Trond Myklebust
	Remove the checks for whether or not we need to transmit, and whether or not a reply has been received. Those are already handled in call_transmit() itself. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK	Trond Myklebust
	If the request is still on the queue, this will be incorrect behaviour. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Treat the task and request as separate in the xprt_ops->send_request()	Trond Myklebust
	When we shift to using the transmit queue, then the task that holds the write lock will not necessarily be the same as the one being transmitted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Fix up the back channel transmit	Trond Myklebust
	Fix up the back channel code to recognise that it has already been transmitted, so does not need to be called again. Also ensure that we set req->rq_task. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Refactor RPC call encoding	Trond Myklebust
	Move the call encoding so that it occurs before the transport connection etc. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Add a transmission queue for RPC requests	Trond Myklebust
	Add the queue that will enforce the ordering of RPC task transmission. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Distinguish between the slot allocation list and receive queue	Trond Myklebust
	When storing a struct rpc_rqst on the slot allocation list, we currently use the same field 'rq_list' as we use to store the request on the receive queue. Since the structure is never on both lists at the same time, this is OK. However, for clarity, let's make that a union with different names for the different lists so that we can more easily distinguish between the two states. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Minor cleanup for call_transmit()	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Refactor xprt_transmit() to remove wait for reply code	Trond Myklebust
	Allow the caller in clnt.c to call into the code to wait for a reply after calling xprt_transmit(). Again, the reason is that the backchannel code does not need this functionality. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Refactor xprt_transmit() to remove the reply queue code	Trond Myklebust
	Separate out the action of adding a request to the reply queue so that the backchannel code can simply skip calling it altogether. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Rename xprt->recv_lock to xprt->queue_lock	Trond Myklebust
	We will use the same lock to protect both the transmit and receive queues. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit	Trond Myklebust
	Rather than waking up the entire queue of RPC messages a second time, just wake up the task that was put to sleep. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Test whether the task is queued before grabbing the queue spinlocks	Trond Myklebust
	When asked to wake up an RPC task, it makes sense to test whether or not the task is still queued. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status	Trond Myklebust
	Add a helper that will wake up a task that is sleeping on a specific queue, and will set the value of task->tk_status. This is mainly intended for use by the transport layer to notify the task of an error condition. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Refactor the transport request pinning	Trond Myklebust
	We are going to need to pin for both send and receive. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Simplify dealing with aborted partially transmitted messages	Trond Myklebust
	If the previous message was only partially transmitted, we need to close the socket in order to avoid corruption of the message stream. To do so, we currently hijack the unlocking of the socket in order to schedule the close. Now that we track the message offset in the socket state, we can move that kind of checking out of the socket lock code, which is needed to allow messages to remain queued after dropping the socket lock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Add socket transmit queue offset tracking	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Move reset of TCP state variables into the reconnect code	Trond Myklebust
	Rather than resetting state variables in socket state_change() callback, do it in the sunrpc TCP connect function itself. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Rename TCP receive-specific state variables	Trond Myklebust
	Since we will want to introduce similar TCP state variables for the transmission of requests, let's rename the existing ones to label that they are for the receive side. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Avoid holding locks across the XDR encoding of the RPC message	Trond Myklebust
	Currently, we grab the socket bit lock before we allow the message to be XDR encoded. That significantly slows down the transmission rate, since we serialise on a potentially blocking operation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Simplify identification of when the message send/receive is complete	Trond Myklebust
	Add states to indicate that the message send and receive are not yet complete. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: The transmitted message must lie in the RPCSEC window of validity	Trond Myklebust
	If a message has been encoded using RPCSEC_GSS, the server is maintaining a window of sequence numbers that it considers valid. The client should normally be tracking that window, and needs to verify that the sequence number used by the message being transmitted still lies inside the window of validity. So far, we've been able to assume this condition would be realised automatically, since the client has been encoding the message only after taking the socket lock. Once we change that condition, we will need the explicit check. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: If there is no reply expected, bail early from call_decode	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30	SUNRPC: Clean up initialisation of the struct rpc_rqst	Trond Myklebust
	Move the initialisation back into xprt.c. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-29	tipc: fix failover problem	LUU Duc Canh
	We see the following scenario: 1) Link endpoint B on node 1 discovers that its peer endpoint is gone. Since there is a second working link, failover procedure is started. 2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint A on node 2. The node item 1->2 goes to state FAILINGOVER. 3) Linke endpoint A/2 receives the failover, and is supposed to take down its parallell link endpoint B/2, while producing a FAILOVER message to send back to A/1. 4) However, B/2 has already been deleted, so no FAILOVER message can created. 5) Node 1->2 remains in state FAILINGOVER forever, refusing to receive any messages that can bring B/1 up again. We are left with a non- redundant link between node 1 and 2. We fix this with letting endpoint A/2 build a dummy FAILOVER message to send to back to A/1, so that the situation can be resolved. Signed-off-by: LUU Duc Canh <canh.d.luu@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	openvswitch: Use correct reply values in datapath and vport ops	Yifeng Sun
	This patch fixes the bug that all datapath and vport ops are returning wrong values (OVS_FLOW_CMD_NEW or OVS_DP_CMD_NEW) in their replies. Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tls: Remove redundant vars from tls record structure	Vakul Garg
	Structure 'tls_rec' contains sg_aead_in and sg_aead_out which point to a aad_space and then chain scatterlists sg_plaintext_data, sg_encrypted_data respectively. Rather than using chained scatterlists for plaintext and encrypted data in aead_req, it is efficient to store aad_space in sg_encrypted_data and sg_plaintext_data itself in the first index and get rid of sg_aead_in, sg_aead_in and further chaining. This requires increasing size of sg_encrypted_data & sg_plaintext_data arrarys by 1 to accommodate entry for aad_space. The code which uses sg_encrypted_data and sg_plaintext_data has been modified to skip first index as it points to aad_space. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	Merge tag 'rxrpc-fixes-20180928' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fixes Here are some miscellaneous fixes for AF_RXRPC: (1) Remove a duplicate variable initialisation. (2) Fix one of the checks made when we decide to set up a new incoming service call in which a flag is being checked in the wrong field of the packet header. This check is abstracted out into helper functions. (3) Fix RTT gathering. The code has been trying to make use of socket timestamps, but wasn't actually enabling them. The code has also been recording a transmit time for the outgoing packet for which we're going to measure the RTT after sending the message - but we can get the incoming packet before we get to that and record a negative RTT. (4) Fix the emission of BUSY packets (we are emitting ABORTs instead). (5) Improve error checking on incoming packets. (6) Try to fix a bug in new service call handling whereby a BUG we should never be able to reach somehow got triggered. Do this by moving much of the checking as early as possible and not repeating it later (depends on (5) above). (7) Fix the sockopts set on a UDP6 socket to include the ones set on a UDP4 socket so that we receive UDP4 errors and packet-too-large notifications too. (8) Fix the distribution of errors so that we do it at the point of receiving an error in the UDP callback rather than deferring it thereby cutting short any transmissions that would otherwise occur in the window. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tipc: buffer overflow handling in listener socket	Tung Nguyen
	Default socket receive buffer size for a listener socket is 2Mb. For each arriving empty SYN, the linux kernel allocates a 768 bytes buffer. This means that a listener socket can serve maximum 2700 simultaneous empty connection setup requests before it hits a receive buffer overflow, and much fewer if the SYN is carrying any significant amount of data. When this happens the setup request is rejected, and the client receives an ECONNREFUSED error. This commit mitigates this problem by letting the client socket try to retransmit the SYN message multiple times when it sees it rejected with the code TIPC_ERR_OVERLOAD. Retransmission is done at random intervals in the range of [100 ms, setup_timeout / 4], as many times as there is room for within the setup timeout limit. Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tipc: add SYN bit to connection setup messages	Jon Maloy
	Messages intended for intitating a connection are currently indistinguishable from regular datagram messages. The TIPC protocol specification defines bit 17 in word 0 as a SYN bit to allow sanity check of such messages in the listening socket, but this has so far never been implemented. We do that in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tipc: refactor function tipc_sk_filter_connect()	Jon Maloy
	We refactor the function tipc_sk_filter_connect(), both to make it more readable and as a preparation for the next commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tipc: refactor function tipc_sk_timeout()	Jon Maloy
	We refactor this function as a preparation for the coming commits in the same series. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>