summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-09-18pkt_sched: fq: use proper locking in fq_dump_stats()Eric Dumazet
When fq is used on 32bit kernels, we need to lock the qdisc before copying 64bit fields. Otherwise "tc -s qdisc ..." might report bogus values. Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18openvswitch: use percpu flow statsThadeu Lima de Souza Cascardo
Instead of using flow stats per NUMA node, use it per CPU. When using megaflows, the stats lock can be a bottleneck in scalability. On a E5-2690 12-core system, usual throughput went from ~4Mpps to ~15Mpps when forwarding between two 40GbE ports with a single flow configured on the datapath. This has been tested on a system with possible CPUs 0-7,16-23. After module removal, there were no corruption on the slab cache. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Cc: pravin shelar <pshelar@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18openvswitch: fix flow stats accounting when node 0 is not possibleThadeu Lima de Souza Cascardo
On a system with only node 1 as possible, all statistics is going to be accounted on node 0 as it will have a single writer. However, when getting and clearing the statistics, node 0 is not going to be considered, as it's not a possible node. Tested that statistics are not zero on a system with only node 1 possible. Also compile-tested with CONFIG_NUMA off. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18sctp: not return ENOMEM err back in sctp_packet_transmitXin Long
As David and Marcelo's suggestion, ENOMEM err shouldn't return back to user in transmit path. Instead, sctp's retransmit would take care of the chunks that fail to send because of ENOMEM. This patch is only to do some release job when alloc_skb fails, not to return ENOMEM back any more. Besides, it also cleans up sctp_packet_transmit's err path, and fixes some issues in err path: - It didn't free the head skb in nomem: path. - No need to check nskb in no_route: path. - It should goto err: path if alloc_skb fails for head. - Not all the NOMEMs should free nskb. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18sctp: make sctp_outq_flush/tail/uncork return voidXin Long
sctp_outq_flush return value is meaningless now, this patch is to make sctp_outq_flush return void, as well as sctp_outq_fail and sctp_outq_uncork. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18sctp: save transmit error to sk_err in sctp_outq_flushXin Long
Every time when sctp calls sctp_outq_flush, it sends out the chunks of control queue, retransmit queue and data queue. Even if some trunks are failed to transmit, it still has to flush all the transports, as it's the only chance to clean that transmit_list. So the latest transmit error here should be returned back. This transmit error is an internal error of sctp stack. I checked all the places where it uses the transmit error (the return value of sctp_outq_flush), most of them are actually just save it to sk_err. Except for sctp_assoc/endpoint_bh_rcv, they will drop the chunk if it's failed to send a REPLY, which is actually incorrect, as we can't be sure the error that sctp_outq_flush returns is from sending that REPLY. So it's meaningless for sctp_outq_flush to return error back. This patch is to save transmit error to sk_err in sctp_outq_flush, the new error can update the old value. Eventually, sctp_wait_for_* would check for it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18sctp: free msg->chunks when sctp_primitive_SEND return errXin Long
Last patch "sctp: do not return the transmit err back to sctp_sendmsg" made sctp_primitive_SEND return err only when asoc state is unavailable. In this case, chunks are not enqueued, they have no chance to be freed if we don't take care of them later. This Patch is actually to revert commit 1cd4d5c4326a ("sctp: remove the unused sctp_datamsg_free()"), commit 69b5777f2e57 ("sctp: hold the chunks only after the chunk is enqueued in outq") and commit 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg"), to use sctp_datamsg_free to free the chunks of current msg. Fixes: 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18sctp: do not return the transmit err back to sctp_sendmsgXin Long
Once a chunk is enqueued successfully, sctp queues can take care of it. Even if it is failed to transmit (like because of nomem), it should be put into retransmit queue. If sctp report this error to users, it confuses them, they may resend that msg, but actually in kernel sctp stack is in charge of retransmit it already. Besides, this error probably is not from the failure of transmitting current msg, but transmitting or retransmitting another msg's chunks, as sctp_outq_flush just tries to send out all transports' chunks. This patch is to make sctp_cmd_send_msg return avoid, and not return the transmit err back to sctp_sendmsg Fixes: 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18sctp: remove the unnecessary state check in sctp_outq_tailXin Long
Data Chunks are only sent by sctp_primitive_SEND, in which sctp checks the asoc's state through statetable before calling sctp_outq_tail. So there's no need to check the asoc's state again in sctp_outq_tail. Besides, sctp_do_sm is protected by lock_sock, even if sending msg is interrupted by timer events, the event's processes still need to acquire lock_sock first. It means no others CMDs can be enqueue into side effect list before CMD_SEND_MSG to change asoc->state, so it's safe to remove it. This patch is to remove redundant asoc->state check from sctp_outq_tail. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17ip6_tunnel: add collect_md mode to IPv6 tunnelsAlexei Starovoitov
Similar to gre, vxlan, geneve tunnels allow IPIP6 and IP6IP6 tunnels to operate in 'collect metadata' mode. Unlike ipv4 code here it's possible to reuse ip6_tnl_xmit() function for both collect_md and traditional tunnels. bpf_skb_[gs]et_tunnel_key() helpers and ovs (in the future) are the users. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17ip_tunnel: add collect_md mode to IPIP tunnelAlexei Starovoitov
Similar to gre, vxlan, geneve tunnels allow IPIP tunnels to operate in 'collect metadata' mode. bpf_skb_[gs]et_tunnel_key() helpers can make use of it right away. ovs can use it as well in the future (once appropriate ovs-vport abstractions and user apis are added). Note that just like in other tunnels we cannot cache the dst, since tunnel_info metadata can be different for every packet. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17l2tp: constify net_device_ops structuresJulia Lawall
Check for net_device_ops structures that are only stored in the netdev_ops field of a net_device structure. This field is declared const, so net_device_ops structures that have this property can be declared as const also. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r disable optional_qualifier@ identifier i; position p; @@ static struct net_device_ops i@p = { ... }; @ok@ identifier r.i; struct net_device e; position p; @@ e.netdev_ops = &i@p; @bad@ position p != {r.p,ok.p}; identifier r.i; struct net_device_ops e; @@ e@i@p @depends on !bad disable optional_qualifier@ identifier r.i; @@ static +const struct net_device_ops i = { ... }; // </smpl> The result of size on this file before the change is: text data bss dec hex filename 3401 931 44 4376 1118 net/l2tp/l2tp_eth.o and after the change it is: text data bss dec hex filename 3993 347 44 4384 1120 net/l2tp/l2tp_eth.o Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17llc: switch type to bool as the timeout is only tested versus 0Alan Cox
(As asked by Dave in Februrary) Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17tcp: prepare skbs for better sack shiftingEric Dumazet
With large BDP TCP flows and lossy networks, it is very important to keep a low number of skbs in the write queue. RACK and SACK processing can perform a linear scan of it. We should avoid putting any payload in skb->head, so that SACK shifting can be done if needed. With this patch, we allow to pack ~0.5 MB per skb instead of the 64KB initially cooked at tcp_sendmsg() time. This gives a reduction of number of skbs in write queue by eight. tcp_rack_detect_loss() likes this. We still allow payload in skb->head for first skb put in the queue, to not impact RPC workloads. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17irda: Free skb on irda_accept error path.phil.turnbull@oracle.com
skb is not freed if newsk is NULL. Rework the error path so free_skb is unconditionally called on function exit. Fixes: c3ea9fa27413 ("[IrDA] af_irda: IRDA_ASSERT cleanups") Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17tcp: fix overflow in __tcp_retransmit_skb()Eric Dumazet
If a TCP socket gets a large write queue, an overflow can happen in a test in __tcp_retransmit_skb() preventing all retransmits. The flow then stalls and resets after timeouts. Tested: sysctl -w net.core.wmem_max=1000000000 netperf -H dest -- -s 1000000000 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17rxrpc: Add config to inject packet lossDavid Howells
Add a configuration option to inject packet loss by discarding approximately every 8th packet received and approximately every 8th DATA packet transmitted. Note that no locking is used, but it shouldn't really matter. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Improve skb tracingDavid Howells
Improve sk_buff tracing within AF_RXRPC by the following means: (1) Use an enum to note the event type rather than plain integers and use an array of event names rather than a big multi ?: list. (2) Distinguish Rx from Tx packets and account them separately. This requires the call phase to be tracked so that we know what we might find in rxtx_buffer[]. (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the event type. (4) A pair of 'rotate' events are added to indicate packets that are about to be rotated out of the Rx and Tx windows. (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for packet loss injection recording. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Remove printks from rxrpc_recvmsg_data() to fix uninit varDavid Howells
Remove _enter/_debug/_leave calls from rxrpc_recvmsg_data() of which one uses an uninitialised variable. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Add a tracepoint to follow what recvmsg doesDavid Howells
Add a tracepoint to follow what recvmsg does within AF_RXRPC. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Add a tracepoint to follow packets in the Rx bufferDavid Howells
Add a tracepoint to follow the life of packets that get added to a call's receive buffer. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Add a tracepoint to log ACK transmissionDavid Howells
Add a tracepoint to log information about ACK transmission. Signed-off-by: David Howels <dhowells@redhat.com>
2016-09-17rxrpc: Add a tracepoint to log received ACK packetsDavid Howells
Add a tracepoint to log information from received ACK packets. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Add a tracepoint to follow the life of a packet in the Tx bufferDavid Howells
Add a tracepoint to follow the insertion of a packet into the transmit buffer, its transmission and its rotation out of the buffer. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Add connection tracepoint and client conn state tracepointDavid Howells
Add a pair of tracepoints, one to track rxrpc_connection struct ref counting and the other to track the client connection cache state. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Add some additional call tracingDavid Howells
Add additional call tracepoint points for noting call-connected, call-released and connection-failed events. Also fix one tracepoint that was using an integer instead of the corresponding enum value as the point type. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Print the packet type name in the Rx packet traceDavid Howells
Print a symbolic packet type name for each valid received packet in the trace output, not just a number. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Fix the basic transmit DATA packet content size at 1412 bytesDavid Howells
Fix the basic transmit DATA packet content size at 1412 bytes so that they can be arbitrarily assembled into jumbo packets. In the future, I'm thinking of moving to keeping a jumbo packet header at the beginning of each packet in the Tx queue and creating the packet header on the spot when kernel_sendmsg() is invoked. That way, jumbo packets can be assembled on the spur of the moment for (re-)transmission. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Be consistent about switch value in rxrpc_send_call_packet()David Howells
rxrpc_send_call_packet() should use type in both its switch-statements rather than using pkt->whdr.type. This might give the compiler an easier job of uninitialised variable checking. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Don't transmit an ACK if there's no reason setDavid Howells
Don't transmit an ACK if call->ackr_reason in unset. There's the possibility of a race between recvmsg() sending an ACK and the background processing thread trying to send the same one. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Fix retransmission algorithmDavid Howells
Make the retransmission algorithm use for-loops instead of do-loops and move the counter increments into the for-statement increment slots. Though the do-loops are slighly more efficient since there will be at least one pass through the each loop, the counter increments are harder to get right as the continue-statements skip them. Without this, if there are any positive acks within the loop, the do-loop will cycle forever because the counter increment is never done. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Fix the parsing of soft-ACKsDavid Howells
The soft-ACK parser doesn't increment the pointer into the soft-ACK list, resulting in the first ACK/NACK value being applied to all the relevant packets in the Tx queue. This has the potential to miss retransmissions and cause excessive retransmissions. Fix this by incrementing the pointer. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Fix unexposed client conn releaseDavid Howells
If the last call on a client connection is release after the connection has had a bunch of calls allocated but before any DATA packets are sent (so that it's not yet marked RXRPC_CONN_EXPOSED), an assertion will happen in rxrpc_disconnect_client_call(). af_rxrpc: Assertion failed - 1(0x1) >= 2(0x2) is false ------------[ cut here ]------------ kernel BUG at ../net/rxrpc/conn_client.c:753! This is because it's expecting the conn to have been exposed and to have 2 or more refs - but this isn't necessarily the case. Simply remove the assertion. This allows the conn to be moved into the inactive state and deleted if it isn't resurrected before the final put is called. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()David Howells
Call rxrpc_release_call() on getting an error in rxrpc_new_client_call() rather than trying to do the cleanup ourselves. This isn't a problem, provided we set RXRPC_CALL_HAS_USERID only if we actually add the call to the calls tree as cleanup code fragments that would otherwise cause problems are conditional. Without this, we miss some of the cleanup. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Fix the putting of client connectionsDavid Howells
In rxrpc_put_one_client_conn(), if a connection has RXRPC_CONN_COUNTED set on it, then it's accounted for in rxrpc_nr_client_conns and may be on various lists - and this is cleaned up correctly. However, if the connection doesn't have RXRPC_CONN_COUNTED set on it, then the put routine returns rather than just skipping the extra bit of cleanup. Fix this by making the extra bit of clean up conditional instead and always killing off the connection. This manifests itself as connections with a zero usage count hanging around in /proc/net/rxrpc_conns because the connection allocated, but discarded, due to a race with another process that set up a parallel connection, which was then shared instead. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Purge the to_be_accepted queue on socket releaseDavid Howells
Purge the queue of to_be_accepted calls on socket release. Note that purging sock_calls doesn't release the ref owned by to_be_accepted. Probably the sock_calls list is redundant given a purges of the recvmsg_q, the to_be_accepted queue and the calls tree. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Record calls that need to be acceptedDavid Howells
Record calls that need to be accepted using sk_acceptq_added() otherwise the backlog counter goes negative because sk_acceptq_removed() is called. This causes the preallocator to malfunction. Calls that are preaccepted by AFS within the kernel aren't affected by this. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Fix handling of the last packet in rxrpc_recvmsg_data()David Howells
The code for determining the last packet in rxrpc_recvmsg_data() has been using the RXRPC_CALL_RX_LAST flag to determine if the rx_top pointer points to the last packet or not. This isn't a good idea, however, as the input code may be running simultaneously on another CPU and that sets the flag *before* updating the top pointer. Fix this by the following means: (1) Restrict the use of RXRPC_CALL_RX_LAST to the input routines only. There's otherwise a synchronisation problem between detecting the flag and checking tx_top. This could probably be dealt with by appropriate application of memory barriers, but there's a simpler way. (2) Set RXRPC_CALL_RX_LAST after setting rx_top. (3) Make rxrpc_rotate_rx_window() consult the flags header field of the DATA packet it's about to discard to see if that was the last packet. Use this as the basis for ending the Rx phase. This shouldn't be a problem because the recvmsg side of things is guaranteed to see the packets in order. (4) Make rxrpc_recvmsg_data() return 1 to indicate the end of the data if: (a) the packet it has just processed is marked as RXRPC_LAST_PACKET (b) the call's Rx phase has been ended. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Check the return value of rxrpc_locate_data()David Howells
Check the return value of rxrpc_locate_data() in rxrpc_recvmsg_data(). Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Move the check of rx_pkt_offset from rxrpc_locate_data() to callerDavid Howells
Move the check of rx_pkt_offset from rxrpc_locate_data() to the caller, rxrpc_recvmsg_data(), so that it's more clear what's going on there. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Remove some whitespace.David Howells
Remove a tab that's on a line that should otherwise be blank. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-17rxrpc: Make IPv6 support conditional on CONFIG_IPV6David Howells
Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it. This is then made conditional on CONFIG_IPV6. Without this, the following can be seen: net/built-in.o: In function `rxrpc_init_peer': >> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags' Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16Merge tag 'nfsd-4.8-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfix from Bruce Fields: "Fix a memory corruption bug that I introduced in 4.7" * tag 'nfsd-4.8-2' of git://linux-nfs.org/~bfields/linux: svcauth_gss: Revert 64c59a3726f2 ("Remove unnecessary allocation")
2016-09-16cfg80211: add helper to find an IE that matches a byte-arrayLuca Coelho
There are a few places where an IE that matches not only the EID, but also other bytes inside the element, needs to be found. To simplify that and reduce the amount of similar code, implement a new helper function to match the EID and an extra array of bytes. Additionally, simplify cfg80211_find_vendor_ie() by using the new match function. Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-09-16mac80211: allow using AP_LINK_PS with mac80211-generated TIM IEEmmanuel Grumbach
In 46fa38e84b65 ("mac80211: allow software PS-Poll/U-APSD with AP_LINK_PS"), Johannes allowed to use mac80211's code for handling stations that go to PS or send PS-Poll / uAPSD trigger frames for devices that enable RSS. This means that mac80211 doesn't look at frames anymore but rather relies on a notification that will come from the device when a PS transition occurs or when a PS-Poll / trigger frame is detected by the device. iwlwifi will need this capability but still needs mac80211 to take care of the TIM IE. Today, if a driver sets AP_LINK_PS, mac80211 will not update the TIM IE. Change mac80211 to check existence of the set_tim driver callback rather than using AP_LINK_PS to decide if the driver handles the TIM IE internally or not. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> [reword commit message a bit] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-09-16net-next: dsa: add Qualcomm tag RX/TX handlerJohn Crispin
Add support for the 2-bytes Qualcomm tag that gigabit switches such as the QCA8337/N might insert when receiving packets, or that we need to insert while targeting specific switch ports. The tag is inserted directly behind the ethernet header. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16net: VRF: Pass original iif to ip_route_input()Mark Tomlinson
The function ip_rcv_finish() calls l3mdev_ip_rcv(). On any VRF except the global VRF, this replaces skb->dev with the VRF master interface. When calling ip_route_input_noref() from here, the checks for forwarding look at this master device instead of the initial ingress interface. This will allow packets to be routed which normally would be dropped. For example, an interface that is not assigned an IP address should drop packets, but because the checking is against the master device, the packet will be forwarded. The fix here is to still call l3mdev_ip_rcv(), but remember the initial net_device. This is passed to the other functions within ip_rcv_finish, so they still see the original interface. Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16Merge tag 'batadv-net-for-davem-20160914' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here are two batman-adv bugfix patches: - Fix reference counting for last_bonding_candidate, by Sven Eckelmann - Fix head room reservation for ELP packets, by Linus Luessing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16tcp: fix a stale ooo_last_skb after a replaceEric Dumazet
When skb replaces another one in ooo queue, I forgot to also update tp->ooo_last_skb as well, if the replaced skb was the last one in the queue. To fix this, we simply can re-use the code that runs after an insertion, trying to merge skbs at the right of current skb. This not only fixes the bug, but also remove all small skbs that might be a subset of the new one. Example: We receive segments 2001:3001, 4001:5001 Then we receive 2001:8001 : We should replace 2001:3001 with the big skb, but also remove 4001:50001 from the queue to save space. packetdrill test demonstrating the bug 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> +0.100 < . 1:1(0) ack 1 win 1024 +0 accept(3, ..., ...) = 4 +0.01 < . 1001:2001(1000) ack 1 win 1024 +0 > . 1:1(0) ack 1 <nop,nop, sack 1001:2001> +0.01 < . 1001:3001(2000) ack 1 win 1024 +0 > . 1:1(0) ack 1 <nop,nop, sack 1001:2001 1001:3001> Fixes: 9f5afeae5152 ("tcp: use an RB tree for ooo receive queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yuchung Cheng <ycheng@google.com> Cc: Yaogong Wang <wygivan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16Merge tag 'rxrpc-rewrite-20160913-2' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Support IPv6 Here is a set of patches that add IPv6 support. They need to be applied on top of the just-posted miscellaneous fix patches. They are: (1) Make autobinding of an unconnected socket work when sendmsg() is called to initiate a client call. (2) Don't specify the protocol when creating the client socket, but rather take the default instead. (3) Use rxrpc_extract_addr_from_skb() in a couple of places that were doing the same thing manually. This allows the IPv6 address extraction to be done in fewer places. (4) Add IPv6 support. With this, calls can be made to IPv6 servers from userspace AF_RXRPC programs; AFS, however, can't use IPv6 yet as the RPC calls need to be upgradeable. ==================== Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>