summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-11-02sctp: clean up sctp_packet_transmitXin Long
After adding sctp gso, sctp_packet_transmit is a quite big function now. This patch is to extract the codes for packing packet to sctp_packet_pack from sctp_packet_transmit, and add some comments, simplify the err path by freeing auth chunk when freeing packet chunk_list in out path and freeing head skb early if it fails to pack packet. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02net/sched: cls_flower: merge filter delete/destroy common codeRoi Dayan
Move common code from fl_delete and fl_detroy to __fl_delete. Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02net/sched: cls_flower: add missing unbind call when destroying flowsRoi Dayan
tcf_unbind was called in fl_delete but was missing in fl_destroy when force deleting flows. Fixes: 77b9900ef53a ('tc: introduce Flower classifier') Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. This includes better integration with the routing subsystem for nf_tables, explicit notrack support and smaller updates. More specifically, they are: 1) Add fib lookup expression for nf_tables, from Florian Westphal. This new expression provides a native replacement for iptables addrtype and rp_filter matches. This is more flexible though, since we can populate the kernel flowi representation to inquire fib to accomodate new usecases, such as RTBH through skb mark. 2) Introduce rt expression for nf_tables, from Anders K. Pedersen. This new expression allow you to access skbuff route metadata, more specifically nexthop and classid fields. 3) Add notrack support for nf_tables, to skip conntracking, requested by many users already. 4) Add boilerplate code to allow to use nf_log infrastructure from nf_tables ingress. 5) Allow to mangle pkttype from nf_tables prerouting chain, to emulate the xtables cluster match, from Liping Zhang. 6) Move socket lookup code into generic nf_socket_* infrastructure so we can provide a native replacement for the xtables socket match. 7) Make sure nfnetlink_queue data that is updated on every packets is placed in a different cache from read-only data, from Florian Westphal. 8) Handle NF_STOLEN from nf_tables core, also from Florian Westphal. 9) Start round robin number generation in nft_numgen from zero, instead of n-1, for consistency with xtables statistics match, patch from Liping Zhang. 10) Set GFP_NOWARN flag in skbuff netlink allocations in nfnetlink_log, given we retry with a smaller allocation on failure, from Calvin Owens. 11) Cleanup xt_multiport to use switch(), from Gao feng. 12) Remove superfluous check in nft_immediate and nft_cmp, from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01netfilter: nf_queue: place volatile data in own cachelineFlorian Westphal
As the comment indicates, the data at the end of nfqnl_instance struct is written on every queue/dequeue, so it should reside in its own cacheline. Before this change, 'lock' was in first cacheline so we dirtied both. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_tables: remove useless U8_MAX validationLiping Zhang
After call nft_data_init, size is already validated and desc.len will not exceed the sizeof(struct nft_data), i.e. 16 bytes. So it will never exceed U8_MAX. Furthermore, in nft_immediate_init, we forget to call nft_data_uninit when desc.len exceeds U8_MAX, although this will not happen, but it's a logical mistake. Now remove these redundant validation introduced by commit 36b701fae12a ("netfilter: nf_tables: validate maximum value of u32 netlink attributes") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_tables: introduce routing expressionAnders K. Pedersen
Introduces an nftables rt expression for routing related data with support for nexthop (i.e. the directly connected IP address that an outgoing packet is sent to), which can be used either for matching or accounting, eg. # nft add rule filter postrouting \ ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop This will drop any traffic to 192.168.1.0/24 that is not routed via 192.168.0.1. # nft add rule filter postrouting \ flow table acct { rt nexthop timeout 600s counter } # nft add rule ip6 filter postrouting \ flow table acct { rt nexthop timeout 600s counter } These rules count outgoing traffic per nexthop. Note that the timeout releases an entry if no traffic is seen for this nexthop within 10 minutes. # nft add rule inet filter postrouting \ ether type ip \ flow table acct { rt nexthop timeout 600s counter } # nft add rule inet filter postrouting \ ether type ip6 \ flow table acct { rt nexthop timeout 600s counter } Same as above, but via the inet family, where the ether type must be specified explicitly. "rt classid" is also implemented identical to "meta rtclassid", since it is more logical to have this match in the routing expression going forward. Signed-off-by: Anders K. Pedersen <akp@cohaesio.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.cPablo Neira Ayuso
We need this split to reuse existing codebase for the upcoming nf_tables socket expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_log: add packet logging for netdev familyPablo Neira Ayuso
Move layer 2 packet logging into nf_log_l2packet() that resides in nf_log_common.c, so this can be shared by both bridge and netdev families. This patch adds the boiler plate code to register the netdev logging family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_tables: add fib expressionFlorian Westphal
Add FIB expression, supported for ipv4, ipv6 and inet family (the latter just dispatches to ipv4 or ipv6 one based on nfproto). Currently supports fetching output interface index/name and the rtm_type associated with an address. This can be used for adding path filtering. rtm_type is useful to e.g. enforce a strong-end host model where packets are only accepted if daddr is configured on the interface the packet arrived on. The fib expression is a native nftables alternative to the xtables addrtype and rp_filter matches. FIB result order for oif/oifname retrieval is as follows: - if packet is local (skb has rtable, RTF_LOCAL set, this will also catch looped-back multicast packets), set oif to the loopback interface. - if fib lookup returns an error, or result points to local, store zero result. This means '--local' option of -m rpfilter is not supported. It is possible to use 'fib type local' or add explicit saddr/daddr matching rules to create exceptions if this is really needed. - store result in the destination register. In case of multiple routes, search set for desired oif in case strict matching is requested. ipv4 and ipv6 behave fib expressions are supposed to behave the same. [ I have collapsed Arnd Bergmann's ("netfilter: nf_tables: fib warnings") http://patchwork.ozlabs.org/patch/688615/ to address fallout from this patch after rebasing nf-next, that was posted to address compilation warnings. --pablo ] Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01sunrpc: GFP_KERNEL should be GFP_NOFS in crypto codeJ. Bruce Fields
Writes may depend on the auth_gss crypto code, so we shouldn't be allocating with GFP_KERNEL there. This still leaves some crypto_alloc_* calls which end up doing GFP_KERNEL allocations in the crypto code. Those could probably done at crypto import time. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01svcrdma: backchannel cannot share a page for send and rcv buffersChuck Lever
The underlying transport releases the page pointed to by rq_buffer during xprt_rdma_bc_send_request. When the backchannel reply arrives, rq_rbuffer then points to freed memory. Fixes: 68778945e46f ('SUNRPC: Separate buffer pointers for RPC ...') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01unix: escape all null bytes in abstract unix domain socketIsaac Boukris
Abstract unix domain socket may embed null characters, these should be translated to '@' when printed out to proc the same way the null prefix is currently being translated. This helps for tools such as netstat, lsof and the proc based implementation in ss to show all the significant bytes of the name (instead of getting cut at the first null occurrence). Signed-off-by: Isaac Boukris <iboukris@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01genetlink: fix error return code in genl_register_family()Wei Yongjun
Fix to return a negative error code from the idr_alloc() error handling case instead of 0, as done elsewhere in this function. Also fix the return value check of idr_alloc() since idr_alloc return negative errors on failure, not zero. Fixes: 2ae0f17df1cd ("genetlink: use idr to track families") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01net: Enable support for VRF with ipv4 multicastDavid Ahern
Enable support for IPv4 multicast: - similar to unicast the flow struct is updated to L3 master device if relevant prior to calling fib_rules_lookup. The table id is saved to the lookup arg so the rule action for ipmr can return the table associated with the device. - ip_mr_forward needs to check for master device mismatch as well since the skb->dev is set to it - allow multicast address on VRF device for Rx by checking for the daddr in the VRF device as well as the original ingress device - on Tx need to drop to __mkroute_output when FIB lookup fails for multicast destination address. - if CONFIG_IP_MROUTE_MULTIPLE_TABLES is enabled VRF driver creates IPMR FIB rules on first device create similar to FIB rules. In addition the VRF driver does not divert IPv4 multicast packets: it breaks on Tx since the fib lookup fails on the mcast address. With this patch, ipmr forwarding and local rx/tx work. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove SS_CONNECTED sock stateParthasarathy Bhuvaragan
In this commit, we replace references to sock->state SS_CONNECTE with sk_state TIPC_ESTABLISHED. Finally, the sock->state is no longer explicitly used by tipc. The FSM below is for various types of connection oriented sockets. Stream Server Listening Socket: +-----------+ +-------------+ | TIPC_OPEN |------>| TIPC_LISTEN | +-----------+ +-------------+ Stream Server Data Socket: +-----------+ +------------------+ | TIPC_OPEN |------>| TIPC_ESTABLISHED | +-----------+ +------------------+ ^ | | | | v +--------------------+ | TIPC_DISCONNECTING | +--------------------+ Stream Socket Client: +-----------+ +-----------------+ | TIPC_OPEN |------>| TIPC_CONNECTING |------+ +-----------+ +-----------------+ | | | | | v | +------------------+ | | TIPC_ESTABLISHED | | +------------------+ | ^ | | | | | | v | +--------------------+ | | TIPC_DISCONNECTING |<--+ +--------------------+ Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_CONNECTING as a new sk_stateParthasarathy Bhuvaragan
In this commit, we create a new tipc socket state TIPC_CONNECTING by primarily replacing the SS_CONNECTING with TIPC_CONNECTING. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove SS_DISCONNECTING stateParthasarathy Bhuvaragan
In this commit, we replace the references to SS_DISCONNECTING with the combination of sk_state TIPC_DISCONNECTING and flags set in sk_shutdown. We introduce a new function _tipc_shutdown(), which provides the common code required by tipc_release() and tipc_shutdown(). Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_DISCONNECTING as a new sk_stateParthasarathy Bhuvaragan
In this commit, we create a new tipc socket state TIPC_DISCONNECTING in sk_state. TIPC_DISCONNECTING is replacing the socket connection status update using SS_DISCONNECTING. TIPC_DISCONNECTING is set for connection oriented sockets at: - tipc_shutdown() - connection probe timeout - when we receive an error message on the connection. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_OPEN as a new sk_stateParthasarathy Bhuvaragan
In this commit, we create a new tipc socket state TIPC_OPEN in sk_state. We primarily replace the SS_UNCONNECTED sock->state with TIPC_OPEN. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_ESTABLISHED as a new sk_stateParthasarathy Bhuvaragan
Until now, tipc maintains probing state for connected sockets in tsk->probing_state variable. In this commit, we express this information as socket states and this remove the variable. We set probe_unacked flag when a probe is sent out and reset it if we receive a reply. Instead of the probing state TIPC_CONN_OK, we create a new state TIPC_ESTABLISHED. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_LISTEN as a new sk_stateParthasarathy Bhuvaragan
Until now, tipc maintains the socket state in sock->state variable. This is used to maintain generic socket states, but in tipc we overload it and save tipc socket states like TIPC_LISTEN. Other protocols like TCP, UDP store protocol specific states in sk->sk_state instead. In this commit, we : - declare a new tipc state TIPC_LISTEN, that replaces SS_LISTEN - Create a new function tipc_set_state(), to update sk->sk_state. - TIPC_LISTEN state is maintained in sk->sk_state. - replace references to SS_LISTEN with TIPC_LISTEN. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove socket state SS_READYParthasarathy Bhuvaragan
Until now, tipc socket state SS_READY declares that the socket is a connectionless socket. In this commit, we remove the state SS_READY and replace it with a condition which returns true for datagram / connectionless sockets. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove probing_intv from tipc_sockParthasarathy Bhuvaragan
Until now, probing_intv is a variable in struct tipc_sock but is always set to a constant CONN_PROBING_INTERVAL. The socket connection is probed based on this value. In this commit, we remove this variable and setup the socket timer based on the constant CONN_PROBING_INTERVAL. There is no functional change in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove tsk->connected from tipc_sockParthasarathy Bhuvaragan
Until now, we determine if a socket is connected or not based on tsk->connected, which is set once when the probing state is set to TIPC_CONN_OK. It is unset when the sock->state is updated from SS_CONNECTED to any other state. In this commit, we remove connected variable from tipc_sock and derive socket connection status from the following condition: sock->state == SS_CONNECTED => tsk->connected There is no functional change in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove tsk->connected for connectionless socketsParthasarathy Bhuvaragan
Until now, for connectionless sockets the peer information during connect is stored in tsk->peer and a connection state is set in tsk->connected. This is redundant. In this commit, for connectionless sockets we update: - __tipc_sendmsg(), when the destination is NULL the peer existence is determined by tsk->peer.family, instead of tsk->connected. - tipc_connect(), remove set/unset of tsk->connected. Hence tsk->connected is no longer used for connectionless sockets. There is no functional change in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: rename tsk->remote to tsk->peer for consistent namingParthasarathy Bhuvaragan
Until now, the peer information for connect is stored in tsk->remote but the rest of code uses the name peer for peer/remote. In this commit, we rename tsk->remote to tsk->peer to align with naming convention followed in the rest of the code. There is no functional change in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: rename struct tipc_skb_cb member handle to bytes_readParthasarathy Bhuvaragan
In this commit, we rename handle to bytes_read indicating the purpose of the member. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: set kern=0 in sk_alloc() during tipc_accept()Parthasarathy Bhuvaragan
Until now, tipc_accept() calls sk_alloc() with kern=1. This is incorrect as the data socket's owner is the user application. Thus for these accepted data sockets the network namespace refcount is skipped. In this commit, we fix this by setting kern=0. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: wakeup sleeping users at disconnectParthasarathy Bhuvaragan
Until now, in filter_connect() when we terminate a connection due to an error message from peer, we set the socket state to DISCONNECTING. The socket is notified about this broken connection using EPIPE when a user tries to send a message. However if a socket was waiting on a poll() while the connection is being terminated, we fail to wakeup that socket. In this commit, we wakeup sleeping sockets at connection termination. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: return early for non-blocking sockets at link congestionParthasarathy Bhuvaragan
Until now, in stream/mcast send() we pass the message to the link layer even when the link is congested and add the socket to the link's wakeup queue. This is unnecessary for non-blocking sockets. If a socket is set to non-blocking and sends multicast with zero back off time while receiving EAGAIN, we exhaust the memory. In this commit, we return immediately at stream/mcast send() for non-blocking sockets. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31Merge tag 'linux-can-fixes-for-4.9-20161031' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2016-10-31 this is a pull request of two patches for the upcoming v4.9 release. The first patch is by Lukas Resch for the sja1000 plx_pci driver that adds support for Moxa CAN devices. The second patch is by Oliver Hartkopp and fixes a potential kernel panic in the CAN broadcast manager. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31sctp: hold transport instead of assoc when lookup assoc in rx pathXin Long
Prior to this patch, in rx path, before calling lock_sock, it needed to hold assoc when got it by __sctp_lookup_association, in case other place would free/put assoc. But in __sctp_lookup_association, it lookup and hold transport, then got assoc by transport->assoc, then hold assoc and put transport. It means it didn't hold transport, yet it was returned and later on directly assigned to chunk->transport. Without the protection of sock lock, the transport may be freed/put by other places, which would cause a use-after-free issue. This patch is to fix this issue by holding transport instead of assoc. As holding transport can make sure to access assoc is also safe, and actually it looks up assoc by searching transport rhashtable, to hold transport here makes more sense. Note that the function will be renamed later on on another patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31sctp: return back transport in __sctp_rcv_init_lookupXin Long
Prior to this patch, it used a local variable to save the transport that is looked up by __sctp_lookup_association(), and didn't return it back. But in sctp_rcv, it is used to initialize chunk->transport. So when hitting this, even if it found the transport, it was still initializing chunk->transport with null instead. This patch is to return the transport back through transport pointer that is from __sctp_rcv_lookup_harder(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31sctp: hold transport instead of assoc in sctp_diagXin Long
In sctp_transport_lookup_process(), Commit 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") moved cb() out of rcu lock, but it put transport and hold assoc instead, and ignore that cb() still uses transport. It may cause a use-after-free issue. This patch is to hold transport instead of assoc there. Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31bridge: mcast: add router port on PIM hello messageNikolay Aleksandrov
When we receive a PIM Hello message on a port we can consider that it has a multicast router attached, thus it is correct to add it to the router list. The only catch is it shouldn't be considered for a querier. Using Daniel's description: leaf-11 leaf-12 leaf-13 \ | / bridge-1 / \ host-11 host-12 - all ports in bridge-1 are in a single vlan aware bridge - leaf-11 is the IGMP querier - leaf-13 is the PIM DR - host-11 TXes packets to 226.10.10.10 - bridge-1 only forwards the 226.10.10.10 traffic out the port to leaf-11, it should also forward this traffic out the port to leaf-13 Suggested-by: Daniel Walton <dwalton@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31net: pim: add all RFC7761 message typesNikolay Aleksandrov
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31can: bcm: fix warning in bcm_connect/proc_registerOliver Hartkopp
Andrey Konovalov reported an issue with proc_register in bcm.c. As suggested by Cong Wang this patch adds a lock_sock() protection and a check for unsuccessful proc_create_data() in bcm_connect(). Reference: http://marc.info/?l=linux-netdev&m=147732648731237 Reported-by: Andrey Konovalov <andreyknvl@google.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-10-31net: mangle zero checksum in skb_checksum_help()Eric Dumazet
Sending zero checksum is ok for TCP, but not for UDP. UDPv6 receiver should by default drop a frame with a 0 checksum, and UDPv4 would not verify the checksum and might accept a corrupted packet. Simply replace such checksum by 0xffff, regardless of transport. This error was caught on SIT tunnels, but seems generic. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31net: clear sk_err_soft in sk_clone_lock()Eric Dumazet
At accept() time, it is possible the parent has a non zero sk_err_soft, leftover from a prior error. Make sure we do not leave this value in the child, as it makes future getsockopt(SO_ERROR) calls quite unreliable. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31dctcp: avoid bogus doubling of cwnd after lossFlorian Westphal
If a congestion control module doesn't provide .undo_cwnd function, tcp_undo_cwnd_reduction() will set cwnd to tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1); ... which makes sense for reno (it sets ssthresh to half the current cwnd), but it makes no sense for dctcp, which sets ssthresh based on the current congestion estimate. This can cause severe growth of cwnd (eventually overflowing u32). Fix this by saving last cwnd on loss and restore cwnd based on that, similar to cubic and other algorithms. Fixes: e3118e8359bb7c ("net: tcp: add DCTCP congestion control algorithm") Cc: Lawrence Brakmo <brakmo@fb.com> Cc: Andrew Shewmaker <agshew@gmail.com> Cc: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31net: Add support for XPS with QoS via traffic classesAlexander Duyck
This patch adds support for setting and using XPS when QoS via traffic classes is enabled. With this change we will factor in the priority and traffic class mapping of the packet and use that information to correctly select the queue. This allows us to define a set of queues for a given traffic class via mqprio and then configure the XPS mapping for those queues so that the traffic flows can avoid head-of-line blocking between the individual CPUs if so desired. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31net: Refactor removal of queues from XPS map and apply on num_tc changesAlexander Duyck
This patch updates the code for removing queues from the XPS map and makes it so that we can apply the code any time we change either the number of traffic classes or the mapping of a given block of queues. This way we avoid having queues pulling traffic from a foreign traffic class. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31net: Add sysfs value to determine queue traffic classAlexander Duyck
Add a sysfs attribute for a Tx queue that allows us to determine the traffic class for a given queue. This will allow us to more easily determine this in the future. It is needed as XPS will take the traffic class for a group of queues into account in order to avoid pulling traffic from one traffic class into another. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31net: Move functions for configuring traffic classes out of inline headersAlexander Duyck
The functions for configuring the traffic class to queue mappings have other effects that need to be addressed. Instead of trying to export a bunch of new functions just relocate the functions so that we can instrument them directly with the functionality they will need. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31ipv6: add mtu lock check in __ip6_rt_update_pmtuXin Long
Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu. It leaded to that mtu lock doesn't really work when receiving the pkt of ICMPV6_PKT_TOOBIG. This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4 did in __ip_rt_update_pmtu. Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31ipv6: Don't use ufo handling on later transformed packetsJakub Sitnicki
Similar to commit c146066ab802 ("ipv4: Don't use ufo handling on later transformed packets"), don't perform UFO on packets that will be IPsec transformed. To detect it we rely on the fact that headerlen in dst_entry is non-zero only for transformation bundles (xfrm_dst objects). Unwanted segmentation can be observed with a NETIF_F_UFO capable device, such as a dummy device: DEV=dum0 LEN=1493 ip li add $DEV type dummy ip addr add fc00::1/64 dev $DEV nodad ip link set $DEV up ip xfrm policy add dir out src fc00::1 dst fc00::2 \ tmpl src fc00::1 dst fc00::2 proto esp spi 1 ip xfrm state add src fc00::1 dst fc00::2 \ proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b tcpdump -n -nn -i $DEV -t & socat /dev/zero,readbytes=$LEN udp6:[fc00::2]:$LEN tcpdump output before: IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448 IP6 fc00::1 > fc00::2: frag (1448|48) IP6 fc00::1 > fc00::2: ESP(spi=0x00000001,seq=0x2), length 88 ... and after: IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448 IP6 fc00::1 > fc00::2: frag (1448|80) Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach") Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31net: add an ioctl to get a socket network namespaceAndrey Vagin
Each socket operates in a network namespace where it has been created, so if we want to dump and restore a socket, we have to know its network namespace. We have a socket_diag to get information about sockets, it doesn't report sockets which are not bound or connected. This patch introduces a new socket ioctl, which is called SIOCGSKNS and used to get a file descriptor for a socket network namespace. A task must have CAP_NET_ADMIN in a target network namespace to use this ioctl. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31netfilter: nft_dup: do not use sreg_dev if the user doesn't specify itLiping Zhang
The NFTA_DUP_SREG_DEV attribute is not a must option, so we should use it in routing lookup only when the user specify it. Fixes: d877f07112f1 ("netfilter: nf_tables: add nft_dup expression") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-31netfilter: nf_tables: destroy the set if fail to add transactionLiping Zhang
When the memory is exhausted, then we will fail to add the NFT_MSG_NEWSET transaction. In such case, we should destroy the set before we free it. Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>