summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-11-03netfilter: merge nf_iterate() into nf_hook_slow()Pablo Neira Ayuso
nf_iterate() has become rather simple, we can integrate this code into nf_hook_slow() to reduce the amount of LOC in the core path. However, we still need nf_iterate() around for nf_queue packet handling, so move this function there where we only need it. I think it should be possible to refactor nf_queue code to get rid of it definitely, but given this is slow path anyway, let's have a look this later. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: remove hook_entries field from nf_hook_statePablo Neira Ayuso
This field is only useful for nf_queue, so store it in the nf_queue_entry structure instead, away from the core path. Pass hook_head to nf_hook_slow(). Since we always have a valid entry on the first iteration in nf_iterate(), we can use 'do { ... } while (entry)' loop instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: use switch() to handle verdict cases from nf_hook_slow()Pablo Neira Ayuso
Use switch() for verdict handling and add explicit handling for NF_STOLEN and other non-conventional verdicts. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: nf_tables: use hook state from xt_action_param structurePablo Neira Ayuso
Don't copy relevant fields from hook state structure, instead use the one that is already available in struct xt_action_param. This patch also adds a set of new wrapper functions to fetch relevant hook state structure fields. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: x_tables: move hook state into xt_action_param structurePablo Neira Ayuso
Place pointer to hook state in xt_action_param structure instead of copying the fields that we need. After this change xt_action_param fits into one cacheline. This patch also adds a set of new wrapper functions to fetch relevant hook state structure fields. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: deprecate NF_STOPPablo Neira Ayuso
NF_STOP is only used by br_netfilter these days, and it can be emulated with a combination of NF_STOLEN plus explicit call to the ->okfn() function as Florian suggests. To retain binary compatibility with userspace nf_queue application, we have to keep NF_STOP around, so libnetfilter_queue userspace userspace applications still work if they use NF_STOP for some exotic reason. Out of tree modules using NF_STOP would break, but we don't care about those. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: kill NF_HOOK_THRESH() and state->treshPablo Neira Ayuso
Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh") introduced br_nf_hook_thresh(). Replace NF_HOOK_THRESH() by br_nf_hook_thresh from br_nf_forward_finish(), so we have no more callers for this macro. As a result, state->thresh and explicit thresh parameter in the hook state structure is not required anymore. And we can get rid of skip-hook-under-thresh loop in nf_iterate() in the core path that is only used by br_netfilter to search for the filter hook. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: remove comments that predate rcu daysPablo Neira Ayuso
We cannot block/sleep on nf_iterate because netfilter runs under rcu read lock these days, where blocking is well-known to be illegal. So let's remove these old comments. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: get rid of useless debugging from corePablo Neira Ayuso
This patch remove compile time code to catch inconventional verdicts. We have better ways to handle this case these days, eg. pr_debug() but even though I don't think this is useful at all, so let's remove this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-02ila: Fix crash caused by rhashtable changesTom Herbert
commit ca26893f05e86 ("rhashtable: Add rhlist interface") added a field to rhashtable_iter so that length became 56 bytes and would exceed the size of args in netlink_callback (which is 48 bytes). The netlink diag dump function already has been allocating a iter structure and storing the pointed to that in the args of netlink_callback. ila_xlat also uses rhahstable_iter but is still putting that directly in the arg block. Now since rhashtable_iter size is increased we are overwriting beyond the structure. The next field happens to be cb_mutex pointer in netlink_sock and hence the crash. Fix is to alloc the rhashtable_iter and save it as pointer in arg. Tested: modprobe ila ./ip ila add loc 3333:0:0:0 loc_match 2222:0:0:1, ./ip ila list # NO crash now Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02net: ip, diag -- Adjust raw_abort to use unlocked __udp_disconnectCyrill Gorcunov
While being preparing patches for killing raw sockets via diag netlink interface I noticed that my runs are stuck: | [root@pcs7 ~]# cat /proc/`pidof ss`/stack | [<ffffffff816d1a76>] __lock_sock+0x80/0xc4 | [<ffffffff816d206a>] lock_sock_nested+0x47/0x95 | [<ffffffff8179ded6>] udp_disconnect+0x19/0x33 | [<ffffffff8179b517>] raw_abort+0x33/0x42 | [<ffffffff81702322>] sock_diag_destroy+0x4d/0x52 which has not been the case before. I narrowed it down to the commit | commit 286c72deabaa240b7eebbd99496ed3324d69f3c0 | Author: Eric Dumazet <edumazet@google.com> | Date: Thu Oct 20 09:39:40 2016 -0700 | | udp: must lock the socket in udp_disconnect() where we start locking the socket for different reason. So the raw_abort escaped the renaming and we have to fix this typo using __udp_disconnect instead. Fixes: 286c72deabaa ("udp: must lock the socket in udp_disconnect()") CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Ahern <dsa@cumulusnetworks.com> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> CC: Andrey Vagin <avagin@openvz.org> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02tcp: enhance tcp collapsingEric Dumazet
As Ilya Lesokhin suggested, we can collapse two skbs at retransmit time even if the skb at the right has fragments. We simply have to use more generic skb_copy_bits() instead of skb_copy_from_linear_data() in tcp_collapse_retrans() Also need to guard this skb_copy_bits() in case there is nothing to copy, otherwise skb_put() could panic if left skb has frags. Tested: Used following packetdrill test // Establish a connection. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> +.100 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 write(4, ..., 200) = 200 +0 > P. 1:201(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 201:401(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 401:601(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 601:801(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 801:1001(200) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1001:1101(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1101:1201(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1201:1301(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1301:1401(100) ack 1 +.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401> // Check that TCP collapse works : +0 > P. 1:1001(1000) ack 1 Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02ip6_udp_tunnel: remove unused IPCB related codesEli Cooper
Some IPCB fields are currently set in udp_tunnel6_xmit_skb(), which are never used before it reaches ip6tunnel_xmit(), and past that point the control buffer is no longer interpreted as IPCB. This clears these unused IPCB related codes. Currently there is no skb scrubbing in ip6_udp_tunnel, otherwise IPCB(skb)->opt might need to be cleared for IPv4 packets, as shown in 5146d1f1511 ("tunnel: Clear IPCB(skb)->opt before dst_link_failure called"). Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02sctp: clean up sctp_packet_transmitXin Long
After adding sctp gso, sctp_packet_transmit is a quite big function now. This patch is to extract the codes for packing packet to sctp_packet_pack from sctp_packet_transmit, and add some comments, simplify the err path by freeing auth chunk when freeing packet chunk_list in out path and freeing head skb early if it fails to pack packet. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02net/sched: cls_flower: merge filter delete/destroy common codeRoi Dayan
Move common code from fl_delete and fl_detroy to __fl_delete. Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02net/sched: cls_flower: add missing unbind call when destroying flowsRoi Dayan
tcf_unbind was called in fl_delete but was missing in fl_destroy when force deleting flows. Fixes: 77b9900ef53a ('tc: introduce Flower classifier') Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. This includes better integration with the routing subsystem for nf_tables, explicit notrack support and smaller updates. More specifically, they are: 1) Add fib lookup expression for nf_tables, from Florian Westphal. This new expression provides a native replacement for iptables addrtype and rp_filter matches. This is more flexible though, since we can populate the kernel flowi representation to inquire fib to accomodate new usecases, such as RTBH through skb mark. 2) Introduce rt expression for nf_tables, from Anders K. Pedersen. This new expression allow you to access skbuff route metadata, more specifically nexthop and classid fields. 3) Add notrack support for nf_tables, to skip conntracking, requested by many users already. 4) Add boilerplate code to allow to use nf_log infrastructure from nf_tables ingress. 5) Allow to mangle pkttype from nf_tables prerouting chain, to emulate the xtables cluster match, from Liping Zhang. 6) Move socket lookup code into generic nf_socket_* infrastructure so we can provide a native replacement for the xtables socket match. 7) Make sure nfnetlink_queue data that is updated on every packets is placed in a different cache from read-only data, from Florian Westphal. 8) Handle NF_STOLEN from nf_tables core, also from Florian Westphal. 9) Start round robin number generation in nft_numgen from zero, instead of n-1, for consistency with xtables statistics match, patch from Liping Zhang. 10) Set GFP_NOWARN flag in skbuff netlink allocations in nfnetlink_log, given we retry with a smaller allocation on failure, from Calvin Owens. 11) Cleanup xt_multiport to use switch(), from Gao feng. 12) Remove superfluous check in nft_immediate and nft_cmp, from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01netfilter: nf_queue: place volatile data in own cachelineFlorian Westphal
As the comment indicates, the data at the end of nfqnl_instance struct is written on every queue/dequeue, so it should reside in its own cacheline. Before this change, 'lock' was in first cacheline so we dirtied both. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_tables: remove useless U8_MAX validationLiping Zhang
After call nft_data_init, size is already validated and desc.len will not exceed the sizeof(struct nft_data), i.e. 16 bytes. So it will never exceed U8_MAX. Furthermore, in nft_immediate_init, we forget to call nft_data_uninit when desc.len exceeds U8_MAX, although this will not happen, but it's a logical mistake. Now remove these redundant validation introduced by commit 36b701fae12a ("netfilter: nf_tables: validate maximum value of u32 netlink attributes") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_tables: introduce routing expressionAnders K. Pedersen
Introduces an nftables rt expression for routing related data with support for nexthop (i.e. the directly connected IP address that an outgoing packet is sent to), which can be used either for matching or accounting, eg. # nft add rule filter postrouting \ ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop This will drop any traffic to 192.168.1.0/24 that is not routed via 192.168.0.1. # nft add rule filter postrouting \ flow table acct { rt nexthop timeout 600s counter } # nft add rule ip6 filter postrouting \ flow table acct { rt nexthop timeout 600s counter } These rules count outgoing traffic per nexthop. Note that the timeout releases an entry if no traffic is seen for this nexthop within 10 minutes. # nft add rule inet filter postrouting \ ether type ip \ flow table acct { rt nexthop timeout 600s counter } # nft add rule inet filter postrouting \ ether type ip6 \ flow table acct { rt nexthop timeout 600s counter } Same as above, but via the inet family, where the ether type must be specified explicitly. "rt classid" is also implemented identical to "meta rtclassid", since it is more logical to have this match in the routing expression going forward. Signed-off-by: Anders K. Pedersen <akp@cohaesio.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.cPablo Neira Ayuso
We need this split to reuse existing codebase for the upcoming nf_tables socket expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_log: add packet logging for netdev familyPablo Neira Ayuso
Move layer 2 packet logging into nf_log_l2packet() that resides in nf_log_common.c, so this can be shared by both bridge and netdev families. This patch adds the boiler plate code to register the netdev logging family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01netfilter: nf_tables: add fib expressionFlorian Westphal
Add FIB expression, supported for ipv4, ipv6 and inet family (the latter just dispatches to ipv4 or ipv6 one based on nfproto). Currently supports fetching output interface index/name and the rtm_type associated with an address. This can be used for adding path filtering. rtm_type is useful to e.g. enforce a strong-end host model where packets are only accepted if daddr is configured on the interface the packet arrived on. The fib expression is a native nftables alternative to the xtables addrtype and rp_filter matches. FIB result order for oif/oifname retrieval is as follows: - if packet is local (skb has rtable, RTF_LOCAL set, this will also catch looped-back multicast packets), set oif to the loopback interface. - if fib lookup returns an error, or result points to local, store zero result. This means '--local' option of -m rpfilter is not supported. It is possible to use 'fib type local' or add explicit saddr/daddr matching rules to create exceptions if this is really needed. - store result in the destination register. In case of multiple routes, search set for desired oif in case strict matching is requested. ipv4 and ipv6 behave fib expressions are supposed to behave the same. [ I have collapsed Arnd Bergmann's ("netfilter: nf_tables: fib warnings") http://patchwork.ozlabs.org/patch/688615/ to address fallout from this patch after rebasing nf-next, that was posted to address compilation warnings. --pablo ] Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-01sunrpc: GFP_KERNEL should be GFP_NOFS in crypto codeJ. Bruce Fields
Writes may depend on the auth_gss crypto code, so we shouldn't be allocating with GFP_KERNEL there. This still leaves some crypto_alloc_* calls which end up doing GFP_KERNEL allocations in the crypto code. Those could probably done at crypto import time. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01svcrdma: backchannel cannot share a page for send and rcv buffersChuck Lever
The underlying transport releases the page pointed to by rq_buffer during xprt_rdma_bc_send_request. When the backchannel reply arrives, rq_rbuffer then points to freed memory. Fixes: 68778945e46f ('SUNRPC: Separate buffer pointers for RPC ...') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01unix: escape all null bytes in abstract unix domain socketIsaac Boukris
Abstract unix domain socket may embed null characters, these should be translated to '@' when printed out to proc the same way the null prefix is currently being translated. This helps for tools such as netstat, lsof and the proc based implementation in ss to show all the significant bytes of the name (instead of getting cut at the first null occurrence). Signed-off-by: Isaac Boukris <iboukris@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01genetlink: fix error return code in genl_register_family()Wei Yongjun
Fix to return a negative error code from the idr_alloc() error handling case instead of 0, as done elsewhere in this function. Also fix the return value check of idr_alloc() since idr_alloc return negative errors on failure, not zero. Fixes: 2ae0f17df1cd ("genetlink: use idr to track families") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01net: Enable support for VRF with ipv4 multicastDavid Ahern
Enable support for IPv4 multicast: - similar to unicast the flow struct is updated to L3 master device if relevant prior to calling fib_rules_lookup. The table id is saved to the lookup arg so the rule action for ipmr can return the table associated with the device. - ip_mr_forward needs to check for master device mismatch as well since the skb->dev is set to it - allow multicast address on VRF device for Rx by checking for the daddr in the VRF device as well as the original ingress device - on Tx need to drop to __mkroute_output when FIB lookup fails for multicast destination address. - if CONFIG_IP_MROUTE_MULTIPLE_TABLES is enabled VRF driver creates IPMR FIB rules on first device create similar to FIB rules. In addition the VRF driver does not divert IPv4 multicast packets: it breaks on Tx since the fib lookup fails on the mcast address. With this patch, ipmr forwarding and local rx/tx work. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove SS_CONNECTED sock stateParthasarathy Bhuvaragan
In this commit, we replace references to sock->state SS_CONNECTE with sk_state TIPC_ESTABLISHED. Finally, the sock->state is no longer explicitly used by tipc. The FSM below is for various types of connection oriented sockets. Stream Server Listening Socket: +-----------+ +-------------+ | TIPC_OPEN |------>| TIPC_LISTEN | +-----------+ +-------------+ Stream Server Data Socket: +-----------+ +------------------+ | TIPC_OPEN |------>| TIPC_ESTABLISHED | +-----------+ +------------------+ ^ | | | | v +--------------------+ | TIPC_DISCONNECTING | +--------------------+ Stream Socket Client: +-----------+ +-----------------+ | TIPC_OPEN |------>| TIPC_CONNECTING |------+ +-----------+ +-----------------+ | | | | | v | +------------------+ | | TIPC_ESTABLISHED | | +------------------+ | ^ | | | | | | v | +--------------------+ | | TIPC_DISCONNECTING |<--+ +--------------------+ Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_CONNECTING as a new sk_stateParthasarathy Bhuvaragan
In this commit, we create a new tipc socket state TIPC_CONNECTING by primarily replacing the SS_CONNECTING with TIPC_CONNECTING. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove SS_DISCONNECTING stateParthasarathy Bhuvaragan
In this commit, we replace the references to SS_DISCONNECTING with the combination of sk_state TIPC_DISCONNECTING and flags set in sk_shutdown. We introduce a new function _tipc_shutdown(), which provides the common code required by tipc_release() and tipc_shutdown(). Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_DISCONNECTING as a new sk_stateParthasarathy Bhuvaragan
In this commit, we create a new tipc socket state TIPC_DISCONNECTING in sk_state. TIPC_DISCONNECTING is replacing the socket connection status update using SS_DISCONNECTING. TIPC_DISCONNECTING is set for connection oriented sockets at: - tipc_shutdown() - connection probe timeout - when we receive an error message on the connection. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_OPEN as a new sk_stateParthasarathy Bhuvaragan
In this commit, we create a new tipc socket state TIPC_OPEN in sk_state. We primarily replace the SS_UNCONNECTED sock->state with TIPC_OPEN. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_ESTABLISHED as a new sk_stateParthasarathy Bhuvaragan
Until now, tipc maintains probing state for connected sockets in tsk->probing_state variable. In this commit, we express this information as socket states and this remove the variable. We set probe_unacked flag when a probe is sent out and reset it if we receive a reply. Instead of the probing state TIPC_CONN_OK, we create a new state TIPC_ESTABLISHED. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: create TIPC_LISTEN as a new sk_stateParthasarathy Bhuvaragan
Until now, tipc maintains the socket state in sock->state variable. This is used to maintain generic socket states, but in tipc we overload it and save tipc socket states like TIPC_LISTEN. Other protocols like TCP, UDP store protocol specific states in sk->sk_state instead. In this commit, we : - declare a new tipc state TIPC_LISTEN, that replaces SS_LISTEN - Create a new function tipc_set_state(), to update sk->sk_state. - TIPC_LISTEN state is maintained in sk->sk_state. - replace references to SS_LISTEN with TIPC_LISTEN. There is no functional change in this commit. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove socket state SS_READYParthasarathy Bhuvaragan
Until now, tipc socket state SS_READY declares that the socket is a connectionless socket. In this commit, we remove the state SS_READY and replace it with a condition which returns true for datagram / connectionless sockets. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove probing_intv from tipc_sockParthasarathy Bhuvaragan
Until now, probing_intv is a variable in struct tipc_sock but is always set to a constant CONN_PROBING_INTERVAL. The socket connection is probed based on this value. In this commit, we remove this variable and setup the socket timer based on the constant CONN_PROBING_INTERVAL. There is no functional change in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove tsk->connected from tipc_sockParthasarathy Bhuvaragan
Until now, we determine if a socket is connected or not based on tsk->connected, which is set once when the probing state is set to TIPC_CONN_OK. It is unset when the sock->state is updated from SS_CONNECTED to any other state. In this commit, we remove connected variable from tipc_sock and derive socket connection status from the following condition: sock->state == SS_CONNECTED => tsk->connected There is no functional change in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: remove tsk->connected for connectionless socketsParthasarathy Bhuvaragan
Until now, for connectionless sockets the peer information during connect is stored in tsk->peer and a connection state is set in tsk->connected. This is redundant. In this commit, for connectionless sockets we update: - __tipc_sendmsg(), when the destination is NULL the peer existence is determined by tsk->peer.family, instead of tsk->connected. - tipc_connect(), remove set/unset of tsk->connected. Hence tsk->connected is no longer used for connectionless sockets. There is no functional change in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: rename tsk->remote to tsk->peer for consistent namingParthasarathy Bhuvaragan
Until now, the peer information for connect is stored in tsk->remote but the rest of code uses the name peer for peer/remote. In this commit, we rename tsk->remote to tsk->peer to align with naming convention followed in the rest of the code. There is no functional change in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: rename struct tipc_skb_cb member handle to bytes_readParthasarathy Bhuvaragan
In this commit, we rename handle to bytes_read indicating the purpose of the member. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: set kern=0 in sk_alloc() during tipc_accept()Parthasarathy Bhuvaragan
Until now, tipc_accept() calls sk_alloc() with kern=1. This is incorrect as the data socket's owner is the user application. Thus for these accepted data sockets the network namespace refcount is skipped. In this commit, we fix this by setting kern=0. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: wakeup sleeping users at disconnectParthasarathy Bhuvaragan
Until now, in filter_connect() when we terminate a connection due to an error message from peer, we set the socket state to DISCONNECTING. The socket is notified about this broken connection using EPIPE when a user tries to send a message. However if a socket was waiting on a poll() while the connection is being terminated, we fail to wakeup that socket. In this commit, we wakeup sleeping sockets at connection termination. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-01tipc: return early for non-blocking sockets at link congestionParthasarathy Bhuvaragan
Until now, in stream/mcast send() we pass the message to the link layer even when the link is congested and add the socket to the link's wakeup queue. This is unnecessary for non-blocking sockets. If a socket is set to non-blocking and sends multicast with zero back off time while receiving EAGAIN, we exhaust the memory. In this commit, we return immediately at stream/mcast send() for non-blocking sockets. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31Merge tag 'linux-can-fixes-for-4.9-20161031' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2016-10-31 this is a pull request of two patches for the upcoming v4.9 release. The first patch is by Lukas Resch for the sja1000 plx_pci driver that adds support for Moxa CAN devices. The second patch is by Oliver Hartkopp and fixes a potential kernel panic in the CAN broadcast manager. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31sctp: hold transport instead of assoc when lookup assoc in rx pathXin Long
Prior to this patch, in rx path, before calling lock_sock, it needed to hold assoc when got it by __sctp_lookup_association, in case other place would free/put assoc. But in __sctp_lookup_association, it lookup and hold transport, then got assoc by transport->assoc, then hold assoc and put transport. It means it didn't hold transport, yet it was returned and later on directly assigned to chunk->transport. Without the protection of sock lock, the transport may be freed/put by other places, which would cause a use-after-free issue. This patch is to fix this issue by holding transport instead of assoc. As holding transport can make sure to access assoc is also safe, and actually it looks up assoc by searching transport rhashtable, to hold transport here makes more sense. Note that the function will be renamed later on on another patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31sctp: return back transport in __sctp_rcv_init_lookupXin Long
Prior to this patch, it used a local variable to save the transport that is looked up by __sctp_lookup_association(), and didn't return it back. But in sctp_rcv, it is used to initialize chunk->transport. So when hitting this, even if it found the transport, it was still initializing chunk->transport with null instead. This patch is to return the transport back through transport pointer that is from __sctp_rcv_lookup_harder(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31sctp: hold transport instead of assoc in sctp_diagXin Long
In sctp_transport_lookup_process(), Commit 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") moved cb() out of rcu lock, but it put transport and hold assoc instead, and ignore that cb() still uses transport. It may cause a use-after-free issue. This patch is to hold transport instead of assoc there. Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31bridge: mcast: add router port on PIM hello messageNikolay Aleksandrov
When we receive a PIM Hello message on a port we can consider that it has a multicast router attached, thus it is correct to add it to the router list. The only catch is it shouldn't be considered for a querier. Using Daniel's description: leaf-11 leaf-12 leaf-13 \ | / bridge-1 / \ host-11 host-12 - all ports in bridge-1 are in a single vlan aware bridge - leaf-11 is the IGMP querier - leaf-13 is the PIM DR - host-11 TXes packets to 226.10.10.10 - bridge-1 only forwards the 226.10.10.10 traffic out the port to leaf-11, it should also forward this traffic out the port to leaf-13 Suggested-by: Daniel Walton <dwalton@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31net: pim: add all RFC7761 message typesNikolay Aleksandrov
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>