summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-08-14ipv6: release rt6->rt6i_idev properly during ifdownWei Wang
When a dst is created by addrconf_dst_alloc() for a host route or an anycast route, dst->dev points to loopback dev while rt6->rt6i_idev points to a real device. When the real device goes down, the current cleanup code only checks for dst->dev and assumes rt6->rt6i_idev->dev is the same. This causes the refcount leak on the real device in the above situation. This patch makes sure to always release the refcount taken on rt6->rt6i_idev during dst_dev_put(). Fixes: 587fea741134 ("ipv6: mark DST_NOGC and remove the operation of dst_free()") Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Tested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-14af_key: do not use GFP_KERNEL in atomic contextsEric Dumazet
pfkey_broadcast() might be called from non process contexts, we can not use GFP_KERNEL in these cases [1]. This patch partially reverts commit ba51b6be38c1 ("net: Fix RCU splat in af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock() section. [1] : syzkaller reported : in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439 3 locks held by syzkaller183439/2932: #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649 #1: (&pfk->dump_lock){+.+.+.}, at: [<ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293 #2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline] #2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028 CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994 __might_sleep+0x95/0x190 kernel/sched/core.c:5947 slab_pre_alloc_hook mm/slab.h:416 [inline] slab_alloc mm/slab.c:3383 [inline] kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559 skb_clone+0x1a0/0x400 net/core/skbuff.c:1037 pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207 pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281 dump_sp+0x3d6/0x500 net/key/af_key.c:2685 xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042 pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695 pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299 pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722 pfkey_process+0x606/0x710 net/key/af_key.c:2814 pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 ___sys_sendmsg+0x755/0x890 net/socket.c:2035 __sys_sendmsg+0xe5/0x210 net/socket.c:2069 SYSC_sendmsg net/socket.c:2080 [inline] SyS_sendmsg+0x2d/0x50 net/socket.c:2076 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x445d79 RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79 RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008 RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700 R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000 R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000 Fixes: ba51b6be38c1 ("net: Fix RCU splat in af_key") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: David Ahern <dsa@cumulusnetworks.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-14tcp: ulp: avoid module refcnt leak in tcp_set_ulpSabrina Dubroca
__tcp_ulp_find_autoload returns tcp_ulp_ops after taking a reference on the module. Then, if ->init fails, tcp_set_ulp propagates the error but nothing releases that reference. Fixes: 734942cc4ea6 ("tcp: ULP infrastructure") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-14tipc: avoid inheriting msg_non_seq flag when message is returnedJon Paul Maloy
In the function msg_reverse(), we reverse the header while trying to reuse the original buffer whenever possible. Those rejected/returned messages are always transmitted as unicast, but the msg_non_seq field is not explicitly set to zero as it should be. We have seen cases where multicast senders set the message type to "NOT dest_droppable", meaning that a multicast message shorter than one MTU will be returned, e.g., during receive buffer overflow, by reusing the original buffer. This has the effect that even the 'msg_non_seq' field is inadvertently inherited by the rejected message, although it is now sent as a unicast message. This again leads the receiving unicast link endpoint to steer the packet toward the broadcast link receive function, where it is dropped. The affected unicast link is thereafter (after 100 failed retransmissions) declared 'stale' and reset. We fix this by unconditionally setting the 'msg_non_seq' flag to zero for all rejected/returned messages. Reported-by: Canh Duc Luu <canh.d.luu@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-14tipc: accept PACKET_MULTICAST packetsJon Paul Maloy
On L2 bearers, the TIPC broadcast function is sending out packets using the corresponding L2 broadcast address. At reception, we filter such packets under the assumption that they will also be delivered as broadcast packets. This assumption doesn't always hold true. Under high load, we have seen that a switch may convert the destination address and deliver the packet as a PACKET_MULTICAST, something leading to inadvertently dropped packets and a stale and reset broadcast link. We fix this by extending the reception filtering to accept packets of type PACKET_MULTICAST. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-14ipv4: route: fix inet_rtm_getroute induced crashFlorian Westphal
"ip route get $daddr iif eth0 from $saddr" causes: BUG: KASAN: use-after-free in ip_route_input_rcu+0x1535/0x1b50 Call Trace: ip_route_input_rcu+0x1535/0x1b50 ip_route_input_noref+0xf9/0x190 tcp_v4_early_demux+0x1a4/0x2b0 ip_rcv+0xbcb/0xc05 __netif_receive_skb+0x9c/0xd0 netif_receive_skb_internal+0x5a8/0x890 Problem is that inet_rtm_getroute calls either ip_route_input_rcu (if an iif was provided) or ip_route_output_key_hash_rcu. But ip_route_input_rcu, unlike ip_route_output_key_hash_rcu, already associates the dst_entry with the skb. This clears the SKB_DST_NOREF bit (i.e. skb_dst_drop will release/free the entry while it should not). Thus only set the dst if we called ip_route_output_key_hash_rcu(). I tested this patch by running: while true;do ip r get 10.0.1.2;done > /dev/null & while true;do ip r get 10.0.1.2 iif eth0 from 10.0.1.1;done > /dev/null & ... and saw no crash or memory leak. Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: David Ahern <dsahern@gmail.com> Fixes: ba52d61e0ff ("ipv4: route: restore skb_dst_set in inet_rtm_getroute") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-13net: ipv4: add check for l3slave for index returned in IP_PKTINFODavid Ahern
Similar to the loopback device, for packets sent through a VRF device the index returned in ipi_ifindex needs to be the saved index in rt_iif. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-13net: ipv4: remove unnecessary check on orig_oifDavid Ahern
rt_iif is going to be set to either 0 or orig_oif. If orig_oif is 0 it amounts to the same end result so remove the check. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-13net: export some generic xdp helpersJason Wang
This patch tries to export some generic xdp helpers to drivers. This can let driver to do XDP for a specific skb. This is useful for the case when the packet is hard to be processed at page level directly (e.g jumbo/GSO frame). With this patch, there's no need for driver to forbid the XDP set when configuration is not suitable. Instead, it can defer the XDP for packets that is hard to be processed directly after skb is created. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-13rtnelink: Move link dump consistency check out of the loopJakub Sitnicki
Calls to rtnl_dump_ifinfo() are protected by RTNL lock. So are the {list,unlist}_netdevice() calls where we bump the net->dev_base_seq number. For this reason net->dev_base_seq can't change under out feet while we're looping over links in rtnl_dump_ifinfo(). So move the check for net->dev_base_seq change (since the last time we were called) out of the loop. This way we avoid giving a wrong impression that there are concurrent updates to the link list going on while we're iterating over them. Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11bpf: fix two missing target_size settings in bpf_convert_ctx_accessDaniel Borkmann
When CONFIG_NET_SCHED or CONFIG_NET_RX_BUSY_POLL is /not/ set and we try a narrow __sk_buff load of tc_index or napi_id, respectively, then verifier rightfully complains that it's misconfigured, because we need to set target_size in each of the two cases. The rewrite for the ctx access is just a dummy op, but needs to pass, so fix this up. Fixes: f96da09473b5 ("bpf: simplify narrower ctx access") Reported-by: Shubham Bansal <illusionist.neo@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11openvswitch: Remove unnecessary newlines from OVS_NLERR usesJoe Perches
OVS_NLERR already adds a newline so these just add blank lines to the logging. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11net: skb_needs_check() removes CHECKSUM_UNNECESSARY check for tx.Tonghao Zhang
Because we remove the UFO support, we will also remove the CHECKSUM_UNNECESSARY check in skb_needs_check(). Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11net: ipv4: set orig_oif based on fib result for local trafficDavid Ahern
Attempts to connect to a local address with a socket bound to a device with the local address hangs if there is no listener: $ ip addr sh dev eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 02:e0:f9:1c:00:37 brd ff:ff:ff:ff:ff:ff inet 10.100.1.4/24 scope global eth1 valid_lft forever preferred_lft forever inet6 2001:db8:1::4/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe1c:37/64 scope link valid_lft forever preferred_lft forever $ vrf-test -I eth1 -r 10.100.1.4 <hangs when there is no server> (don't let the command name fool you; vrf-test works without vrfs.) The problem is that the original intended device, eth1 in this case, is lost when the tcp reset is sent, so the socket lookup does not find a match for the reset and the connect attempt hangs. Fix by adjusting orig_oif for local traffic to the device from the fib lookup result. With this patch you get the more user friendly: $ vrf-test -I eth1 -r 10.100.1.4 connect failed: 111: Connection refused orig_oif is saved to the newly created rtable as rt_iif and when set it is used as the dif for socket lookups. It is set based on flowi4_oif passed in to ip_route_output_key_hash_rcu and will be set to either the loopback device, an l3mdev device, nothing (flowi4_oif = 0 which is the case in the example above) or a netdev index depending on the lookup path. In each case, resetting orig_oif to the device in the fib result for the RTN_LOCAL case allows the actual device to be preserved as the skb tx and rx is done over the loopback or VRF device. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11net/sched/hfsc: allocate tcf block for hfsc root classKonstantin Khlebnikov
Without this filters cannot be attached. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11net: dsa: ksz: fix skb freeingVivien Didelot
The DSA layer frees the original skb when an xmit function returns NULL, meaning an error occurred. But if the tagging code copied the original skb, it is responsible of freeing the copy if an error occurs. The ksz tagging code currently has two issues: if skb_put_padto fails, the skb copy is not freed, and the original skb will be freed twice. To fix that, move skb_put_padto inside both branches of the skb_tailroom condition, before freeing the original skb, and free the copy on error. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11net: sched: remove cops->tcf_cl_offloadJiri Pirko
cops->tcf_cl_offload is no longer needed, as the drivers check what they can and cannot offload using the classid identify helpers. So remove this. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11net: sched: use newly added classid identity helpersJiri Pirko
Instead of checking handle, which does not have the inner class information and drivers wrongly assume clsact->egress as ingress, use the newly introduced classid identification helpers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11Bluetooth: kfree tmp rather than an alias to itColin Ian King
While the kfree of dhkey_a is of the same address of tmp, it probably is clearer and more human readable if tmp is kfree'd rather than dhkey_a. Detected by CoverityScan, CID#1448650 ("Free of address-of expression") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-08-11xprtrdma: Clean up rpcrdma_bc_marshal_reply()Chuck Lever
Same changes as in rpcrdma_marshal_req(). This removes C-structure style encoding from the backchannel. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11xprtrdma: Harden chunk list encoding against send buffer overflowChuck Lever
While marshaling chunk lists which are variable-length XDR objects, check for XDR buffer overflow at every step. Measurements show no significant changes in CPU utilization. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11xprtrdma: Set up an xdr_stream in rpcrdma_marshal_req()Chuck Lever
Initialize an xdr_stream at the top of rpcrdma_marshal_req(), and use it to encode the fixed transport header fields. This xdr_stream will be used to encode the chunk lists in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11xprtrdma: Remove rpclen from rpcrdma_marshal_reqChuck Lever
Clean up: Remove a variable whose result is no longer used. Commit 655fec6987be ("xprtrdma: Use gathered Send for large inline messages") should have removed it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11xprtrdma: Clean up rpcrdma_marshal_req() synopsisChuck Lever
Clean up: The caller already has rpcrdma_xprt, so pass that directly instead. And provide a documenting comment for this critical function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11sctp: fix some indents in sm_make_chunk.cXin Long
There are some bad indents of functions' defination in sm_make_chunk.c. They have been there since beginning, it was probably caused by that the typedef sctp_chunk_t was replaced with struct sctp_chunk. So it's the best time to fix them in this patchset, it's also to fix some bad indents in other functions' defination in sm_make_chunk.c. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11sctp: remove the typedef sctp_disposition_tXin Long
This patch is to remove the typedef sctp_disposition_t, and replace with enum sctp_disposition in the places where it's using this typedef. It's also to fix the indent for many functions' defination. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11sctp: remove the typedef sctp_sm_table_entry_tXin Long
This patch is to remove the typedef sctp_sm_table_entry_t, and replace with struct sctp_sm_table_entry in the places where it's using this typedef. It is also to fix some indents. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11sctp: remove the typedef sctp_verb_tXin Long
This patch is to remove the typedef sctp_verb_t, and replace with enum sctp_verb in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11sctp: remove the typedef sctp_arg_tXin Long
This patch is to remove the typedef sctp_arg_t, and replace with union sctp_arg in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11sctp: remove the typedef sctp_cmd_seq_tXin Long
This patch is to remove the typedef sctp_cmd_seq_t, and replace with struct sctp_cmd_seq in the places where it's using this typedef. Note that it doesn't fix many indents although it should, as sctp_disposition_t's removal would mess them up again. So better to fix them when removing sctp_disposition_t in the later patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11sctp: remove the typedef sctp_cmd_tXin Long
This patch is to remove the typedef sctp_cmd_t, and replace with enum sctp_cmd in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11sctp: remove the typedef sctp_socket_type_tXin Long
This patch is to remove the typedef sctp_socket_type_t, and replace with enum sctp_socket_type in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11sctp: remove the typedef sctp_dbg_objcnt_entry_tXin Long
This patch is to remove the typedef sctp_dbg_objcnt_entry_t, and replace with struct sctp_dbg_objcnt_entry in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11sctp: remove the typedef sctp_cmsgs_tXin Long
This patch is to remove the typedef sctp_cmsgs_t, and replace with struct sctp_cmsgs in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11sctp: remove the typedef sctp_sender_hb_info_tXin Long
This patch is to remove the typedef sctp_sender_hb_info_t, and replace with struct sctp_sender_hb_info in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar
Conflicts: include/linux/mm_types.h mm/huge_memory.c I removed the smp_mb__before_spinlock() like the following commit does: 8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()") and fixed up the affected commits. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-11net: xfrm: support setting an output mark.Lorenzo Colitti
On systems that use mark-based routing it may be necessary for routing lookups to use marks in order for packets to be routed correctly. An example of such a system is Android, which uses socket marks to route packets via different networks. Currently, routing lookups in tunnel mode always use a mark of zero, making routing incorrect on such systems. This patch adds a new output_mark element to the xfrm state and a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output mark differs from the existing xfrm mark in two ways: 1. The xfrm mark is used to match xfrm policies and states, while the xfrm output mark is used to set the mark (and influence the routing) of the packets emitted by those states. 2. The existing mark is constrained to be a subset of the bits of the originating socket or transformed packet, but the output mark is arbitrary and depends only on the state. The use of a separate mark provides additional flexibility. For example: - A packet subject to two transforms (e.g., transport mode inside tunnel mode) can have two different output marks applied to it, one for the transport mode SA and one for the tunnel mode SA. - On a system where socket marks determine routing, the packets emitted by an IPsec tunnel can be routed based on a mark that is determined by the tunnel, not by the marks of the unencrypted packets. - Support for setting the output marks can be introduced without breaking any existing setups that employ both mark-based routing and xfrm tunnel mode. Simply changing the code to use the xfrm mark for routing output packets could xfrm mark could change behaviour in a way that breaks these setups. If the output mark is unspecified or set to zero, the mark is not set or changed. Tested: make allyesconfig; make -j64 Tested: https://android-review.googlesource.com/452776 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Mainline had UFO fixes, but UFO is removed in net-next so we take the HEAD hunks. Minor context conflict in bcmsysport statistics bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10packet: fix tp_reserve race in packet_set_ringWillem de Bruijn
Updates to tp_reserve can race with reads of the field in packet_set_ring. Avoid this by holding the socket lock during updates in setsockopt PACKET_RESERVE. This bug was discovered by syzkaller. Fixes: 8913336a7e8d ("packet: add PACKET_RESERVE sockopt") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10udp: consistently apply ufo or fragmentationWillem de Bruijn
When iteratively building a UDP datagram with MSG_MORE and that datagram exceeds MTU, consistently choose UFO or fragmentation. Once skb_is_gso, always apply ufo. Conversely, once a datagram is split across multiple skbs, do not consider ufo. Sendpage already maintains the first invariant, only add the second. IPv6 does not have a sendpage implementation to modify. A gso skb must have a partial checksum, do not follow sk_no_check_tx in udp_send_skb. Found by syzkaller. Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10rtnetlink: fallback to UNSPEC if current family has no doit callbackFlorian Westphal
We need to use PF_UNSPEC in case the requested family has no doit callback, otherwise this now fails with EOPNOTSUPP instead of running the unspec doit callback, as before. Fixes: 6853dd488119 ("rtnetlink: protect handler table with rcu") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10rtnetlink: init handler refcounts to 1Florian Westphal
If using CONFIG_REFCOUNT_FULL=y we get following splat: refcount_t: increment on 0; use-after-free. WARNING: CPU: 0 PID: 304 at lib/refcount.c:152 refcount_inc+0x47/0x50 Call Trace: rtnetlink_rcv_msg+0x191/0x260 ... This warning is harmless (0 is "no callback running", not "memory was freed"). Use '1' as the new 'no handler is running' base instead of 0 to avoid this. Fixes: 019a316992ee ("rtnetlink: add reference counting to prevent module unload while dump is in progress") Reported-by: Sabrina Dubroca <sdubroca@redhat.com> Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10rtnetlink: switch rtnl_link_get_slave_info_data_size to rcuFlorian Westphal
David Ahern reports following splat: RTNL: assertion failed at net/core/dev.c (5717) netdev_master_upper_dev_get+0x5f/0x70 if_nlmsg_size+0x158/0x240 rtnl_calcit.isra.26+0xa3/0xf0 rtnl_link_get_slave_info_data_size currently assumes RTNL protection, but there appears to be no hard requirement for this, so use rcu instead. At the time of this writing, there are three 'get_slave_size' callbacks (now invoked under rcu): bond_get_slave_size, vrf_get_slave_size and br_port_get_slave_size, all return constant only (i.e. they don't sleep). Fixes: 6853dd488119 ("rtnetlink: protect handler table with rcu") Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10rtnetlink: do not use RTM_GETLINK directlyFlorian Westphal
Userspace sends RTM_GETLINK type, but the kernel substracts RTM_BASE from this, i.e. 'type' doesn't contain RTM_GETLINK anymore but instead RTM_GETLINK - RTM_BASE. This caused the calcit callback to not be invoked when it should have been (and vice versa). While at it, also fix a off-by one when checking family index. vs handler array size. Fixes: e1fa6d216dd ("rtnetlink: call rtnl_calcit directly") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10rtnetlink: use rcu_dereference_raw to silence rcu splatFlorian Westphal
Ido reports a rcu splat in __rtnl_register. The splat is correct; as rtnl_register doesn't grab any logs and doesn't use rcu locks either. It has always been like this. handler families are not registered in parallel so there are no races wrt. the kmalloc ordering. The only reason to use rcu_dereference in the first place was to avoid sparse from complaining about this. Thus this switches to _raw() to not have rcu checks here. The alternative is to add rtnl locking to register/unregister, however, I don't see a compelling reason to do so as this has been lockless for the past twenty years or so. Fixes: 6853dd4881 ("rtnetlink: protect handler table with rcu") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10net: core: fix compile error inside flow_dissector due to new dsa callbackJohn Crispin
The following error was introduced by commit 43e665287f93 ("net-next: dsa: fix flow dissection") due to a missing #if guard net/core/flow_dissector.c: In function '__skb_flow_dissect': net/core/flow_dissector.c:448:18: error: 'struct net_device' has no member named 'dsa_ptr' ops = skb->dev->dsa_ptr->tag_ops; ^ make[3]: *** [net/core/flow_dissector.o] Error 1 Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10jump_label: Do not use unserialized static_key_enabled()Paolo Bonzini
Any use of key->enabled (that is static_key_enabled and static_key_count) outside jump_label_lock should handle its own serialization. The only two that are not doing so are the UDP encapsulation static keys. Change them to use static_key_enable, which now correctly tests key->enabled under the jump label lock. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1501601046-35683-3-git-send-email-pbonzini@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-09net-next: dsa: fix flow dissectionJohn Crispin
RPS and probably other kernel features are currently broken on some if not all DSA devices. The root cause of this is that skb_hash will call the flow_dissector. At this point the skb still contains the magic switch header and the skb->protocol field is not set up to the correct 802.3 value yet. By the time the tag specific code is called, removing the header and properly setting the protocol an invalid hash is already set. In the case of the mt7530 this will result in all flows always having the same hash. Signed-off-by: Muciri Gatimu <muciri@openmesh.com> Signed-off-by: Shashidhar Lakkavalli <shashidhar.lakkavalli@openmesh.com> Signed-off-by: John Crispin <john@phrozen.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net-next: tag_mtk: add flow_dissect callback to the ops structJohn Crispin
The MT7530 inserts the 4 magic header in between the 802.3 address and protocol field. The patch implements the callback that can be called by the flow dissector to figure out the real protocol and offset of the network header. With this patch applied we can properly parse the packet and thus make hashing function properly. Signed-off-by: Muciri Gatimu <muciri@openmesh.com> Signed-off-by: Shashidhar Lakkavalli <shashidhar.lakkavalli@openmesh.com> Signed-off-by: John Crispin <john@phrozen.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net-next: dsa: move struct dsa_device_ops to the global header fileJohn Crispin
We need to access this struct from within the flow_dissector to fix dissection for packets coming in on DSA devices. Signed-off-by: Muciri Gatimu <muciri@openmesh.com> Signed-off-by: Shashidhar Lakkavalli <shashidhar.lakkavalli@openmesh.com> Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>