summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-10-25ip6_tunnel: Allow rcv/xmit even if remote address is a local addressShmulik Ladkani
Currently, ip6_tnl_xmit_ctl drops tunneled packets if the remote address (outer v6 destination) is one of host's locally configured addresses. Same applies to ip6_tnl_rcv_ctl: it drops packets if the remote address (outer v6 source) is a local address. This prevents using ipxip6 (and ip6_gre) tunnels whose local/remote endpoints are on same host; OTOH v4 tunnels (ipip or gre) allow such configurations. An example where this proves useful is a system where entities are identified by their unique v6 addresses, and use tunnels to encapsulate traffic between them. The limitation prevents placing several entities on same host. Introduce IP6_TNL_F_ALLOW_LOCAL_REMOTE which allows to bypass this restriction. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24netfilter: nf_ct_h323: Out Of Bound Read in Netfilter ConntrackEric Sesterhenn
Add missing counter decrement to prevent out of bounds memory read. Signed-off-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-24netfilter: xt_connlimit: don't store address in the conn nodesFlorian Westphal
Only stored, never read. This is a leftover from commit 7d08487777c8 ("netfilter: connlimit: use rbtree for per-host conntrack obj storage"), which added the rbtree node struct that stores the address instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-24netfilter: nf_conntrack_h323: Remove typedef structHarsha Sharma
Remove typedef from struct as linux-kernel coding style tends to avoid using typedefs. Done using following coccinelle semantic patch @r1@ type T; @@ typedef struct { ... } T; @script:python c1@ T2; T << r1.T; @@ if T[-2:] =="_t" or T[-2:] == "_T": coccinelle.T2 = T[:-2]; else: coccinelle.T2 = T; print T, coccinelle.T2 @r2@ type r1.T; identifier c1.T2; @@ -typedef struct + T2 { ... } -T ; @r3@ type r1.T; identifier c1.T2; @@ -T +struct T2 Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-24netfilter: conntrack: make l3proto trackers constFlorian Westphal
previous patches removed all writes to them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-24netfilter: x_tables: don't use seqlock when fetching old countersFlorian Westphal
after previous commit xt_replace_table will wait until all cpus had even seqcount (i.e., no cpu is accessing old ruleset). Add a 'old' counter retrival version that doesn't synchronize counters. Its not needed, the old counters are not in use anymore at this point. This speeds up table replacement on busy systems with large tables (and many cores). Cc: Dan Williams <dcbw@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-24netfilter: x_tables: make xt_replace_table wait until old rules are not used ↵Florian Westphal
anymore xt_replace_table relies on table replacement counter retrieval (which uses xt_recseq to synchronize pcpu counters). This is fine, however with large rule set get_counters() can take a very long time -- it needs to synchronize all counters because it has to assume concurrent modifications can occur. Make xt_replace_table synchronize by itself by waiting until all cpus had an even seqcount. This allows a followup patch to copy the counters of the old ruleset without any synchonization after xt_replace_table has completed. Cc: Dan Williams <dcbw@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-24netfilter: conntrack: remove pf argument from l4 packet functionsFlorian Westphal
not needed/used anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-24netfilter: conntrack: add and use nf_ct_l4proto_log_invalidFlorian Westphal
We currently pass down the l4 protocol to the conntrack ->packet() function, but the only user of this is the debug info decision. Same information can be derived from struct nf_conn. Add a wrapper for the previous patch that extracs the information from nf_conn and passes it to nf_l4proto_log_invalid(). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-24netfilter: conntrack: add and use nf_l4proto_log_invalidFlorian Westphal
We currently pass down the l4 protocol to the conntrack ->packet() function, but the only user of this is the debug info decision. Same information can be derived from struct nf_conn. As a first step, add and use a new log function for this, similar to nf_ct_helper_log(). Add __cold annotation -- invalid packets should be infrequent so gcc can consider all call paths that lead to such a function as unlikely. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-24netfilter: nat: use test_and_clear_bit when deleting ct from bysource listFlorian Westphal
We can use a single statement for this. While at it, fixup the comment -- we don't have pernet table/ops anymore, the function is only called from module exit path. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-24xfrm: Fix xfrm_dst_cache memleakSteffen Klassert
We have a memleak whenever a flow matches a policy without a matching SA. In this case we generate a dummy bundle and take an additional refcount on the dst_entry. This was needed as long as we had the flowcache. The flowcache removal patches deleted all related refcounts but forgot the one for the dummy bundle case. Fix the memleak by removing this refcount. Fixes: 3ca28286ea80 ("xfrm_policy: bypass flow_cache_lookup") Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-10-24Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-10-24 1) Fix a memleak when we don't find a inner_mode during bundle creation. From David Miller. 2) Fix a xfrm policy dump crash. We may crash on error when dumping policies via netlink. Fix this by initializing the policy walk with the cb->start method. This fix is a serious stable candidate. From Herbert Xu. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24net/sched: Fix actions list corruption when adding offloaded tc flowsOr Gerlitz
Prior to commit b3f55bdda8df, the networking core doesn't wire an in-place actions list the when the low level driver is called to offload the flow, but all low level drivers do that (call tcf_exts_to_list()) in their offloading "add" logic. Now, the in-place list is set in the core which goes over the list in a loop, but also by the hw driver when their offloading code is invoked indirectly: cls_xxx add flow -> tc_setup_cb_call -> tc_exts_setup_cb_egdev_call -> hw driver which messes up the core list instance upon driver return. Fix that by avoiding in-place list on the net core code that deals with adding flows. Fixes: b3f55bdda8df ('net: sched: introduce per-egress action device callbacks') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24ipv6: add ip6_null_entry check in rt6_select()Wei Wang
In rt6_select(), fn->leaf could be pointing to net->ipv6.ip6_null_entry. In this case, we should directly return instead of trying to carry on with the rest of the process. If not, we could crash at: spin_lock_bh(&leaf->rt6i_table->rt6_lock); because net->ipv6.ip6_null_entry does not have rt6i_table set. Syzkaller recently reported following issue on net-next: Use struct sctp_sack_info instead kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: sctp: [Deprecated]: syz-executor4 (pid 26496) Use of struct sctp_assoc_value in delayed_ack socket option. Use struct sctp_sack_info instead CPU: 1 PID: 26523 Comm: syz-executor6 Not tainted 4.14.0-rc4+ #85 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d147e3c0 task.stack: ffff8801a4328000 RIP: 0010:debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline] RIP: 0010:do_raw_spin_lock+0x23/0x1e0 kernel/locking/spinlock_debug.c:112 RSP: 0018:ffff8801a432ed70 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: 0000000000000018 RCX: 0000000000000000 RDX: 0000000000000003 RSI: 0000000000000000 RDI: 000000000000001c RBP: ffff8801a432ed90 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: ffffffff8482b279 R12: ffff8801ce2ff3a0 sctp: [Deprecated]: syz-executor1 (pid 26546) Use of int in maxseg socket option. Use struct sctp_assoc_value instead R13: dffffc0000000000 R14: ffff8801d971e000 R15: ffff8801ce2ff0d8 FS: 00007f56e82f5700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001ddbc22000 CR3: 00000001a4a04000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline] _raw_spin_lock_bh+0x39/0x40 kernel/locking/spinlock.c:175 spin_lock_bh include/linux/spinlock.h:321 [inline] rt6_select net/ipv6/route.c:786 [inline] ip6_pol_route+0x1be3/0x3bd0 net/ipv6/route.c:1650 sctp: [Deprecated]: syz-executor1 (pid 26576) Use of int in maxseg socket option. Use struct sctp_assoc_value instead TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies. Check SNMP counters. ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1843 fib6_rule_lookup+0x9e/0x2a0 net/ipv6/ip6_fib.c:309 ip6_route_output_flags+0x1f1/0x2b0 net/ipv6/route.c:1871 ip6_route_output include/net/ip6_route.h:80 [inline] ip6_dst_lookup_tail+0x4ea/0x970 net/ipv6/ip6_output.c:953 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1076 sctp_v6_get_dst+0x675/0x1c30 net/sctp/ipv6.c:274 sctp_transport_route+0xa8/0x430 net/sctp/transport.c:287 sctp_assoc_add_peer+0x4fe/0x1100 net/sctp/associola.c:656 __sctp_connect+0x251/0xc80 net/sctp/socket.c:1187 sctp_connect+0xb4/0xf0 net/sctp/socket.c:4209 inet_dgram_connect+0x16b/0x1f0 net/ipv4/af_inet.c:541 SYSC_connect+0x20a/0x480 net/socket.c:1642 SyS_connect+0x24/0x30 net/socket.c:1623 entry_SYSCALL_64_fastpath+0x1f/0xbe Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24tcp: Configure TFO without cookie per socket and/or per routeChristoph Paasch
We already allow to enable TFO without a cookie by using the fastopen-sysctl and setting it to TFO_SERVER_COOKIE_NOT_REQD (or TFO_CLIENT_NO_COOKIE). This is safe to do in certain environments where we know that there isn't a malicous host (aka., data-centers) or when the application-protocol already provides an authentication mechanism in the first flight of data. A server however might be providing multiple services or talking to both sides (public Internet and data-center). So, this server would want to enable cookie-less TFO for certain services and/or for connections that go to the data-center. This patch exposes a socket-option and a per-route attribute to enable such fine-grained configurations. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24ipv4: tcp_minisocks: use BUG_ON instead of if condition followed by BUGGustavo A. R. Silva
Use BUG_ON instead of if condition followed by BUG in tcp_time_wait. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24ipv4: icmp: use BUG_ON instead of if condition followed by BUGGustavo A. R. Silva
Use BUG_ON instead of if condition followed by BUG in icmp_timestamp. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBINDLaszlo Toth
Commit 9b9742022888 ("sctp: support ipv6 nonlocal bind") introduced support for the above options as v4 sctp did, so patched sctp_v6_available(). In the v4 implementation it's enough, because sctp_inet_bind_verify() just returns with sctp_v4_available(). However sctp_inet6_bind_verify() has an extra check before that for link-local scope_id, which won't respect the above options. Added the checks before calling ipv6_chk_addr(), but not before the validation of scope_id. before (w/ both options): ./v6test fe80::10 sctp bind failed, errno: 99 (Cannot assign requested address) ./v6test fe80::10 tcp bind success, errno: 0 (Success) after (w/ both options): ./v6test fe80::10 sctp bind success, errno: 0 (Success) Signed-off-by: Laszlo Toth <laszlth@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24net: smc_close: mark expected switch fall-throughGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in this particular case I placed the "fall through" comment on its own line, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24net: rxrpc: mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24ipv6: addrconf: do not block BH in ipv6_chk_home_addr()Eric Dumazet
rcu_read_lock() is enough here, no need to block BH. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24ipv6: addrconf: do not block BH in /proc/net/if_inet6 handlingEric Dumazet
Table is really RCU protected, no need to block BH Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24ipv6: addrconf: do not block BH in ipv6_get_ifaddr()Eric Dumazet
rcu_read_lock() is enough here, no need to block BH. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24ipv6: addrconf: do not block BH in ipv6_chk_addr_and_flags()Eric Dumazet
rcu_read_lock() is enough here, as inet6_ifa_finish_destroy() uses kfree_rcu() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24ipv6: addrconf: add per netns perturbation in inet6_addr_hash()Eric Dumazet
Bring IPv6 in par with IPv4 : - Use net_hash_mix() to spread addresses a bit more. - Use 256 slots hash table instead of 16 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24ipv6: addrconf: factorize inet6_addr_hash() callEric Dumazet
ipv6_add_addr_hash() can compute the hash value outside of locked section and pass it to ipv6_chk_same_addr(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24ipv6: addrconf: move ipv6_chk_same_addr() to avoid forward declarationEric Dumazet
ipv6_chk_same_addr() is only used by ipv6_add_addr_hash(), so moving it avoids a forward declaration. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23net/9p: Switch to wait_event_killable()Tuomas Tynkkynen
Because userspace gets Very Unhappy when calls like stat() and execve() return -EINTR on 9p filesystem mounts. For instance, when bash is looking in PATH for things to execute and some SIGCHLD interrupts stat(), bash can throw a spurious 'command not found' since it doesn't retry the stat(). In practice, hitting the problem is rare and needs a really slow/bogged down 9p server. Cc: stable@vger.kernel.org Signed-off-by: Tuomas Tynkkynen <tuomas@tuxera.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-10-24tcp: add tracepoint trace_tcp_set_state()Song Liu
This patch adds tracepoint trace_tcp_set_state. Besides usual fields (s/d ports, IP addresses), old and new state of the socket is also printed with TP_printk, with __print_symbolic(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24tcp: add tracepoint trace_tcp_destroy_sockSong Liu
This patch adds trace event trace_tcp_destroy_sock. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24tcp: add tracepoint trace_tcp_receive_resetSong Liu
New tracepoint trace_tcp_receive_reset is added and called from tcp_reset(). This tracepoint is define with a new class tcp_event_sk. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24tcp: add tracepoint trace_tcp_send_resetSong Liu
New tracepoint trace_tcp_send_reset is added and called from tcp_v4_send_reset(), tcp_v6_send_reset() and tcp_send_active_reset(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24Merge tag 'batadv-next-for-davem-20171023' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This documentation/cleanup patchset includes the following patches: - Fix parameter kerneldoc which caused kerneldoc warnings, by Sven Eckelmann - Remove spurious warnings in B.A.T.M.A.N. V neighbor comparison, by Sven Eckelmann - Use inline kernel-doc style for UAPI constants, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23esp6: remove redundant initialization of esphColin Ian King
The pointer esph is being initialized with a value that is never read and then being updated. Remove the redundant initialization and move the declaration and initializtion of esph to the local code block. Cleans up clang warning: net/ipv6/esp6.c:562:21: warning: Value stored to 'esph' during its initialization is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-10-23ipsec: Fix aborted xfrm policy dump crashHerbert Xu
An independent security researcher, Mohamed Ghannam, has reported this vulnerability to Beyond Security's SecuriTeam Secure Disclosure program. The xfrm_dump_policy_done function expects xfrm_dump_policy to have been called at least once or it will crash. This can be triggered if a dump fails because the target socket's receive buffer is full. This patch fixes it by using the cb->start mechanism to ensure that the initialisation is always done regardless of the buffer situation. Fixes: 12a169e7d8f4 ("ipsec: Put dumpers on the dump list") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-10-23tcp/dccp: fix lockdep splat in inet_csk_route_req()Eric Dumazet
This patch fixes the following lockdep splat in inet_csk_route_req() lockdep_rcu_suspicious inet_csk_route_req tcp_v4_send_synack tcp_rtx_synack inet_rtx_syn_ack tcp_fastopen_synack_time tcp_retransmit_timer tcp_write_timer_handler tcp_write_timer call_timer_fn Thread running inet_csk_route_req() owns a reference on the request socket, so we have the guarantee ireq->ireq_opt wont be changed or freed. lockdep can enforce this invariant for us. Fixes: c92e8c02fe66 ("tcp/dccp: fix ireq->opt races") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23tcp: do tcp_mstamp_refresh before retransmits on TSQ handlerKoichiro Den
When retransmission on TSQ handler was introduced in the commit f9616c35a0d7 ("tcp: implement TSQ for retransmits"), the retransmitted skbs' timestamps were updated on the actual transmission. In the later commit 385e20706fac ("tcp: use tp->tcp_mstamp in output path"), it stops being done so. In the commit, the comment says "We try to refresh tp->tcp_mstamp only when necessary", and at present tcp_tsq_handler and tcp_v4_mtu_reduced applies to this. About the latter, it's okay since it's rare enough. About the former, even though possible retransmissions on the tasklet comes just after the destructor run in NET_RX softirq handling, the time between them could be nonnegligibly large to the extent that tcp_rack_advance or rto rearming be affected if other (remaining) RX, BLOCK and (preceding) TASKLET sofirq handlings are unexpectedly heavy. So in the same way as tcp_write_timer_handler does, doing tcp_mstamp_refresh ensures the accuracy of algorithms relying on it. Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path") Signed-off-by: Koichiro Den <den@klaipeden.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23net: core: rtnetlink: use BUG_ON instead of if condition followed by BUGGustavo A. R. Silva
Use BUG_ON instead of if condition followed by BUG in do_setlink. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
There were quite a few overlapping sets of changes here. Daniel's bug fix for off-by-ones in the new BPF branch instructions, along with the added allowances for "data_end > ptr + x" forms collided with the metadata additions. Along with those three changes came veritifer test cases, which in their final form I tried to group together properly. If I had just trimmed GIT's conflict tags as-is, this would have split up the meta tests unnecessarily. In the socketmap code, a set of preemption disabling changes overlapped with the rename of bpf_compute_data_end() to bpf_compute_data_pointers(). Changes were made to the mv88e6060.c driver set addr method which got removed in net-next. The hyperv transport socket layer had a locking change in 'net' which overlapped with a change of socket state macro usage in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "A little more than usual this time around. Been travelling, so that is part of it. Anyways, here are the highlights: 1) Deal with memcontrol races wrt. listener dismantle, from Eric Dumazet. 2) Handle page allocation failures properly in nfp driver, from Jaku Kicinski. 3) Fix memory leaks in macsec, from Sabrina Dubroca. 4) Fix crashes in pppol2tp_session_ioctl(), from Guillaume Nault. 5) Several fixes in bnxt_en driver, including preventing potential NVRAM parameter corruption from Michael Chan. 6) Fix for KRACK attacks in wireless, from Johannes Berg. 7) rtnetlink event generation fixes from Xin Long. 8) Deadlock in mlxsw driver, from Ido Schimmel. 9) Disallow arithmetic operations on context pointers in bpf, from Jakub Kicinski. 10) Missing sock_owned_by_user() check in sctp_icmp_redirect(), from Xin Long. 11) Only TCP is supported for sockmap, make that explicit with a check, from John Fastabend. 12) Fix IP options state races in DCCP and TCP, from Eric Dumazet. 13) Fix panic in packet_getsockopt(), also from Eric Dumazet. 14) Add missing locked in hv_sock layer, from Dexuan Cui. 15) Various aquantia bug fixes, including several statistics handling cures. From Igor Russkikh et al. 16) Fix arithmetic overflow in devmap code, from John Fastabend. 17) Fix busted socket memory accounting when we get a fault in the tcp zero copy paths. From Willem de Bruijn. 18) Don't leave opt->tot_len uninitialized in ipv6, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) stmmac: Don't access tx_q->dirty_tx before netif_tx_lock ipv6: flowlabel: do not leave opt->tot_len with garbage of_mdio: Fix broken PHY IRQ in case of probe deferral textsearch: fix typos in library helpers rxrpc: Don't release call mutex on error pointer net: stmmac: Prevent infinite loop in get_rx_timestamp_status() net: stmmac: Fix stmmac_get_rx_hwtstamp() net: stmmac: Add missing call to dev_kfree_skb() mlxsw: spectrum_router: Configure TIGCR on init mlxsw: reg: Add Tunneling IPinIP General Configuration Register net: ethtool: remove error check for legacy setting transceiver type soreuseport: fix initialization race net: bridge: fix returning of vlan range op errors sock: correct sk_wmem_queued accounting on efault in tcp zerocopy bpf: add test cases to bpf selftests to cover all access tests bpf: fix pattern matches for direct packet access bpf: fix off by one for range markings with L{T, E} patterns bpf: devmap fix arithmetic overflow in bitmap_size calculation net: aquantia: Bad udp rate on default interrupt coalescing net: aquantia: Enable coalescing management via ethtool interface ...
2017-10-22ipv6: flowlabel: do not leave opt->tot_len with garbageEric Dumazet
When syzkaller team brought us a C repro for the crash [1] that had been reported many times in the past, I finally could find the root cause. If FlowLabel info is merged by fl6_merge_options(), we leave part of the opt_space storage provided by udp/raw/l2tp with random value in opt_space.tot_len, unless a control message was provided at sendmsg() time. Then ip6_setup_cork() would use this random value to perform a kzalloc() call. Undefined behavior and crashes. Fix is to properly set tot_len in fl6_merge_options() At the same time, we can also avoid consuming memory and cpu cycles to clear it, if every option is copied via a kmemdup(). This is the change in ip6_setup_cork(). [1] kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801cb64a100 task.stack: ffff8801cc350000 RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: 0018:ffff8801cc357550 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010 RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014 RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10 R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0 R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0 FS: 00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0 DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729 udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x358/0x5a0 net/socket.c:1750 SyS_sendto+0x40/0x50 net/socket.c:1718 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4520a9 RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9 RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016 RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029 Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22bpf: Add BPF_SOCKET_OPS_BASE_RTT support to tcp_nvLawrence Brakmo
TCP_NV will try to get the base RTT from a socket_ops BPF program if one is loaded. NV will then use the base RTT to bound its min RTT (its notion of the base RTT). It uses the base RTT as an upper bound and 80% of the base RTT as its lower bound. In other words, NV will consider filtered RTTs larger than base RTT as a sign of congestion. As a result, there is no minRTT inflation when there is a lot of congestion. For example, in a DC where the RTTs are less than 40us when there is no congestion, a base RTT value of 80us improves the performance of NV. The difference between the uncongested RTT and the base RTT provided represents how much queueing we are willing to have (in practice it can be higher). NV has been tunned to reduce congestion when there are many flows at the cost of one flow not achieving full bandwith utilization. When a reasonable base RTT is provided, one NV flow can now fully utilize the full bandwidth. In addition, the performance is also improved when there are many flows. In the following examples the NV results are using a kernel with this patch set (i.e. both NV results are using the new nv_loss_dec_factor). With one host sending to another host and only one flow the goodputs are: Cubic: 9.3 Gbps, NV: 5.5 Gbps, NV (baseRTT=80us): 9.2 Gbps With 2 hosts sending to one host (1 flow per host, the goodput per flow is: Cubic: 4.6 Gbps, NV: 4.5 Gbps, NV (baseRTT=80us)L 4.6 Gbps But the RTTs seen by a ping process in the sender is: Cubic: 3.3ms NV: 97us, NV (baseRTT=80us): 146us With a lot of flows things look even better for NV with baseRTT. Here we have 3 hosts sending to one host. Each sending host has 6 flows: 1 stream, 4x1MB RPC, 1x10KB RPC. Cubic, NV and NV with baseRTT all fully utilize the full available bandwidth. However, the distribution of bandwidth among the flows is very different. For the 10KB RPC flow: Cubic: 27Mbps, NV: 111Mbps, NV (baseRTT=80us): 222Mbps The 99% latencies for the 10KB flows are: Cubic: 26ms, NV: 1ms, NV (baseRTT=80us): 500us The RTT seen by a ping process at the senders: Cubic: 3.2ms NV: 720us, NV (baseRTT=80us): 330us Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22bpf: Adding helper function bpf_getsockopsLawrence Brakmo
Adding support for helper function bpf_getsockops to socket_ops BPF programs. This patch only supports TCP_CONGESTION. Signed-off-by: Vlad Vysotsky <vlad@cs.ucla.edu> Acked-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22net: x25: mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22net: af_unix: mark expected switch fall-throughGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22rxrpc: Don't release call mutex on error pointerDavid Howells
Don't release call mutex at the end of rxrpc_kernel_begin_call() if the call pointer actually holds an error value. Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22tipc: refactor tipc_sk_timeout() functionJon Maloy
The function tipc_sk_timeout() is more complex than necessary, and even seems to contain an undetected bug. At one of the occurences where we renew the timer we just order it with (HZ / 20), instead of (jiffies + HZ / 20); In this commit we clean up the function. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22net: ethtool: remove error check for legacy setting transceiver typeNiklas Söderlund
Commit 9cab88726929605 ("net: ethtool: Add back transceiver type") restores the transceiver type to struct ethtool_link_settings and convert_link_ksettings_to_legacy_settings() but forgets to remove the error check for the same in convert_legacy_settings_to_link_ksettings(). This prevents older versions of ethtool to change link settings. # ethtool --version ethtool version 3.16 # ethtool -s eth0 autoneg on speed 100 duplex full Cannot set new settings: Invalid argument not setting speed not setting duplex not setting autoneg While newer versions of ethtool works. # ethtool --version ethtool version 4.10 # ethtool -s eth0 autoneg on speed 100 duplex full [ 57.703268] sh-eth ee700000.ethernet eth0: Link is Down [ 59.618227] sh-eth ee700000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx Fixes: 19cab88726929605 ("net: ethtool: Add back transceiver type") Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reported-by: Renjith R V <renjith.rv@quest-global.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22net: sched: mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>