summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-01-02net_sch: red: Fix the new offload indicationNogah Frankel
Update the offload flag, TCQ_F_OFFLOADED, in each dump call (and ignore the offloading function return value in relation to this flag). This is done because a qdisc is being initialized, and therefore offloaded before being grafted. Since the ability of the driver to offload the qdisc depends on its location, a qdisc can be offloaded and un-offloaded by graft calls, that doesn't effect the qdisc itself. Fixes: 428a68af3a7c ("net: sched: Move to new offload indication in RED" Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02ip6_tunnel: allow ip6gre dev mtu to be set below 1280Xin Long
Commit 582442d6d5bc ("ipv6: Allow the MTU of ipip6 tunnel to be set below 1280") fixed a mtu setting issue. It works for ipip6 tunnel. But ip6gre dev updates the mtu also with ip6_tnl_change_mtu. Since the inner packet over ip6gre can be ipv4 and it's mtu should also be allowed to set below 1280, the same issue also exists on ip6gre. This patch is to fix it by simply changing to check if parms.proto is IPPROTO_IPV6 in ip6_tnl_change_mtu instead, to make ip6gre to go to 'else' branch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02ip6_tunnel: disable dst caching if tunnel is dual-stackEli Cooper
When an ip6_tunnel is in mode 'any', where the transport layer protocol can be either 4 or 41, dst_cache must be disabled. This is because xfrm policies might apply to only one of the two protocols. Caching dst would cause xfrm policies for one protocol incorrectly used for the other. Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02Revert "net: core: dev_get_valid_name is now the same as dev_alloc_name_ns"David S. Miller
This reverts commit 87c320e51519a83c496ab7bfb4e96c8f9c001e89. Changing the error return code in some situations turns out to be harmful in practice. In particular Michael Ellerman reports that DHCP fails on his powerpc machines, and this revert gets things working again. Johannes Berg agrees that this revert is the best course of action for now. Fixes: 029b6d140550 ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02Merge 4.15-rc6 into staging-nextGreg Kroah-Hartman
We need the staging fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-31xfrm: fix rcu usage in xfrm_get_type_offloadSabrina Dubroca
request_module can sleep, thus we cannot hold rcu_read_lock() while calling it. The function also jumps back and takes rcu_read_lock() again (in xfrm_state_get_afinfo()), resulting in an imbalance. This codepath is triggered whenever a new offloaded state is created. Fixes: ffdb5211da1c ("xfrm: Auto-load xfrm offload modules") Reported-by: syzbot+ca425f44816d749e8eb49755567a75ee48cf4a30@syzkaller.appspotmail.com Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-30af_key: fix buffer overread in parse_exthdrs()Eric Biggers
If a message sent to a PF_KEY socket ended with an incomplete extension header (fewer than 4 bytes remaining), then parse_exthdrs() read past the end of the message, into uninitialized memory. Fix it by returning -EINVAL in this case. Reproducer: #include <linux/pfkeyv2.h> #include <sys/socket.h> #include <unistd.h> int main() { int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2); char buf[17] = { 0 }; struct sadb_msg *msg = (void *)buf; msg->sadb_msg_version = PF_KEY_V2; msg->sadb_msg_type = SADB_DELETE; msg->sadb_msg_len = 2; write(sock, buf, 17); } Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-30af_key: fix buffer overread in verify_address_len()Eric Biggers
If a message sent to a PF_KEY socket ended with one of the extensions that takes a 'struct sadb_address' but there were not enough bytes remaining in the message for the ->sa_family member of the 'struct sockaddr' which is supposed to follow, then verify_address_len() read past the end of the message, into uninitialized memory. Fix it by returning -EINVAL in this case. This bug was found using syzkaller with KMSAN. Reproducer: #include <linux/pfkeyv2.h> #include <sys/socket.h> #include <unistd.h> int main() { int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2); char buf[24] = { 0 }; struct sadb_msg *msg = (void *)buf; struct sadb_address *addr = (void *)(msg + 1); msg->sadb_msg_version = PF_KEY_V2; msg->sadb_msg_type = SADB_DELETE; msg->sadb_msg_len = 3; addr->sadb_address_len = 1; addr->sadb_address_exttype = SADB_EXT_ADDRESS_SRC; write(sock, buf, 24); } Reported-by: Alexander Potapenko <glider@google.com> Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-30xfrm: skip policies marked as dead while rehashingFlorian Westphal
syzkaller triggered following KASAN splat: BUG: KASAN: slab-out-of-bounds in xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618 read of size 2 at addr ffff8801c8e92fe4 by task kworker/1:1/23 [..] Workqueue: events xfrm_hash_rebuild [..] __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:428 xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618 process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112 worker_thread+0x223/0x1990 kernel/workqueue.c:2246 [..] The reproducer triggers: 1016 if (error) { 1017 list_move_tail(&walk->walk.all, &x->all); 1018 goto out; 1019 } in xfrm_policy_walk() via pfkey (it sets tiny rcv space, dump callback returns -ENOBUFS). In this case, *walk is located the pfkey socket struct, so this socket becomes visible in the global policy list. It looks like this is intentional -- phony walker has walk.dead set to 1 and all other places skip such "policies". Ccing original authors of the two commits that seem to expose this issue (first patch missed ->dead check, second patch adds pfkey sockets to policies dumper list). Fixes: 880a6fab8f6ba5b ("xfrm: configure policy hash table thresholds by netlink") Fixes: 12a169e7d8f4b1c ("ipsec: Put dumpers on the dump list") Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Timo Teras <timo.teras@iki.fi> Cc: Christophe Gouault <christophe.gouault@6wind.com> Reported-by: syzbot <bot+c028095236fcb6f4348811565b75084c754dc729@syzkaller.appspotmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-30xfrm: Forbid state updates from changing encap typeHerbert Xu
Currently we allow state updates to competely replace the contents of x->encap. This is bad because on the user side ESP only sets up header lengths depending on encap_type once when the state is first created. This could result in the header lengths getting out of sync with the actual state configuration. In practice key managers will never do a state update to change the encapsulation type. Only the port numbers need to be changed as the peer NAT entry is updated. Therefore this patch adds a check in xfrm_state_update to forbid any changes to the encap_type. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
net/ipv6/ip6_gre.c is a case of parallel adds. include/trace/events/tcp.h is a little bit more tricky. The removal of in-trace-macro ifdefs in 'net' paralleled with moving show_tcp_state_name and friends over to include/trace/events/sock.h in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28strparser: Call sock_owned_by_user_nocheckTom Herbert
strparser wants to check socket ownership without producing any warnings. As indicated by the comment in the code, it is permissible for owned_by_user to return true. Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-and-tested-by: <syzbot+c91c53af67f9ebe599a337d2e70950366153b295@syzkaller.appspotmail.com> Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28skbuff: in skb_copy_ubufs unclone before releasing zerocopyWillem de Bruijn
skb_copy_ubufs must unclone before it is safe to modify its skb_shared_info with skb_zcopy_clear. Commit b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even without user frags") ensures that all skbs release their zerocopy state, even those without frags. But I forgot an edge case where such an skb arrives that is cloned. The stack does not build such packets. Vhost/tun skbs have their frags orphaned before cloning. TCP skbs only attach zerocopy state when a frag is added. But if TCP packets can be trimmed or linearized, this might occur. Tracing the code I found no instance so far (e.g., skb_linearize ends up calling skb_zcopy_clear if !skb->data_len). Still, it is non-obvious that no path exists. And it is fragile to rely on this. Fixes: b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even without user frags") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28net: sched: don't set extack message in case the qdisc will be createdJiri Pirko
If the qdisc is not found here, it is going to be created. Therefore, this is not an error path. Remove the extack message set and don't confuse user with error message in case the qdisc was created successfully. Fixes: 09215598119e ("net: sched: sch_api: handle generic qdisc errors") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28tipc: fix hanging poll() for stream socketsParthasarathy Bhuvaragan
In commit 42b531de17d2f6 ("tipc: Fix missing connection request handling"), we replaced unconditional wakeup() with condtional wakeup for clients with flags POLLIN | POLLRDNORM | POLLRDBAND. This breaks the applications which do a connect followed by poll with POLLOUT flag. These applications are not woken when the connection is ESTABLISHED and hence sleep forever. In this commit, we fix it by including the POLLOUT event for sockets in TIPC_CONNECTING state. Fixes: 42b531de17d2f6 ("tipc: Fix missing connection request handling") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2017-12-28 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Fix incorrect state pruning related to recognition of zero initialized stack slots, where stacksafe exploration would mistakenly return a positive pruning verdict too early ignoring other slots, from Gianluca. 2) Various BPF to BPF calls related follow-up fixes. Fix an off-by-one in maximum call depth check, and rework maximum stack depth tracking logic to fix a bypass of the total stack size check reported by Jann. Also fix a bug in arm64 JIT where prog->jited_len was uninitialized. Addition of various test cases to BPF selftests, from Alexei. 3) Addition of a BPF selftest to test_verifier that is related to BPF to BPF calls which demonstrates a late caller stack size increase and thus out of bounds access. Fixed above in 2). Test case from Jann. 4) Addition of correlating BPF helper calls, BPF to BPF calls as well as BPF maps to bpftool xlated dump in order to allow for better BPF program introspection and debugging, from Daniel. 5) Fixing several bugs in BPF to BPF calls kallsyms handling in order to get it actually to work for subprogs, from Daniel. 6) Extending sparc64 JIT support for BPF to BPF calls and fix a couple of build errors for libbpf on sparc64, from David. 7) Allow narrower context access for BPF dev cgroup typed programs in order to adapt to LLVM code generation. Also adjust memlock rlimit in the test_dev_cgroup BPF selftest, from Yonghong. 8) Add netdevsim Kconfig entry to BPF selftests since test_offload.py relies on netdevsim device being available, from Jakub. 9) Reduce scope of xdp_do_generic_redirect_map() to being static, from Xiongwei. 10) Minor cleanups and spelling fixes in BPF verifier, from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27tcp: do not allocate linear memory for zerocopy skbsWillem de Bruijn
Zerocopy payload is now always stored in frags, and space for headers is reversed, so this memory is unused. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27tcp: place all zerocopy payload in fragsWillem de Bruijn
This avoids an unnecessary copy of 1-2KB and improves tso_fragment, which has to fall back to tcp_fragment if skb->len != skb_data_len. It also avoids a surprising inconsistency in notifications: Zerocopy packets sent over loopback have their frags copied, so set SO_EE_CODE_ZEROCOPY_COPIED in the notification. But this currently does not happen for small packets, because when all data fits in the linear fragment, data is not copied in skb_orphan_frags_rx. Reported-by: Tom Deseyn <tom.deseyn@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27tcp: push full zerocopy packetsWillem de Bruijn
Skbs that reach MAX_SKB_FRAGS cannot be extended further. Do the same for zerocopy frags as non-zerocopy frags and set the PSH bit. This improves GRO assembly. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27skbuff: in skb_segment, call zerocopy functions once per nskbWillem de Bruijn
This is a net-next follow-up to commit 268b79067942 ("skbuff: orphan frags before zerocopy clone"), which fixed a bug in net, but added a call to skb_zerocopy_clone at each frag to do so. When segmenting skbs with user frags, either the user frags must be replaced with private copies and uarg released, or the uarg must have its refcount increased for each new skb. skb_orphan_frags does the first, except for cases that can handle reference counting. skb_zerocopy_clone then does the second. Call these once per nskb, instead of once per frag. That is, in the common case. With a frag list, also refresh when the origin skb (frag_skb) changes. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27sctp: Replace use of sockets_allocated with specified macro.Tonghao Zhang
The patch(180d8cd942ce) replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to accessor macros. But the sockets_allocated field of sctp sock is not replaced at all. Then replace it now for unifying the code. Fixes: 180d8cd942ce ("foundations of per-cgroup memory pressure controlling.") Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27rds: tcp: cleanup if kmem_cache_alloc fails in rds_tcp_conn_alloc()Sowmini Varadhan
If kmem_cache_alloc() fails in the middle of the for() loop, cleanup anything that might have been allocated so far. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27rds: tcp: initialize t_tcp_detached to falseSowmini Varadhan
Commit f10b4cff98c6 ("rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete") adds the field t_tcp_detached, but this needs to be initialized explicitly to false. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27rds; Reset rs->rs_bound_addr in rds_add_bound() failure pathSowmini Varadhan
If the rds_sock is not added to the bind_hash_table, we must reset rs_bound_addr so that rds_remove_bound will not trip on this rds_sock. rds_add_bound() does a rds_sock_put() in this failure path, so failing to reset rs_bound_addr will result in a socket refcount bug, and will trigger a WARN_ON with the stack shown below when the application subsequently tries to close the PF_RDS socket. WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \ rds_sock_destruct+0x15/0x30 [rds] : __sk_destruct+0x21/0x190 rds_remove_bound.part.13+0xb6/0x140 [rds] rds_release+0x71/0x120 [rds] sock_release+0x1a/0x70 sock_close+0xe/0x20 __fput+0xd5/0x210 task_work_run+0x82/0xa0 do_exit+0x2ce/0xb30 ? syscall_trace_enter+0x1cc/0x2b0 do_group_exit+0x39/0xa0 SyS_exit_group+0x10/0x10 do_syscall_64+0x61/0x1a0 Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27l2tp: add peer_offset parameterLorenzo Bianconi
Introduce peer_offset parameter in order to add the capability to specify two different values for payload offset on tx/rx side. If just offset is provided by userspace use it for rx side as well in order to maintain compatibility with older l2tp versions Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27l2tp: fix missing print session offset infoHangbin Liu
Report offset parameter in L2TP_CMD_SESSION_GET command if it has been configured by userspace Fixes: 309795f4bec ("l2tp: Add netlink control API for L2TP") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-12-22 1) Separate ESP handling from segmentation for GRO packets. This unifies the IPsec GSO and non GSO codepath. 2) Add asynchronous callbacks for xfrm on layer 2. This adds the necessary infrastructure to core networking. 3) Allow to use the layer2 IPsec GSO codepath for software crypto, all infrastructure is there now. 4) Also allow IPsec GSO with software crypto for local sockets. 5) Don't require synchronous crypto fallback on IPsec offloading, it is not needed anymore. 6) Check for xdo_dev_state_free and only call it if implemented. From Shannon Nelson. 7) Check for the required add and delete functions when a driver registers xdo_dev_ops. From Shannon Nelson. 8) Define xfrmdev_ops only with offload config. From Shannon Nelson. 9) Update the xfrm stats documentation. From Shannon Nelson. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-12-22 1) Check for valid id proto in validate_tmpl(), otherwise we may trigger a warning in xfrm_state_fini(). From Cong Wang. 2) Fix a typo on XFRMA_OUTPUT_MARK policy attribute. From Michal Kubecek. 3) Verify the state is valid when encap_type < 0, otherwise we may crash on IPsec GRO . From Aviv Heller. 4) Fix stack-out-of-bounds read on socket policy lookup. We access the flowi of the wrong address family in the IPv4 mapped IPv6 case, fix this by catching address family missmatches before we do the lookup. 5) fix xfrm_do_migrate() with AEAD to copy the geniv field too. Otherwise the state is not fully initialized and migration fails. From Antony Antony. 6) Fix stack-out-of-bounds with misconfigured transport mode policies. Our policy template validation is not strict enough. It is possible to configure policies with transport mode template where the address family of the template does not match the selectors address family. Fix this by refusing such a configuration, address family can not change on transport mode. 7) Fix a policy reference leak when reusing pcpu xdst entry. From Florian Westphal. 8) Reinject transport-mode packets through tasklet, otherwise it is possible to reate a recursion loop. From Herbert Xu. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error pathTommi Rantala
Calling tipc_mon_delete() before the monitor has been created will oops. This can happen in tipc_enable_bearer() error path if tipc_disc_create() fails. [ 48.589074] BUG: unable to handle kernel paging request at 0000000000001008 [ 48.590266] IP: tipc_mon_delete+0xea/0x270 [tipc] [ 48.591223] PGD 1e60c5067 P4D 1e60c5067 PUD 1eb0cf067 PMD 0 [ 48.592230] Oops: 0000 [#1] SMP KASAN [ 48.595610] CPU: 5 PID: 1199 Comm: tipc Tainted: G B 4.15.0-rc4-pc64-dirty #5 [ 48.597176] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 48.598489] RIP: 0010:tipc_mon_delete+0xea/0x270 [tipc] [ 48.599347] RSP: 0018:ffff8801d827f668 EFLAGS: 00010282 [ 48.600705] RAX: ffff8801ee813f00 RBX: 0000000000000204 RCX: 0000000000000000 [ 48.602183] RDX: 1ffffffff1de6a75 RSI: 0000000000000297 RDI: 0000000000000297 [ 48.604373] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff1dd1533 [ 48.605607] R10: ffffffff8eafbb05 R11: fffffbfff1dd1534 R12: 0000000000000050 [ 48.607082] R13: dead000000000200 R14: ffffffff8e73f310 R15: 0000000000001020 [ 48.608228] FS: 00007fc686484800(0000) GS:ffff8801f5540000(0000) knlGS:0000000000000000 [ 48.610189] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.611459] CR2: 0000000000001008 CR3: 00000001dda70002 CR4: 00000000003606e0 [ 48.612759] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 48.613831] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 48.615038] Call Trace: [ 48.615635] tipc_enable_bearer+0x415/0x5e0 [tipc] [ 48.620623] tipc_nl_bearer_enable+0x1ab/0x200 [tipc] [ 48.625118] genl_family_rcv_msg+0x36b/0x570 [ 48.631233] genl_rcv_msg+0x5a/0xa0 [ 48.631867] netlink_rcv_skb+0x1cc/0x220 [ 48.636373] genl_rcv+0x24/0x40 [ 48.637306] netlink_unicast+0x29c/0x350 [ 48.639664] netlink_sendmsg+0x439/0x590 [ 48.642014] SYSC_sendto+0x199/0x250 [ 48.649912] do_syscall_64+0xfd/0x2c0 [ 48.650651] entry_SYSCALL64_slow_path+0x25/0x25 [ 48.651843] RIP: 0033:0x7fc6859848e3 [ 48.652539] RSP: 002b:00007ffd25dff938 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 48.654003] RAX: ffffffffffffffda RBX: 00007ffd25dff990 RCX: 00007fc6859848e3 [ 48.655303] RDX: 0000000000000054 RSI: 00007ffd25dff990 RDI: 0000000000000003 [ 48.656512] RBP: 00007ffd25dff980 R08: 00007fc685c35fc0 R09: 000000000000000c [ 48.657697] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000d13010 [ 48.658840] R13: 00007ffd25e009c0 R14: 0000000000000000 R15: 0000000000000000 [ 48.662972] RIP: tipc_mon_delete+0xea/0x270 [tipc] RSP: ffff8801d827f668 [ 48.664073] CR2: 0000000000001008 [ 48.664576] ---[ end trace e811818d54d5ce88 ]--- Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27tipc: error path leak fixes in tipc_enable_bearer()Tommi Rantala
Fix memory leak in tipc_enable_bearer() if enable_media() fails, and cleanup with bearer_disable() if tipc_mon_create() fails. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27RDS: Check cmsg_len before dereferencing CMSG_DATAAvinash Repaka
RDS currently doesn't check if the length of the control message is large enough to hold the required data, before dereferencing the control message data. This results in following crash: BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013 [inline] BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066 Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157 CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430 rds_rdma_bytes net/rds/send.c:1013 [inline] rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066 sock_sendmsg_nosec net/socket.c:628 [inline] sock_sendmsg+0xca/0x110 net/socket.c:638 ___sys_sendmsg+0x320/0x8b0 net/socket.c:2018 __sys_sendmmsg+0x1ee/0x620 net/socket.c:2108 SYSC_sendmmsg net/socket.c:2139 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2134 entry_SYSCALL_64_fastpath+0x1f/0x96 RIP: 0033:0x43fe49 RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49 RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0 R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000 To fix this, we verify that the cmsg_len is large enough to hold the data to be read, before proceeding further. Reported-by: syzbot <syzkaller-bugs@googlegroups.com> Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26net: erspan: remove md NULL checkWilliam Tu
The 'md' is allocated from 'tun_dst = ip_tun_rx_dst' and since we've checked 'tun_dst', 'md' will never be NULL. The patch removes it at both ipv4 and ipv6 erspan. Fixes: afb4c97d90e6 ("ip6_gre: fix potential memory leak in ip6erspan_rcv") Fixes: 50670b6ee9bc ("ip_gre: fix potential memory leak in erspan_rcv") Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26tcp: md5: Handle RCU dereference of md5sig_infoMat Martineau
Dereference tp->md5sig_info in tcp_v4_destroy_sock() the same way it is done in the adjacent call to tcp_clear_md5_list(). Resolves this sparse warning: net/ipv4/tcp_ipv4.c:1914:17: warning: incorrect type in argument 1 (different address spaces) net/ipv4/tcp_ipv4.c:1914:17: expected struct callback_head *head net/ipv4/tcp_ipv4.c:1914:17: got struct callback_head [noderef] <asn:4>*<noident> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26ipv6: Reinject IPv6 packets if IPsec policy matches after SNATTobias Brunner
If SNAT modifies the source address the resulting packet might match an IPsec policy, reinject the packet if that's the case. The exact same thing is already done for IPv4. Signed-off-by: Tobias Brunner <tobias@strongswan.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26tipc: fix memory leak of group member when peer node is lostJon Maloy
When a group member receives a member WITHDRAW event, this might have two reasons: either the peer member is leaving the group, or the link to the member's node has been lost. In the latter case we need to issue a DOWN event to the user right away, and let function tipc_group_filter_msg() perform delete of the member item. However, in this case we miss to change the state of the member item to MBR_LEAVING, so the member item is not deleted, and we have a memory leak. We now separate better between the four sub-cases of a WITHRAW event and make sure that each case is handled correctly. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26net: sched: fix possible null pointer deref in tcf_block_putJiri Pirko
We need to check block for being null in both tcf_block_put and tcf_block_put_ext. Fixes: 343723dd51ef ("net: sched: fix clsact init error path") Reported-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26tipc: base group replicast ack counter on number of actual receiversJon Maloy
In commit 2f487712b893 ("tipc: guarantee that group broadcast doesn't bypass group unicast") we introduced a mechanism that requires the first (replicated) broadcast sent after a unicast to be acknowledged by all receivers before permitting sending of the next (true) broadcast. The counter for keeping track of the number of acknowledges to expect is based on the tipc_group::member_cnt variable. But this misses that some of the known members may not be ready for reception, and will never acknowledge the message, either because they haven't fully joined the group or because they are leaving the group. Such members are identified by not fulfilling the condition tested for in the function tipc_group_is_enabled(). We now set the counter for the actual number of acks to receive at the moment the message is sent, by just counting the number of recipients satisfying the tipc_group_is_enabled() test. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26net_sched: fix a missing rcu barrier in mini_qdisc_pair_swap()Cong Wang
The rcu_barrier_bh() in mini_qdisc_pair_swap() is to wait for flying RCU callback installed by a previous mini_qdisc_pair_swap(), however we miss it on the tp_head==NULL path, which leads to that the RCU callback still uses miniq_old->rcu after it is freed together with qdisc in qdisc_graft(). So just add it on that path too. Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath ") Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26ip6_gre: fix device features for ioctl setupAlexey Kodanev
When ip6gre is created using ioctl, its features, such as scatter-gather, GSO and tx-checksumming will be turned off: # ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1 # ethtool -k gre6 (truncated output) tx-checksumming: off scatter-gather: off tcp-segmentation-offload: off generic-segmentation-offload: off [requested on] But when netlink is used, they will be enabled: # ip link add gre6 type ip6gre remote fd00::1 # ethtool -k gre6 (truncated output) tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on generic-segmentation-offload: on This results in a loss of performance when gre6 is created via ioctl. The issue was found with LTP/gre tests. Fix it by moving the setup of device features to a separate function and invoke it with ndo_init callback because both netlink and ioctl will eventually call it via register_netdevice(): register_netdevice() - ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init() - ip6gre_tunnel_init_common() - ip6gre_tnl_init_features() The moved code also contains two minor style fixes: * removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line. * fixed the issue reported by checkpatch: "Unnecessary parentheses around 'nt->encap.type == TUNNEL_ENCAP_NONE'" Fixes: ac4eb009e477 ("ip6gre: Add support for basic offloads offloads excluding GSO") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26netfilter: nf_tables: fix potential NULL-ptr deref in nf_tables_dump_obj_done()Hangbin Liu
If there is no NFTA_OBJ_TABLE and NFTA_OBJ_TYPE, the c.data will be NULL in nf_tables_getobj(). So before free filter->table in nf_tables_dump_obj_done(), we need to check if filter is NULL first. Fixes: e46abbcc05aa ("netfilter: nf_tables: Allow table names of up to 255 chars") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Lots of overlapping changes. Also on the net-next side the XDP state management is handled more in the generic layers so undo the 'net' nfp fix which isn't applicable in net-next. Include a necessary change by Jakub Kicinski, with log message: ==================== cls_bpf no longer takes care of offload tracking. Make sure netdevsim performs necessary checks. This fixes a warning caused by TC trying to remove a filter it has not added. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21flow_dissector: Parse batman-adv unicast headersSven Eckelmann
The batman-adv unicast packets contain a full layer 2 frame in encapsulated form. The flow dissector must therefore be able to parse the batman-adv unicast header to reach the layer 2+3 information. +--------------------+ | ip(v6)hdr | +--------------------+ | inner ethhdr | +--------------------+ | batadv unicast hdr | +--------------------+ | outer ethhdr | +--------------------+ The obtained information from the upper layer can then be used by RPS to schedule the processing on separate cores. This allows better distribution of multiple flows from the same neighbor to different cores. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Convert packet.h to uapi headerSven Eckelmann
The header file is used by different userspace programs to inject packets or to decode sniffed packets. It should therefore be available to them as userspace header. Also other components in the kernel (like the flow dissector) require access to the packet definitions to be able to decode ETH_P_BATMAN ethernet packets. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Remove kernel fixed width types in packet.hSven Eckelmann
The uapi headers use the __u8/__u16/... version of the fixed width types instead of u8/u16/... The use of the latter must be avoided before packet.h is copied to include/uapi/linux/. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Remove usage of BIT(x) in packet.hSven Eckelmann
The BIT(x) macro is no longer available for uapi headers because it is defined outside of it (linux/bitops.h). The use of it must therefore be avoided and replaced by an appropriate other representation. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Let packet.h include its headers directlySven Eckelmann
The headers used by packet.h should also be included by it directly. main.h is currently dealing with it in batman-adv, but this will no longer work when this header is moved to include/uapi/linux/. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21skbuff: skb_copy_ubufs must release uarg even without user fragsWillem de Bruijn
skb_copy_ubufs creates a private copy of frags[] to release its hold on user frags, then calls uarg->callback to notify the owner. Call uarg->callback even when no frags exist. This edge case can happen when zerocopy_sg_from_iter finds enough room in skb_headlen to copy all the data. Fixes: 3ece782693c4 ("sock: skb_copy_ubufs support for compound pages") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21skbuff: orphan frags before zerocopy cloneWillem de Bruijn
Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate calls to skb_uarg(skb)->callback for the same data. skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb with each segment. This is only safe for uargs that do refcounting, which is those that pass skb_orphan_frags without dropping their shared frags. For others, skb_orphan_frags drops the user frags and sets the uarg to NULL, after which sock_zerocopy_clone has no effect. Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback calls for the same data causing the vhost_net_ubuf_ref_>refcount to drop below zero. Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com> Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") Reported-by: Andreas Hartmann <andihartmann@01019freenet.de> Reported-by: David Hill <dhill@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: reevalulate autoflowlabel setting after sysctl settingShaohua Li
sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2. If sockopt doesn't set autoflowlabel, outcome packets from the hosts are supposed to not include flowlabel. This is true for normal packet, but not for reset packet. The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't changed, so the sock will keep the old behavior in terms of auto flowlabel. Reset packet is suffering from this problem, because reset packet is sent from a special control socket, which is created at boot time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control socket will always have its ipv6_pinfo.autoflowlabel set, even after user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always have flowlabel. Normal sock created before sysctl setting suffers from the same issue. We can't even turn off autoflowlabel unless we kill all socks in the hosts. To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the autoflowlabel setting from user, otherwise we always call ip6_default_np_autolabel() which has the new settings of sysctl. Note, this changes behavior a little bit. Before commit 42240901f7c4 (ipv6: Implement different admin modes for automatic flow labels), the autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes, existing connection will change autoflowlabel behavior. After that commit, autoflowlabel behavior is sticky in the whole life of the sock. With this patch, the behavior isn't sticky again. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <tom@quantonium.net> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21openvswitch: Fix pop_vlan action for double tagged framesEric Garver
skb_vlan_pop() expects skb->protocol to be a valid TPID for double tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop() shift the true ethertype into position for us. Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets") Signed-off-by: Eric Garver <e@erig.me> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>