summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-08-29packet: Don't write vnet header beyond end of bufferBenjamin Poirier
... which may happen with certain values of tp_reserve and maclen. Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29tipc: permit bond slave as bearerParthasarathy Bhuvaragan
For a bond slave device as a tipc bearer, the dev represents the bond interface and orig_dev represents the slave in tipc_l2_rcv_msg(). Since we decode the tipc_ptr from bonding device (dev), we fail to find the bearer and thus tipc links are not established. In this commit, we register the tipc protocol callback per device and look for tipc bearer from both the devices. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29ipv6: do not set sk_destruct in IPV6_ADDRFORM sockoptXin Long
ChunYu found a kernel warn_on during syzkaller fuzzing: [40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 inet_sock_destruct+0x78d/0x9a0 [40226.144849] Call Trace: [40226.147590] <IRQ> [40226.149859] dump_stack+0xe2/0x186 [40226.176546] __warn+0x1a4/0x1e0 [40226.180066] warn_slowpath_null+0x31/0x40 [40226.184555] inet_sock_destruct+0x78d/0x9a0 [40226.246355] __sk_destruct+0xfa/0x8c0 [40226.290612] rcu_process_callbacks+0xaa0/0x18a0 [40226.336816] __do_softirq+0x241/0x75e [40226.367758] irq_exit+0x1f6/0x220 [40226.371458] smp_apic_timer_interrupt+0x7b/0xa0 [40226.376507] apic_timer_interrupt+0x93/0xa0 The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct. As after commit f970bd9e3a06 ("udp: implement memory accounting helpers"), udp has changed to use udp_destruct_sock as sk_destruct where it would udp_rmem_release all rmem. But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after changing family to PF_INET. If rmem is not 0 at that time, and there is no place to release rmem before calling inet_sock_destruct, the warn_on will be triggered. This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has already set it's sk_destruct with inet_sock_destruct and UDP has set with udp_destruct_sock since they're created. Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers") Reported-by: ChunYu Wang <chunwang@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-08-29 1) Fix dst_entry refcount imbalance when using socket policies. From Lorenzo Colitti. 2) Fix locking when adding the ESP trailers. 3) Fix tailroom calculation for the ESP trailer by using skb_tailroom instead of skb_availroom. 4) Fix some info leaks in xfrm_user. From Mathias Krause. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28net: dsa: Don't dereference dst->cpu_dp->netdevFlorian Fainelli
If we do not have a master network device attached dst->cpu_dp will be NULL and accessing cpu_dp->netdev will create a trace similar to the one below. The correct check is on dst->cpu_dp period. [ 1.004650] DSA: switch 0 0 parsed [ 1.008078] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [ 1.016195] pgd = c0003000 [ 1.018918] [00000010] *pgd=80000000004003, *pmd=00000000 [ 1.024349] Internal error: Oops: 206 [#1] SMP ARM [ 1.029157] Modules linked in: [ 1.032228] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc6-00071-g45b45afab9bd-dirty #7 [ 1.040772] Hardware name: Broadcom STB (Flattened Device Tree) [ 1.046704] task: ee08f840 task.stack: ee090000 [ 1.051258] PC is at dsa_register_switch+0x5e0/0x9dc [ 1.056234] LR is at dsa_register_switch+0x5d0/0x9dc [ 1.061211] pc : [<c08fb28c>] lr : [<c08fb27c>] psr: 60000213 [ 1.067491] sp : ee091d88 ip : 00000000 fp : 0000000c [ 1.072728] r10: 00000000 r9 : 00000001 r8 : ee208010 [ 1.077965] r7 : ee2b57b0 r6 : ee2b5780 r5 : 00000000 r4 : ee208e0c [ 1.084506] r3 : 00000000 r2 : 00040d00 r1 : 2d1b2000 r0 : 00000016 [ 1.091050] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 1.098199] Control: 32c5387d Table: 00003000 DAC: fffffffd [ 1.103957] Process swapper/0 (pid: 1, stack limit = 0xee090210) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 6d3c8c0dd88a ("net: dsa: Remove master_netdev and use dst->cpu_dp->netdev") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bridge: check for null fdb->dst before notifying switchdev driversRoopa Prabhu
current switchdev drivers dont seem to support offloading fdb entries pointing to the bridge device which have fdb->dst not set to any port. This patch adds a NULL fdb->dst check in the switchdev notifier code. This patch fixes the below NULL ptr dereference: $bridge fdb add 00:02:00:00:00:33 dev br0 self [ 69.953374] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 69.954044] IP: br_switchdev_fdb_notify+0x29/0x80 [ 69.954044] PGD 66527067 [ 69.954044] P4D 66527067 [ 69.954044] PUD 7899c067 [ 69.954044] PMD 0 [ 69.954044] [ 69.954044] Oops: 0000 [#1] SMP [ 69.954044] Modules linked in: [ 69.954044] CPU: 1 PID: 3074 Comm: bridge Not tainted 4.13.0-rc6+ #1 [ 69.954044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 69.954044] task: ffff88007b827140 task.stack: ffffc90001564000 [ 69.954044] RIP: 0010:br_switchdev_fdb_notify+0x29/0x80 [ 69.954044] RSP: 0018:ffffc90001567918 EFLAGS: 00010246 [ 69.954044] RAX: 0000000000000000 RBX: ffff8800795e0880 RCX: 00000000000000c0 [ 69.954044] RDX: ffffc90001567920 RSI: 000000000000001c RDI: ffff8800795d0600 [ 69.954044] RBP: ffffc90001567938 R08: ffff8800795d0600 R09: 0000000000000000 [ 69.954044] R10: ffffc90001567a88 R11: ffff88007b849400 R12: ffff8800795e0880 [ 69.954044] R13: ffff8800795d0600 R14: ffffffff81ef8880 R15: 000000000000001c [ 69.954044] FS: 00007f93d3085700(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 69.954044] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 69.954044] CR2: 0000000000000008 CR3: 0000000066551000 CR4: 00000000000006e0 [ 69.954044] Call Trace: [ 69.954044] fdb_notify+0x3f/0xf0 [ 69.954044] __br_fdb_add.isra.12+0x1a7/0x370 [ 69.954044] br_fdb_add+0x178/0x280 [ 69.954044] rtnl_fdb_add+0x10a/0x200 [ 69.954044] rtnetlink_rcv_msg+0x1b4/0x240 [ 69.954044] ? skb_free_head+0x21/0x40 [ 69.954044] ? rtnl_calcit.isra.18+0xf0/0xf0 [ 69.954044] netlink_rcv_skb+0xed/0x120 [ 69.954044] rtnetlink_rcv+0x15/0x20 [ 69.954044] netlink_unicast+0x180/0x200 [ 69.954044] netlink_sendmsg+0x291/0x370 [ 69.954044] ___sys_sendmsg+0x180/0x2e0 [ 69.954044] ? filemap_map_pages+0x2db/0x370 [ 69.954044] ? do_wp_page+0x11d/0x420 [ 69.954044] ? __handle_mm_fault+0x794/0xd80 [ 69.954044] ? vma_link+0xcb/0xd0 [ 69.954044] __sys_sendmsg+0x4c/0x90 [ 69.954044] SyS_sendmsg+0x12/0x20 [ 69.954044] do_syscall_64+0x63/0xe0 [ 69.954044] entry_SYSCALL64_slow_path+0x25/0x25 [ 69.954044] RIP: 0033:0x7f93d2bad690 [ 69.954044] RSP: 002b:00007ffc7217a638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 69.954044] RAX: ffffffffffffffda RBX: 00007ffc72182eac RCX: 00007f93d2bad690 [ 69.954044] RDX: 0000000000000000 RSI: 00007ffc7217a670 RDI: 0000000000000003 [ 69.954044] RBP: 0000000059a1f7f8 R08: 0000000000000006 R09: 000000000000000a [ 69.954044] R10: 00007ffc7217a400 R11: 0000000000000246 R12: 00007ffc7217a670 [ 69.954044] R13: 00007ffc72182a98 R14: 00000000006114c0 R15: 00007ffc72182aa0 [ 69.954044] Code: 1f 00 66 66 66 66 90 55 48 89 e5 48 83 ec 20 f6 47 20 04 74 0a 83 fe 1c 74 09 83 fe 1d 74 2c c9 66 90 c3 48 8b 47 10 48 8d 55 e8 <48> 8b 70 08 0f b7 47 1e 48 83 c7 18 48 89 7d f0 bf 03 00 00 00 [ 69.954044] RIP: br_switchdev_fdb_notify+0x29/0x80 RSP: ffffc90001567918 [ 69.954044] CR2: 0000000000000008 [ 69.954044] ---[ end trace 03e9eec4a82c238b ]--- Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28ipv6: set dst.obsolete when a cached route has expiredXin Long
Now it doesn't check for the cached route expiration in ipv6's dst_ops->check(), because it trusts dst_gc that would clean the cached route up when it's expired. The problem is in dst_gc, it would clean the cached route only when it's refcount is 1. If some other module (like xfrm) keeps holding it and the module only release it when dst_ops->check() fails. But without checking for the cached route expiration, .check() may always return true. Meanwhile, without releasing the cached route, dst_gc couldn't del it. It will cause this cached route never to expire. This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK in .check. Note that this is even needed when ipv6 dst_gc timer is removed one day. It would set dst.obsolete in .redirect and .update_pmtu instead, and check for cached route expiration when getting it, just like what ipv4 route does. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28ipv6: fix sparse warning on rt6i_nodeWei Wang
Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This generates a new sparse warning on rt->rt6i_node related code: net/ipv6/route.c:1394:30: error: incompatible types in comparison expression (different address spaces) ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison expression (different address spaces) This commit adds "__rcu" tag for rt6i_node and makes sure corresponding rcu API is used for it. After this fix, sparse no longer generates the above warning. Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: hold tunnel used while creating sessions with netlinkGuillaume Nault
Use l2tp_tunnel_get() to retrieve tunnel, so that it can't go away on us. Otherwise l2tp_tunnel_destruct() might release the last reference count concurrently, thus freeing the tunnel while we're using it. Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: hold tunnel while handling genl TUNNEL_GET commandsGuillaume Nault
Use l2tp_tunnel_get() instead of l2tp_tunnel_find() so that we get a reference on the tunnel, preventing l2tp_tunnel_destruct() from freeing it from under us. Also move l2tp_tunnel_get() below nlmsg_new() so that we only take the reference when needed. Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: hold tunnel while handling genl tunnel updatesGuillaume Nault
We need to make sure the tunnel is not going to be destroyed by l2tp_tunnel_destruct() concurrently. Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: hold tunnel while processing genl delete commandGuillaume Nault
l2tp_nl_cmd_tunnel_delete() needs to take a reference on the tunnel, to prevent it from being concurrently freed by l2tp_tunnel_destruct(). Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: hold tunnel while looking up sessions in l2tp_netlinkGuillaume Nault
l2tp_tunnel_find() doesn't take a reference on the returned tunnel. Therefore, it's unsafe to use it because the returned tunnel can go away on us anytime. Fix this by defining l2tp_tunnel_get(), which works like l2tp_tunnel_find(), but takes a reference on the returned tunnel. Caller then has to drop this reference using l2tp_tunnel_dec_refcount(). As l2tp_tunnel_dec_refcount() needs to be moved to l2tp_core.h, let's simplify the patch and not move the L2TP_REFCNT_DEBUG part. This code has been broken (not even compiling) in May 2012 by commit a4ca44fa578c ("net: l2tp: Standardize logging styles") and fixed more than two years later by commit 29abe2fda54f ("l2tp: fix missing line continuation"). So it doesn't appear to be used by anyone. Same thing for l2tp_tunnel_free(); instead of moving it to l2tp_core.h, let's just simplify things and call kfree_rcu() directly in l2tp_tunnel_dec_refcount(). Extra assertions and debugging code provided by l2tp_tunnel_free() didn't help catching any of the reference counting and socket handling issues found while working on this series. Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28l2tp: initialise session's refcount before making it reachableGuillaume Nault
Sessions must be fully initialised before calling l2tp_session_add_to_tunnel(). Otherwise, there's a short time frame where partially initialised sessions can be accessed by external users. Fixes: dbdbc73b4478 ("l2tp: fix duplicate session creation") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28net: missing call of trace_napi_poll in busy_poll_stopJesper Dangaard Brouer
Noticed that busy_poll_stop() also invoke the drivers napi->poll() function pointer, but didn't have an associated call to trace_napi_poll() like all other call sites. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28xfrm_user: fix info leak in build_aevent()Mathias Krause
The memory reserved to dump the ID of the xfrm state includes a padding byte in struct xfrm_usersa_id added by the compiler for alignment. To prevent the heap info leak, memset(0) the sa_id before filling it. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Fixes: d51d081d6504 ("[IPSEC]: Sync series - user") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-28xfrm_user: fix info leak in build_expire()Mathias Krause
The memory reserved to dump the expired xfrm state includes padding bytes in struct xfrm_user_expire added by the compiler for alignment. To prevent the heap info leak, memset(0) the remainder of the struct. Initializing the whole structure isn't needed as copy_to_user_state() already takes care of clearing the padding bytes within the 'state' member. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-28xfrm_user: fix info leak in xfrm_notify_sa()Mathias Krause
The memory reserved to dump the ID of the xfrm state includes a padding byte in struct xfrm_usersa_id added by the compiler for alignment. To prevent the heap info leak, memset(0) the whole struct before filling it. Cc: Herbert Xu <herbert@gondor.apana.org.au> Fixes: 0603eac0d6b7 ("[IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete notification") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-28xfrm_user: fix info leak in copy_user_offload()Mathias Krause
The memory reserved to dump the xfrm offload state includes padding bytes of struct xfrm_user_offload added by the compiler for alignment. Add an explicit memset(0) before filling the buffer to avoid the heap info leak. Cc: Steffen Klassert <steffen.klassert@secunet.com> Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-25udp6: set rx_dst_cookie on rx_dst updatesPaolo Abeni
Currently, in the udp6 code, the dst cookie is not initialized/updated concurrently with the RX dst used by early demux. As a result, the dst_check() in the early_demux path always fails, the rx dst cache is always invalidated, and we can't really leverage significant gain from the demux lookup. Fix it adding udp6 specific variant of sk_rx_dst_set() and use it to set the dst cookie when the dst entry is really changed. The issue is there since the introduction of early demux for ipv6. Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast") Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25tcp: fix refcnt leak with ebpf congestion controlSabrina Dubroca
There are a few bugs around refcnt handling in the new BPF congestion control setsockopt: - The new ca is assigned to icsk->icsk_ca_ops even in the case where we cannot get a reference on it. This would lead to a use after free, since that ca is going away soon. - Changing the congestion control case doesn't release the refcnt on the previous ca. - In the reinit case, we first leak a reference on the old ca, then we call tcp_reinit_congestion_control on the ca that we have just assigned, leading to deinitializing the wrong ca (->release of the new ca on the old ca's data) and releasing the refcount on the ca that we actually want to use. This is visible by building (for example) BIC as a module and setting net.ipv4.tcp_congestion_control=bic, and using tcp_cong_kern.c from samples/bpf. This patch fixes the refcount issues, and moves reinit back into tcp core to avoid passing a ca pointer back to BPF. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25ipv6: Fix may be used uninitialized warning in rt6_checkSteffen Klassert
rt_cookie might be used uninitialized, fix this by initializing it. Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25esp: Fix skb tailroom calculationSteffen Klassert
We use skb_availroom to calculate the skb tailroom for the ESP trailer. skb_availroom calculates the tailroom and subtracts this value by reserved_tailroom. However reserved_tailroom is a union with the skb mark. This means that we subtract the tailroom by the skb mark if set. Fix this by using skb_tailroom instead. Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-25esp: Fix locking on page fragment allocationSteffen Klassert
We allocate the page fragment for the ESP trailer inside a spinlock, but consume it outside of the lock. This is racy as some other cou could get the same page fragment then. Fix this by consuming the page fragment inside the lock too. Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-24tipc: context imbalance at node read unlockParthasarathy Bhuvaragan
If we fail to find a valid bearer in tipc_node_get_linkname(), node_read_unlock() is called without holding the node read lock. This commit fixes this error. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24tipc: reassign pointers after skb reallocation / linearizationParthasarathy Bhuvaragan
In tipc_msg_reverse(), we assign skb attributes to local pointers in stack at startup. This is followed by skb_linearize() and for cloned buffers we perform skb relocation using pskb_expand_head(). Both these methods may update the skb attributes and thus making the pointers incorrect. In this commit, we fix this error by ensuring that the pointers are re-assigned after any of these skb operations. Fixes: 29042e19f2c60 ("tipc: let function tipc_msg_reverse() expand header when needed") Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24tipc: perform skb_linearize() before parsing the inner headerParthasarathy Bhuvaragan
In tipc_rcv(), we linearize only the header and usually the packets are consumed as the nodes permit direct reception. However, if the skb contains tunnelled message due to fail over or synchronization we parse it in tipc_node_check_state() without performing linearization. This will cause link disturbances if the skb was non linear. In this commit, we perform linearization for the above messages. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24net_sched: fix a refcount_t issue with noop_qdiscEric Dumazet
syzkaller reported a refcount_t warning [1] Issue here is that noop_qdisc refcnt was never really considered as a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set : if (qdisc->flags & TCQ_F_BUILTIN || !refcount_dec_and_test(&qdisc->refcnt))) return; Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not really needed, but harmless until refcount_t came. To fix this problem, we simply need to not increment noop_qdisc.refcnt, since we never decrement it. [1] refcount_t: increment on 0; use-after-free. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ #20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x417 kernel/panic.c:180 __warn+0x1c4/0x1d9 kernel/panic.c:541 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:273 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846 RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152 RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282 RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000 RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8 RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0 R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000 attach_default_qdiscs net/sched/sch_generic.c:792 [inline] dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833 __dev_open+0x227/0x330 net/core/dev.c:1380 __dev_change_flags+0x695/0x990 net/core/dev.c:6726 dev_change_flags+0x88/0x140 net/core/dev.c:6792 dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256 dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554 sock_do_ioctl+0x94/0xb0 net/socket.c:968 sock_ioctl+0x2c2/0x440 net/socket.c:1058 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 Fixes: 7b9364050246 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Reshetova, Elena <elena.reshetova@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24bpf: fix bpf_setsockopts return valueYuchung Cheng
This patch fixes a bug causing any sock operations to always return EINVAL. Fixes: a5192c52377e ("bpf: fix to bpf_setsockops"). Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Craig Gallek <kraig@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24tipc: Fix tipc_sk_reinit handling of -EAGAINBob Peterson
In 9dbbfb0ab6680c6a85609041011484e6658e7d3c function tipc_sk_reinit had additional logic added to loop in the event that function rhashtable_walk_next() returned -EAGAIN. No worries. However, if rhashtable_walk_start returns -EAGAIN, it does "continue", and therefore skips the call to rhashtable_walk_stop(). That has the effect of calling rcu_read_lock() without its paired call to rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem may not be apparent for a while, especially since resize events may be rare. But the comments to rhashtable_walk_start() state: * ...Note that we take the RCU lock in all * cases including when we return an error. So you must always call * rhashtable_walk_stop to clean up. This patch replaces the continue with a goto and label to ensure a matching call to rhashtable_walk_stop(). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix use after free of struct proc_dir_entry in ipt_CLUSTERIP, patch from Sabrina Dubroca. 2) Fix spurious EINVAL errors from iptables over nft compatibility layer. 3) Reload pointer to ip header only if there is non-terminal verdict, ie. XT_CONTINUE, otherwise invalid memory access may happen, patch from Taehee Yoo. 4) Fix interaction between SYNPROXY and NAT, SYNPROXY adds sequence adjustment already, however from nf_nat_setup() assumes there's not. Patch from Xin Long. 5) Fix burst arithmetics in nft_limit as Joe Stringer mentioned during NFWS in Faro. Patch from Andy Zhou. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24netfilter: nf_tables: Fix nft limit burst handlingandy zhou
Current implementation treats the burst configuration the same as rate configuration. This can cause the per packet cost to be lower than configured. In effect, this bug causes the token bucket to be refilled at a higher rate than what user has specified. This patch changes the implementation so that the token bucket size is controlled by "rate + burst", while maintain the token bucket refill rate the same as user specified. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-08-24netfilter: check for seqadj ext existence before adding it in nf_nat_setup_infoXin Long
Commit 4440a2ab3b9f ("netfilter: synproxy: Check oom when adding synproxy and seqadj ct extensions") wanted to drop the packet when it fails to add seqadj ext due to no memory by checking if nfct_seqadj_ext_add returns NULL. But that nfct_seqadj_ext_add returns NULL can also happen when seqadj ext already exists in a nf_conn. It will cause that userspace protocol doesn't work when both dnat and snat are configured. Li Shuang found this issue in the case: Topo: ftp client router ftp server 10.167.131.2 <-> 10.167.131.254 10.167.141.254 <-> 10.167.141.1 Rules: # iptables -t nat -A PREROUTING -i eth1 -p tcp -m tcp --dport 21 -j \ DNAT --to-destination 10.167.141.1 # iptables -t nat -A POSTROUTING -o eth2 -p tcp -m tcp --dport 21 -j \ SNAT --to-source 10.167.141.254 In router, when both dnat and snat are added, nf_nat_setup_info will be called twice. The packet can be dropped at the 2nd time for DNAT due to seqadj ext is already added at the 1st time for SNAT. This patch is to fix it by checking for seqadj ext existence before adding it, so that the packet will not be dropped if seqadj ext already exists. Note that as Florian mentioned, as a long term, we should review ext_add() behaviour, it's better to return a pointer to the existing ext instead. Fixes: 4440a2ab3b9f ("netfilter: synproxy: Check oom when adding synproxy and seqadj ct extensions") Reported-by: Li Shuang <shuali@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-08-24net: xfrm: don't double-hold dst when sk_policy in use.Lorenzo Colitti
While removing dst_entry garbage collection, commit 52df157f17e5 ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle") changed xfrm_resolve_and_create_bundle so it returns an xdst with a refcount of 1 instead of 0. However, it did not delete the dst_hold performed by xfrm_lookup when a per-socket policy is in use. This means that when a socket policy is in use, dst entries returned by xfrm_lookup have a refcount of 2, and are not freed when no longer in use. Cc: Wei Wang <weiwan@google.com> Fixes: 52df157f17 ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle") Tested: https://android-review.googlesource.com/417481 Tested: https://android-review.googlesource.com/418659 Tested: https://android-review.googlesource.com/424463 Tested: https://android-review.googlesource.com/452776 passes on net-next Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-23sctp: Avoid out-of-bounds reads from address storageStefano Brivio
inet_diag_msg_sctp{,l}addr_fill() and sctp_get_sctp_info() copy sizeof(sockaddr_storage) bytes to fill in sockaddr structs used to export diagnostic information to userspace. However, the memory allocated to store sockaddr information is smaller than that and depends on the address family, so we leak up to 100 uninitialized bytes to userspace. Just use the size of the source structs instead, in all the three cases this is what userspace expects. Zero out the remaining memory. Unused bytes (i.e. when IPv4 addresses are used) in source structs sctp_sockaddr_entry and sctp_transport are already cleared by sctp_add_bind_addr() and sctp_transport_new(), respectively. Noticed while testing KASAN-enabled kernel with 'ss': [ 2326.885243] BUG: KASAN: slab-out-of-bounds in inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] at addr ffff881be8779800 [ 2326.896800] Read of size 128 by task ss/9527 [ 2326.901564] CPU: 0 PID: 9527 Comm: ss Not tainted 4.11.0-22.el7a.x86_64 #1 [ 2326.909236] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017 [ 2326.917585] Call Trace: [ 2326.920312] dump_stack+0x63/0x8d [ 2326.924014] kasan_object_err+0x21/0x70 [ 2326.928295] kasan_report+0x288/0x540 [ 2326.932380] ? inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] [ 2326.938500] ? skb_put+0x8b/0xd0 [ 2326.942098] ? memset+0x31/0x40 [ 2326.945599] check_memory_region+0x13c/0x1a0 [ 2326.950362] memcpy+0x23/0x50 [ 2326.953669] inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] [ 2326.959596] ? inet_diag_msg_sctpasoc_fill+0x460/0x460 [sctp_diag] [ 2326.966495] ? __lock_sock+0x102/0x150 [ 2326.970671] ? sock_def_wakeup+0x60/0x60 [ 2326.975048] ? remove_wait_queue+0xc0/0xc0 [ 2326.979619] sctp_diag_dump+0x44a/0x760 [sctp_diag] [ 2326.985063] ? sctp_ep_dump+0x280/0x280 [sctp_diag] [ 2326.990504] ? memset+0x31/0x40 [ 2326.994007] ? mutex_lock+0x12/0x40 [ 2326.997900] __inet_diag_dump+0x57/0xb0 [inet_diag] [ 2327.003340] ? __sys_sendmsg+0x150/0x150 [ 2327.007715] inet_diag_dump+0x4d/0x80 [inet_diag] [ 2327.012979] netlink_dump+0x1e6/0x490 [ 2327.017064] __netlink_dump_start+0x28e/0x2c0 [ 2327.021924] inet_diag_handler_cmd+0x189/0x1a0 [inet_diag] [ 2327.028045] ? inet_diag_rcv_msg_compat+0x1b0/0x1b0 [inet_diag] [ 2327.034651] ? inet_diag_dump_compat+0x190/0x190 [inet_diag] [ 2327.040965] ? __netlink_lookup+0x1b9/0x260 [ 2327.045631] sock_diag_rcv_msg+0x18b/0x1e0 [ 2327.050199] netlink_rcv_skb+0x14b/0x180 [ 2327.054574] ? sock_diag_bind+0x60/0x60 [ 2327.058850] sock_diag_rcv+0x28/0x40 [ 2327.062837] netlink_unicast+0x2e7/0x3b0 [ 2327.067212] ? netlink_attachskb+0x330/0x330 [ 2327.071975] ? kasan_check_write+0x14/0x20 [ 2327.076544] netlink_sendmsg+0x5be/0x730 [ 2327.080918] ? netlink_unicast+0x3b0/0x3b0 [ 2327.085486] ? kasan_check_write+0x14/0x20 [ 2327.090057] ? selinux_socket_sendmsg+0x24/0x30 [ 2327.095109] ? netlink_unicast+0x3b0/0x3b0 [ 2327.099678] sock_sendmsg+0x74/0x80 [ 2327.103567] ___sys_sendmsg+0x520/0x530 [ 2327.107844] ? __get_locked_pte+0x178/0x200 [ 2327.112510] ? copy_msghdr_from_user+0x270/0x270 [ 2327.117660] ? vm_insert_page+0x360/0x360 [ 2327.122133] ? vm_insert_pfn_prot+0xb4/0x150 [ 2327.126895] ? vm_insert_pfn+0x32/0x40 [ 2327.131077] ? vvar_fault+0x71/0xd0 [ 2327.134968] ? special_mapping_fault+0x69/0x110 [ 2327.140022] ? __do_fault+0x42/0x120 [ 2327.144008] ? __handle_mm_fault+0x1062/0x17a0 [ 2327.148965] ? __fget_light+0xa7/0xc0 [ 2327.153049] __sys_sendmsg+0xcb/0x150 [ 2327.157133] ? __sys_sendmsg+0xcb/0x150 [ 2327.161409] ? SyS_shutdown+0x140/0x140 [ 2327.165688] ? exit_to_usermode_loop+0xd0/0xd0 [ 2327.170646] ? __do_page_fault+0x55d/0x620 [ 2327.175216] ? __sys_sendmsg+0x150/0x150 [ 2327.179591] SyS_sendmsg+0x12/0x20 [ 2327.183384] do_syscall_64+0xe3/0x230 [ 2327.187471] entry_SYSCALL64_slow_path+0x25/0x25 [ 2327.192622] RIP: 0033:0x7f41d18fa3b0 [ 2327.196608] RSP: 002b:00007ffc3b731218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 2327.205055] RAX: ffffffffffffffda RBX: 00007ffc3b731380 RCX: 00007f41d18fa3b0 [ 2327.213017] RDX: 0000000000000000 RSI: 00007ffc3b731340 RDI: 0000000000000003 [ 2327.220978] RBP: 0000000000000002 R08: 0000000000000004 R09: 0000000000000040 [ 2327.228939] R10: 00007ffc3b730f30 R11: 0000000000000246 R12: 0000000000000003 [ 2327.236901] R13: 00007ffc3b731340 R14: 00007ffc3b7313d0 R15: 0000000000000084 [ 2327.244865] Object at ffff881be87797e0, in cache kmalloc-64 size: 64 [ 2327.251953] Allocated: [ 2327.254581] PID = 9484 [ 2327.257215] save_stack_trace+0x1b/0x20 [ 2327.261485] save_stack+0x46/0xd0 [ 2327.265179] kasan_kmalloc+0xad/0xe0 [ 2327.269165] kmem_cache_alloc_trace+0xe6/0x1d0 [ 2327.274138] sctp_add_bind_addr+0x58/0x180 [sctp] [ 2327.279400] sctp_do_bind+0x208/0x310 [sctp] [ 2327.284176] sctp_bind+0x61/0xa0 [sctp] [ 2327.288455] inet_bind+0x5f/0x3a0 [ 2327.292151] SYSC_bind+0x1a4/0x1e0 [ 2327.295944] SyS_bind+0xe/0x10 [ 2327.299349] do_syscall_64+0xe3/0x230 [ 2327.303433] return_from_SYSCALL_64+0x0/0x6a [ 2327.308194] Freed: [ 2327.310434] PID = 4131 [ 2327.313065] save_stack_trace+0x1b/0x20 [ 2327.317344] save_stack+0x46/0xd0 [ 2327.321040] kasan_slab_free+0x73/0xc0 [ 2327.325220] kfree+0x96/0x1a0 [ 2327.328530] dynamic_kobj_release+0x15/0x40 [ 2327.333195] kobject_release+0x99/0x1e0 [ 2327.337472] kobject_put+0x38/0x70 [ 2327.341266] free_notes_attrs+0x66/0x80 [ 2327.345545] mod_sysfs_teardown+0x1a5/0x270 [ 2327.350211] free_module+0x20/0x2a0 [ 2327.354099] SyS_delete_module+0x2cb/0x2f0 [ 2327.358667] do_syscall_64+0xe3/0x230 [ 2327.362750] return_from_SYSCALL_64+0x0/0x6a [ 2327.367510] Memory state around the buggy address: [ 2327.372855] ffff881be8779700: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc [ 2327.380914] ffff881be8779780: fb fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00 [ 2327.388972] >ffff881be8779800: 00 00 00 00 fc fc fc fc fb fb fb fb fb fb fb fb [ 2327.397031] ^ [ 2327.401792] ffff881be8779880: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc [ 2327.409850] ffff881be8779900: 00 00 00 00 00 04 fc fc fc fc fc fc 00 00 00 00 [ 2327.417907] ================================================================== This fixes CVE-2017-7558. References: https://bugzilla.redhat.com/show_bug.cgi?id=1480266 Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Cc: Xin Long <lucien.xin@gmail.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23net: dsa: use consume_skb()Eric Dumazet
Two kfree_skb() should be consume_skb(), to be friend with drop monitor (perf record ... -e skb:kfree_skb) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23net: dsa: skb_put_padto() already frees nskbFlorian Fainelli
The first call of skb_put_padto() will free up the SKB on error, but we return NULL which tells dsa_slave_xmit() that the original SKB should be freed so this would lead to a double free here. The second skb_put_padto() already frees the passed sk_buff reference upon error, so calling kfree_skb() on it again is not necessary. Detected by CoverityScan, CID#1416687 ("USE_AFTER_FREE") Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23net: core: Specify skb_pad()/skb_put_padto() SKB freeingFlorian Fainelli
Rename skb_pad() into __skb_pad() and make it take a third argument: free_on_error which controls whether kfree_skb() should be called or not, skb_pad() directly makes use of it and passes true to preserve its existing behavior. Do exactly the same thing with __skb_put_padto() and skb_put_padto(). Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22net: sched: don't do tcf_chain_flush from tcf_chain_destroyJiri Pirko
tcf_chain_flush needs to be called with RTNL. However, on free_tcf-> tcf_action_goto_chain_fini-> tcf_chain_put-> tcf_chain_destroy-> tcf_chain_flush callpath, it is called without RTNL. This issue was notified by following warning: [ 155.599052] WARNING: suspicious RCU usage [ 155.603165] 4.13.0-rc5jiri+ #54 Not tainted [ 155.607456] ----------------------------- [ 155.611561] net/sched/cls_api.c:195 suspicious rcu_dereference_protected() usage! Since on this callpath, the chain is guaranteed to be already empty by check in tcf_chain_put, move the tcf_chain_flush call out and call it only where it is needed - into tcf_block_put. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22net: sched: fix use after free when tcf_chain_destroy is called multiple timesJiri Pirko
The goto_chain termination action takes a reference of a chain. In that case, there is an issue when block_put is called tcf_chain_destroy directly. The follo-up call of tcf_chain_put by goto_chain action free works with memory that is already freed. This was caught by kasan: [ 220.337908] BUG: KASAN: use-after-free in tcf_chain_put+0x1b/0x50 [ 220.344103] Read of size 4 at addr ffff88036d1f2cec by task systemd-journal/261 [ 220.353047] CPU: 0 PID: 261 Comm: systemd-journal Not tainted 4.13.0-rc5jiri+ #54 [ 220.360661] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016 [ 220.371784] Call Trace: [ 220.374290] <IRQ> [ 220.376355] dump_stack+0xd5/0x150 [ 220.391485] print_address_description+0x86/0x410 [ 220.396308] kasan_report+0x181/0x4c0 [ 220.415211] tcf_chain_put+0x1b/0x50 [ 220.418949] free_tcf+0x95/0xc0 So allow tcf_chain_destroy to be called multiple times, free only in case the reference count drops to 0. Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22udp: on peeking bad csum, drop packets even if not at headEric Dumazet
When peeking, if a bad csum is discovered, the skb is unlinked from the queue with __sk_queue_drop_skb and the peek operation restarted. __sk_queue_drop_skb only drops packets that match the queue head. This fails if the skb was found after the head, using SO_PEEK_OFF socket option. This causes an infinite loop. We MUST drop this problematic skb, and we can simply check if skb was already removed by another thread, by looking at skb->next : This pointer is set to NULL by the __skb_unlink() operation, that might have happened only under the spinlock protection. Many thanks to syzkaller team (and particularly Dmitry Vyukov who provided us nice C reproducers exhibiting the lockup) and Willem de Bruijn who provided first version for this patch and a test program. Fixes: 627d2d6b5500 ("udp: enable MSG_PEEK at non-zero offset") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22tipc: fix a race condition of releasing subscriber objectYing Xue
No matter whether a request is inserted into workqueue as a work item to cancel a subscription or to delete a subscription's subscriber asynchronously, the work items may be executed in different workers. As a result, it doesn't mean that one request which is raised prior to another request is definitely handled before the latter. By contrast, if the latter request is executed before the former request, below error may happen: [ 656.183644] BUG: spinlock bad magic on CPU#0, kworker/u8:0/12117 [ 656.184487] general protection fault: 0000 [#1] SMP [ 656.185160] Modules linked in: tipc ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio [last unloaded: ip6_udp_tunnel] [ 656.187003] CPU: 0 PID: 12117 Comm: kworker/u8:0 Not tainted 4.11.0-rc7+ #6 [ 656.187920] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 656.188690] Workqueue: tipc_rcv tipc_recv_work [tipc] [ 656.189371] task: ffff88003f5cec40 task.stack: ffffc90004448000 [ 656.190157] RIP: 0010:spin_bug+0xdd/0xf0 [ 656.190678] RSP: 0018:ffffc9000444bcb8 EFLAGS: 00010202 [ 656.191375] RAX: 0000000000000034 RBX: ffff88003f8d1388 RCX: 0000000000000000 [ 656.192321] RDX: ffff88003ba13708 RSI: ffff88003ba0cd08 RDI: ffff88003ba0cd08 [ 656.193265] RBP: ffffc9000444bcd0 R08: 0000000000000030 R09: 000000006b6b6b6b [ 656.194208] R10: ffff8800bde3e000 R11: 00000000000001b4 R12: 6b6b6b6b6b6b6b6b [ 656.195157] R13: ffffffff81a3ca64 R14: ffff88003f8d1388 R15: ffff88003f8d13a0 [ 656.196101] FS: 0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000 [ 656.197172] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 656.197935] CR2: 00007f0b3d2e6000 CR3: 000000003ef9e000 CR4: 00000000000006f0 [ 656.198873] Call Trace: [ 656.199210] do_raw_spin_lock+0x66/0xa0 [ 656.199735] _raw_spin_lock_bh+0x19/0x20 [ 656.200258] tipc_subscrb_subscrp_delete+0x28/0xf0 [tipc] [ 656.200990] tipc_subscrb_rcv_cb+0x45/0x260 [tipc] [ 656.201632] tipc_receive_from_sock+0xaf/0x100 [tipc] [ 656.202299] tipc_recv_work+0x2b/0x60 [tipc] [ 656.202872] process_one_work+0x157/0x420 [ 656.203404] worker_thread+0x69/0x4c0 [ 656.203898] kthread+0x138/0x170 [ 656.204328] ? process_one_work+0x420/0x420 [ 656.204889] ? kthread_create_on_node+0x40/0x40 [ 656.205527] ret_from_fork+0x29/0x40 [ 656.206012] Code: 48 8b 0c 25 00 c5 00 00 48 c7 c7 f0 24 a3 81 48 81 c1 f0 05 00 00 65 8b 15 61 ef f5 7e e8 9a 4c 09 00 4d 85 e4 44 8b 4b 08 74 92 <45> 8b 84 24 40 04 00 00 49 8d 8c 24 f0 05 00 00 eb 8d 90 0f 1f [ 656.208504] RIP: spin_bug+0xdd/0xf0 RSP: ffffc9000444bcb8 [ 656.209798] ---[ end trace e2a800e6eb0770be ]--- In above scenario, the request of deleting subscriber was performed earlier than the request of canceling a subscription although the latter was issued before the former, which means tipc_subscrb_delete() was called before tipc_subscrp_cancel(). As a result, when tipc_subscrb_subscrp_delete() called by tipc_subscrp_cancel() was executed to cancel a subscription, the subscription's subscriber refcnt had been decreased to 1. After tipc_subscrp_delete() where the subscriber was freed because its refcnt was decremented to zero, but the subscriber's lock had to be released, as a consequence, panic happened. By contrast, if we increase subscriber's refcnt before tipc_subscrb_subscrp_delete() is called in tipc_subscrp_cancel(), the panic issue can be avoided. Fixes: d094c4d5f5c7 ("tipc: add subscription refcount to avoid invalid delete") Reported-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22tipc: remove subscription references only for pending timersParthasarathy Bhuvaragan
In commit, 139bb36f754a ("tipc: advance the time of deleting subscription from subscriber->subscrp_list"), we delete the subscription from the subscribers list and from nametable unconditionally. This leads to the following bug if the timer running tipc_subscrp_timeout() in another CPU accesses the subscription list after the subscription delete request. [39.570] general protection fault: 0000 [#1] SMP :: [39.574] task: ffffffff81c10540 task.stack: ffffffff81c00000 [39.575] RIP: 0010:tipc_subscrp_timeout+0x32/0x80 [tipc] [39.576] RSP: 0018:ffff88003ba03e90 EFLAGS: 00010282 [39.576] RAX: dead000000000200 RBX: ffff88003f0f3600 RCX: 0000000000000101 [39.577] RDX: dead000000000100 RSI: 0000000000000201 RDI: ffff88003f0d7948 [39.578] RBP: ffff88003ba03ea0 R08: 0000000000000001 R09: ffff88003ba03ef8 [39.579] R10: 000000000000014f R11: 0000000000000000 R12: ffff88003f0d7948 [39.580] R13: ffff88003f0f3618 R14: ffffffffa006c250 R15: ffff88003f0f3600 [39.581] FS: 0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000 [39.582] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [39.583] CR2: 00007f831c6e0714 CR3: 000000003d3b0000 CR4: 00000000000006f0 [39.584] Call Trace: [39.584] <IRQ> [39.585] call_timer_fn+0x3d/0x180 [39.585] ? tipc_subscrb_rcv_cb+0x260/0x260 [tipc] [39.586] run_timer_softirq+0x168/0x1f0 [39.586] ? sched_clock_cpu+0x16/0xc0 [39.587] __do_softirq+0x9b/0x2de [39.587] irq_exit+0x60/0x70 [39.588] smp_apic_timer_interrupt+0x3d/0x50 [39.588] apic_timer_interrupt+0x86/0x90 [39.589] RIP: 0010:default_idle+0x20/0xf0 [39.589] RSP: 0018:ffffffff81c03e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [39.590] RAX: 0000000000000000 RBX: ffffffff81c10540 RCX: 0000000000000000 [39.591] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [39.592] RBP: ffffffff81c03e68 R08: 0000000000000000 R09: 0000000000000000 [39.593] R10: ffffc90001cbbe00 R11: 0000000000000000 R12: 0000000000000000 [39.594] R13: ffffffff81c10540 R14: 0000000000000000 R15: 0000000000000000 [39.595] </IRQ> :: [39.603] RIP: tipc_subscrp_timeout+0x32/0x80 [tipc] RSP: ffff88003ba03e90 [39.604] ---[ end trace 79ce94b7216cb459 ]--- Fixes: 139bb36f754a ("tipc: advance the time of deleting subscription from subscriber->subscrp_list") Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22net/hsr: Check skb_put_padto() return valueFlorian Fainelli
skb_put_padto() will free the sk_buff passed as reference in case of errors, but we still need to check its return value and decide what to do. Detected by CoverityScan, CID#1416688 ("CHECKED_RETURN") Fixes: ee1c27977284 ("net/hsr: Added support for HSR v1") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22ipv6: add rcu grace period before freeing fib6_nodeWei Wang
We currently keep rt->rt6i_node pointing to the fib6_node for the route. And some functions make use of this pointer to dereference the fib6_node from rt structure, e.g. rt6_check(). However, as there is neither refcount nor rcu taken when dereferencing rt->rt6i_node, it could potentially cause crashes as rt->rt6i_node could be set to NULL by other CPUs when doing a route deletion. This patch introduces an rcu grace period before freeing fib6_node and makes sure the functions that dereference it takes rcu_read_lock(). Note: there is no "Fixes" tag because this bug was there in a very early stage. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-08-21 1) Fix memleaks when ESP takes an error path. 2) Fix null pointer dereference when creating a sub policy that matches the same outer flow as main policy does. From Koichiro Den. 3) Fix possible out-of-bound access in xfrm_migrate. This patch should go to the stable trees too. From Vladis Dronov. 4) ESP can return positive and negative error values, so treat both cases as an error. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()Stefano Brivio
A packet length of exactly IPV6_MAXPLEN is allowed, we should refuse parsing options only if the size is 64KiB or more. While at it, remove one extra variable and one assignment which were also introduced by the commit that introduced the size check. Checking the sum 'offset + len' and only later adding 'len' to 'offset' doesn't provide any advantage over directly summing to 'offset' and checking it. Fixes: 6399f1fae4ec ("ipv6: avoid overflow of offset in ip6_find_1stfragopt") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-20ipv6: repair fib6 tree in failure caseWei Wang
In fib6_add(), it is possible that fib6_add_1() picks an intermediate node and sets the node's fn->leaf to NULL in order to add this new route. However, if fib6_add_rt2node() fails to add the new route for some reason, fn->leaf will be left as NULL and could potentially cause crash when fn->leaf is accessed in fib6_locate(). This patch makes sure fib6_repair_tree() is called to properly repair fn->leaf in the above failure case. Here is the syzkaller reported general protection fault in fib6_locate: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000 RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline] RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] ipv6_prefix_equal include/net/ipv6.h:492 [inline] RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline] RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233 RSP: 0018:ffff8801d01a36a8 EFLAGS: 00010202 RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000 RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100 RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001 R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000 FS: 00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0 ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0 ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988 Call Trace: [<ffffffff82a223d6>] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109 [<ffffffff82a23f9d>] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075 [<ffffffff82621359>] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450 [<ffffffff8274c1d1>] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281 [<ffffffff82613ddf>] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456 [<ffffffff8274ad38>] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline] [<ffffffff8274ad38>] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232 [<ffffffff8274b83e>] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778 [<ffffffff82564aff>] sock_sendmsg_nosec net/socket.c:609 [inline] [<ffffffff82564aff>] sock_sendmsg+0xcf/0x110 net/socket.c:619 [<ffffffff82564d62>] sock_write_iter+0x222/0x3a0 net/socket.c:834 [<ffffffff8178523d>] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478 [<ffffffff817853f4>] __vfs_write+0xe4/0x110 fs/read_write.c:491 [<ffffffff81786c38>] vfs_write+0x178/0x4b0 fs/read_write.c:538 [<ffffffff817892a9>] SYSC_write fs/read_write.c:585 [inline] [<ffffffff817892a9>] SyS_write+0xd9/0x1b0 fs/read_write.c:577 [<ffffffff82c71e32>] entry_SYSCALL_64_fastpath+0x12/0x17 Note: there is no "Fixes" tag as this seems to be a bug introduced very early. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18net: sched: fix NULL pointer dereference when action calls some targetsXin Long
As we know in some target's checkentry it may dereference par.entryinfo to check entry stuff inside. But when sched action calls xt_check_target, par.entryinfo is set with NULL. It would cause kernel panic when calling some targets. It can be reproduce with: # tc qd add dev eth1 ingress handle ffff: # tc filter add dev eth1 parent ffff: u32 match u32 0 0 action xt \ -j ECN --ecn-tcp-remove It could also crash kernel when using target CLUSTERIP or TPROXY. By now there's no proper value for par.entryinfo in ipt_init_target, but it can not be set with NULL. This patch is to void all these panics by setting it with an ipt_entry obj with all members = 0. Note that this issue has been there since the very beginning. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18rxrpc: Fix oops when discarding a preallocated service callDavid Howells
rxrpc_service_prealloc_one() doesn't set the socket pointer on any new call it preallocates, but does add it to the rxrpc net namespace call list. This, however, causes rxrpc_put_call() to oops when the call is discarded when the socket is closed. rxrpc_put_call() needs the socket to be able to reach the namespace so that it can use a lock held therein. Fix this by setting a call's socket pointer immediately before discarding it. This can be triggered by unloading the kafs module, resulting in an oops like the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: rxrpc_put_call+0x1e2/0x32d PGD 0 P4D 0 Oops: 0000 [#1] SMP Modules linked in: kafs(E-) CPU: 3 PID: 3037 Comm: rmmod Tainted: G E 4.12.0-fscache+ #213 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 task: ffff8803fc92e2c0 task.stack: ffff8803fef74000 RIP: 0010:rxrpc_put_call+0x1e2/0x32d RSP: 0018:ffff8803fef77e08 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8803fab99ac0 RCX: 000000000000000f RDX: ffffffff81c50a40 RSI: 000000000000000c RDI: ffff8803fc92ea88 RBP: ffff8803fef77e30 R08: ffff8803fc87b941 R09: ffffffff82946d20 R10: ffff8803fef77d10 R11: 00000000000076fc R12: 0000000000000005 R13: ffff8803fab99c20 R14: 0000000000000001 R15: ffffffff816c6aee FS: 00007f915a059700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 00000003fef39000 CR4: 00000000001406e0 Call Trace: rxrpc_discard_prealloc+0x325/0x341 rxrpc_listen+0xf9/0x146 kernel_listen+0xb/0xd afs_close_socket+0x3e/0x173 [kafs] afs_exit+0x1f/0x57 [kafs] SyS_delete_module+0x10f/0x19a do_syscall_64+0x8a/0x149 entry_SYSCALL64_slow_path+0x25/0x25 Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>