summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Reject invalid updates to netfilter expectation policies, from Pablo Neira Ayuso. 2) Fix memory leak in nfnl_cthelper, from Jeffy Chen. 3) Don't do stupid things if we get a neigh_probe() on a neigh entry whose ops lack a solicit method. From Eric Dumazet. 4) Don't transmit packets in r8152 driver when the carrier is off, from Hayes Wang. 5) Fix ipv6 packet type detection in aquantia driver, from Pavel Belous. 6) Don't write uninitialized data into hw registers in bna driver, from Arnd Bergmann. 7) Fix locking in ping_unhash(), from Eric Dumazet. 8) Make BPF verifier range checks able to understand certain sequences emitted by LLVM, from Alexei Starovoitov. 9) Fix use after free in ipconfig, from Mark Rutland. 10) Fix refcount leak on force commit in openvswitch, from Jarno Rajahalme. 11) Fix various overflow checks in AF_PACKET, from Andrey Konovalov. 12) Fix endianness bug in be2net driver, from Suresh Reddy. 13) Don't forget to wake TX queues when processing a timeout, from Grygorii Strashko. 14) ARP header on-stack storage is wrong in flow dissector, from Simon Horman. 15) Lost retransmit and reordering SNMP stats in TCP can be underreported. From Yuchung Cheng. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (82 commits) nfp: fix potential use after free on xdp prog tcp: fix reordering SNMP under-counting tcp: fix lost retransmit SNMP under-counting sctp: get sock from transport in sctp_transport_update_pmtu net: ethernet: ti: cpsw: fix race condition during open() l2tp: fix PPP pseudo-wire auto-loading bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_* l2tp: take reference on sessions being dumped tcp: minimize false-positives on TCP/GRO check sctp: check for dst and pathmtu update in sctp_packet_config flow dissector: correct size of storage for ARP net: ethernet: ti: cpsw: wake tx queues on ndo_tx_timeout l2tp: take a reference on sessions used in genetlink handlers l2tp: hold session while sending creation notifications l2tp: fix duplicate session creation l2tp: ensure session can't get removed during pppol2tp_session_ioctl() l2tp: fix race in l2tp_recv_common() sctp: use right in and out stream cnt bpf: add various verifier test cases for self-tests bpf, verifier: fix rejection of unaligned access checks for map_value_adj ...
2017-04-05tcp: fix reordering SNMP under-countingYuchung Cheng
Currently the reordering SNMP counters only increase if a connection sees a higher degree then it has previously seen. It ignores if the reordering degree is not greater than the default system threshold. This significantly under-counts the number of reordering events and falsely convey that reordering is rare on the network. This patch properly and faithfully records the number of reordering events detected by the TCP stack, just like the comment says "this exciting event is worth to be remembered". Note that even so TCP still under-estimate the actual reordering events because TCP requires TS options or certain packet sequences to detect reordering (i.e. ACKing never-retransmitted sequence in recovery or disordered state). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05tcp: fix lost retransmit SNMP under-countingYuchung Cheng
The lost retransmit SNMP stat is under-counting retransmission that uses segment offloading. This patch fixes that so all retransmission related SNMP counters are consistent. Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05sctp: get sock from transport in sctp_transport_update_pmtuXin Long
This patch is almost to revert commit 02f3d4ce9e81 ("sctp: Adjust PMTU updates to accomodate route invalidation."). As t->asoc can't be NULL in sctp_transport_update_pmtu, it could get sk from asoc, and no need to pass sk into that function. It is also to remove some duplicated codes from that function. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-04l2tp: fix PPP pseudo-wire auto-loadingGuillaume Nault
PPP pseudo-wire type is 7 (11 is L2TP_PWTYPE_IP). Fixes: f1f39f911027 ("l2tp: auto load type modules") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-04l2tp: take reference on sessions being dumpedGuillaume Nault
Take a reference on the sessions returned by l2tp_session_find_nth() (and rename it l2tp_session_get_nth() to reflect this change), so that caller is assured that the session isn't going to disappear while processing it. For procfs and debugfs handlers, the session is held in the .start() callback and dropped in .show(). Given that pppol2tp_seq_session_show() dereferences the associated PPPoL2TP socket and that l2tp_dfs_seq_session_show() might call pppol2tp_show(), we also need to call the session's .ref() callback to prevent the socket from going away from under us. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Fixes: 0ad6614048cf ("l2tp: Add debugfs files for dumping l2tp debug info") Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-03tcp: minimize false-positives on TCP/GRO checkMarcelo Ricardo Leitner
Markus Trippelsdorf reported that after commit dcb17d22e1c2 ("tcp: warn on bogus MSS and try to amend it") the kernel started logging the warning for a NIC driver that doesn't even support GRO. It was diagnosed that it was possibly caused on connections that were using TCP Timestamps but some packets lacked the Timestamps option. As we reduce rcv_mss when timestamps are used, the lack of them would cause the packets to be bigger than expected, although this is a valid case. As this warning is more as a hint, getting a clean-cut on the threshold is probably not worth the execution time spent on it. This patch thus alleviates the false-positives with 2 quick checks: by accounting for the entire TCP option space and also checking against the interface MTU if it's available. These changes, specially the MTU one, might mask some real positives, though if they are really happening, it's possible that sooner or later it will be triggered anyway. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-03sctp: check for dst and pathmtu update in sctp_packet_configXin Long
This patch is to move sctp_transport_dst_check into sctp_packet_config from sctp_packet_transmit and add pathmtu check in sctp_packet_config. With this fix, sctp can update dst or pathmtu before appending chunks, which can void dropping packets in sctp_packet_transmit when dst is obsolete or dst's mtu is changed. This patch is also to improve some other codes in sctp_packet_config. It updates packet max_size with gso_max_size, checks for dst and pathmtu, and appends ecne chunk only when packet is empty and asoc is not NULL. It makes sctp flush work better, as we only need to set up them once for one flush schedule. It's also safe, since asoc is NULL only when the packet is created by sctp_ootb_pkt_new in which it just gets the new dst, no need to do more things for it other than set packet with transport's pathmtu. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-03flow dissector: correct size of storage for ARPSimon Horman
The last argument to __skb_header_pointer() should be a buffer large enough to store struct arphdr. This can be a pointer to a struct arphdr structure. The code was previously using a pointer to a pointer to struct arphdr. By my counting the storage available both before and after is 8 bytes on x86_64. Fixes: 55733350e5e8 ("flow disector: ARP support") Reported-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01l2tp: take a reference on sessions used in genetlink handlersGuillaume Nault
Callers of l2tp_nl_session_find() need to hold a reference on the returned session since there's no guarantee that it isn't going to disappear from under them. Relying on the fact that no l2tp netlink message may be processed concurrently isn't enough: sessions can be deleted by other means (e.g. by closing the PPPOL2TP socket of a ppp pseudowire). l2tp_nl_cmd_session_delete() is a bit special: it runs a callback function that may require a previous call to session->ref(). In particular, for ppp pseudowires, the callback is l2tp_session_delete(), which then calls pppol2tp_session_close() and dereferences the PPPOL2TP socket. The socket might already be gone at the moment l2tp_session_delete() calls session->ref(), so we need to take a reference during the session lookup. So we need to pass the do_ref variable down to l2tp_session_get() and l2tp_session_get_by_ifname(). Since all callers have to be updated, l2tp_session_find_by_ifname() and l2tp_nl_session_find() are renamed to reflect their new behaviour. Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01l2tp: hold session while sending creation notificationsGuillaume Nault
l2tp_session_find() doesn't take any reference on the returned session. Therefore, the session may disappear while sending the notification. Use l2tp_session_get() instead and decrement session's refcount once the notification is sent. Fixes: 33f72e6f0c67 ("l2tp : multicast notification to the registered listeners") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01l2tp: fix duplicate session creationGuillaume Nault
l2tp_session_create() relies on its caller for checking for duplicate sessions. This is racy since a session can be concurrently inserted after the caller's verification. Fix this by letting l2tp_session_create() verify sessions uniqueness upon insertion. Callers need to be adapted to check for l2tp_session_create()'s return code instead of calling l2tp_session_find(). pppol2tp_connect() is a bit special because it has to work on existing sessions (if they're not connected) or to create a new session if none is found. When acting on a preexisting session, a reference must be held or it could go away on us. So we have to use l2tp_session_get() instead of l2tp_session_find() and drop the reference before exiting. Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01l2tp: ensure session can't get removed during pppol2tp_session_ioctl()Guillaume Nault
Holding a reference on session is required before calling pppol2tp_session_ioctl(). The session could get freed while processing the ioctl otherwise. Since pppol2tp_session_ioctl() uses the session's socket, we also need to take a reference on it in l2tp_session_get(). Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01l2tp: fix race in l2tp_recv_common()Guillaume Nault
Taking a reference on sessions in l2tp_recv_common() is racy; this has to be done by the callers. To this end, a new function is required (l2tp_session_get()) to atomically lookup a session and take a reference on it. Callers then have to manually drop this reference. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01sctp: use right in and out stream cntXin Long
Since sctp reconf was added in sctp, the real cnt of in/out stream have not been c.sinit_max_instreams and c.sinit_num_ostreams any more. This patch is to replace them with stream->in/outcnt. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01Merge tag 'mac80211-for-davem-2017-03-31' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two fixes: * don't block netdev queues (indefinitely!) if mac80211 manages traffic queueing itself * check wiphy registration before checking for ops on resume, to avoid crash ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01openvswitch: Fix ovs_flow_key_update()Yi-Hung Wei
ovs_flow_key_update() is called when the flow key is invalid, and it is used to update and revalidate the flow key. Commit 329f45bc4f19 ("openvswitch: add mac_proto field to the flow key") introduces mac_proto field to flow key and use it to determine whether the flow key is valid. However, the commit does not update the code path in ovs_flow_key_update() to revalidate the flow key which may cause BUG_ON() on execute_recirc(). This patch addresses the aforementioned issue. Fixes: 329f45bc4f19 ("openvswitch: add mac_proto field to the flow key") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01Merge tag 'nfsd-4.11-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "The restriction of NFSv4 to TCP went overboard and also broke the backchannel; fix. Also some minor refinements to the nfsd version-setting interface that we'd like to get fixed before release" * tag 'nfsd-4.11-1' of git://linux-nfs.org/~bfields/linux: svcrdma: set XPT_CONG_CTRL flag for bc xprt NFSD: fix nfsd_reset_versions for NFSv4. NFSD: fix nfsd_minorversion(.., NFSD_AVAIL) NFSD: further refinement of content of /proc/fs/nfsd/versions nfsd: map the ENOKEY to nfserr_perm for avoiding warning SUNRPC/backchanel: set XPT_CONG_CTRL flag for bc xprt
2017-03-30sctp: alloc stream info when initializing asocXin Long
When sending a msg without asoc established, sctp will send INIT packet first and then enqueue chunks. Before receiving INIT_ACK, stream info is not yet alloced. But enqueuing chunks needs to access stream info, like out stream state and out stream cnt. This patch is to fix it by allocing out stream info when initializing an asoc, allocing in stream and re-allocing out stream when processing init. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30net/packet: fix overflow in check for tp_reserveAndrey Konovalov
When calculating po->tp_hdrlen + po->tp_reserve the result can overflow. Fix by checking that tp_reserve <= INT_MAX on assign. Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30net/packet: fix overflow in check for tp_frame_nrAndrey Konovalov
When calculating rb->frames_per_block * req->tp_block_nr the result can overflow. Add a check that tp_block_size * tp_block_nr <= UINT_MAX. Since frames_per_block <= tp_block_size, the expression would never overflow. Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30net/packet: fix overflow in check for priv area sizeAndrey Konovalov
Subtracting tp_sizeof_priv from tp_block_size and casting to int to check whether one is less then the other doesn't always work (both of them are unsigned ints). Compare them as is instead. Also cast tp_sizeof_priv to u64 before using BLK_PLUS_PRIV, as it can overflow inside BLK_PLUS_PRIV otherwise. Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains a rather large update with Netfilter fixes, specifically targeted to incorrect RCU usage in several spots and the userspace conntrack helper infrastructure (nfnetlink_cthelper), more specifically they are: 1) expect_class_max is incorrect set via cthelper, as in kernel semantics mandate that this represents the array of expectation classes minus 1. Patch from Liping Zhang. 2) Expectation policy updates via cthelper are currently broken for several reasons: This code allows illegal changes in the policy such as changing the number of expeciation classes, it is leaking the updated policy and such update occurs with no RCU protection at all. Fix this by adding a new nfnl_cthelper_update_policy() that describes what is really legal on the update path. 3) Fix several memory leaks in cthelper, from Jeffy Chen. 4) synchronize_rcu() is missing in the removal path of several modules, this may lead to races since CPU may still be running on code that has just gone. Also from Liping Zhang. 5) Don't use the helper hashtable from cthelper, it is not safe to walk over those bits without the helper mutex. Fix this by introducing a new independent list for userspace helpers. From Liping Zhang. 6) nf_ct_extend_unregister() needs synchronize_rcu() to make sure no packets are walking on any conntrack extension that is gone after module removal, again from Liping. 7) nf_nat_snmp may crash if we fail to unregister the helper due to accidental leftover code, from Gao Feng. 8) Fix leak in nfnetlink_queue with secctx support, from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29Merge branch 'apw' (xfrm_user fixes)Linus Torvalds
Merge xfrm_user validation fixes from Andy Whitcroft: "Two patches we are applying to Ubuntu for XFRM_MSG_NEWAE validation issue reported by ZDI. The first of these is the primary fix, and the second is for a more theoretical issue that Kees pointed out when reviewing the first" * emailed patches from Andy Whitcroft <apw@canonical.com>: xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
2017-03-29l2tp: purge socket queues in the .destruct() callbackGuillaume Nault
The Rx path may grab the socket right before pppol2tp_release(), but nothing guarantees that it will enqueue packets before skb_queue_purge(). Therefore, the socket can be destroyed without its queues fully purged. Fix this by purging queues in pppol2tp_session_destruct() where we're guaranteed nothing is still referencing the socket. Fixes: 9e9cb6221aa7 ("l2tp: fix userspace reception on plain L2TP sockets") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29l2tp: hold tunnel socket when handling control frames in l2tp_ip and l2tp_ip6Guillaume Nault
The code following l2tp_tunnel_find() expects that a new reference is held on sk. Either sk_receive_skb() or the discard_put error path will drop a reference from the tunnel's socket. This issue exists in both l2tp_ip and l2tp_ip6. Fixes: a3c18422a4b4 ("l2tp: hold socket before dropping lock in l2tp_ip{, 6}_recv()") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harderAndy Whitcroft
Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to wrapping issues. To ensure we are correctly ensuring that the two ESN structures are the same size compare both the overall size as reported by xfrm_replay_state_esn_len() and the internal length are the same. CVE-2017-7184 Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_windowAndy Whitcroft
When a new xfrm state is created during an XFRM_MSG_NEWSA call we validate the user supplied replay_esn to ensure that the size is valid and to ensure that the replay_window size is within the allocated buffer. However later it is possible to update this replay_esn via a XFRM_MSG_NEWAE call. There we again validate the size of the supplied buffer matches the existing state and if so inject the contents. We do not at this point check that the replay_window is within the allocated memory. This leads to out-of-bounds reads and writes triggered by netlink packets. This leads to memory corruption and the potential for priviledge escalation. We already attempt to validate the incoming replay information in xfrm_new_ae() via xfrm_replay_verify_len(). This confirms that the user is not trying to change the size of the replay state buffer which includes the replay_esn. It however does not check the replay_window remains within that buffer. Add validation of the contained replay_window. CVE-2017-7184 Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29mac80211: unconditionally start new netdev queues with iTXQ supportJohannes Berg
When internal mac80211 TXQs aren't supported, netdev queues must always started out started even when driver queues are stopped while the interface is added. This is necessary because with the internal TXQ support netdev queues are never stopped and packet scheduling/dropping is done in mac80211. Cc: stable@vger.kernel.org # 4.9+ Fixes: 80a83cfc434b1 ("mac80211: skip netdev queue control with software queuing") Reported-and-tested-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-29netfilter: nfnetlink_queue: fix secctx memory leakLiping Zhang
We must call security_release_secctx to free the memory returned by security_secid_to_secctx, otherwise memory may be leaked forever. Fixes: ef493bd930ae ("netfilter: nfnetlink_queue: add security context information") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-29cfg80211: check rdev resume callback only for registered wiphyArend Van Spriel
We got the following use-after-free KASAN report: BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211] at addr ffff8803fc244090 Read of size 8 by task kworker/u16:24/2587 CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+ Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016 Workqueue: events_unbound async_run_entry_fn ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088 ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080 ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9 Call Trace: [<ffffffffaeedb541>] dump_stack+0x85/0xc4 [<ffffffffae84d7a1>] kasan_object_err+0x21/0x70 [<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500 [<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211] [<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211] [<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211] [<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70 [<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211] [<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211] [<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211] [<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211] [<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0 [<ffffffffaf3b31b2>] device_resume+0x1c2/0x670 [<ffffffffaf3b367d>] async_resume+0x1d/0x50 [<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610 [<ffffffffae3d0666>] process_one_work+0x716/0x1a50 [<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50 [<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60 [<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0 [<ffffffffae3d1a80>] worker_thread+0xe0/0x1460 [<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50 [<ffffffffae3e54c2>] kthread+0x222/0x2e0 [<ffffffffae3e52a0>] ? kthread_park+0x80/0x80 [<ffffffffae3e52a0>] ? kthread_park+0x80/0x80 [<ffffffffae3e52a0>] ? kthread_park+0x80/0x80 [<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40 Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024 Allocated: PID = 71 save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x12/0x20 __kmalloc_track_caller+0x134/0x360 kmemdup+0x20/0x50 brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac] brcmf_bus_start+0x19a/0x9a0 [brcmfmac] brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac] brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac] request_firmware_work_func+0x135/0x280 process_one_work+0x716/0x1a50 worker_thread+0xe0/0x1460 kthread+0x222/0x2e0 ret_from_fork+0x2a/0x40 Freed: PID = 2568 save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 kasan_slab_free+0x71/0xb0 kfree+0xe8/0x2e0 brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac] brcmf_detach+0x14a/0x2b0 [brcmfmac] brcmf_pcie_remove+0x140/0x5d0 [brcmfmac] brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac] pci_pm_resume+0x186/0x220 dpm_run_callback+0x6e/0x4f0 device_resume+0x1c2/0x670 async_resume+0x1d/0x50 async_run_entry_fn+0xfe/0x610 process_one_work+0x716/0x1a50 worker_thread+0xe0/0x1460 kthread+0x222/0x2e0 ret_from_fork+0x2a/0x40 Memory state around the buggy address: ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb What is happening is that brcmf_pcie_resume() detects a device that is no longer responsive and it decides to unbind resulting in a wiphy_unregister() and wiphy_free() call. Now the wiphy instance remains allocated, because PM needs to call wiphy_resume() for it. However, brcmfmac already does a kfree() for the struct cfg80211_registered_device::ops field. Change the checks in wiphy_resume() to only access the struct cfg80211_registered_device::ops if the wiphy instance is still registered at this time. Cc: stable@vger.kernel.org # 4.10.x, 4.9.x Reported-by: Daniel J Blueman <daniel@quora.org> Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-28openvswitch: Fix refcount leak on force commit.Jarno Rajahalme
The reference count held for skb needs to be released when the skb's nfct pointer is cleared regardless of if nf_ct_delete() is called or not. Failing to release the skb's reference cound led to deferred conntrack cleanup spinning forever within nf_conntrack_cleanup_net_list() when cleaning up a network namespace:    kworker/u16:0-19025 [004] 45981067.173642: sched_switch: kworker/u16:0:19025 [120] R ==> rcu_preempt:7 [120]    kworker/u16:0-19025 [004] 45981067.173651: kernel_stack: <stack trace> => ___preempt_schedule (ffffffffa001ed36) => _raw_spin_unlock_bh (ffffffffa0713290) => nf_ct_iterate_cleanup (ffffffffc00a4454) => nf_conntrack_cleanup_net_list (ffffffffc00a5e1e) => nf_conntrack_pernet_exit (ffffffffc00a63dd) => ops_exit_list.isra.1 (ffffffffa06075f3) => cleanup_net (ffffffffa0607df0) => process_one_work (ffffffffa0084c31) => worker_thread (ffffffffa008592b) => kthread (ffffffffa008bee2) => ret_from_fork (ffffffffa071b67c) Fixes: dd41d33f0b03 ("openvswitch: Add force commit.") Reported-by: Yang Song <yangsong@vmware.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28svcrdma: set XPT_CONG_CTRL flag for bc xprtChuck Lever
Same change as Kinglong Mee's fix for the TCP backchannel service. Fixes: 5283b03ee5cd ("nfs/nfsd/sunrpc: enforce transport...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-03-28sctp: change to save MSG_MORE flag into assocXin Long
David Laight noticed the support for MSG_MORE with datamsg->force_delay didn't really work as we expected, as the first msg with MSG_MORE set would always block the following chunks' dequeuing. This Patch is to rewrite it by saving the MSG_MORE flag into assoc as David Laight suggested. asoc->force_delay is used to save MSG_MORE flag before a msg is sent. All chunks in queue would not be sent out if asoc->force_delay is set by the msg with MSG_MORE flag, until a new msg without MSG_MORE flag clears asoc->force_delay. Note that this change would not affect the flush is generated by other triggers, like asoc->state != ESTABLISHED, queue size > pmtu etc. v1->v2: Not clear asoc->force_delay after sending the msg with MSG_MORE flag. Fixes: 4ea0c32f5f42 ("sctp: add support for MSG_MORE") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: David Laight <david.laight@aculab.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27net: ipconfig: fix ic_close_devs() use-after-freeMark Rutland
Our chosen ic_dev may be anywhere in our list of ic_devs, and we may free it before attempting to close others. When we compare d->dev and ic_dev->dev, we're potentially dereferencing memory returned to the allocator. This causes KASAN to scream for each subsequent ic_dev we check. As there's a 1-1 mapping between ic_devs and netdevs, we can instead compare d and ic_dev directly, which implicitly handles the !ic_dev case, and avoids the use-after-free. The ic_dev pointer may be stale, but we will not dereference it. Original splat: [ 6.487446] ================================================================== [ 6.494693] BUG: KASAN: use-after-free in ic_close_devs+0xc4/0x154 at addr ffff800367efa708 [ 6.503013] Read of size 8 by task swapper/0/1 [ 6.507452] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc3-00002-gda42158 #8 [ 6.514993] Hardware name: AppliedMicro Mustang/Mustang, BIOS 3.05.05-beta_rc Jan 27 2016 [ 6.523138] Call trace: [ 6.525590] [<ffff200008094778>] dump_backtrace+0x0/0x570 [ 6.530976] [<ffff200008094d08>] show_stack+0x20/0x30 [ 6.536017] [<ffff200008bee928>] dump_stack+0x120/0x188 [ 6.541231] [<ffff20000856d5e4>] kasan_object_err+0x24/0xa0 [ 6.546790] [<ffff20000856d924>] kasan_report_error+0x244/0x738 [ 6.552695] [<ffff20000856dfec>] __asan_report_load8_noabort+0x54/0x80 [ 6.559204] [<ffff20000aae86ac>] ic_close_devs+0xc4/0x154 [ 6.564590] [<ffff20000aaedbac>] ip_auto_config+0x2ed4/0x2f1c [ 6.570321] [<ffff200008084b04>] do_one_initcall+0xcc/0x370 [ 6.575882] [<ffff20000aa31de8>] kernel_init_freeable+0x5f8/0x6c4 [ 6.581959] [<ffff20000a16df00>] kernel_init+0x18/0x190 [ 6.587171] [<ffff200008084710>] ret_from_fork+0x10/0x40 [ 6.592468] Object at ffff800367efa700, in cache kmalloc-128 size: 128 [ 6.598969] Allocated: [ 6.601324] PID = 1 [ 6.603427] save_stack_trace_tsk+0x0/0x418 [ 6.607603] save_stack_trace+0x20/0x30 [ 6.611430] kasan_kmalloc+0xd8/0x188 [ 6.615087] ip_auto_config+0x8c4/0x2f1c [ 6.619002] do_one_initcall+0xcc/0x370 [ 6.622832] kernel_init_freeable+0x5f8/0x6c4 [ 6.627178] kernel_init+0x18/0x190 [ 6.630660] ret_from_fork+0x10/0x40 [ 6.634223] Freed: [ 6.636233] PID = 1 [ 6.638334] save_stack_trace_tsk+0x0/0x418 [ 6.642510] save_stack_trace+0x20/0x30 [ 6.646337] kasan_slab_free+0x88/0x178 [ 6.650167] kfree+0xb8/0x478 [ 6.653131] ic_close_devs+0x130/0x154 [ 6.656875] ip_auto_config+0x2ed4/0x2f1c [ 6.660875] do_one_initcall+0xcc/0x370 [ 6.664705] kernel_init_freeable+0x5f8/0x6c4 [ 6.669051] kernel_init+0x18/0x190 [ 6.672534] ret_from_fork+0x10/0x40 [ 6.676098] Memory state around the buggy address: [ 6.680880] ffff800367efa600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 6.688078] ffff800367efa680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 6.695276] >ffff800367efa700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 6.702469] ^ [ 6.705952] ffff800367efa780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 6.713149] ffff800367efa800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 6.720343] ================================================================== [ 6.727536] Disabling lock debugging due to kernel taint Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: David S. Miller <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to registerGao Feng
In the commit 93557f53e1fb ("netfilter: nf_conntrack: nf_conntrack snmp helper"), the snmp_helper is replaced by nf_nat_snmp_hook. So the snmp_helper is never registered. But it still tries to unregister the snmp_helper, it could cause the panic. Now remove the useless snmp_helper and the unregister call in the error handler. Fixes: 93557f53e1fb ("netfilter: nf_conntrack: nf_conntrack snmp helper") Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-27netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregisterLiping Zhang
If one cpu is doing nf_ct_extend_unregister while another cpu is doing __nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover, there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to NULL, so it's possible that we may access invalid pointer. But actually, most of the ct extends are built-in, so the problem listed above will not happen. However, there are two exceptions: NF_CT_EXT_NAT and NF_CT_EXT_SYNPROXY. For _EXT_NAT, the panic will not happen, since adding the nat extend and unregistering the nat extend are located in the same file(nf_nat_core.c), this means that after the nat module is removed, we cannot add the nat extend too. For _EXT_SYNPROXY, synproxy extend may be added by init_conntrack, while synproxy extend unregister will be done by synproxy_core_exit. So after nf_synproxy_core.ko is removed, we may still try to add the synproxy extend, then kernel panic may happen. I know it's very hard to reproduce this issue, but I can play a tricky game to make it happen very easily :) Step 1. Enable SYNPROXY for tcp dport 1234 at FORWARD hook: # iptables -I FORWARD -p tcp --dport 1234 -j SYNPROXY Step 2. Queue the syn packet to the userspace at raw table OUTPUT hook. Also note, in the userspace we only add a 20s' delay, then reinject the syn packet to the kernel: # iptables -t raw -I OUTPUT -p tcp --syn -j NFQUEUE --queue-num 1 Step 3. Using "nc 2.2.2.2 1234" to connect the server. Step 4. Now remove the nf_synproxy_core.ko quickly: # iptables -F FORWARD # rmmod ipt_SYNPROXY # rmmod nf_synproxy_core Step 5. After 20s' delay, the syn packet is reinjected to the kernel. Now you will see the panic like this: kernel BUG at net/netfilter/nf_conntrack_extend.c:91! Call Trace: ? __nf_ct_ext_add_length+0x53/0x3c0 [nf_conntrack] init_conntrack+0x12b/0x600 [nf_conntrack] nf_conntrack_in+0x4cc/0x580 [nf_conntrack] ipv4_conntrack_local+0x48/0x50 [nf_conntrack_ipv4] nf_reinject+0x104/0x270 nfqnl_recv_verdict+0x3e1/0x5f9 [nfnetlink_queue] ? nfqnl_recv_verdict+0x5/0x5f9 [nfnetlink_queue] ? nla_parse+0xa0/0x100 nfnetlink_rcv_msg+0x175/0x6a9 [nfnetlink] [...] One possible solution is to make NF_CT_EXT_SYNPROXY extend built-in, i.e. introduce nf_conntrack_synproxy.c and only do ct extend register and unregister in it, similar to nf_conntrack_timeout.c. But having such a obscure restriction of nf_ct_extend_unregister is not a good idea, so we should invoke synchronize_rcu after set nf_ct_ext_types to NULL, and check the NULL pointer when do __nf_ct_ext_add_length. Then it will be easier if we add new ct extend in the future. Last, we use kfree_rcu to free nf_ct_ext, so rcu_barrier() is unnecessary anymore, remove it too. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-27netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash tableLiping Zhang
The nf_ct_helper_hash table is protected by nf_ct_helper_mutex, while nfct_helper operation is protected by nfnl_lock(NFNL_SUBSYS_CTHELPER). So it's possible that one CPU is walking the nf_ct_helper_hash for cthelper add/get/del, another cpu is doing nf_conntrack_helpers_unregister at the same time. This is dangrous, and may cause use after free error. Note, delete operation will flush all cthelpers added via nfnetlink, so using rcu to do protect is not easy. Now introduce a dummy list to record all the cthelpers added via nfnetlink, then we can walk the dummy list instead of walking the nf_ct_helper_hash. Also, keep nfnl_cthelper_dump_table unchanged, it may be invoked without nfnl_lock(NFNL_SUBSYS_CTHELPER) held. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-27netfilter: invoke synchronize_rcu after set the _hook_ to NULLLiping Zhang
Otherwise, another CPU may access the invalid pointer. For example: CPU0 CPU1 - rcu_read_lock(); - pfunc = _hook_; _hook_ = NULL; - mod unload - - pfunc(); // invalid, panic - rcu_read_unlock(); So we must call synchronize_rcu() to wait the rcu reader to finish. Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked by later nf_conntrack_helper_unregister, but I'm inclined to add a explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend on such obscure assumptions is not a good idea. Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object, so in cttimeout_exit, invoking rcu_barrier() is not necessary at all, remove it too. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-24ping: implement proper lockingEric Dumazet
We got a report of yet another bug in ping http://www.openwall.com/lists/oss-security/2017/03/24/6 ->disconnect() is not called with socket lock held. Fix this by acquiring ping rwlock earlier. Thanks to Daniel, Alexander and Andrey for letting us know this problem. Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Daniel Jiang <danieljiang0415@gmail.com> Reported-by: Solar Designer <solar@openwall.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24secure_seq: downgrade to per-host timestamp offsetsFlorian Westphal
Unfortunately too many devices (not under our control) use tcp_tw_recycle=1, which depends on timestamps being identical of the same saddr. Although tcp_tw_recycle got removed in net-next we can't make such end hosts disappear so downgrade to per-host timestamp offsets. Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Reported-by: Yvan Vanrossomme <yvan@vanrossomme.net> Fixes: 95a22caee396c ("tcp: randomize tcp timestamp offsets for each connection") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Do not allow negative values for busy_read and busy_poll sysctl interfacesAlexander Duyck
This change basically codifies what I think was already the limitations on the busy_poll and busy_read sysctl interfaces. We weren't checking the lower bounds and as such could input negative values. The behavior when that was used was dependent on the architecture. In order to prevent any issues with that I am just disabling support for values less than 0 since this way we don't have to worry about any odd behaviors. By limiting the sysctl values this way it also makes it consistent with how we handle the SO_BUSY_POLL socket option since the value appears to be reported as a signed integer value and negative values are rejected. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24Merge tag 'ceph-for-4.11-rc4' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fix from Ilya Dryomov: "A fix for a writeback deadlock caused by a GFP_KERNEL allocation on the reclaim path, tagged for stable" * tag 'ceph-for-4.11-rc4' of git://github.com/ceph/ceph-client: libceph: force GFP_NOIO for socket allocations
2017-03-24kcm: return immediately after copy_from_user() failureWANG Cong
There is no reason to continue after a copy_from_user() failure. Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23net: neigh: guard against NULL solicit() methodEric Dumazet
Dmitry posted a nice reproducer of a bug triggering in neigh_probe() when dereferencing a NULL neigh->ops->solicit method. This can happen for arp_direct_ops/ndisc_direct_ops and similar, which can be used for NUD_NOARP neighbours (created when dev->header_ops is NULL). Admin can then force changing nud_state to some other state that would fire neigh timer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Several netfilter fixes from Pablo and the crew: - Handle fragmented packets properly in netfilter conntrack, from Florian Westphal. - Fix SCTP ICMP packet handling, from Ying Xue. - Fix big-endian bug in nftables, from Liping Zhang. - Fix alignment of fake conntrack entry, from Steven Rostedt. 2) Fix feature flags setting in fjes driver, from Taku Izumi. 3) Openvswitch ipv6 tunnel source address not set properly, from Or Gerlitz. 4) Fix jumbo MTU handling in amd-xgbe driver, from Thomas Lendacky. 5) sk->sk_frag.page not released properly in some cases, from Eric Dumazet. 6) Fix RTNL deadlocks in nl80211, from Johannes Berg. 7) Fix erroneous RTNL lockdep splat in crypto, from Herbert Xu. 8) Cure improper inflight handling during AF_UNIX GC, from Andrey Ulanov. 9) sch_dsmark doesn't write to packet headers properly, from Eric Dumazet. 10) Fix SCM_TIMESTAMPING_OPT_STATS handling in TCP, from Soheil Hassas Yeganeh. 11) Add some IDs for Motorola qmi_wwan chips, from Tony Lindgren. 12) Fix nametbl deadlock in tipc, from Ying Xue. 13) GRO and LRO packets not counted correctly in mlx5 driver, from Gal Pressman. 14) Fix reset of internal PHYs in bcmgenet, from Doug Berger. 15) Fix hashmap allocation handling, from Alexei Starovoitov. 16) nl_fib_input() needs stronger netlink message length checking, from Eric Dumazet. 17) Fix double-free of sk->sk_filter during sock clone, from Daniel Borkmann. 18) Fix RX checksum offloading in aquantia driver, from Pavel Belous. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits) net:ethernet:aquantia: Fix for RX checksum offload. amd-xgbe: Fix the ECC-related bit position definitions sfc: cleanup a condition in efx_udp_tunnel_del() Bluetooth: btqcomsmd: fix compile-test dependency inet: frag: release spinlock before calling icmp_send() tcp: initialize icsk_ack.lrcvtime at session start time genetlink: fix counting regression on ctrl_dumpfamily() socket, bpf: fix sk_filter use after free in sk_clone_lock ipv4: provide stronger user input validation in nl_fib_input() bpf: fix hashmap extra_elems logic enic: update enic maintainers net: bcmgenet: remove bcmgenet_internal_phy_setup() ipv6: make sure to initialize sockc.tsflags before first use fjes: Do not load fjes driver if extended socket device is not power on. fjes: Do not load fjes driver if system does not have extended socket device. net/mlx5e: Count LRO packets correctly net/mlx5e: Count GSO packets correctly net/mlx5: Increase number of max QPs in default profile net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps net/mlx5e: Use the proper UAPI values when offloading TC vlan actions ...
2017-03-23libceph: force GFP_NOIO for socket allocationsIlya Dryomov
sock_alloc_inode() allocates socket+inode and socket_wq with GFP_KERNEL, which is not allowed on the writeback path: Workqueue: ceph-msgr con_work [libceph] ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000 0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00 ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148 Call Trace: [<ffffffff816dd629>] schedule+0x29/0x70 [<ffffffff816e066d>] schedule_timeout+0x1bd/0x200 [<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120 [<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70 [<ffffffff816deb5f>] wait_for_completion+0xbf/0x180 [<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390 [<ffffffff81086335>] flush_work+0x165/0x250 [<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0 [<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs] [<ffffffff816d6b42>] ? __slab_free+0xee/0x234 [<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs] [<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30 [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs] [<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs] [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs] [<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs] [<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40 [<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs] [<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs] [<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs] [<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs] [<ffffffff811c0c18>] super_cache_scan+0x178/0x180 [<ffffffff8115912e>] shrink_slab_node+0x14e/0x340 [<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450 [<ffffffff8115af70>] shrink_slab+0x100/0x140 [<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490 [<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0 [<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be [<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40 [<ffffffff8119743e>] alloc_pages_current+0x9e/0x110 [<ffffffff811a0ac5>] new_slab+0x2c5/0x390 [<ffffffff816d71c4>] __slab_alloc+0x33b/0x459 [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0 [<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0 [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0 [<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0 [<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0 [<ffffffff811d8566>] alloc_inode+0x26/0xa0 [<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70 [<ffffffff815b933e>] sock_alloc+0x1e/0x80 [<ffffffff815ba855>] __sock_create+0x95/0x220 [<ffffffff815baa04>] sock_create_kern+0x24/0x30 [<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph] [<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd] [<ffffffff81084c19>] process_one_work+0x159/0x4f0 [<ffffffff8108561b>] worker_thread+0x11b/0x530 [<ffffffff81085500>] ? create_worker+0x1d0/0x1d0 [<ffffffff8108b6f9>] kthread+0xc9/0xe0 [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90 [<ffffffff816e1b98>] ret_from_fork+0x58/0x90 [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90 Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here. Cc: stable@vger.kernel.org # 3.10+, needs backporting Link: http://tracker.ceph.com/issues/19309 Reported-by: Sergey Jerusalimov <wintchester@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2017-03-22inet: frag: release spinlock before calling icmp_send()Eric Dumazet
Dmitry reported a lockdep splat [1] (false positive) that we can fix by releasing the spinlock before calling icmp_send() from ip_expire() This is a false positive because sending an ICMP message can not possibly re-enter the IP frag engine. [1] [ INFO: possible circular locking dependency detected ] 4.10.0+ #29 Not tainted ------------------------------------------------------- modprobe/12392 is trying to acquire lock: (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] spin_lock include/linux/spinlock.h:299 [inline] (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] __netif_tx_lock include/linux/netdevice.h:3486 [inline] (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 but task is already holding lock: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock include/linux/spinlock.h:299 [inline] (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&q->lock)->rlock){+.-...}: validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] ip_defrag+0x3a2/0x4130 net/ipv4/ip_fragment.c:669 ip_check_defrag+0x4e3/0x8b0 net/ipv4/ip_fragment.c:713 packet_rcv_fanout+0x282/0x800 net/packet/af_packet.c:1459 deliver_skb net/core/dev.c:1834 [inline] dev_queue_xmit_nit+0x294/0xa90 net/core/dev.c:1890 xmit_one net/core/dev.c:2903 [inline] dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2923 sch_direct_xmit+0x31f/0x6d0 net/sched/sch_generic.c:182 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_resolve_output+0x6b9/0xb10 net/core/neighbour.c:1308 neigh_output include/net/neighbour.h:478 [inline] ip_finish_output2+0x8b8/0x15a0 net/ipv4/ip_output.c:228 ip_do_fragment+0x1d93/0x2720 net/ipv4/ip_output.c:672 ip_fragment.constprop.54+0x145/0x200 net/ipv4/ip_output.c:545 ip_finish_output+0x82d/0xe10 net/ipv4/ip_output.c:314 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 raw_sendmsg+0x26de/0x3a00 net/ipv4/raw.c:655 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 ___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985 __sys_sendmmsg+0x25c/0x750 net/socket.c:2075 SYSC_sendmmsg net/socket.c:2106 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2101 do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281 return_from_SYSCALL_64+0x0/0x7a -> #0 (_xmit_ETHER#2){+.-...}: check_prev_add kernel/locking/lockdep.c:1830 [inline] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940 validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] __netif_tx_lock include/linux/netdevice.h:3486 [inline] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_hh_output include/net/neighbour.h:468 [inline] neigh_output include/net/neighbour.h:476 [inline] ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228 ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394 icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754 ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239 call_timer_fn+0x241/0x820 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 [inline] __run_timers+0x960/0xcf0 kernel/time/timer.c:1601 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:657 [inline] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 __read_once_size include/linux/compiler.h:254 [inline] atomic_read arch/x86/include/asm/atomic.h:26 [inline] rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline] __rcu_is_watching kernel/rcu/tree.c:1133 [inline] rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147 rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293 radix_tree_deref_slot include/linux/radix-tree.h:238 [inline] filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335 do_fault_around mm/memory.c:3231 [inline] do_read_fault mm/memory.c:3265 [inline] do_fault+0xbd5/0x2080 mm/memory.c:3370 handle_pte_fault mm/memory.c:3600 [inline] __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714 handle_mm_fault+0x1e2/0x480 mm/memory.c:3751 __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397 do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460 page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&q->lock)->rlock); lock(_xmit_ETHER#2); lock(&(&q->lock)->rlock); lock(_xmit_ETHER#2); *** DEADLOCK *** 10 locks held by modprobe/12392: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff81329758>] __do_page_fault+0x2b8/0xb60 arch/x86/mm/fault.c:1336 #1: (rcu_read_lock){......}, at: [<ffffffff8188cab6>] filemap_map_pages+0x1e6/0x1570 mm/filemap.c:2324 #2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] spin_lock include/linux/spinlock.h:299 [inline] #2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] pte_alloc_one_map mm/memory.c:2944 [inline] #2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] alloc_set_pte+0x13b8/0x1b90 mm/memory.c:3072 #3: (((&q->timer))){+.-...}, at: [<ffffffff81627e72>] lockdep_copy_map include/linux/lockdep.h:175 [inline] #3: (((&q->timer))){+.-...}, at: [<ffffffff81627e72>] call_timer_fn+0x1c2/0x820 kernel/time/timer.c:1258 #4: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock include/linux/spinlock.h:299 [inline] #4: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201 #5: (rcu_read_lock){......}, at: [<ffffffff8389a633>] ip_expire+0x1b3/0x6c0 net/ipv4/ip_fragment.c:216 #6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] spin_trylock include/linux/spinlock.h:309 [inline] #6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_xmit_lock net/ipv4/icmp.c:219 [inline] #6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_send+0x803/0x1c80 net/ipv4/icmp.c:681 #7: (rcu_read_lock_bh){......}, at: [<ffffffff838ab9a1>] ip_finish_output2+0x2c1/0x15a0 net/ipv4/ip_output.c:198 #8: (rcu_read_lock_bh){......}, at: [<ffffffff836d1dee>] __dev_queue_xmit+0x23e/0x1e60 net/core/dev.c:3324 #9: (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at: [<ffffffff836d3a27>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 stack backtrace: CPU: 0 PID: 12392 Comm: modprobe Not tainted 4.10.0+ #29 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:52 print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204 check_prev_add kernel/locking/lockdep.c:1830 [inline] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940 validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] __netif_tx_lock include/linux/netdevice.h:3486 [inline] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_hh_output include/net/neighbour.h:468 [inline] neigh_output include/net/neighbour.h:476 [inline] ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228 ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394 icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754 ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239 call_timer_fn+0x241/0x820 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 [inline] __run_timers+0x960/0xcf0 kernel/time/timer.c:1601 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:657 [inline] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline] RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline] RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline] RIP: 0010:__rcu_is_watching kernel/rcu/tree.c:1133 [inline] RIP: 0010:rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147 RSP: 0000:ffff8801c391f120 EFLAGS: 00000a03 ORIG_RAX: ffffffffffffff10 RAX: dffffc0000000000 RBX: ffff8801c391f148 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000055edd4374000 RDI: ffff8801dbe1ae0c RBP: ffff8801c391f1a0 R08: 0000000000000002 R09: 0000000000000000 R10: dffffc0000000000 R11: 0000000000000002 R12: 1ffff10038723e25 R13: ffff8801dbe1ae00 R14: ffff8801c391f680 R15: dffffc0000000000 </IRQ> rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293 radix_tree_deref_slot include/linux/radix-tree.h:238 [inline] filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335 do_fault_around mm/memory.c:3231 [inline] do_read_fault mm/memory.c:3265 [inline] do_fault+0xbd5/0x2080 mm/memory.c:3370 handle_pte_fault mm/memory.c:3600 [inline] __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714 handle_mm_fault+0x1e2/0x480 mm/memory.c:3751 __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397 do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460 page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011 RIP: 0033:0x7f83172f2786 RSP: 002b:00007fffe859ae80 EFLAGS: 00010293 RAX: 000055edd4373040 RBX: 00007f83175111c8 RCX: 000055edd4373238 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f8317510970 RBP: 00007fffe859afd0 R08: 0000000000000009 R09: 0000000000000000 R10: 0000000000000064 R11: 0000000000000000 R12: 000055edd4373040 R13: 0000000000000000 R14: 00007fffe859afe8 R15: 0000000000000000 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22tcp: initialize icsk_ack.lrcvtime at session start timeEric Dumazet
icsk_ack.lrcvtime has a 0 value at socket creation time. tcpi_last_data_recv can have bogus value if no payload is ever received. This patch initializes icsk_ack.lrcvtime for active sessions in tcp_finish_connect(), and for passive sessions in tcp_create_openreq_child() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22genetlink: fix counting regression on ctrl_dumpfamily()Stanislaw Gruszka
Commit 2ae0f17df1cd ("genetlink: use idr to track families") replaced if (++n < fams_to_skip) continue; into: if (n++ < fams_to_skip) continue; This subtle change cause that on retry ctrl_dumpfamily() call we omit one family that failed to do ctrl_fill_info() on previous call, because cb->args[0] = n number counts also family that failed to do ctrl_fill_info(). Patch fixes the problem and avoid confusion in the future just decrease n counter when ctrl_fill_info() fail. User visible problem caused by this bug is failure to get access to some genetlink family i.e. nl80211. However problem is reproducible only if number of registered genetlink families is big enough to cause second call of ctrl_dumpfamily(). Cc: Xose Vazquez Perez <xose.vazquez@gmail.com> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: Johannes Berg <johannes@sipsolutions.net> Fixes: 2ae0f17df1cd ("genetlink: use idr to track families") Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>