summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-06-14net: don't global ICMP rate limit packets originating from loopbackJesper Dangaard Brouer
Florian Weimer seems to have a glibc test-case which requires that loopback interfaces does not get ICMP ratelimited. This was broken by commit c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited"). An ICMP response will usually be routed back-out the same incoming interface. Thus, take advantage of this and skip global ICMP ratelimit when the incoming device is loopback. In the unlikely event that the outgoing it not loopback, due to strange routing policy rules, ICMP rate limiting still works via peer ratelimiting via icmpv4_xrlim_allow(). Thus, we should still comply with RFC1812 (section 4.3.2.8 "Rate Limiting"). This seems to fix the reproducer given by Florian. While still avoiding to perform expensive and unneeded outgoing route lookup for rate limited packets (in the non-loopback case). Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited") Reported-by: Florian Weimer <fweimer@redhat.com> Reported-by: "H.J. Lu" <hjl.tools@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14net/act_pedit: fix an error codeDan Carpenter
I'm reviewing static checker warnings where we do ERR_PTR(0), which is the same as NULL. I'm pretty sure we intended to return ERR_PTR(-EINVAL) here. Sometimes these bugs lead to a NULL dereference but I don't immediately see that problem here. Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14net_sched: move tcf_lock down after gen_replace_estimator()WANG Cong
Laura reported a sleep-in-atomic kernel warning inside tcf_act_police_init() which calls gen_replace_estimator() with spinlock protection. It is not necessary in this case, we already have RTNL lock here so it is enough to protect concurrent writers. For the reader, i.e. tcf_act_police(), it needs to make decision based on this rate estimator, in the worst case we drop more/less packets than necessary while changing the rate in parallel, it is still acceptable. Reported-by: Laura Abbott <labbott@redhat.com> Reported-by: Nick Huber <nicholashuber@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13caif: Add sockaddr length check before accessing sa_family in connect handlerMateusz Jurczyk
Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in the connect() handler of the AF_CAIF socket. Since the syscall doesn't enforce a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing sa_family. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13Merge tag 'batadv-net-for-davem-20170613' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here are two batman-adv bugfixes: - fix rx packet counters for local ARP replies, by Sven Eckelmann - fix memory leaks for unicast packetes received from another gateway in bridge loop avoidance, by Andreas Pape ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13Merge tag 'mac80211-for-davem-2017-06-13' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Some fixes: * Avi fixes some fallout from my mac80211 RX flags changes * Emmanuel fixes an issue with adhering to the spec, and an oversight in the SMPS management code * Jason's patch makes mac80211 use constant-time memory comparisons for message authentication, to avoid having potentially observable timing differences * my fix makes mac80211 set the basic rates bitmap before the channel so the next update to the driver has more consistent data - this required another rework patch to remove some useless 5/10 MHz code that can never be hit ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13igmp: acquire pmc lock for ip_mc_clear_src()WANG Cong
Andrey reported a use-after-free in add_grec(): for (psf = *psf_list; psf; psf = psf_next) { ... psf_next = psf->sf_next; where the struct ip_sf_list's were already freed by: kfree+0xe8/0x2b0 mm/slub.c:3882 ip_mc_clear_src+0x69/0x1c0 net/ipv4/igmp.c:2078 ip_mc_dec_group+0x19a/0x470 net/ipv4/igmp.c:1618 ip_mc_drop_socket+0x145/0x230 net/ipv4/igmp.c:2609 inet_release+0x4e/0x1c0 net/ipv4/af_inet.c:411 sock_release+0x8d/0x1e0 net/socket.c:597 sock_close+0x16/0x20 net/socket.c:1072 This happens because we don't hold pmc->lock in ip_mc_clear_src() and a parallel mr_ifc_timer timer could jump in and access them. The RCU lock is there but it is merely for pmc itself, this spinlock could actually ensure we don't access them in parallel. Thanks to Eric and Long for discussion on this bug. Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13net: rps: fix uninitialized symbol warningAshwanth Goli
This patch fixes uninitialized symbol warning that got introduced by the following commit 773fc8f6e8d6 ("net: rps: send out pending IPI's on CPU hotplug") Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13mac80211: don't send SMPS action frame in AP mode when not neededEmmanuel Grumbach
mac80211 allows to modify the SMPS state of an AP both, when it is started, and after it has been started. Such a change will trigger an action frame to all the peers that are currently connected, and will be remembered so that new peers will get notified as soon as they connect (since the SMPS setting in the beacon may not be the right one). This means that we need to remember the SMPS state currently requested as well as the SMPS state that was configured initially (and advertised in the beacon). The former is bss->req_smps and the latter is sdata->smps_mode. Initially, the AP interface could only be started with SMPS_OFF, which means that sdata->smps_mode was SMPS_OFF always. Later, a nl80211 API was added to be able to start an AP with a different AP mode. That code forgot to update bss->req_smps and because of that, if the AP interface was started with SMPS_DYNAMIC, we had: sdata->smps_mode = SMPS_DYNAMIC bss->req_smps = SMPS_OFF That configuration made mac80211 think it needs to fire off an action frame to any new station connecting to the AP in order to let it know that the actual SMPS configuration is SMPS_OFF. Fix that by properly setting bss->req_smps in ieee80211_start_ap. Fixes: f69931748730 ("mac80211: set smps_mode according to ap params") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-06-13mac80211/wpa: use constant time memory comparison for MACsJason A. Donenfeld
Otherwise, we enable all sorts of forgeries via timing attack. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: linux-wireless@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-06-13mac80211: set bss_info data before configuring the channelJohannes Berg
When mac80211 changes the channel, it also calls into the driver's bss_info_changed() callback, e.g. with BSS_CHANGED_IDLE. The driver may, like iwlwifi does, access more data from bss_info in that case and iwlwifi accesses the basic_rates bitmap, but if changing from a band with more (basic) rates to one with fewer, an out-of-bounds access of the rate array may result. While we can't avoid having invalid data at some point in time, we can avoid having it while we call the driver - so set up all the data before configuring the channel, and then apply it afterwards. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=195677 Reported-by: Johannes Hirte <johannes.hirte@datenkhaos.de> Tested-by: Johannes Hirte <johannes.hirte@datenkhaos.de> Debugged-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-06-13mac80211: remove 5/10 MHz rate code from station MLMEJohannes Berg
There's no need for the station MLME code to handle bitrates for 5 or 10 MHz channels when it can't ever create such a configuration. Remove the unnecessary code. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-06-13mac80211: Fix incorrect condition when checking rx timestampAvraham Stern
If the driver reports the rx timestamp at PLCP start, mac80211 can only handle legacy encoding, but the code checks that the encoding is not legacy. Fix this. Fixes: da6a4352e7c8 ("mac80211: separate encoding/bandwidth from flags") Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-06-13mac80211: don't look at the PM bit of BAR framesEmmanuel Grumbach
When a peer sends a BAR frame with PM bit clear, we should not modify its PM state as madated by the spec in 802.11-20012 10.2.1.2. Cc: stable@vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-06-12hsr: fix incorrect warningKaricheri, Muralidharan
When HSR interface is setup using ip link command, an annoying warning appears with the trace as below:- [ 203.019828] hsr_get_node: Non-HSR frame [ 203.019833] Modules linked in: [ 203.019848] CPU: 0 PID: 158 Comm: sd-resolve Tainted: G W 4.12.0-rc3-00052-g9fa6bf70 #2 [ 203.019853] Hardware name: Generic DRA74X (Flattened Device Tree) [ 203.019869] [<c0110280>] (unwind_backtrace) from [<c010c2f4>] (show_stack+0x10/0x14) [ 203.019880] [<c010c2f4>] (show_stack) from [<c04b9f64>] (dump_stack+0xac/0xe0) [ 203.019894] [<c04b9f64>] (dump_stack) from [<c01374e8>] (__warn+0xd8/0x104) [ 203.019907] [<c01374e8>] (__warn) from [<c0137548>] (warn_slowpath_fmt+0x34/0x44) root@am57xx-evm:~# [ 203.019921] [<c0137548>] (warn_slowpath_fmt) from [<c081126c>] (hsr_get_node+0x148/0x170) [ 203.019932] [<c081126c>] (hsr_get_node) from [<c0814240>] (hsr_forward_skb+0x110/0x7c0) [ 203.019942] [<c0814240>] (hsr_forward_skb) from [<c0811d64>] (hsr_dev_xmit+0x2c/0x34) [ 203.019954] [<c0811d64>] (hsr_dev_xmit) from [<c06c0828>] (dev_hard_start_xmit+0xc4/0x3bc) [ 203.019963] [<c06c0828>] (dev_hard_start_xmit) from [<c06c13d8>] (__dev_queue_xmit+0x7c4/0x98c) [ 203.019974] [<c06c13d8>] (__dev_queue_xmit) from [<c0782f54>] (ip6_finish_output2+0x330/0xc1c) [ 203.019983] [<c0782f54>] (ip6_finish_output2) from [<c0788f0c>] (ip6_output+0x58/0x454) [ 203.019994] [<c0788f0c>] (ip6_output) from [<c07b16cc>] (mld_sendpack+0x420/0x744) As this is an expected path to hsr_get_node() with frame coming from the master interface, add a check to ensure packet is not from the master port and then warn. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-12proc: snmp6: Use correct type in memsetChristian Perle
Reading /proc/net/snmp6 yields bogus values on 32 bit kernels. Use "u64" instead of "unsigned long" in sizeof(). Fixes: 4a4857b1c81e ("proc: Reduce cache miss in snmp6_seq_show") Signed-off-by: Christian Perle <christian.perle@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-11net: ipmr: Fix some mroute forwarding issues in vrf'sDonald Sharp
This patch fixes two issues: 1) When forwarding on *,G mroutes that are in a vrf, the kernel was dropping information about the actual incoming interface when calling ip_mr_forward from ip_mr_input. This caused ip_mr_forward to send the multicast packet back out the incoming interface. Fix this by modifying ip_mr_forward to be handed the correctly resolved dev. 2) When a unresolved cache entry is created we store the incoming skb on the unresolved cache entry and upon mroute resolution from the user space daemon, we attempt to forward the packet. Again we were not resolving to the correct incoming device for a vrf scenario, before calling ip_mr_forward. Fix this by resolving to the correct interface and calling ip_mr_forward with the result. Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast") Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-10net: tipc: Fix a sleep-in-atomic bug in tipc_msg_reverseJia-Ju Bai
The kernel may sleep under a rcu read lock in tipc_msg_reverse, and the function call path is: tipc_l2_rcv_msg (acquire the lock by rcu_read_lock) tipc_rcv tipc_sk_rcv tipc_msg_reverse pskb_expand_head(GFP_KERNEL) --> may sleep tipc_node_broadcast tipc_node_xmit_skb tipc_node_xmit tipc_sk_rcv tipc_msg_reverse pskb_expand_head(GFP_KERNEL) --> may sleep To fix it, "GFP_KERNEL" is replaced with "GFP_ATOMIC". Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-10net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfxJia-Ju Bai
The kernel may sleep under a rcu read lock in cfpkt_create_pfx, and the function call path is: cfcnfg_linkup_rsp (acquire the lock by rcu_read_lock) cfctrl_linkdown_req cfpkt_create cfpkt_create_pfx alloc_skb(GFP_KERNEL) --> may sleep cfserl_receive (acquire the lock by rcu_read_lock) cfpkt_split cfpkt_create_pfx alloc_skb(GFP_KERNEL) --> may sleep There is "in_interrupt" in cfpkt_create_pfx to decide use "GFP_KERNEL" or "GFP_ATOMIC". In this situation, "GFP_KERNEL" is used because the function is called under a rcu read lock, instead in interrupt. To fix it, only "GFP_ATOMIC" is used in cfpkt_create_pfx. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-10sctp: disable BH in sctp_for_each_endpointXin Long
Now sctp holds read_lock when foreach sctp_ep_hashtable without disabling BH. If CPU schedules to another thread A at this moment, the thread A may be trying to hold the write_lock with disabling BH. As BH is disabled and CPU cannot schedule back to the thread holding the read_lock, while the thread A keeps waiting for the read_lock. A dead lock would be triggered by this. This patch is to fix this dead lock by calling read_lock_bh instead to disable BH when holding the read_lock in sctp_for_each_endpoint. Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-10l2tp: cast l2tp traffic counter to unsignedDominik Heidler
This fixes a counter problem on 32bit systems: When the rx_bytes counter reached 2 GiB, it jumpd to (2^64 Bytes - 2GiB) Bytes. rtnl_link_stats64 has __u64 type and atomic_long_read returns atomic_long_t which is signed. Due to the conversation we get an incorrect value on 32bit systems if the MSB of the atomic_long_t value is set. CC: Tom Parkin <tparkin@katalix.com> Fixes: 7b7c0719cd7a ("l2tp: avoid deadlock in l2tp stats update") Signed-off-by: Dominik Heidler <dheidler@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09Merge tag 'linux-can-fixes-for-4.12-20170609' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2017-06-09 this is a pull request of 6 patches for net/master. There's a patch by Stephane Grosjean that fixes an uninitialized symbol warning in the peak_canfd driver. A patch by Johan Hovold to fix the product-id endianness in an error message in the the peak_usb driver. A patch by Oliver Hartkopp to enable CAN FD for virtual CAN devices by default. Three patches by me, one makes the helper function can_change_state() robust to be called with cf == NULL. The next patch fixes a memory leak in the gs_usb driver. And the last one fixes a lockdep splat by properly initialize the per-net can_rcvlists_lock spin_lock. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09mac80211: free netdev on dev_alloc_name() errorJohannes Berg
The change to remove free_netdev() from ieee80211_if_free() erroneously didn't add the necessary free_netdev() for when ieee80211_if_free() is called directly in one place, rather than as the priv_destructor. Add the missing call. Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09net: rps: send out pending IPI's on CPU hotplugashwanth@codeaurora.org
IPI's from the victim cpu are not handled in dev_cpu_callback. So these pending IPI's would be sent to the remote cpu only when NET_RX is scheduled on the victim cpu and since this trigger is unpredictable it would result in packet latencies on the remote cpu. This patch add support to send the pending ipi's of victim cpu. Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09Fix an intermittent pr_emerg warning about lo becoming free.Krister Johansen
It looks like this: Message from syslogd@flamingo at Apr 26 00:45:00 ... kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4 They seem to coincide with net namespace teardown. The message is emitted by netdev_wait_allrefs(). Forced a kdump in netdev_run_todo, but found that the refcount on the lo device was already 0 at the time we got to the panic. Used bcc to check the blocking in netdev_run_todo. The only places where we're off cpu there are in the rcu_barrier() and msleep() calls. That behavior is expected. The msleep time coincides with the amount of time we spend waiting for the refcount to reach zero; the rcu_barrier() wait times are not excessive. After looking through the list of callbacks that the netdevice notifiers invoke in this path, it appears that the dst_dev_event is the most interesting. The dst_ifdown path places a hold on the loopback_dev as part of releasing the dev associated with the original dst cache entry. Most of our notifier callbacks are straight-forward, but this one a) looks complex, and b) places a hold on the network interface in question. I constructed a new bcc script that watches various events in the liftime of a dst cache entry. Note that dst_ifdown will take a hold on the loopback device until the invalidated dst entry gets freed. [ __dst_free] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282115677036183 __dst_free rcu_nocb_kthread kthread ret_from_fork Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09af_unix: Add sockaddr length checks before accessing sa_family in bind and ↵Mateusz Jurczyk
connect handlers Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in bind() and connect() handlers of the AF_UNIX socket. Since neither syscall enforces a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing .sa_family. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09can: af_can: namespace support: fix lockdep splat: properly initialize spin_lockMarc Kleine-Budde
This patch uses spin_lock_init() instead of __SPIN_LOCK_UNLOCKED() to initialize the per namespace net->can.can_rcvlists_lock lock to fix this lockdep warning: | INFO: trying to register non-static key. | the code is fine but needs lockdep annotation. | turning off the locking correctness validator. | CPU: 0 PID: 186 Comm: candump Not tainted 4.12.0-rc3+ #47 | Hardware name: Marvell Kirkwood (Flattened Device Tree) | [<c0016644>] (unwind_backtrace) from [<c00139a8>] (show_stack+0x18/0x1c) | [<c00139a8>] (show_stack) from [<c0058c8c>] (register_lock_class+0x1e4/0x55c) | [<c0058c8c>] (register_lock_class) from [<c005bdfc>] (__lock_acquire+0x148/0x1990) | [<c005bdfc>] (__lock_acquire) from [<c005deec>] (lock_acquire+0x174/0x210) | [<c005deec>] (lock_acquire) from [<c04a6780>] (_raw_spin_lock+0x50/0x88) | [<c04a6780>] (_raw_spin_lock) from [<bf02116c>] (can_rx_register+0x94/0x15c [can]) | [<bf02116c>] (can_rx_register [can]) from [<bf02a868>] (raw_enable_filters+0x60/0xc0 [can_raw]) | [<bf02a868>] (raw_enable_filters [can_raw]) from [<bf02ac14>] (raw_enable_allfilters+0x2c/0xa0 [can_raw]) | [<bf02ac14>] (raw_enable_allfilters [can_raw]) from [<bf02ad38>] (raw_bind+0xb0/0x250 [can_raw]) | [<bf02ad38>] (raw_bind [can_raw]) from [<c03b5fb8>] (SyS_bind+0x70/0xac) | [<c03b5fb8>] (SyS_bind) from [<c000f8c0>] (ret_fast_syscall+0x0/0x1c) Cc: Mario Kicherer <dev@kicherer.org> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-06-08ila_xlat: add missing hash secret initializationArnd Bergmann
While discussing the possible merits of clang warning about unused initialized functions, I found one function that was clearly meant to be called but never actually is. __ila_hash_secret_init() initializes the hash value for the ila locator, apparently this is intended to prevent hash collision attacks, but this ends up being a read-only zero constant since there is no caller. I could find no indication of why it was never called, the earliest patch submission for the module already was like this. If my interpretation is right, we certainly want to backport the patch to stable kernels as well. I considered adding it to the ila_xlat_init callback, but for best effect the random data is read as late as possible, just before it is first used. The underlying net_get_random_once() is already highly optimized to avoid overhead when called frequently. Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility") Cc: stable@vger.kernel.org Link: https://www.spinics.net/lists/kernel/msg2527243.html Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08net: ipv6: Release route when device is unregisteringDavid Ahern
Roopa reported attempts to delete a bond device that is referenced in a multipath route is hanging: $ ifdown bond2 # ifupdown2 command that deletes virtual devices unregister_netdevice: waiting for bond2 to become free. Usage count = 2 Steps to reproduce: echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown ip link add dev bond12 type bond ip link add dev bond13 type bond ip addr add 2001:db8:2::0/64 dev bond12 ip addr add 2001:db8:3::0/64 dev bond13 ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2 ip link del dev bond12 ip link del dev bond13 The root cause is the recent change to keep routes on a linkdown. Update the check to detect when the device is unregistering and release the route for that case. Fixes: a1a22c12060e4 ("net: ipv6: Keep nexthop of multipath route on admin down") Reported-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08net: Zero ifla_vf_info in rtnl_fill_vfinfo()Mintz, Yuval
Some of the structure's fields are not initialized by the rtnetlink. If driver doesn't set those in ndo_get_vf_config(), they'd leak memory to user. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> CC: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skbMateusz Jurczyk
Verify that the length of the socket buffer is sufficient to cover the nlmsghdr structure before accessing the nlh->nlmsg_len field for further input sanitization. If the client only supplies 1-3 bytes of data in sk_buff, then nlh->nlmsg_len remains partially uninitialized and contains leftover memory from the corresponding kernel allocation. Operating on such data may result in indeterminate evaluation of the nlmsg_len < sizeof(*nlh) expression. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. The patch prevents this and other similar tools (e.g. KMSAN) from flagging this behavior in the future. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08Revert "decnet: dn_rtmsg: Improve input length sanitization in ↵David S. Miller
dnrmg_receive_user_skb" This reverts commit 85eac2ba35a2dbfbdd5767c7447a4af07444a5b4. There is an updated version of this fix which we should use instead. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skbMateusz Jurczyk
Verify that the length of the socket buffer is sufficient to cover the entire nlh->nlmsg_len field before accessing that field for further input sanitization. If the client only supplies 1-3 bytes of data in sk_buff, then nlh->nlmsg_len remains partially uninitialized and contains leftover memory from the corresponding kernel allocation. Operating on such data may result in indeterminate evaluation of the nlmsg_len < sizeof(*nlh) expression. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. The patch prevents this and other similar tools (e.g. KMSAN) from flagging this behavior in the future. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07net: Fix inconsistent teardown and release of private netdev state.David S. Miller
Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07net: don't call strlen on non-terminated string in dev_set_alias()Alexander Potapenko
KMSAN reported a use of uninitialized memory in dev_set_alias(), which was caused by calling strlcpy() (which in turn called strlen()) on the user-supplied non-terminated string. Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Made TCP congestion control documentation match current reality, from Anmol Sarma. 2) Various build warning and failure fixes from Arnd Bergmann. 3) Fix SKB list leak in ipv6_gso_segment(). 4) Use after free in ravb driver, from Eugeniu Rosca. 5) Don't use udp_poll() in ping protocol driver, from Eric Dumazet. 6) Don't crash in PCI error recovery of cxgb4 driver, from Guilherme Piccoli. 7) _SRC_NAT_DONE_BIT needs to be cleared using atomics, from Liping Zhang. 8) Use after free in vxlan deletion, from Mark Bloch. 9) Fix ordering of NAPI poll enabled in ethoc driver, from Max Filippov. 10) Fix stmmac hangs with TSO, from Niklas Cassel. 11) Fix crash in CALIPSO ipv6, from Richard Haines. 12) Clear nh_flags properly on mpls link up. From Roopa Prabhu. 13) Fix regression in sk_err socket error queue handling, noticed by ping applications. From Soheil Hassas Yeganeh. 14) Update mlx4/mlx5 MAINTAINERS information. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits) net: stmmac: fix a broken u32 less than zero check net: stmmac: fix completely hung TX when using TSO net: ethoc: enable NAPI before poll may be scheduled net: bridge: fix a null pointer dereference in br_afspec ravb: Fix use-after-free on `ifconfig eth0 down` net/ipv6: Fix CALIPSO causing GPF with datagram support net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value Revert "sit: reload iphdr in ipip6_rcv" i40e/i40evf: proper update of the page_offset field i40e: Fix state flags for bit set and clean operations of PF iwlwifi: fix host command memory leaks iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265 iwlwifi: mvm: clear new beacon command template struct iwlwifi: mvm: don't fail when removing a key from an inexisting sta iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3 iwlwifi: mvm: fix firmware debug restart recording iwlwifi: tt: move ucode_loaded check under mutex iwlwifi: mvm: support ibss in dqa mode iwlwifi: mvm: Fix command queue number on d0i3 flow iwlwifi: mvm: rs: start using LQ command color ...
2017-06-06net: bridge: fix a null pointer dereference in br_afspecNikolay Aleksandrov
We might call br_afspec() with p == NULL which is a valid use case if the action is on the bridge device itself, but the bridge tunnel code dereferences the p pointer without checking, so check if p is null first. Reported-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06net/ipv6: Fix CALIPSO causing GPF with datagram supportRichard Haines
When using CALIPSO with IPPROTO_UDP it is possible to trigger a GPF as the IP header may have moved. Also update the payload length after adding the CALIPSO option. Signed-off-by: Richard Haines <richard_c_haines@btinternet.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06Revert "sit: reload iphdr in ipip6_rcv"David S. Miller
This reverts commit b699d0035836f6712917a41e7ae58d84359b8ff9. As per Eric Dumazet, the pskb_may_pull() is a NOP in this particular case, so the 'iph' reload is unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05devlink: fix potential memort leakHaishuang Yan
We must free allocated skb when genlmsg_put() return fails. Fixes: 1555d204e743 ("devlink: Support for pipeline debug (dpipe)") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04sit: reload iphdr in ipip6_rcvHaishuang Yan
Since iptunnel_pull_header() can call pskb_may_pull(), we must reload any pointer that was related to skb->head. Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04net: ping: do not abuse udp_poll()Eric Dumazet
Alexander reported various KASAN messages triggered in recent kernels The problem is that ping sockets should not use udp_poll() in the first place, and recent changes in UDP stack finally exposed this old bug. Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sasha Levin <alexander.levin@verizon.com> Cc: Solar Designer <solar@openwall.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Lorenzo Colitti <lorenzo@google.com> Acked-By: Lorenzo Colitti <lorenzo@google.com> Tested-By: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04net: dsa: Fix stale cpu_switch reference after unbind then bindFlorian Fainelli
Commit 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]") replaced the use of dst->ds[0] with dst->cpu_switch since that is functionally equivalent, however, we can now run into an use after free scenario after unbinding then rebinding the switch driver. The use after free happens because we do correctly initialize dst->cpu_switch the first time we probe in dsa_cpu_parse(), then we unbind the driver: dsa_dst_unapply() is called, and we rebind again. dst->cpu_switch now points to a freed "ds" structure, and so when we finally dereference it in dsa_cpu_port_ethtool_setup(), we oops. To fix this, simply set dst->cpu_switch to NULL in dsa_dst_unapply() which guarantees that we always correctly re-assign dst->cpu_switch in dsa_cpu_parse(). Fixes: 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04ipv6: Fix leak in ipv6_gso_segment().David S. Miller
If ip6_find_1stfragopt() fails and we return an error we have to free up 'segs' because nobody else is going to. Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options") Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04sock: reset sk_err when the error queue is emptySoheil Hassas Yeganeh
Prior to f5f99309fa74 (sock: do not set sk_err in sock_dequeue_err_skb), sk_err was reset to the error of the skb on the head of the error queue. Applications, most notably ping, are relying on this behavior to reset sk_err for ICMP packets. Set sk_err to the ICMP error when there is an ICMP packet at the head of the error queue. Fixes: f5f99309fa74 (sock: do not set sk_err in sock_dequeue_err_skb) Reported-by: Cyril Hrubis <chrubis@suse.cz> Tested-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04ip6_tunnel: fix traffic class routing for tunnelsLiam McBirnie
ip6_route_output() requires that the flowlabel contains the traffic class for policy routing. Commit 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets") removed the code which previously added the traffic class to the flowlabel. The traffic class is added here because only route lookup needs the flowlabel to contain the traffic class. Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets") Signed-off-by: Liam McBirnie <liam.mcbirnie@boeing.com> Acked-by: Peter Dawson <peter.a.dawson@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04Merge tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Bugfixes include: - Fix a typo in commit e092693443b ("NFS append COMMIT after synchronous COPY") that breaks copy offload - Fix the connect error propagation in xs_tcp_setup_socket() - Fix a lock leak in nfs40_walk_client_list - Verify that pNFS requests lie within the offset range of the layout segment" * tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: Mark unnecessarily extern functions as static SUNRPC: ensure correct error is reported by xs_tcp_setup_socket() NFSv4.0: Fix a lock leak in nfs40_walk_client_list pnfs: Fix the check for requests in range of layout segment xprtrdma: Delete an error message for a failed memory allocation in xprt_rdma_bc_setup() pNFS/flexfiles: missing error code in ff_layout_alloc_lseg() NFS fix COMMIT after COPY
2017-06-02tcp: disallow cwnd undo when switching congestion controlYuchung Cheng
When the sender switches its congestion control during loss recovery, if the recovery is spurious then it may incorrectly revert cwnd and ssthresh to the older values set by a previous congestion control. Consider a congestion control (like BBR) that does not use ssthresh and keeps it infinite: the connection may incorrectly revert cwnd to an infinite value when switching from BBR to another congestion control. This patch fixes it by disallowing such cwnd undo operation upon switching congestion control. Note that undo_marker is not reset s.t. the packets that were incorrectly marked lost would be corrected. We only avoid undoing the cwnd in tcp_undo_cwnd_reduction(). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()Ben Hutchings
xfrm6_find_1stfragopt() may now return an error code and we must not treat it as a length. Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02Merge tag 'mac80211-for-davem-2017-06-02' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Just two fixes: * fix the per-CPU drop counters to not be added to the rx_packets counter, but really the drop counter * fix TX aggregation start/stop callback races by setting bits instead of allocating and queueing an skb ==================== Signed-off-by: David S. Miller <davem@davemloft.net>