summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-06-14tcp: add tcp_rx_skb_cache sysctlEric Dumazet
Instead of relying on rps_needed, it is safer to use a separate static key, since we do not want to enable TCP rx_skb_cache by default. This feature can cause huge increase of memory usage on hosts with millions of sockets. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14sysctl: define proc_do_static_key()Eric Dumazet
Convert proc_dointvec_minmax_bpf_stats() into a more generic helper, since we are going to use jump labels more often. Note that sysctl_bpf_stats_enabled is removed, since it is no longer needed/used. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14hv_netvsc: Set probe mode to syncHaiyang Zhang
For better consistency of synthetic NIC names, we set the probe mode to PROBE_FORCE_SYNCHRONOUS. So the names can be aligned with the vmbus channel offer sequence. Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14net: sched: flower: don't call synchronize_rcu() on mask creationVlad Buslov
Current flower mask creating code assumes that temporary mask that is used when inserting new filter is stack allocated. To prevent race condition with data patch synchronize_rcu() is called every time fl_create_new_mask() replaces temporary stack allocated mask. As reported by Jiri, this increases runtime of creating 20000 flower classifiers from 4 seconds to 163 seconds. However, this design is no longer necessary since temporary mask was converted to be dynamically allocated by commit 2cddd2014782 ("net/sched: cls_flower: allocate mask dynamically in fl_change()"). Remove synchronize_rcu() calls from mask creation code. Instead, refactor fl_change() to always deallocate temporary mask with rcu grace period. Fixes: 195c234d15c9 ("net: sched: flower: handle concurrent mask insertion") Reported-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Tested-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14net: dsa: fix warning same module namesAnders Roxell
When building with CONFIG_NET_DSA_REALTEK_SMI and CONFIG_REALTEK_PHY enabled as loadable modules, we see the following warning: warning: same module names found: drivers/net/phy/realtek.ko drivers/net/dsa/realtek.ko Rework so the driver name is realtek-smi instead of realtek. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14sctp: Free cookie before we memdup a new oneNeil Horman
Based on comments from Xin, even after fixes for our recent syzbot report of cookie memory leaks, its possible to get a resend of an INIT chunk which would lead to us leaking cookie memory. To ensure that we don't leak cookie memory, free any previously allocated cookie first. Change notes v1->v2 update subsystem tag in subject (davem) repeat kfree check for peer_random and peer_hmacs (xin) v2->v3 net->sctp also free peer_chunks v3->v4 fix subject tags v4->v5 remove cut line Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> CC: Xin Long <lucien.xin@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14net: dsa: microchip: Don't try to read stats for unused portsRobert Hancock
If some of the switch ports were not listed in the device tree, due to being unused, the ksz_mib_read_work function ended up accessing a NULL dp->slave pointer and causing an oops. Skip checking statistics for any unused ports. Fixes: 7c6ff470aa867f53 ("net: dsa: microchip: add MIB counter reading support") Signed-off-by: Robert Hancock <hancock@sedsystems.ca> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14Merge branch 'qmi_wwan-fix-QMAP-handling'David S. Miller
Reinhard Speyerer says: ==================== qmi_wwan: fix QMAP handling This series addresses the following issues observed when using the QMAP support of the qmi_wwan driver: 1. The QMAP code in the qmi_wwan driver is based on the CodeAurora GobiNet driver ([1], [2]) which does not process QMAP padding in the RX path correctly. This causes qmimux_rx_fixup() to pass incorrect data to the IP stack when padding is used. 2. qmimux devices currently lack proper network device usage statistics. 3. RCU stalls on device disconnect with QMAP activated like this # echo Y > /sys/class/net/wwan0/qmi/raw_ip # echo 1 > /sys/class/net/wwan0/qmi/add_mux # echo 2 > /sys/class/net/wwan0/qmi/add_mux # echo 3 > /sys/class/net/wwan0/qmi/add_mux have been observed in certain setups: [ 2273.676593] option1 ttyUSB16: GSM modem (1-port) converter now disconnected from ttyUSB16 [ 2273.676617] option 6-1.2:1.0: device disconnected [ 2273.676774] WARNING: CPU: 1 PID: 141 at kernel/rcu/tree_plugin.h:342 rcu_note_context_switch+0x2a/0x3d0 [ 2273.676776] Modules linked in: option qmi_wwan cdc_mbim cdc_ncm qcserial cdc_wdm usb_wwan sierra sierra_net usbnet mii edd coretemp iptable_mangle ip6_tables iptable_filter ip_tables cdc_acm dm_mod dax iTCO_wdt evdev iTCO_vendor_support sg ftdi_sio usbserial e1000e ptp pps_core i2c_i801 ehci_pci button lpc_ich i2c_core mfd_core uhci_hcd ehci_hcd rtc_cmos usbcore usb_common sd_mod fan ata_piix thermal [ 2273.676817] CPU: 1 PID: 141 Comm: kworker/1:1 Not tainted 4.19.38-rsp-1 #1 [ 2273.676819] Hardware name: Not Applicable Not Applicable /CX-GS/GM45-GL40 , BIOS V1.11 03/23/2011 [ 2273.676828] Workqueue: usb_hub_wq hub_event [usbcore] [ 2273.676832] EIP: rcu_note_context_switch+0x2a/0x3d0 [ 2273.676834] Code: 55 89 e5 57 56 89 c6 53 83 ec 14 89 45 f0 e8 5d ff ff ff 89 f0 64 8b 3d 24 a6 86 c0 84 c0 8b 87 04 02 00 00 75 7a 85 c0 7e 7a <0f> 0b 80 bf 08 02 00 00 00 0f 84 87 00 00 00 e8 b2 e2 ff ff bb dc [ 2273.676836] EAX: 00000001 EBX: f614bc00 ECX: 00000001 EDX: c0715b81 [ 2273.676838] ESI: 00000000 EDI: f18beb40 EBP: f1a3dc20 ESP: f1a3dc00 [ 2273.676840] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010002 [ 2273.676842] CR0: 80050033 CR2: b7e97230 CR3: 2f9c4000 CR4: 000406b0 [ 2273.676843] Call Trace: [ 2273.676847] ? preempt_count_add+0xa5/0xc0 [ 2273.676852] __schedule+0x4e/0x4f0 [ 2273.676855] ? __queue_work+0xf1/0x2a0 [ 2273.676858] ? _raw_spin_lock_irqsave+0x14/0x40 [ 2273.676860] ? preempt_count_add+0x52/0xc0 [ 2273.676862] schedule+0x33/0x80 [ 2273.676865] _synchronize_rcu_expedited+0x24e/0x280 [ 2273.676867] ? rcu_accelerate_cbs_unlocked+0x70/0x70 [ 2273.676871] ? wait_woken+0x70/0x70 [ 2273.676873] ? rcu_accelerate_cbs_unlocked+0x70/0x70 [ 2273.676875] ? _synchronize_rcu_expedited+0x280/0x280 [ 2273.676877] synchronize_rcu_expedited+0x22/0x30 [ 2273.676881] synchronize_net+0x25/0x30 [ 2273.676885] dev_deactivate_many+0x133/0x230 [ 2273.676887] ? preempt_count_add+0xa5/0xc0 [ 2273.676890] __dev_close_many+0x4d/0xc0 [ 2273.676892] ? skb_dequeue+0x40/0x50 [ 2273.676895] dev_close_many+0x5d/0xd0 [ 2273.676898] rollback_registered_many+0xbf/0x4c0 [ 2273.676901] ? raw_notifier_call_chain+0x1a/0x20 [ 2273.676904] ? call_netdevice_notifiers_info+0x23/0x60 [ 2273.676906] ? netdev_master_upper_dev_get+0xe/0x70 [ 2273.676908] rollback_registered+0x1f/0x30 [ 2273.676911] unregister_netdevice_queue+0x47/0xb0 [ 2273.676915] qmimux_unregister_device+0x1f/0x30 [qmi_wwan] [ 2273.676917] qmi_wwan_disconnect+0x5d/0x90 [qmi_wwan] ... [ 2273.677001] ---[ end trace 0fcc5f88496b485a ]--- [ 2294.679136] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 2294.679140] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P141 [ 2294.679144] rcu: (detected by 0, t=21002 jiffies, g=265857, q=8446) [ 2294.679148] kworker/1:1 D 0 141 2 0x80000000 In addition the permitted QMAP mux_id value range is extended for compatibility with ip(8) and the rmnet driver. Reinhard [1]: https://portland.source.codeaurora.org/patches/quic/gobi [2]: https://portland.source.codeaurora.org/quic/qsdk/oss/lklm/gobinet/ ==================== Tested-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14qmi_wwan: extend permitted QMAP mux_id value rangeReinhard Speyerer
Permit mux_id values up to 254 to be used in qmimux_register_device() for compatibility with ip(8) and the rmnet driver. Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support") Cc: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: Reinhard Speyerer <rspmn@arcor.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14qmi_wwan: avoid RCU stalls on device disconnect when in QMAP modeReinhard Speyerer
Switch qmimux_unregister_device() and qmi_wwan_disconnect() to use unregister_netdevice_queue() and unregister_netdevice_many() instead of unregister_netdevice(). This avoids RCU stalls which have been observed on device disconnect in certain setups otherwise. Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support") Cc: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: Reinhard Speyerer <rspmn@arcor.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14qmi_wwan: add network device usage statistics for qmimux devicesReinhard Speyerer
Add proper network device usage statistics for qmimux devices instead of reporting all-zero values for them. Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support") Cc: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: Reinhard Speyerer <rspmn@arcor.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14qmi_wwan: add support for QMAP padding in the RX pathReinhard Speyerer
The QMAP code in the qmi_wwan driver is based on the CodeAurora GobiNet driver which does not process QMAP padding in the RX path correctly. Add support for QMAP padding to qmimux_rx_fixup() according to the description of the rmnet driver. Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support") Cc: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: Reinhard Speyerer <rspmn@arcor.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14Merge tag 'mac80211-for-davem-2019-06-14' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Various fixes, all over: * a few memory leaks * fixes for management frame protection security and A2/A3 confusion (affecting TDLS as well) * build fix for certificates * etc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14net: phylink: further mac_config documentation improvementsRussell King - ARM Linux admin
While reviewing the DPAA2 work, it has become apparent that we need better documentation about which members of the phylink link state structure are valid in the mac_config call. Improve this documentation. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14nfc: Ensure presence of required attributes in the deactivate_target handlerYoung Xiao
Check that the NFC_ATTR_TARGET_INDEX attributes (in addition to NFC_ATTR_DEVICE_INDEX) are provided by the netlink client prior to accessing them. This prevents potential unhandled NULL pointer dereference exceptions which can be triggered by malicious user-mode programs, if they omit one or both of these attributes. Signed-off-by: Young Xiao <92siuyang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14cfg80211: report measurement start TSF correctlyAvraham Stern
Instead of reporting the AP's TSF, host time was reported. Fix it. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-06-14cfg80211: fix memory leak of wiphy device nameEric Biggers
In wiphy_new_nm(), if an error occurs after dev_set_name() and device_initialize() have already been called, it's necessary to call put_device() (via wiphy_free()) to avoid a memory leak. Reported-by: syzbot+7fddca22578bc67c3fe4@syzkaller.appspotmail.com Fixes: 1f87f7d3a3b4 ("cfg80211: add rfkill support") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-06-14cfg80211: util: fix bit count off by oneMordechay Goodstein
The bits of Rx MCS Map in VHT capability were enumerated with index transform - index i -> (i + 1) bit => nss i. BUG! while it should be - index i -> (i + 1) bit => (i + 1) nss. The bug was exposed in commit a53b2a0b1245 ("iwlwifi: mvm: implement VHT extended NSS support in rs.c"), where iwlwifi started using the function. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Fixes: b0aa75f0b1b2 ("ieee80211: add new VHT capability fields/parsing") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-06-14mac80211: do not start any work during reconfigure flowNaftali Goldstein
It is not a good idea to try to perform any work (e.g. send an auth frame) during reconfigure flow. Prevent this from happening, and at the end of the reconfigure flow requeue all the works. Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-06-14cfg80211: use BIT_ULL in cfg80211_parse_mbssid_data()Luca Coelho
The seen_indices variable is u64 and in other parts of the code we assume mbssid_index_ie[2] can be up to 45, so we should use the 64-bit versions of BIT, namely, BIT_ULL(). Reported-by: Dan Carpented <dan.carpenter@oracle.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-06-14mac80211: only warn once on chanctx_conf being NULLYibo Zhao
In multiple SSID cases, it takes time to prepare every AP interface to be ready in initializing phase. If a sta already knows everything it needs to join one of the APs and sends authentication to the AP which is not fully prepared at this point of time, AP's channel context could be NULL. As a result, warning message occurs. Even worse, if the AP is under attack via tools such as MDK3 and massive authentication requests are received in a very short time, console will be hung due to kernel warning messages. WARN_ON_ONCE() could be a better way for indicating warning messages without duplicate messages to flood the console. Johannes: We still need to address the underlying problem, but we don't really have a good handle on it yet. Suppress the worst side-effects for now. Signed-off-by: Zhi Chen <zhichen@codeaurora.org> Signed-off-by: Yibo Zhao <yiboz@codeaurora.org> [johannes: add note, change subject] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-06-14mac80211: drop robust management frames from unknown TAJohannes Berg
When receiving a robust management frame, drop it if we don't have rx->sta since then we don't have a security association and thus couldn't possibly validate the frame. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-06-12Merge branch 'net-mvpp2-prs-Fixes-for-VID-filtering'David S. Miller
Maxime Chevallier says: ==================== net: mvpp2: prs: Fixes for VID filtering This series fixes some issues with VID filtering offload, mainly due to the wrong ranges being used in the TCAM header parser. The first patch fixes a bug where removing a VLAN from a port's whitelist would also remove it from other port's, if they are on the same PPv2 instance. The second patch makes so that we don't invalidate the wrong TCAM entries when clearing the whole whitelist. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12net: mvpp2: prs: Use the correct helpers when removing all VID filtersMaxime Chevallier
When removing all VID filters, the mvpp2_prs_vid_entry_remove would be called with the TCAM id incorrectly used as a VID, causing the wrong TCAM entries to be invalidated. Fix this by directly invalidating entries in the VID range. Fixes: 56beda3db602 ("net: mvpp2: Add hardware offloading for VLAN filtering") Suggested-by: Yuri Chipchev <yuric@marvell.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12net: mvpp2: prs: Fix parser range for VID filteringMaxime Chevallier
VID filtering is implemented in the Header Parser, with one range of 11 vids being assigned for each no-loopback port. Make sure we use the per-port range when looking for existing entries in the Parser. Since we used a global range instead of a per-port one, this causes VIDs to be removed from the whitelist from all ports of the same PPv2 instance. Fixes: 56beda3db602 ("net: mvpp2: Add hardware offloading for VLAN filtering") Suggested-by: Yuri Chipchev <yuric@marvell.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12Merge branch 'mlxsw-Various-fixes'David S. Miller
Ido Schimmel says: ==================== mlxsw: Various fixes This patchset contains various fixes for mlxsw. Patch #1 fixes an hash polarization problem when a nexthop device is a LAG device. This is caused by the fact that the same seed is used for the LAG and ECMP hash functions. Patch #2 fixes an issue in which the driver fails to refresh a nexthop neighbour after it becomes dead. This prevents the nexthop from ever being written to the adjacency table and used to forward traffic. Patch Patch #4 fixes a wrong extraction of TOS value in flower offload code. Patch #5 is a test case. Patch #6 works around a buffer issue in Spectrum-2 by reducing the default sizes of the shared buffer pools. Patch #7 prevents prio-tagged packets from entering the switch when PVID is removed from the bridge port. Please consider patches #2, #4 and #6 for 5.1.y ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12mlxsw: spectrum: Disallow prio-tagged packets when PVID is removedIdo Schimmel
When PVID is removed from a bridge port, the Linux bridge drops both untagged and prio-tagged packets. Align mlxsw with this behavior. Fixes: 148f472da5db ("mlxsw: reg: Add the Switch Port Acceptable Frame Types register") Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12mlxsw: spectrum_buffers: Reduce pool size on Spectrum-2Petr Machata
Due to an issue on Spectrum-2, in front-panel ports split four ways, 2 out of 32 port buffers cannot be used. To work around this, the next FW release will mark them as unused, and will report correspondingly lower total shared buffer size. mlxsw will pick up the new value through a query to cap_total_buffer_size resource. However the initial size for shared buffer pool 0 is hard-coded and therefore needs to be updated. Thus reduce the pool size by 2.7 MiB (which corresponds to 2/32 of the total size of 42 MiB), and round down to the whole number of cells. Fixes: fe099bf682ab ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12selftests: tc_flower: Add TOS matching testJiri Pirko
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12mlxsw: spectrum_flower: Fix TOS matchingJiri Pirko
The TOS value was not extracted correctly. Fix it. Fixes: 87996f91f739 ("mlxsw: spectrum_flower: Add support for ip tos") Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12selftests: mlxsw: Test nexthop offload indicationIdo Schimmel
Test that IPv4 and IPv6 nexthops are correctly marked with offload indication in response to neighbour events. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12mlxsw: spectrum_router: Refresh nexthop neighbour when it becomes deadIdo Schimmel
The driver tries to periodically refresh neighbours that are used to reach nexthops. This is done by periodically calling neigh_event_send(). However, if the neighbour becomes dead, there is nothing we can do to return it to a connected state and the above function call is basically a NOP. This results in the nexthop never being written to the device's adjacency table and therefore never used to forward packets. Fix this by dropping our reference from the dead neighbour and associating the nexthop with a new neigbhour which we will try to refresh. Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12mlxsw: spectrum: Use different seeds for ECMP and LAG hashIdo Schimmel
The same hash function and seed are used for both ECMP and LAG hash. Therefore, when a LAG device is used as a nexthop device as part of an ECMP group, hash polarization can occur and all the traffic will be hashed to a single LAG slave. Fix this by using a different seed for the LAG hash. Fixes: fa73989f2697 ("mlxsw: spectrum: Use a stable ECMP/LAG seed") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12net: tls, correctly account for copied bytes with multiple sk_msgsJohn Fastabend
tls_sw_do_sendpage needs to return the total number of bytes sent regardless of how many sk_msgs are allocated. Unfortunately, copied (the value we return up the stack) is zero'd before each new sk_msg is allocated so we only return the copied size of the last sk_msg used. The caller (splice, etc.) of sendpage will then believe only part of its data was sent and send the missing chunks again. However, because the data actually was sent the receiver will get multiple copies of the same data. To reproduce this do multiple sendfile calls with a length close to the max record size. This will in turn call splice/sendpage, sendpage may use multiple sk_msg in this case and then returns the incorrect number of bytes. This will cause splice to resend creating duplicate data on the receiver. Andre created a C program that can easily generate this case so we will push a similar selftest for this to bpf-next shortly. The fix is to _not_ zero the copied field so that the total sent bytes is returned. Reported-by: Steinar H. Gunderson <steinar+kernel@gunderson.no> Reported-by: Andre Tomt <andre@tomt.net> Tested-by: Andre Tomt <andre@tomt.net> Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12vrf: Increment Icmp6InMsgs on the original netdevStephen Suryaputra
Get the ingress interface and increment ICMP counters based on that instead of skb->dev when the the dev is a VRF device. This is a follow up on the following message: https://www.spinics.net/lists/netdev/msg560268.html v2: Avoid changing skb->dev since it has unintended effect for local delivery (David Ahern). Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12net: ethtool: Allow matching on vlan DEI bitMaxime Chevallier
Using ethtool, users can specify a classification action matching on the full vlan tag, which includes the DEI bit (also previously called CFI). However, when converting the ethool_flow_spec to a flow_rule, we use dissector keys to represent the matching patterns. Since the vlan dissector key doesn't include the DEI bit, this information was silently discarded when translating the ethtool flow spec in to a flow_rule. This commit adds the DEI bit into the vlan dissector key, and allows propagating the information to the driver when parsing the ethtool flow spec. Fixes: eca4205f9ec3 ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator") Reported-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12linux-next: DOC: RDS: Fix a typo in rds.txtMasanari Iida
This patch fixes a spelling typo in rds.txt Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12mpls: fix af_mpls dependencies for realMatteo Croce
Randy reported that selecting MPLS_ROUTING without PROC_FS breaks the build, because since commit c1a9d65954c6 ("mpls: fix af_mpls dependencies"), MPLS_ROUTING selects PROC_SYSCTL, but Kconfig's select doesn't recursively handle dependencies. Change the select into a dependency. Fixes: c1a9d65954c6 ("mpls: fix af_mpls dependencies") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-11Merge branch 'vxlan-geneve-linear'David S. Miller
Stefano Brivio says: ==================== Don't assume linear buffers in error handlers for VXLAN and GENEVE Guillaume noticed the same issue fixed by commit 26fc181e6cac ("fou, fou6: do not assume linear skbs") for fou and fou6 is also present in VXLAN and GENEVE error handlers: we can't assume linear buffers there, we need to use pskb_may_pull() instead. ==================== Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-11geneve: Don't assume linear buffers in error handlerStefano Brivio
In commit a07966447f39 ("geneve: ICMP error lookup handler") I wrongly assumed buffers from icmp_socket_deliver() would be linear. This is not the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear data. Eric fixed this same issue for fou and fou6 in commits 26fc181e6cac ("fou, fou6: do not assume linear skbs") and 5355ed6388e2 ("fou, fou6: avoid uninit-value in gue_err() and gue6_err()"). Use pskb_may_pull() instead of checking skb->len, and take into account the fact we later access the GENEVE header with udp_hdr(), so we also need to sum skb_transport_header() here. Reported-by: Guillaume Nault <gnault@redhat.com> Fixes: a07966447f39 ("geneve: ICMP error lookup handler") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-11vxlan: Don't assume linear buffers in error handlerStefano Brivio
In commit c3a43b9fec8a ("vxlan: ICMP error lookup handler") I wrongly assumed buffers from icmp_socket_deliver() would be linear. This is not the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear data. Eric fixed this same issue for fou and fou6 in commits 26fc181e6cac ("fou, fou6: do not assume linear skbs") and 5355ed6388e2 ("fou, fou6: avoid uninit-value in gue_err() and gue6_err()"). Use pskb_may_pull() instead of checking skb->len, and take into account the fact we later access the VXLAN header with udp_hdr(), so we also need to sum skb_transport_header() here. Reported-by: Guillaume Nault <gnault@redhat.com> Fixes: c3a43b9fec8a ("vxlan: ICMP error lookup handler") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-11net: openvswitch: do not free vport if register_netdevice() is failed.Taehee Yoo
In order to create an internal vport, internal_dev_create() is used and that calls register_netdevice() internally. If register_netdevice() fails, it calls dev->priv_destructor() to free private data of netdev. actually, a private data of this is a vport. Hence internal_dev_create() should not free and use a vport after failure of register_netdevice(). Test command ovs-dpctl add-dp bonding_masters Splat looks like: [ 1035.667767] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 1035.675958] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 1035.676916] CPU: 1 PID: 1028 Comm: ovs-vswitchd Tainted: G B 5.2.0-rc3+ #240 [ 1035.676916] RIP: 0010:internal_dev_create+0x2e5/0x4e0 [openvswitch] [ 1035.676916] Code: 48 c1 ea 03 80 3c 02 00 0f 85 9f 01 00 00 4c 8b 23 48 b8 00 00 00 00 00 fc ff df 49 8d bc 24 60 05 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 86 01 00 00 49 8b bc 24 60 05 00 00 e8 e4 68 f4 [ 1035.713720] RSP: 0018:ffff88810dcb7578 EFLAGS: 00010206 [ 1035.713720] RAX: dffffc0000000000 RBX: ffff88810d13fe08 RCX: ffffffff84297704 [ 1035.713720] RDX: 00000000000000ac RSI: 0000000000000000 RDI: 0000000000000560 [ 1035.713720] RBP: 00000000ffffffef R08: fffffbfff0d3b881 R09: fffffbfff0d3b881 [ 1035.713720] R10: 0000000000000001 R11: fffffbfff0d3b880 R12: 0000000000000000 [ 1035.768776] R13: 0000607ee460b900 R14: ffff88810dcb7690 R15: ffff88810dcb7698 [ 1035.777709] FS: 00007f02095fc980(0000) GS:ffff88811b400000(0000) knlGS:0000000000000000 [ 1035.777709] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1035.777709] CR2: 00007ffdf01d2f28 CR3: 0000000108258000 CR4: 00000000001006e0 [ 1035.777709] Call Trace: [ 1035.777709] ovs_vport_add+0x267/0x4f0 [openvswitch] [ 1035.777709] new_vport+0x15/0x1e0 [openvswitch] [ 1035.777709] ovs_vport_cmd_new+0x567/0xd10 [openvswitch] [ 1035.777709] ? ovs_dp_cmd_dump+0x490/0x490 [openvswitch] [ 1035.777709] ? __kmalloc+0x131/0x2e0 [ 1035.777709] ? genl_family_rcv_msg+0xa54/0x1030 [ 1035.777709] genl_family_rcv_msg+0x63a/0x1030 [ 1035.777709] ? genl_unregister_family+0x630/0x630 [ 1035.841681] ? debug_show_all_locks+0x2d0/0x2d0 [ ... ] Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-11net: correct udp zerocopy refcnt also when zerocopy only on appendWillem de Bruijn
The below patch fixes an incorrect zerocopy refcnt increment when appending with MSG_MORE to an existing zerocopy udp skb. send(.., MSG_ZEROCOPY | MSG_MORE); // refcnt 1 send(.., MSG_ZEROCOPY | MSG_MORE); // refcnt still 1 (bar frags) But it missed that zerocopy need not be passed at the first send. The right test whether the uarg is newly allocated and thus has extra refcnt 1 is not !skb, but !skb_zcopy. send(.., MSG_MORE); // <no uarg> send(.., MSG_ZEROCOPY); // refcnt 1 Fixes: 100f6d8e09905 ("net: correct zerocopy refcnt with udp MSG_MORE") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-09nfp: ensure skb network header is set for packet redirectJohn Hurley
Packets received at the NFP driver may be redirected to egress of another netdev (e.g. in the case of OvS internal ports). On the egress path, some processes, like TC egress hooks, may expect the network header offset field in the skb to be correctly set. If this is not the case there is potential for abnormal behaviour and even the triggering of BUG() calls. Set the skb network header field before the mac header pull when doing a packet redirect. Fixes: 27f54b582567 ("nfp: allow fallback packets from non-reprs") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-09tcp: fix undo spurious SYNACK in passive Fast OpenYuchung Cheng
Commit 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit") may cause tcp_fastretrans_alert() to warn about pending retransmission in Open state. This is triggered when the Fast Open server both sends data and has spurious SYNACK retransmission during the handshake, and the data packets were lost or reordered. The root cause is a bit complicated: (1) Upon receiving SYN-data: a full socket is created with snd_una = ISN + 1 by tcp_create_openreq_child() (2) On SYNACK timeout the server/sender enters CA_Loss state. (3) Upon receiving the final ACK to complete the handshake, sender does not mark FLAG_SND_UNA_ADVANCED since (1) Sender then calls tcp_process_loss since state is CA_loss by (2) (4) tcp_process_loss() does not invoke undo operations but instead mark REXMIT_LOST to force retransmission (5) tcp_rcv_synrecv_state_fastopen() calls tcp_try_undo_loss(). It changes state to CA_Open but has positive tp->retrans_out (6) Next ACK triggers the WARN_ON in tcp_fastretrans_alert() The step that goes wrong is (4) where the undo operation should have been invoked because the ACK successfully acknowledged the SYN sequence. This fixes that by specifically checking undo when the SYN-ACK sequence is acknowledged. Then after tcp_process_loss() the state would be further adjusted based in tcp_fastretrans_alert() to avoid triggering the warning in (6). Fixes: 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-09mpls: fix af_mpls dependenciesMatteo Croce
MPLS routing code relies on sysctl to work, so let it select PROC_SYSCTL. Reported-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-09Merge branch 'ibmvnic-Fixes-for-device-reset-handling'David S. Miller
Thomas Falcon says: ==================== ibmvnic: Fixes for device reset handling This series contains three unrelated fixes to issues seen during device resets. The first patch fixes an error when the driver requests to deactivate the link of an uninitialized device, resulting in a failure to reset. Next, a patch to fix multicast transmission failures seen after a driver reset. The final patch fixes mishandling of memory allocation failures during device initialization, which caused a kernel oops. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-09ibmvnic: Fix unchecked return codes of memory allocationsThomas Falcon
The return values for these memory allocations are unchecked, which may cause an oops if the driver does not handle them after a failure. Fix by checking the function's return code. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-09ibmvnic: Refresh device multicast list after resetThomas Falcon
It was observed that multicast packets were no longer received after a device reset. The fix is to resend the current multicast list to the backing device after recovery. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-09ibmvnic: Do not close unopened driver during resetThomas Falcon
Check driver state before halting it during a reset. If the driver is not running, do nothing. Otherwise, a request to deactivate a down link can cause an error and the reset will fail. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>