git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-12-03	Merge branch 'mlx4-fixes'	David S. Miller
	Tariq Toukan says: ==================== mlx4 fixes for 4.20-rc This patchset includes small fixes for the mlx4_en driver. First patch by Eran fixes the value used to init the netdevice's min_mtu field. Please queue it to -stable >= v4.10. Second patch by Saeed adds missing Kconfig build dependencies. Series generated against net commit: 35b827b6d061 tun: forbid iface creation with rtnl ops ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net/mlx4_en: Fix build break when CONFIG_INET is off	Saeed Mahameed
	MLX4_EN depends on NETDEVICES, ETHERNET and INET Kconfigs. Make sure they are listed in MLX4_EN Kconfig dependencies. This fixes the following build break: drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: ‘struct iphdr’ declared inside parameter list [enabled by default] struct iphdr *iph) ^ drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] drivers/net/ethernet/mellanox/mlx4/en_rx.c: In function ‘get_fixed_ipv4_csum’: drivers/net/ethernet/mellanox/mlx4/en_rx.c:586:20: error: dereferencing pointer to incomplete type _u8 ipproto = iph->protocol; Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net/mlx4_en: Change min MTU size to ETH_MIN_MTU	Eran Ben Elisha
	NIC driver minimal MTU size shall be set to ETH_MIN_MTU, as defined in the RFC791 and in the network stack. Remove old mlx4_en only define for it, which was set to wrong value. Fixes: b80f71f5816f ("ethernet/mellanox: use core min/max MTU checking") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net/core: tidy up an error message	Qian Cai
	netif_napi_add() could report an error like this below due to it allows to pass a format string for wildcarding before calling dev_get_valid_name(), "netif_napi_add() called with weight 256 on device eth%d" For example, hns_enet_drv module does this. hns_nic_try_get_ae hns_nic_init_ring_data netif_napi_add register_netdev dev_get_valid_name Hence, make it a bit more human-readable by using netdev_err_once() instead. Signed-off-by: Qian Cai <cai@gmx.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	Revert "PCI/ASPM: Do not initialize link state when aspm_disabled is set"	Bjorn Helgaas
	This reverts commit 17c91487364fb33797ed84022564ee7544ac4945. Rafael found that this commit broke the SD card reader in his Acer Aspire S5. Details of the problem are in the bugzilla below. Fixes: 17c91487364f ("PCI/ASPM: Do not initialize link state when aspm_disabled is set") Link: https://bugzilla.kernel.org/show_bug.cgi?id=201801 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-12-03	mv88e6060: disable hardware level MAC learning	Anderson Luiz Alves
	Disable hardware level MAC learning because it breaks station roaming. When enabled it drops all frames that arrive from a MAC address that is on a different port at learning table. Signed-off-by: Anderson Luiz Alves <alacn1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	macvlan: return correct error value	Matteo Croce
	A MAC address must be unique among all the macvlan devices with the same lower device. The only exception is the passthru [sic] mode, which shares the lower device address. When duplicate addresses are detected, EBUSY is returned when bringing the interface up: # ip link add macvlan0 link eth0 type macvlan # read addr </sys/class/net/eth0/address # ip link set macvlan0 address $addr # ip link set macvlan0 up RTNETLINK answers: Device or resource busy Use correct error code which is EADDRINUSE, and do the check also earlier, on address change: # ip link set macvlan0 address $addr RTNETLINK answers: Address already in use Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	Merge branch 'udp-msg_zerocopy'	David S. Miller
	Willem de Bruijn says: ==================== udp msg_zerocopy Enable MSG_ZEROCOPY for udp sockets Patch 1/3 is the main patch, a rework of RFC patch http://patchwork.ozlabs.org/patch/899630/ more details in the patch commit message Patch 2/3 is an optimization to remove a branch from the UDP hot path and refcount_inc/refcount_dec_and_test pair when zerocopy is used. This used to be included in the first patch in v2. Patch 3/3 runs the already existing udp zerocopy tests as part of kselftest See also recent Linux Plumbers presentation https://linuxplumbersconf.org/event/2/contributions/106/attachments/104/128/willemdebruijn-lpc2018-udpgso-presentation-20181113.pdf Changes: v1 -> v2 - Fixup reverse christmas tree violation v2 -> v3 - Split refcount avoidance optimization into separate patch - Fix refcount leak on error in fragmented case (thanks to Paolo Abeni for pointing this one out!) - Fix refcount inc on zero v3 -> v4 - Move skb_zcopy_set below the only kfree_skb that might cause a premature uarg destroy before skb_zerocopy_put_abort - Move the entire skb_shinfo assignment block, to keep that cacheline access in one place ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	selftests: extend zerocopy tests to udp	Willem de Bruijn
	Both msg_zerocopy and udpgso_bench have udp zerocopy variants. Exercise these as part of the standard kselftest run. With udp, msg_zerocopy has no control channel. Ensure that the receiver exits after the sender by accounting for the initial delay in starting them (in msg_zerocopy.sh). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	udp: elide zerocopy operation in hot path	Willem de Bruijn
	With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info. Release of its last reference triggers a completion notification. The TCP stack in tcp_sendmsg_locked holds an extra ref independent of the skbs, because it can build, send and free skbs within its loop, possibly reaching refcount zero and freeing the ubuf_info too soon. The UDP stack currently also takes this extra ref, but does not need it as all skbs are sent after return from __ip(6)_append_data. Avoid the extra refcount_inc and refcount_dec_and_test, and generally the sock_zerocopy_put in the common path, by passing the initial reference to the first skb. This approach is taken instead of initializing the refcount to 0, as that would generate error "refcount_t: increment on 0" on the next skb_zcopy_set. Changes v3 -> v4 - Move skb_zcopy_set below the only kfree_skb that might cause a premature uarg destroy before skb_zerocopy_put_abort - Move the entire skb_shinfo assignment block, to keep that cacheline access in one place Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	udp: msg_zerocopy	Willem de Bruijn
	Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and interpret flag MSG_ZEROCOPY. This patch was previously part of the zerocopy RFC patchsets. Zerocopy is not effective at small MTU. With segmentation offload building larger datagrams, the benefit of page flipping outweights the cost of generating a completion notification. tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on test patch and making skb_orphan_frags_rx same as skb_orphan_frags: ipv4 udp -t 1 tx=191312 (11938 MB) txc=0 zc=n rx=191312 (11938 MB) ipv4 udp -z -t 1 tx=304507 (19002 MB) txc=304507 zc=y rx=304507 (19002 MB) ok ipv6 udp -t 1 tx=174485 (10888 MB) txc=0 zc=n rx=174485 (10888 MB) ipv6 udp -z -t 1 tx=294801 (18396 MB) txc=294801 zc=y rx=294801 (18396 MB) ok Changes v1 -> v2 - Fixup reverse christmas tree violation v2 -> v3 - Split refcount avoidance optimization into separate patch - Fix refcount leak on error in fragmented case (thanks to Paolo Abeni for pointing this one out!) - Fix refcount inc on zero - Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data. This is needed since commit 5cf4a8532c99 ("tcp: really ignore MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	sctp: kfree_rcu asoc	Xin Long
	In sctp_hash_transport/sctp_epaddr_lookup_transport, it dereferences a transport's asoc under rcu_read_lock while asoc is freed not after a grace period, which leads to a use-after-free panic. This patch fixes it by calling kfree_rcu to make asoc be freed after a grace period. Note that only the asoc's memory is delayed to free in the patch, it won't cause sk to linger longer. Thanks Neil and Marcelo to make this clear. Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable") Fixes: cd2b70875058 ("sctp: check duplicate node before inserting a new transport") Reported-by: syzbot+0b05d8aa7cb185107483@syzkaller.appspotmail.com Reported-by: syzbot+aad231d51b1923158444@syzkaller.appspotmail.com Suggested-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net/ibmvnic: Fix RTNL deadlock during device reset	Thomas Falcon
	Commit a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset") made the change to hold the RTNL lock during driver reset but still calls netdev_notify_peers, which results in a deadlock. Instead, use call_netdevice_notifiers, which is functionally the same except that it does not take the RTNL lock again. Fixes: a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset") Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	vhost: fix IOTLB locking	Jean-Philippe Brucker
	Commit 78139c94dc8c ("net: vhost: lock the vqs one by one") moved the vq lock to improve scalability, but introduced a possible deadlock in vhost-iotlb. vhost_iotlb_notify_vq() now takes vq->mutex while holding the device's IOTLB spinlock. And on the vhost_iotlb_miss() path, the spinlock is taken while holding vq->mutex. Since calling vhost_poll_queue() doesn't require any lock, avoid the deadlock by not taking vq->mutex. Fixes: 78139c94dc8c ("net: vhost: lock the vqs one by one") Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	Merge tag 'wireless-drivers-next-for-davem-2018-11-30' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.21 First set of patches for 4.21. Most notable here is support for Quantenna's QSR1000/QSR2000 chipsets and more flexible ways to provide nvram files for brcmfmac. Major changes: brcmfmac * add support for first trying to get a board specific nvram file * add support for getting nvram contents from EFI variables qtnfmac * use single PCIe driver for all platforms and rename Kconfig option CONFIG_QTNFMAC_PEARL_PCIE to CONFIG_QTNFMAC_PCIE * add support for QSR1000/QSR2000 (Topaz) family of chipsets ath10k * add support for WCN3990 firmware crash recovery * add firmware memory dump support for QCA4019 wil6210 * add firmware error recovery while in AP mode ath9k * remove experimental notice from dynack feature iwlwifi * PCI IDs for some new 9000-series cards * improve antenna usage on connection problems * new firmware debugging infrastructure * some more work on 802.11ax * improve support for multiple RF modules with 22000 devices cordic * move cordic macros and defines to a public header file * convert brcmsmac and b43 to fully use cordic library ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	Merge branch 'davinci_emac-read-the-MAC-address-from-nvmem'	David S. Miller
	Bartosz Golaszewski says: ==================== davinci_emac: read the MAC address from nvmem This series is part of a bigger series that aims at removing the platform data structure from the at24 EEPROM driver[1]. We provide a generalized version of of_get_nvmem_mac_address(), switch the only user of the of_ variant to using it, remove the previous implementation and use the new routine in the davinci_emac driver. [1] https://lkml.org/lkml/2018/11/13/884 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net: davinci_emac: use nvmem_get_mac_address()	Bartosz Golaszewski
	All DaVinci boards still supported in board files now define nvmem cells containing the MAC address. We want to stop using the setup callback from at24 so the MAC address for those users will no longer be provided over platform data. If we didn't get a valid MAC in pdata, try nvmem before resorting to a random MAC. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	of: net: kill of_get_nvmem_mac_address()	Bartosz Golaszewski
	We've switched all users to nvmem_get_mac_address(). Remove the now dead code. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net: cadence: switch to using nvmem_get_mac_address()	Bartosz Golaszewski
	We now have a generalized helper routine to read the MAC address from nvmem which takes struct device as argument. The nvmem subsystem will then try device tree first before all other potential providers. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net: ethernet: provide nvmem_get_mac_address()	Bartosz Golaszewski
	We already have of_get_nvmem_mac_address() but some non-DT systems want to read the MAC address from NVMEM too. Implement a generalized routine that takes struct device as argument. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	rhashtable: detect when object movement between tables might have ↵	NeilBrown
	invalidated a lookup Some users of rhashtables might need to move an object from one table to another - this appears to be the reason for the incomplete usage of NULLS markers. To support these, we store a unique NULLS_MARKER at the end of each chain, and when a search fails to find a match, we check if the NULLS marker found was the expected one. If not, the search may not have examined all objects in the target bucket, so it is repeated. The unique NULLS_MARKER is derived from the address of the head of the chain. As this cannot be derived at load-time the static rhnull in rht_bucket_nested() needs to be initialised at run time. Any caller of a lookup function must still be prepared for the possibility that the object returned is in a different table - it might have been there for some time. Note that this does NOT provide support for other uses of NULLS_MARKERs such as allocating with SLAB_TYPESAFE_BY_RCU or changing the key of an object and re-inserting it in the same table. These could only be done safely if new objects were inserted at the start of a hash chain, and that is not currently the case. Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	Merge branch 'hns3-ethtool-dump'	David S. Miller
	Salil Mehta says: ==================== Adds VF/PF PCIe reg dump(ethtool -d) support to HNS3 driver This patchset adds VF/PF PCIe register dump support to HNS3 VF and PF driver using "ethtool -d" command. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net: hns3: Adds support to dump(using ethool-d) PCIe regs in HNS3 PF driver	Jian Shen
	This patch adds support to dump PF PCIe registers using ethtool -d for HNS3 PF Driver. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net: hns3: Support "ethtool -d" for HNS3 VF driver	Jian Shen
	This patch adds "ethtool -d" support for HNS3 VF Driver. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	Merge branch 'phy-micrel-toggling-reset'	David S. Miller
	Yoshihiro Shimoda says: ==================== net: phy: micrel: add toggling phy reset This patch set is for R-Car Gen3 Salvator-XS boards. If we do the following method, the phy cannot link up correctly. 1) Kernel boots by using initramfs. --> No open the nic, so phy_device_register() and phy_probe() deasserts the reset. 2) Kernel enters the suspend. --> So, keep the reset signal as deassert. --> On R-Car Salvator-XS board, unfortunately, the board power is turned off. 3) Kernel returns from suspend. 4) ifconfig eth0 up --> Then, since edge signal of the reset doesn't happen, it cannot link up. 5) ifconfig eth0 down 6) ifconfig eth0 up --> In this case, it can link up. When resolving this issue after I got feedback from Andrew and Heiner, I found an issue that the phy_device.c didn't call phy_resume() if the PHY was not attached. So, patch 1 fixes it and add toggling the phy reset to the micrel phy driver. Changes from v1 (as RFC): - No remove the current code of phy_device.c to avoid any side effects. - Fix the mdio_bus_phy_resume() in phy_device.c. - Add toggling the phy reset in micrel.c if the PHY is not attached. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net: phy: micrel: add toggling phy reset if PHY is not attached	Yoshihiro Shimoda
	This patch adds toggling phy reset if PHY is not attached. Otherwise, some boards (e.g. R-Car H3 Salvator-XS) cannot link up correctly if we do the following method: 1) Kernel boots by using initramfs. --> No open the nic, so phy_device_register() and phy_probe() deasserts the reset. 2) Kernel enters the suspend. --> So, keep the reset signal as deassert. --> On R-Car Salvator-XS board, unfortunately, the board power is turned off. 3) Kernel returns from suspend. 4) ifconfig eth0 up --> Then, since edge signal of the reset doesn't happen, it cannot link up. 5) ifconfig eth0 down 6) ifconfig eth0 up --> In this case, it can link up. Reported-by: Hiromitsu Yamasaki <hiromitsu.yamasaki.ym@renesas.com> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net: phy: Fix not to call phy_resume() if PHY is not attached	Yoshihiro Shimoda
	This patch fixes an issue that mdio_bus_phy_resume() doesn't call phy_resume() if the PHY is not attached. Fixes: 803dd9c77ac3 ("net: phy: avoid suspending twice a PHY") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net: phy: improve generic EEE ethtool functions	Heiner Kallweit
	So far the two functions consider neither member eee_enabled nor eee_active. Therefore network drivers have to do this in some kind of glue code. I think this can be avoided. Getting EEE parameters: When not advertising any EEE mode, we can't consider EEE to be enabled. Therefore interpret "EEE enabled" as "we advertise at least one EEE mode". It's similar with "EEE active": interpret it as "EEE modes advertised by both link partner have at least one mode in common". Setting EEE parameters: If eee_enabled isn't set, don't advertise any EEE mode and restart aneg if needed to switch off EEE. If eee_enabled is set and data->advertised is empty (e.g. because EEE was disabled), advertise everything we support as default. This way EEE can easily switched on/off by doing ethtool --set-eee <if> eee on/off, w/o any additional parameters. The changes to both functions shouldn't break any existing user. Once the changes have been applied, at least some users can be simplified. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	samples: bpf: fix: seg fault with NULL pointer arg	Daniel T. Lee
	When NULL pointer accidentally passed to write_kprobe_events, due to strlen(NULL), segmentation fault happens. Changed code returns -1 to deal with this situation. Bug issued with Smatch, static analysis. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-03	bpf: powerpc64: optimize JIT passes for bpf function calls	Sandipan Das
	Once the JITed images for each function in a multi-function program are generated after the first three JIT passes, we only need to fix the target address for the branch instruction corresponding to each bpf-to-bpf function call. This introduces the following optimizations for reducing the work done by the JIT compiler when handling multi-function programs: [1] Instead of doing two extra passes to fix the bpf function calls, do just one as that would be sufficient. [2] During the extra pass, only overwrite the instruction sequences for the bpf-to-bpf function calls as everything else would still remain exactly the same. This also reduces the number of writes to the JITed image. [3] Do not regenerate the prologue and the epilogue during the extra pass as that would be redundant. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-03	Merge branch 'VXLAN-underlay-VRF'	David S. Miller
	Alexis Bauvin says: ==================== net: Add VRF support for VXLAN underlay v6 -> v7: - proper locking for device in udp_tunnel following Sabrina Dubroca's advice v5 -> v6: - remove automatic rebinding patch following Roopa Prabhu's advice v4 -> v5: - move test script to its own patch (6/6) - add schematic for test script - apply David Ahern comments to the test script v3 -> v4: - rename vxlan_is_in_l3mdev_chain to netdev_is_upper master - move it to net/core/dev.c - make it return bool instead of int - check if remote_ifindex is zero before resolving the l3mdev - add testing script v2 -> v3: - fix build when CONFIG_NET_IPV6 is off - fix build "unused l3mdev_master_upper_ifindex_by_index" build error with some configs v1 -> v2: - move vxlan_get_l3mdev from vxlan driver to l3mdev driver as l3mdev_master_upper_ifindex_by_index - vxlan: rename variables named l3mdev_ifindex to ifindex v0 -> v1: - fix typos We are trying to isolate the VXLAN traffic from different VMs with VRF as shown in the schemas below: +-------------------------+ +----------------------------+ \| +----------+ \| \| +------------+ \| \| \| \| \| \| \| \| \| \| \| tap-red \| \| \| \| tap-blue \| \| \| \| \| \| \| \| \| \| \| +----+-----+ \| \| +-----+------+ \| \| \| \| \| \| \| \| \| \| \| \| \| \| +----+---+ \| \| +----+----+ \| \| \| \| \| \| \| \| \| \| \| br-red \| \| \| \| br-blue \| \| \| \| \| \| \| \| \| \| \| +----+---+ \| \| +----+----+ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| +----+--------+ \| \| +--------------+ \| \| \| \| \| \| \| \| \| \| \| vxlan-red \| \| \| \| vxlan-blue \| \| \| \| \| \| \| \| \| \| \| +------+------+ \| \| +-------+------+ \| \| \| \| \| \| \| \| \| VRF \| \| \| VRF \| \| \| red \| \| \| blue \| +-------------------------+ +----------------------------+ \| \| \| \| +---------------------------------------------------------+ \| \| \| \| \| \| \| \| \| \| +--------------+ \| \| \| \| \| \| \| \| \| +---------+ eth0.2030 +---------+ \| \| \| 10.0.0.1/24 \| \| \| +-----+--------+ VRF \| \| \| green\| +---------------------------------------------------------+ \| \| +----+---+ \| \| \| eth0 \| \| \| +--------+ iproute2 commands to reproduce the setup: ip link add green type vrf table 1 ip link set green up ip link add eth0.2030 link eth0 type vlan id 2030 ip link set eth0.2030 master green ip addr add 10.0.0.1/24 dev eth0.2030 ip link set eth0.2030 up ip link add blue type vrf table 2 ip link set blue up ip link add br-blue type bridge ip link set br-blue master blue ip link set br-blue up ip link add vxlan-blue type vxlan id 2 local 10.0.0.1 dev eth0.2030 \ port 4789 ip link set vxlan-blue master br-blue ip link set vxlan-blue up ip link set tap-blue master br-blue ip link set tap-blue up ip link add red type vrf table 3 ip link set red up ip link add br-red type bridge ip link set br-red master red ip link set br-red up ip link add vxlan-red type vxlan id 3 local 10.0.0.1 dev eth0.2030 \ port 4789 ip link set vxlan-red master br-red ip link set vxlan-red up ip link set tap-red master br-red ip link set tap-red up We faced some issue in the datapath, here are the details: * Egress traffic: The vxlan packets are sent directly to the default VRF because it's where the socket is bound, therefore the traffic has a default route via eth0. the workaround is to force this traffic to VRF green with ip rules. * Ingress traffic: When receiving the traffic on eth0.2030 the vxlan socket is unreachable from VRF green. The workaround is to enable udp_l3mdev_accept sysctl, but this breaks isolation between overlay and underlay: packets sent from blue or red by e.g. a guest VM will be accepted by the socket, allowing injection of VXLAN packets from the overlay. This patch series fixes the issues describe above by allowing VXLAN socket to be bound to a specific VRF device therefore looking up in the correct table. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	test/net: Add script for VXLAN underlay in a VRF	Alexis Bauvin
	This script tests the support of a VXLAN underlay in a non-default VRF. It does so by simulating two hypervisors and two VMs, an extended L2 between the VMs with the hypervisors as VTEPs with the underlay in a VRF, and finally by pinging the two VMs. It also tests that moving the underlay from a VRF to another works when down/up the VXLAN interface. Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	vxlan: add support for underlay in non-default VRF	Alexis Bauvin
	Creating a VXLAN device with is underlay in the non-default VRF makes egress route lookup fail or incorrect since it will resolve in the default VRF, and ingress fail because the socket listens in the default VRF. This patch binds the underlying UDP tunnel socket to the l3mdev of the lower device of the VXLAN device. This will listen in the proper VRF and output traffic from said l3mdev, matching l3mdev routing rules and looking up the correct routing table. When the VXLAN device does not have a lower device, or the lower device is in the default VRF, the socket will not be bound to any interface, keeping the previous behaviour. The underlay l3mdev is deduced from the VXLAN lower device (IFLA_VXLAN_LINK). +----------+ +---------+ \| \| \| \| \| vrf-blue \| \| vrf-red \| \| \| \| \| +----+-----+ +----+----+ \| \| \| \| +----+-----+ +----+----+ \| \| \| \| \| br-blue \| \| br-red \| \| \| \| \| +----+-----+ +---+-+---+ \| \| \| \| +-----+ +-----+ \| \| \| +----+-----+ +------+----+ +----+----+ \| \| lower device \| \| \| \| \| eth0 \| <- - - - - - - \| vxlan-red \| \| tap-red \| (... more taps) \| \| \| \| \| \| +----------+ +-----------+ +---------+ Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	l3mdev: add function to retreive upper master	Alexis Bauvin
	Existing functions to retreive the l3mdev of a device did not walk the master chain to find the upper master. This patch adds a function to find the l3mdev, even indirect through e.g. a bridge: +----------+ \| \| \| vrf-blue \| \| \| +----+-----+ \| \| +----+-----+ \| \| \| br-blue \| \| \| +----+-----+ \| \| +----+-----+ \| \| \| eth0 \| \| \| +----------+ This will properly resolve the l3mdev of eth0 to vrf-blue. Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	udp_tunnel: add config option to bind to a device	Alexis Bauvin
	UDP tunnel sockets are always opened unbound to a specific device. This patch allow the socket to be bound on a custom device, which incidentally makes UDP tunnels VRF-aware if binding to an l3mdev. Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	tun: remove skb access after netif_receive_skb	Prashant Bhole
	In tun.c skb->len was accessed while doing stats accounting after a call to netif_receive_skb. We can not access skb after this call because buffers may be dropped. The fix for this bug would be to store skb->len in local variable and then use it after netif_receive_skb(). IMO using xdp data size for accounting bytes will be better because input for tun_xdp_one() is xdp_buff. Hence this patch: - fixes a bug by removing skb access after netif_receive_skb() - uses xdp data size for accounting bytes [613.019057] BUG: KASAN: use-after-free in tun_sendmsg+0x77c/0xc50 [tun] [613.021062] Read of size 4 at addr ffff8881da9ab7c0 by task vhost-1115/1155 [613.023073] [613.024003] CPU: 0 PID: 1155 Comm: vhost-1115 Not tainted 4.20.0-rc3-vm+ #232 [613.026029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [613.029116] Call Trace: [613.031145] dump_stack+0x5b/0x90 [613.032219] print_address_description+0x6c/0x23c [613.034156] ? tun_sendmsg+0x77c/0xc50 [tun] [613.036141] kasan_report.cold.5+0x241/0x308 [613.038125] tun_sendmsg+0x77c/0xc50 [tun] [613.040109] ? tun_get_user+0x1960/0x1960 [tun] [613.042094] ? __isolate_free_page+0x270/0x270 [613.045173] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.047127] ? peek_head_len.part.13+0x90/0x90 [vhost_net] [613.049096] ? get_tx_bufs+0x5a/0x2c0 [vhost_net] [613.051106] ? vhost_enable_notify+0x2d8/0x420 [vhost] [613.053139] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.053139] ? vhost_net_buf_peek+0x340/0x340 [vhost_net] [613.053139] ? __mutex_lock+0x8d9/0xb30 [613.053139] ? finish_task_switch+0x8f/0x3f0 [613.053139] ? handle_tx+0x32/0x120 [vhost_net] [613.053139] ? mutex_trylock+0x110/0x110 [613.053139] ? finish_task_switch+0xcf/0x3f0 [613.053139] ? finish_task_switch+0x240/0x3f0 [613.053139] ? __switch_to_asm+0x34/0x70 [613.053139] ? __switch_to_asm+0x40/0x70 [613.053139] ? __schedule+0x506/0xf10 [613.053139] handle_tx+0xc7/0x120 [vhost_net] [613.053139] vhost_worker+0x166/0x200 [vhost] [613.053139] ? vhost_dev_init+0x580/0x580 [vhost] [613.053139] ? __kthread_parkme+0x77/0x90 [613.053139] ? vhost_dev_init+0x580/0x580 [vhost] [613.053139] kthread+0x1b1/0x1d0 [613.053139] ? kthread_park+0xb0/0xb0 [613.053139] ret_from_fork+0x35/0x40 [613.088705] [613.088705] Allocated by task 1155: [613.088705] kasan_kmalloc+0xbf/0xe0 [613.088705] kmem_cache_alloc+0xdc/0x220 [613.088705] __build_skb+0x2a/0x160 [613.088705] build_skb+0x14/0xc0 [613.088705] tun_sendmsg+0x4f0/0xc50 [tun] [613.088705] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.088705] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.088705] handle_tx+0xc7/0x120 [vhost_net] [613.088705] vhost_worker+0x166/0x200 [vhost] [613.088705] kthread+0x1b1/0x1d0 [613.088705] ret_from_fork+0x35/0x40 [613.088705] [613.088705] Freed by task 1155: [613.088705] __kasan_slab_free+0x12e/0x180 [613.088705] kmem_cache_free+0xa0/0x230 [613.088705] ip6_mc_input+0x40f/0x5a0 [613.088705] ipv6_rcv+0xc9/0x1e0 [613.088705] __netif_receive_skb_one_core+0xc1/0x100 [613.088705] netif_receive_skb_internal+0xc4/0x270 [613.088705] br_pass_frame_up+0x2b9/0x2e0 [613.088705] br_handle_frame_finish+0x2fb/0x7a0 [613.088705] br_handle_frame+0x30f/0x6c0 [613.088705] __netif_receive_skb_core+0x61a/0x15b0 [613.088705] __netif_receive_skb_one_core+0x8e/0x100 [613.088705] netif_receive_skb_internal+0xc4/0x270 [613.088705] tun_sendmsg+0x738/0xc50 [tun] [613.088705] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.088705] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.088705] handle_tx+0xc7/0x120 [vhost_net] [613.088705] vhost_worker+0x166/0x200 [vhost] [613.088705] kthread+0x1b1/0x1d0 [613.088705] ret_from_fork+0x35/0x40 [613.088705] [613.088705] The buggy address belongs to the object at ffff8881da9ab740 [613.088705] which belongs to the cache skbuff_head_cache of size 232 Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	Merge branch 'mlxsw-fw_load_policy'	David S. Miller
	Ido Schimmel says: ==================== mlxsw: Add 'fw_load_policy' devlink parameter Shalom says: Currently, drivers do not have the ability to control the firmware loading policy and they always use their own fixed policy. This prevents drivers from running the device with a different firmware version for testing and/or debugging purposes. For example, testing a firmware bug fix. For these situations, the new devlink generic parameter, 'fw_load_policy', gives the ability to control this option and allows drivers to run with a different firmware version than required by the driver. Patch #1 adds the new parameter to devlink. The other two patches, #2 and #3, add support for this parameter in the mlxsw driver. Example: # Query the devlink parameters supported by the device $ devlink dev param show pci/0000:03:00.0: name fw_load_policy type generic values: cmode driverinit value driver # Flash new firmware using ethtool $ ethtool -f swp1 mellanox/mlxsw_spectrum-13.1703.4.mfa2 # Toggle parameter $ devlink dev param set pci/0000:03:00.0 name fw_load_policy value flash cmode driverinit # devlink reset $ devlink dev reload pci/0000:03:00.0 # Query firmware version to show changes took affect $ ethtool -i swp1 driver: mlxsw_spectrum version: 1.0 firmware-version: 13.1703.4 expansion-rom-version: bus-info: 0000:03:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no iproute2 patches available here: https://github.com/tshalom/iproute2-next v2: * Change 'fw_version_check' to 'fw_load_policy' with values 'driver' and 'flash' (Jakub) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	mlxsw: spectrum: Load firmware version based on devlink parameter	Shalom Toledo
	Load firmware version based on 'fw_load_policy' devlink parameter. The driver supports these two options: * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER (0) Default, load firmware version preferred by the driver * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH (1) Load firmware currently stored in flash The second option, 'flash', allow the device to run with different firmware version than preferred by the driver for testing and/or debugging purposes. For example, testing a firmware bug fix. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	mlxsw: core: Reset firmware after flash during driver initialization	Shalom Toledo
	After flashing new firmware during the driver initialization flow (reload or not), the driver should do a firmware reset when it gets -EAGAIN in order to load the new one. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	devlink: Add 'fw_load_policy' generic parameter	Shalom Toledo
	Many drivers load the device's firmware image during the initialization flow either from the flash or from the disk. Currently this option is not controlled by the user and the driver decides from where to load the firmware image. 'fw_load_policy' gives the ability to control this option which allows the user to choose between different loading policies supported by the driver. This parameter can be useful while testing and/or debugging the device. For example, testing a firmware bug fix. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net: 8139cp: fix a BUG triggered by changing mtu with network traffic	Su Yanjun
	When changing mtu many times with traffic, a bug is triggered: [ 1035.684037] kernel BUG at lib/dynamic_queue_limits.c:26! [ 1035.684042] invalid opcode: 0000 [#1] SMP [ 1035.684049] Modules linked in: loop binfmt_misc 8139cp(OE) macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag tcp_lp fuse uinput xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter devlink ip6_tables iptable_filter sunrpc snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep ppdev snd_seq iosf_mbi crc32_pclmul parport_pc snd_seq_device ghash_clmulni_intel parport snd_pcm aesni_intel joydev lrw snd_timer virtio_balloon sg gf128mul glue_helper ablk_helper cryptd snd soundcore i2c_piix4 pcspkr ip_tables xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif crct10dif_generic ata_generic [ 1035.684102] pata_acpi virtio_console qxl drm_kms_helper syscopyarea sysfillrect sysimgblt floppy fb_sys_fops crct10dif_pclmul crct10dif_common ttm crc32c_intel serio_raw ata_piix drm libata 8139too virtio_pci drm_panel_orientation_quirks virtio_ring virtio mii dm_mirror dm_region_hash dm_log dm_mod [last unloaded: 8139cp] [ 1035.684132] CPU: 9 PID: 25140 Comm: if-mtu-change Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.el7.x86_64 #1 [ 1035.684134] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1035.684136] task: ffff8f59b1f5a080 ti: ffff8f5a2e32c000 task.ti: ffff8f5a2e32c000 [ 1035.684149] RIP: 0010:[<ffffffffba3a40d0>] [<ffffffffba3a40d0>] dql_completed+0x180/0x190 [ 1035.684162] RSP: 0000:ffff8f5a75483e50 EFLAGS: 00010093 [ 1035.684162] RAX: 00000000000000c2 RBX: ffff8f5a6f91c000 RCX: 0000000000000000 [ 1035.684162] RDX: 0000000000000000 RSI: 0000000000000184 RDI: ffff8f599fea3ec0 [ 1035.684162] RBP: ffff8f5a75483ea8 R08: 00000000000000c2 R09: 0000000000000000 [ 1035.684162] R10: 00000000000616ef R11: ffff8f5a75483b56 R12: ffff8f599fea3e00 [ 1035.684162] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000184 [ 1035.684162] FS: 00007fa8434de740(0000) GS:ffff8f5a75480000(0000) knlGS:0000000000000000 [ 1035.684162] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1035.684162] CR2: 00000000004305d0 CR3: 000000024eb66000 CR4: 00000000001406e0 [ 1035.684162] Call Trace: [ 1035.684162] <IRQ> [ 1035.684162] [<ffffffffc08cbaf8>] ? cp_interrupt+0x478/0x580 [8139cp] [ 1035.684162] [<ffffffffba14a294>] __handle_irq_event_percpu+0x44/0x1c0 [ 1035.684162] [<ffffffffba14a442>] handle_irq_event_percpu+0x32/0x80 [ 1035.684162] [<ffffffffba14a4cc>] handle_irq_event+0x3c/0x60 [ 1035.684162] [<ffffffffba14db29>] handle_fasteoi_irq+0x59/0x110 [ 1035.684162] [<ffffffffba02e554>] handle_irq+0xe4/0x1a0 [ 1035.684162] [<ffffffffba7795dd>] do_IRQ+0x4d/0xf0 [ 1035.684162] [<ffffffffba76b362>] common_interrupt+0x162/0x162 [ 1035.684162] <EOI> [ 1035.684162] [<ffffffffba0c2ae4>] ? __wake_up_bit+0x24/0x70 [ 1035.684162] [<ffffffffba1e46f5>] ? do_set_pte+0xd5/0x120 [ 1035.684162] [<ffffffffba1b64fb>] unlock_page+0x2b/0x30 [ 1035.684162] [<ffffffffba1e4879>] do_read_fault.isra.61+0x139/0x1b0 [ 1035.684162] [<ffffffffba1e9134>] handle_pte_fault+0x2f4/0xd10 [ 1035.684162] [<ffffffffba1ebc6d>] handle_mm_fault+0x39d/0x9b0 [ 1035.684162] [<ffffffffba76f5e3>] __do_page_fault+0x203/0x500 [ 1035.684162] [<ffffffffba76f9c6>] trace_do_page_fault+0x56/0x150 [ 1035.684162] [<ffffffffba76ef42>] do_async_page_fault+0x22/0xf0 [ 1035.684162] [<ffffffffba76b788>] async_page_fault+0x28/0x30 [ 1035.684162] Code: 54 c7 47 54 ff ff ff ff 44 0f 49 ce 48 8b 35 48 2f 9c 00 48 89 77 58 e9 fe fe ff ff 0f 1f 80 00 00 00 00 41 89 d1 e9 ef fe ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 8d 42 ff 48 [ 1035.684162] RIP [<ffffffffba3a40d0>] dql_completed+0x180/0x190 [ 1035.684162] RSP <ffff8f5a75483e50> It's not the same as in 7fe0ee09 patch described. As 8139cp uses shared irq mode, other device irq will trigger cp_interrupt to execute. cp_change_mtu -> cp_close -> cp_open In cp_close routine just before free_irq(), some interrupt may occur. In my environment, cp_interrupt exectutes and IntrStatus is 0x4, exactly TxOk. That will cause cp_tx to wake device queue. As device queue is started, cp_start_xmit and cp_open will run at same time which will cause kernel BUG. For example: [#] for tx descriptor At start: [#][#][#] num_queued=3 After cp_init_hw->cp_start_hw->netdev_reset_queue: [#][#][#] num_queued=0 When 8139cp starts to work then cp_tx will check num_queued mismatchs the complete_bytes. The patch will check IntrMask before check IntrStatus in cp_interrupt. When 8139cp interrupt is disabled, just return. Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	IB/mlx5: Fix implicit ODP interrupted page fault	Artemy Kovalyov
	Since any page fault may be interrupted by a MMU invalidation and implicit leaf MR may be released during this process. The check for parent value is unreliable condition for an implicit MR. Use other condition that we can rely on to determine if MR is implicit. Fixes: b4cfe447d47b ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03	net: phy: don't allow __set_phy_supported to add unsupported modes	Heiner Kallweit
	Currently __set_phy_supported allows to add modes w/o checking whether the PHY supports them. This is wrong, it should never add modes but only remove modes we don't want to support. The commit marked as fixed didn't do anything wrong, it just copied existing functionality to the helper which is being fixed now. Fixes: f3a6bd393c2c ("phylib: Add phy_set_max_speed helper") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	net: phy: don't allow __set_phy_supported to add unsupported modes	Heiner Kallweit
	Currently __set_phy_supported allows to add modes w/o checking whether the PHY supports them. This is wrong, it should never add modes but only remove modes we don't want to support. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03	Merge tag 'arm-soc/for-4.20/devicetree-fixes' of ↵	Olof Johansson
	https://github.com/Broadcom/stblinux into fixes This pull request contains Broadcom ARM-based SoCs Device Tree fixes, please pull the following for 4.20: - Stefan fixes the polariy of the Wi-Fi reset GPIOs signals which would break on Raspberry Pi 3B and 3B+ * tag 'arm-soc/for-4.20/devicetree-fixes' of https://github.com/Broadcom/stblinux: ARM: dts: bcm2837: Fix polarity of wifi reset GPIOs Signed-off-by: Olof Johansson <olof@lixom.net>
2018-12-03	IB/hfi1: Fix an out-of-bounds access in get_hw_stats	Piotr Stankiewicz
	When running with KASAN, the following trace is produced: [ 62.535888] ================================================================== [ 62.544930] BUG: KASAN: slab-out-of-bounds in gut_hw_stats+0x122/0x230 [hfi1] [ 62.553856] Write of size 8 at addr ffff88080e8d6330 by task kworker/0:1/14 [ 62.565333] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted 4.19.0-test-build-kasan+ #8 [ 62.575087] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016 [ 62.587951] Workqueue: events work_for_cpu_fn [ 62.594050] Call Trace: [ 62.598023] dump_stack+0xc6/0x14c [ 62.603089] ? dump_stack_print_info.cold.1+0x2f/0x2f [ 62.610041] ? kmsg_dump_rewind_nolock+0x59/0x59 [ 62.616615] ? get_hw_stats+0x122/0x230 [hfi1] [ 62.622985] print_address_description+0x6c/0x23c [ 62.629744] ? get_hw_stats+0x122/0x230 [hfi1] [ 62.636108] kasan_report.cold.6+0x241/0x308 [ 62.642365] get_hw_stats+0x122/0x230 [hfi1] [ 62.648703] ? hfi1_alloc_rn+0x40/0x40 [hfi1] [ 62.655088] ? __kmalloc+0x110/0x240 [ 62.660695] ? hfi1_alloc_rn+0x40/0x40 [hfi1] [ 62.667142] setup_hw_stats+0xd8/0x430 [ib_core] [ 62.673972] ? show_hfi+0x50/0x50 [hfi1] [ 62.680026] ib_device_register_sysfs+0x165/0x180 [ib_core] [ 62.687995] ib_register_device+0x5a2/0xa10 [ib_core] [ 62.695340] ? show_hfi+0x50/0x50 [hfi1] [ 62.701421] ? ib_unregister_device+0x2e0/0x2e0 [ib_core] [ 62.709222] ? __vmalloc_node_range+0x2d0/0x380 [ 62.716131] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt] [ 62.723735] ? vmalloc_node+0x5c/0x70 [ 62.729697] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt] [ 62.737347] ? rvt_driver_mr_init+0x1f5/0x2d0 [rdmavt] [ 62.744998] ? __rvt_alloc_mr+0x110/0x110 [rdmavt] [ 62.752315] ? rvt_rc_error+0x140/0x140 [rdmavt] [ 62.759434] ? rvt_vma_open+0x30/0x30 [rdmavt] [ 62.766364] ? mutex_unlock+0x1d/0x40 [ 62.772445] ? kmem_cache_create_usercopy+0x15d/0x230 [ 62.780115] rvt_register_device+0x1f6/0x360 [rdmavt] [ 62.787823] ? rvt_get_port_immutable+0x180/0x180 [rdmavt] [ 62.796058] ? __get_txreq+0x400/0x400 [hfi1] [ 62.802969] ? memcpy+0x34/0x50 [ 62.808611] hfi1_register_ib_device+0xde6/0xeb0 [hfi1] [ 62.816601] ? hfi1_get_npkeys+0x10/0x10 [hfi1] [ 62.823760] ? hfi1_init+0x89f/0x9a0 [hfi1] [ 62.830469] ? hfi1_setup_eagerbufs+0xad0/0xad0 [hfi1] [ 62.838204] ? pcie_capability_clear_and_set_word+0xcd/0xe0 [ 62.846429] ? pcie_capability_read_word+0xd0/0xd0 [ 62.853791] ? hfi1_pcie_init+0x187/0x4b0 [hfi1] [ 62.860958] init_one+0x67f/0xae0 [hfi1] [ 62.867301] ? hfi1_init+0x9a0/0x9a0 [hfi1] [ 62.873876] ? wait_woken+0x130/0x130 [ 62.879860] ? read_word_at_a_time+0xe/0x20 [ 62.886329] ? strscpy+0x14b/0x280 [ 62.891998] ? hfi1_init+0x9a0/0x9a0 [hfi1] [ 62.898405] local_pci_probe+0x70/0xd0 [ 62.904295] ? pci_device_shutdown+0x90/0x90 [ 62.910833] work_for_cpu_fn+0x29/0x40 [ 62.916750] process_one_work+0x584/0x960 [ 62.922974] ? rcu_work_rcufn+0x40/0x40 [ 62.928991] ? __schedule+0x396/0xdc0 [ 62.934806] ? __sched_text_start+0x8/0x8 [ 62.941020] ? pick_next_task_fair+0x68b/0xc60 [ 62.947674] ? run_rebalance_domains+0x260/0x260 [ 62.954471] ? __list_add_valid+0x29/0xa0 [ 62.960607] ? move_linked_works+0x1c7/0x230 [ 62.967077] ? trace_event_raw_event_workqueue_execute_start+0x140/0x140 [ 62.976248] ? mutex_lock+0xa6/0x100 [ 62.982029] ? __mutex_lock_slowpath+0x10/0x10 [ 62.988795] ? __switch_to+0x37a/0x710 [ 62.994731] worker_thread+0x62e/0x9d0 [ 63.000602] ? max_active_store+0xf0/0xf0 [ 63.006828] ? __switch_to_asm+0x40/0x70 [ 63.012932] ? __switch_to_asm+0x34/0x70 [ 63.019013] ? __switch_to_asm+0x40/0x70 [ 63.025042] ? __switch_to_asm+0x34/0x70 [ 63.031030] ? __switch_to_asm+0x40/0x70 [ 63.037006] ? __schedule+0x396/0xdc0 [ 63.042660] ? kmem_cache_alloc_trace+0xf3/0x1f0 [ 63.049323] ? kthread+0x59/0x1d0 [ 63.054594] ? ret_from_fork+0x35/0x40 [ 63.060257] ? __sched_text_start+0x8/0x8 [ 63.066212] ? schedule+0xcf/0x250 [ 63.071529] ? __wake_up_common+0x110/0x350 [ 63.077794] ? __schedule+0xdc0/0xdc0 [ 63.083348] ? wait_woken+0x130/0x130 [ 63.088963] ? finish_task_switch+0x1f1/0x520 [ 63.095258] ? kasan_unpoison_shadow+0x30/0x40 [ 63.101792] ? __init_waitqueue_head+0xa0/0xd0 [ 63.108183] ? replenish_dl_entity.cold.60+0x18/0x18 [ 63.115151] ? _raw_spin_lock_irqsave+0x25/0x50 [ 63.121754] ? max_active_store+0xf0/0xf0 [ 63.127753] kthread+0x1ae/0x1d0 [ 63.132894] ? kthread_bind+0x30/0x30 [ 63.138422] ret_from_fork+0x35/0x40 [ 63.146973] Allocated by task 14: [ 63.152077] kasan_kmalloc+0xbf/0xe0 [ 63.157471] __kmalloc+0x110/0x240 [ 63.162804] init_cntrs+0x34d/0xdf0 [hfi1] [ 63.168883] hfi1_init_dd+0x29a3/0x2f90 [hfi1] [ 63.175244] init_one+0x551/0xae0 [hfi1] [ 63.181065] local_pci_probe+0x70/0xd0 [ 63.186759] work_for_cpu_fn+0x29/0x40 [ 63.192310] process_one_work+0x584/0x960 [ 63.198163] worker_thread+0x62e/0x9d0 [ 63.203843] kthread+0x1ae/0x1d0 [ 63.208874] ret_from_fork+0x35/0x40 [ 63.217203] Freed by task 1: [ 63.221844] __kasan_slab_free+0x12e/0x180 [ 63.227844] kfree+0x92/0x1a0 [ 63.232570] single_release+0x3a/0x60 [ 63.238024] __fput+0x1d9/0x480 [ 63.242911] task_work_run+0x139/0x190 [ 63.248440] exit_to_usermode_loop+0x191/0x1a0 [ 63.254814] do_syscall_64+0x301/0x330 [ 63.260283] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 63.270199] The buggy address belongs to the object at ffff88080e8d5500 which belongs to the cache kmalloc-4096 of size 4096 [ 63.287247] The buggy address is located 3632 bytes inside of 4096-byte region [ffff88080e8d5500, ffff88080e8d6500) [ 63.303564] The buggy address belongs to the page: [ 63.310447] page:ffffea00203a3400 count:1 mapcount:0 mapping:ffff88081380e840 index:0x0 compound_mapcount: 0 [ 63.323102] flags: 0x2fffff80008100(slab\|head) [ 63.329775] raw: 002fffff80008100 0000000000000000 0000000100000001 ffff88081380e840 [ 63.340175] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 [ 63.350564] page dumped because: kasan: bad access detected [ 63.361974] Memory state around the buggy address: [ 63.369137] ffff88080e8d6200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 63.379082] ffff88080e8d6280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 63.389032] >ffff88080e8d6300: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc [ 63.398944] ^ [ 63.406141] ffff88080e8d6380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 63.416109] ffff88080e8d6400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 63.426099] ================================================================== The trace happens because get_hw_stats() assumes there is room in the memory allocated in init_cntrs() to accommodate the driver counters. Unfortunately, that routine only allocated space for the device counters. Fix by insuring the allocation has room for the additional driver counters. Cc: <Stable@vger.kernel.org> # v4.14+ Fixes: b7481944b06e9 ("IB/hfi1: Show statistics counters under IB stats interface") Reviewed-by: Mike Marciniczyn <mike.marciniszyn@intel.com> Reviewed-by: Mike Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03	IB/hfi1: Fix a latency issue for small messages	Michael J. Ruhl
	A recent performance enhancement introduced a latency issue in the HFI message path. The new algorithm removed a forced call send for PIO messages and added a forced schedule event for messages larger than the MTU. For PIO, the schedule path can introduce thrashing that can significantly impact the throughput for small messages. If a message size is within the PIO threshold, always take the send path. Fixes: 0b79b27748cb ("IB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of posting") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03	ARM: dts: realview: Fix some more duplicate regulator nodes	Rob Herring
	There's a bug in dtc in checking for duplicate node names when there's another section (e.g. "/ { };"). In this case, skeleton.dtsi provides another section. Upon removal of skeleton.dtsi, the dtb fails to build due to a duplicate node 'fixedregulator@0'. As both nodes were pretty much the same 3.3V fixed regulator, it hasn't really mattered. Fix this by renaming the nodes to something unique. In the process, drop the unit-address which shouldn't be present wtihout reg property. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2018-12-03	MAINTAINERS: update entry for MMP platform	Lubomir Rintel
	Move Eric Miao and Haojian Zhuang over to CREDITS, since they're AWOL for some time already. The git trees have gone away too. I'm adding myself as a reviewer. I'd like to be Cc'd on patches and will be able to test them, but I don't possess a data sheet thus there might be things I'll be unable to review. Hence the Odd-Fixes status. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Olof Johansson <olof@lixom.net>
2018-12-03	ARM: mmp/mmp2: fix cpu_is_mmp2() on mmp2-dt	Lubomir Rintel
	cpu_is_mmp2() was equivalent to cpu_is_pj4(), wouldn't be correct for multiplatform kernels. Fix it by also considering mmp_chip_id, as is done for cpu_is_pxa168() and cpu_is_pxa910() above. Moreover, it is only available with CONFIG_CPU_MMP2 and thus doesn't work on DT-based MMP2 machines. Enable it on CONFIG_MACH_MMP2_DT too. Note: CONFIG_CPU_MMP2 is only used for machines that use board files instead of DT. It should perhaps be renamed. I'm not doing it now, because I don't have a better idea. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Signed-off-by: Olof Johansson <olof@lixom.net>