linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-10-09	nfp: bpf: remove register rename	Jakub Kicinski
	Remove the register renumbering optimization. To implement calling map and other helpers we need more strict register layout. We can't freely reassign register numbers. This will have the effect of running in 4 context/thread mode, which should be OK since we are moving towards integrating the BPF closer with FW app datapath anyway, and the target datapath itself runs in 4 context mode. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	nfp: bpf: encode all 64bit shifts	Jakub Kicinski
	Add encodings of all 64bit shift operations. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	nfp: bpf: move software reg helpers and cmd table out of translator	Jakub Kicinski
	Move the software reg helpers and some static data to nfp_asm.c. They are related to the previous patch, but move is done in a separate commit for ease of review. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	nfp: bpf: use the power of sparse to check we encode registers right	Jakub Kicinski
	Define a new __bitwise type for software representation of registers. This will allow us to catch incorrect parameter types using sparse. Accessors we define also allow us to return correct enum type and therefore ensure all switches handle all register types. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	nfp: bpf: lift the single-port limitation	Jakub Kicinski
	Limiting the eBPF offload to a single port was a workaround required for the PoC application FW which has not been released externally. It's not necessary any more. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	nfp: output control messages to trace_devlink_hwmsg()	Jakub Kicinski
	Use standard devlink trace point to allow tracing of control messages. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	Merge branch 'hns3-cleanups'	David S. Miller
	Yunsheng Lin says: ==================== A few cleanup for hns3 ethernet driver This patchset contains a few cleanup for hns3 ethernet driver. No functional change intended. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	net: hns3: Cleanup for non-static function in hns3 driver	Yunsheng Lin
	This patch fixes the following warning from sparse: warning: symbol 'hns3_set_multicast_list' was not declared. Should it be static. hns3_set_multicast_list turns out to be not used, so delete it. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	net: hns3: Cleanup for endian issue in hns3 driver	Yunsheng Lin
	This patch fixes a lot of endian issues detected by sparse. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	net: hns3: Cleanup for struct that used to send cmd to firmware	Yunsheng Lin
	The hclge_tm module has already added _cmd to the end of struct that used to send cmd to firmware. This will help us finding the endian issues. This patch adds the _cmd to the end of struct that used to send cmd to firmware in hclge_main module. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	net: hns3: Consistently using GENMASK in hns3 driver	Yunsheng Lin
	This patch uses GENMASK to generate bit mask whenever possible in hns3 driver. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	net: hns3: Cleanup indentation for Kconfig in the the hisilicon folder	Yunsheng Lin
	This patch fixes a few indentation for Kconfig file in the hisilicon folder. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	net: hns3: Add hns3_get_handle macro in hns3 driver	Yunsheng Lin
	There are many places that will need to get the handle of netdev, so add a macro to get the handle of netdev. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	net: hns3: Cleanup for shifting true in hns3 driver	Yunsheng Lin
	This patch fixes a shifting true in hclge_main module. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	Merge branch 'master' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-10-09 1) Fix some error paths of the IPsec offloading API. 2) Fix a NULL pointer dereference when IPsec is used with vti. From Alexey Kodanev. 3) Don't call xfrm_policy_cache_flush under xfrm_state_lock, it triggers several locking warnings. From Artem Savkov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	ixgbe: split Tx/Rx ring clearing for ethtool loopback test	Emil Tantilov
	Commit: fed21bcee7a5 ("ixgbe: Don't bother clearing buffer memory for descriptor rings) exposed some issues with the logic in the current implementation of ixgbe_clean_test_rings() that are being addressed in this patch: - Split the clearing of the Tx and Rx rings in separate loops. Previously both Tx and Rx rings were cleared in a rx_desc->wb.upper.length based loop which could lead to issues if for w/e reason packets were received outside of the frames transmitted for the loopback test. - Add check for IXGBE_TXD_STAT_DD to avoid clearing the rings if the transmits have not comlpeted by the time we enter ixgbe_clean_test_rings() - Exit early on ixgbe_check_lbtest_frame() failure. This change fixes a crash during ethtool diagnostic (ethtool -t). Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09	ipv4: Fix traffic triggered IPsec connections.	Steffen Klassert
	A recent patch removed the dst_free() on the allocated dst_entry in ipv4_blackhole_route(). The dst_free() marked the dst_entry as dead and added it to the gc list. I.e. it was setup for a one time usage. As a result we may now have a blackhole route cached at a socket on some IPsec scenarios. This makes the connection unusable. Fix this by marking the dst_entry directly at allocation time as 'dead', so it is used only once. Fixes: b838d5e1c5b6 ("ipv4: mark DST_NOGC and remove the operation of dst_free()") Reported-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	ipv6: Fix traffic triggered IPsec connections.	Steffen Klassert
	A recent patch removed the dst_free() on the allocated dst_entry in ipv6_blackhole_route(). The dst_free() marked the dst_entry as dead and added it to the gc list. I.e. it was setup for a one time usage. As a result we may now have a blackhole route cached at a socket on some IPsec scenarios. This makes the connection unusable. Fix this by marking the dst_entry directly at allocation time as 'dead', so it is used only once. Fixes: 587fea741134 ("ipv6: mark DST_NOGC and remove the operation of dst_free()") Reported-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09	ixgbe: add error checks when initializing the PHY	Emil Tantilov
	Ignoring errors when attempting to identify the PHY can lead to a crash. Specifically in the case of FW controlled PHYs where the PHY read/write operations are set to NULL. Removed redundant comment. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09	ixgbe: restore normal RSS after last macvlan offload is removed	Shannon Nelson
	Just like when the last VF is removed, we need to restore normal operations after the last macvlan offload is removed, else we get stuck in single queue operations. To test: ethtool -l eth1 # note the number of queues in use, ~= cpus ethtool -K eth1 l2-fwd-offload on ip link add mv1 link eth1 type macvlan mode bridge ip link set dev mv1 up ip link del mv1 ethtool -l eth1 # are we back to the same # of queues, or stuck on 1? Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09	ixgbe: declare ixgbe_mac_operations structures as const	Bhumika Goyal
	Declare ixgbe_mac_operations structures as const as they are only stored in the mac_ops field of ixgbe_info structure. This field is of type const and therefore ixgbe_mac_operations structure can be made const too. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09	ixgbe: Clear SWFW_SYNC register during init	Emil Tantilov
	Added clearing of SW resource bits in the SW/FW synchronization register to ixgbe_init_swfw_sync_X540(). Updated ixgbe_acquire_swfw_sync_X540 SW Manageability host interface resource bit error case to match the error handling of the other SW resource bits. Which is to release the SW resource bits if SW times out while attempting to acquire the resource. This allows the driver to load in cases where the semaphore bits could be stuck after a reset or a crash. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09	ixgbe: incorrect XDP ring accounting in ethtool tx_frame param	John Fastabend
	Changing the TX ring parameters with an XDP program attached may cause the XDP queues to be cleared and the TX rings to be incorrectly configured. Fix by doing correct ring accounting in setup call. Fixes: 33fdc82f0883 ("ixgbe: add support for XDP_TX action") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09	net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag	Ding Tianhong
	The ixgbe driver use the compile check to determine if it can send TLPs to Root Port with the Relaxed Ordering Attribute set, this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added to the kernel and we could check the bit4 in the PCIe Device Control register to determine whether we should use the Relaxed Ordering Attributes or not, so use this new way in the ixgbe driver. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09	Revert commit 1a8b6d76dc5b ("net:add one common config...")	Ding Tianhong
	The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added to indicate that Relaxed Ordering Attributes (RO) should not be used for Transaction Layer Packets (TLP) targeted toward these affected Root Port, it will clear the bit4 in the PCIe Device Control register, so the PCIe device drivers could query PCIe configuration space to determine if it can send TLPs to Root Port with the Relaxed Ordering Attributes set. With this new flag we don't need the config ARCH_WANT_RELAX_ORDER to control the Relaxed Ordering Attributes for the ixgbe drivers just like the commit 1a8b6d76dc5b ("net:add one common config...") did, so revert this commit. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09	ixgbe: fix masking of bits read from IXGBE_VXLANCTRL register	Sabrina Dubroca
	In ixgbe_clear_udp_tunnel_port(), we read the IXGBE_VXLANCTRL register and then try to mask some bits out of the value, using the logical instead of bitwise and operator. Fixes: a21d0822ff69 ("ixgbe: add support for geneve Rx offload") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09	ixgbe: Return error when getting PHY address if PHY access is not supported	Mark D Rustad
	In cases where PHY register access is not supported, don't mislead a caller into thinking that it is supported by returning a PHY address. Instead, return -EOPNOTSUPP when PHY access is not supported. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09	netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'	Shmulik Ladkani
	Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced support for attaching an eBPF object by an fd, with the 'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each IPT_SO_SET_REPLACE call. However this breaks subsequent iptables calls: # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT # iptables -A INPUT -s 5.6.7.8 -j ACCEPT iptables: Invalid argument. Run `dmesg' for more information. That's because iptables works by loading existing rules using IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with the replacement set. However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number (from the initial "iptables -m bpf" invocation) - so when 2nd invocation occurs, userspace passes a bogus fd number, which leads to 'bpf_mt_check_v1' to fail. One suggested solution [1] was to hack iptables userspace, to perform a "entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new, process-local fd per every 'xt_bpf_info_v1' entry seen. However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects. This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given '.fd' and instead perform an in-kernel lookup for the bpf object given the provided '.path'. It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is expected to provide the path of the pinned object. Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved. References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2 [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2 Reported-by: Rafael Buchbinder <rafi@rbk.ms> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-09	netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook	Lin Zhang
	In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet, but the real server maybe reply an icmp error packet related to the exist tcp conntrack, so we will access wrong tcp data. Fix it by checking for the protocol field and only process tcp traffic. Signed-off-by: Lin Zhang <xiaolou4617@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-08	qed: Delete redundant check on dcb_app priority	Christos Gkekas
	dcb_app priority is unsigned thus checking whether it is less than zero is redundant. Signed-off-by: Christos Gkekas <chris.gekas@gmail.com> Acked-By: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	net: ethernet: stmmac: Clean up dead code	Christos Gkekas
	Many macros in dwmac-ipq806x are unused and should be removed. Moreover gmac->id is an unsigned variable and therefore checking whether it is less than zero is redundant. Signed-off-by: Christos Gkekas <chris.gekas@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	Merge branch 'ipv6_dev_get_saddr-rcu'	David S. Miller
	Eric Dumazet says: ==================== ipv6: ipv6_dev_get_saddr() rcu works Sending IPv6 udp packets on non connected sockets is quite slow, because ipv6_dev_get_saddr() is still using an rwlock and silly references games on ifa. Tested: $ ./super_netperf 16 -H 4444::555:0786 -l 2000 -t UDP_STREAM -- -m 100 & [1] 12527 Performance is boosted from 2.02 Mpps to 4.28 Mpps Kernel profile before patches : 22.62% [kernel] [k] _raw_read_lock_bh 7.04% [kernel] [k] refcount_sub_and_test 6.56% [kernel] [k] ipv6_get_saddr_eval 5.67% [kernel] [k] _raw_read_unlock_bh 5.34% [kernel] [k] __ipv6_dev_get_saddr 4.95% [kernel] [k] refcount_inc_not_zero 4.03% [kernel] [k] __ip6addrlbl_match 3.70% [kernel] [k] _raw_spin_lock 3.44% [kernel] [k] ipv6_dev_get_saddr 3.24% [kernel] [k] ip6_pol_route 3.06% [kernel] [k] refcount_add_not_zero 2.30% [kernel] [k] __local_bh_enable_ip 1.81% [kernel] [k] mlx4_en_xmit 1.20% [kernel] [k] __ip6_append_data 1.12% [kernel] [k] __ip6_make_skb 1.11% [kernel] [k] __dev_queue_xmit 1.06% [kernel] [k] l3mdev_master_ifindex_rcu Kernel profile after patches : 11.36% [kernel] [k] ip6_pol_route 7.65% [kernel] [k] _raw_spin_lock 7.16% [kernel] [k] __ipv6_dev_get_saddr 6.49% [kernel] [k] ipv6_get_saddr_eval 6.04% [kernel] [k] refcount_add_not_zero 3.34% [kernel] [k] __ip6addrlbl_match 2.62% [kernel] [k] __dev_queue_xmit 2.37% [kernel] [k] mlx4_en_xmit 2.26% [kernel] [k] dst_release 1.89% [kernel] [k] __ip6_make_skb 1.87% [kernel] [k] __ip6_append_data 1.86% [kernel] [k] udpv6_sendmsg 1.86% [kernel] [k] ip6t_do_table 1.64% [kernel] [k] ipv6_dev_get_saddr 1.64% [kernel] [k] find_match 1.51% [kernel] [k] l3mdev_master_ifindex_rcu 1.24% [kernel] [k] ipv6_addr_label ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	ipv6: avoid cache line dirtying in ipv6_dev_get_saddr()	Eric Dumazet
	By extending the rcu section a bit, we can avoid these very expensive in6_ifa_put()/in6_ifa_hold() calls done in __ipv6_dev_get_saddr() and ipv6_dev_get_saddr() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	ipv6: __ipv6_dev_get_saddr() rcu conversion	Eric Dumazet
	Callers hold rcu_read_lock(), so we do not need the rcu_read_lock()/rcu_read_unlock() pair. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	ipv6: ipv6_chk_prefix() rcu conversion	Eric Dumazet
	Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	ipv6: ipv6_chk_custom_prefix() rcu conversion	Eric Dumazet
	Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	ipv6: ipv6_count_addresses() rcu conversion	Eric Dumazet
	Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	ipv6: prepare RCU lookups for idev->addr_list	Eric Dumazet
	inet6_ifa_finish_destroy() already uses kfree_rcu() to free inet6_ifaddr structs. We need to use proper list additions/deletions in order to allow readers to use RCU instead of idev->lock rwlock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	tipc: Unclone message at secondary destination lookup	Jon Maloy
	When a bundling message is received, the function tipc_link_input() calls function tipc_msg_extract() to unbundle all inner messages of the bundling message before adding them to input queue. The function tipc_msg_extract() just clones all inner skb for all inner messagges from the bundling skb. This means that the skb headroom of an inner message overlaps with the data part of the preceding message in the bundle. If the message in question is a name addressed message, it may be subject to a secondary destination lookup, and eventually be sent out on one of the interfaces again. But, since what is perceived as headroom by the device driver in reality is the last bytes of the preceding message in the bundle, the latter will be overwritten by the MAC addresses of the L2 header. If the preceding message has not yet been consumed by the user, it will evenually be delivered with corrupted contents. This commit fixes this by uncloning all messages passing through the function tipc_msg_lookup_dest(), hence ensuring that the headroom is always valid when the message is passed on. Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	tipc: correct initialization of skb list	Jon Maloy
	We change the initialization of the skb transmit buffer queues in the functions tipc_bcast_xmit() and tipc_rcast_xmit() to also initialize their spinlocks. This is needed because we may, during error conditions, need to call skb_queue_purge() on those queues further down the stack. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	Merge branch 'bridge-neigh-msg-proxy-and-flood-suppression-support'	David S. Miller
	Roopa Prabhu says: ==================== bridge: neigh msg proxy and flood suppression support This series implements arp and nd suppression in the bridge driver for ethernet vpns. It implements rfc7432, section 10 https://tools.ietf.org/html/rfc7432#section-10 for ethernet VPN deployments. It is similar to the existing BR_PROXYARP* flags but has a few semantic differences to conform to EVPN standard. Unlike the existing flags, this new flag suppresses flood of all neigh discovery packets (arp and nd) to tunnel ports. Supports both vlan filtering and non-vlan filtering bridges. In case of EVPN, it is mainly used to avoid flooding of arp and nd packets to tunnel ports like vxlan. v2 : rebase to latest + address some optimization feedback from Nikolay. v3 : fix kbuild reported build errors with CONFIG_INET off v4 : simplify port flag mask as suggested by stephen v5 : address some feedback from Toshiaki v6 : some v5 cleanups in nd suppress (keep it consistent with arp suppress) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports	Roopa Prabhu
	This patch avoids flooding and proxies ndisc packets for BR_NEIGH_SUPPRESS ports. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports	Roopa Prabhu
	This patch avoids flooding and proxies arp packets for BR_NEIGH_SUPPRESS ports. Moves existing br_do_proxy_arp to br_do_proxy_suppress_arp to support both proxy arp and neigh suppress. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood	Roopa Prabhu
	This patch adds a new bridge port flag BR_NEIGH_SUPPRESS to suppress arp and nd flood on bridge ports. It implements rfc7432, section 10. https://tools.ietf.org/html/rfc7432#section-10 for ethernet VPN deployments. It is similar to the existing BR_PROXYARP* flags but has a few semantic differences to conform to EVPN standard. Unlike the existing flags, this new flag suppresses flood of all neigh discovery packets (arp and nd) to tunnel ports. Supports both vlan filtering and non-vlan filtering bridges. In case of EVPN, it is mainly used to avoid flooding of arp and nd packets to tunnel ports like vxlan. This patch adds netlink and sysfs support to set this bridge port flag. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	ipv6: fix a BUG in rt6_get_pcpu_route()	Eric Dumazet
	Ido reported following splat and provided a patch. [ 122.221814] BUG: using smp_processor_id() in preemptible [00000000] code: sshd/2672 [ 122.221845] caller is debug_smp_processor_id+0x17/0x20 [ 122.221866] CPU: 0 PID: 2672 Comm: sshd Not tainted 4.14.0-rc3-idosch-next-custom #639 [ 122.221880] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ 122.221893] Call Trace: [ 122.221919] dump_stack+0xb1/0x10c [ 122.221946] ? _atomic_dec_and_lock+0x124/0x124 [ 122.221974] ? ___ratelimit+0xfe/0x240 [ 122.222020] check_preemption_disabled+0x173/0x1b0 [ 122.222060] debug_smp_processor_id+0x17/0x20 [ 122.222083] ip6_pol_route+0x1482/0x24a0 ... I believe we can simplify this code path a bit, since we no longer hold a read_lock and need to release it to avoid a dead lock. By disabling BH, we make sure we'll prevent code re-entry and rt6_get_pcpu_route()/rt6_make_pcpu_route() run on the same cpu. Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Reported-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	Merge tag 'mlx5-updates-2017-10-06' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== Mellanox, mlx5 updates 2017-10-06 This series includes some shared code updates for kernel 4.15 to both net-next and rdma-next trees. The series includes mlx5 low level flow steering updates and optimizations to support firmware command parallelism for flow steering requests from Maor Gottlieb and two other small fixes from Matan and Maor. One fix from Matan adds error handling for when the destination list of the flow steering rule is full. Maor introduced a patch to avoid NULL pointer dereference on steering cleanup. Then Some refactoring patches needed by the series for code sharing purposes. and split the Flow Table Entry (FTE) and Flow Group (FG) creation code to two parts: 1) Object allocation - allocate the steering node and initialize its resources. 2) The firmware command execution. This change will give us the ability to take write lock on the parent node (e.g. FG for FTE creating) only on the software data struct allocation and creation part of the procedure where the synchronization is really required, and will allow us to execute multiple firmware commands simultaneously and overcome the firmware bottleneck. Refactor the locking scheme of the mlx5 core flow steering as follows: 1) Replace the mutex lock with readers-writers semaphore and take the write lock only when necessary (e.g. allocating a new flow table entry index or adding a node to the parent's children list). When we try to find a suitable child in the parent's children list (e.g. search for flow group with the same match_criteria of the rule) then we only take the read lock. 2) Add versioning mechanism - each steering entity (FT, FG, FTE, DST) will have an incremental version. The version is increased when the entity is changed (e.g. when a new FTE was added to FG - the FG's version is increased). Versioning is used in order to determine if the last traverse of an entity's children is valid or a rescan under write lock is required. Last patch adds FGs and FTEs memory pool, It is useful because these objects are not small and could be allocated/deallocated many times. This support improves the insertion rate of steering rules from ~5k/sec to ~40k/sec. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	Linux 4.14-rc4v4.14-rc4	Linus Torvalds

2017-10-08	gso: fix payload length when gso_size is zero	Alexey Kodanev
	When gso_size reset to zero for the tail segment in skb_segment(), later in ipv6_gso_segment(), __skb_udp_tunnel_segment() and gre_gso_segment() we will get incorrect results (payload length, pcsum) for that segment. inet_gso_segment() already has a check for gso_size before calculating payload. The issue was found with LTP vxlan & gre tests over ixgbe NIC. Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	Merge branch 'hv_netvsc-TCP-hash-level'	David S. Miller
	Haiyang Zhang says: ==================== hv_netvsc: support changing TCP hash level The patch set simplifies the existing hash level switching code for UDP. It also adds the support for changing TCP hash level. So users can switch between L3 an L4 hash levels for TCP and UDP. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08	hv_netvsc: Update netvsc Document for TCP hash level setting	Haiyang Zhang
	Update Documentation/networking/netvsc.txt for TCP hash level setting and related info. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>