git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-10-10	net/mpls: Implement handler for strict data checking on dumps	David Ahern
	Without CONFIG_INET enabled compiles fail with: net/mpls/af_mpls.o: In function `mpls_dump_routes': af_mpls.c:(.text+0xed0): undefined reference to `ip_valid_fib_dump_req' The preference is for MPLS to use the same handler as ipv4 and ipv6 to allow consistency when doing a dump for AF_UNSPEC which walks all address families invoking the route dump handler. If INET is disabled then fallback to an MPLS version which can be tighter on the data checks. Fixes: e8ba330ac0c5 ("rtnetlink: Update fib dumps for strict data checking") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	Merge branch 'net-ipv4-fixes-for-PMTU-when-link-MTU-changes'	David S. Miller
	Sabrina Dubroca says: ==================== net: ipv4: fixes for PMTU when link MTU changes The first patch adapts the changes that commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes") did in IPv6 to IPv4: lower PMTU when the first hop's MTU drops below it, and raise PMTU when the first hop was limiting PMTU discovery and its MTU is increased. The second patch fixes bugs introduced in commit d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") that only appear once the first patch is applied. Selftests for these cases were introduced in net-next commit e44e428f59e4 ("selftests: pmtu: add basic IPv4 and IPv6 PMTU tests") v2: add cover letter, and fix a few small things in patch 1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	net: ipv4: don't let PMTU updates increase route MTU	Sabrina Dubroca
	When an MTU update with PMTU smaller than net.ipv4.route.min_pmtu is received, we must clamp its value. However, we can receive a PMTU exception with PMTU < old_mtu < ip_rt_min_pmtu, which would lead to an increase in PMTU. To fix this, take the smallest of the old MTU and ip_rt_min_pmtu. Before this patch, in case of an update, the exception's MTU would always change. Now, an exception can have only its lock flag updated, but not the MTU, so we need to add a check on locking to the following "is this exception getting updated, or close to expiring?" test. Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	net: ipv4: update fnhe_pmtu when first hop's MTU changes	Sabrina Dubroca
	Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	net/ipv6: stop leaking percpu memory in fib6 info	Mike Rapoport
	The fib6_info_alloc() function allocates percpu memory to hold per CPU pointers to rt6_info, but this memory is never freed. Fix it. Fixes: a64efe142f5e ("net/ipv6: introduce fib6_info struct and helpers") Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	Merge branch 'fore200e-DMA-cleanups-and-fixes'	David S. Miller
	Christoph Hellwig says: ==================== fore200e DMA cleanups and fixes The fore200e driver came up during some dma-related audits, so here is the fallout. Compile tested (x86 & sparc) only. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	fore200e: check for dma mapping failures	Christoph Hellwig
	The driver was lacking any handling for failures from the DMA mapping routines. With an iommu or swiotlb this can be fatal. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	fore200e: don't use GFP_DMA	Christoph Hellwig
	The driver properly uses the DMA mapping API, so it should not pointlessly dip into the GFP_DMA pool, which is only 16MB on x86. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	fore200e: devirtualize dma alloc calls	Christoph Hellwig
	There is no need for an indirection before calling the dma alloc routines now that we store a struct device in struct fore200e. Also remove the pointless GFP_ATOMIC for the sbus case, and fix the up the error handling by removing the 0 dma_addr test - some iommus can return 0 as a perfectly valid bus address. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	fore200e: devirtualize dma mapping calls	Christoph Hellwig
	There is no need for an indirection before calling the dma mapping routines now that we store a struct device in struct fore200e. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	fore200e: remove the align_size field of struct chunk	Christoph Hellwig
	There is no need for this field, as the only user of it can just use the local size variable instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	fore200e: store a struct device in struct fore200e	Christoph Hellwig
	This can be used much better than the untyped void pointer containing either a PCI or platform device. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	fore200e: simplify fore200e_bus usage	Christoph Hellwig
	There is no need to have a global array of the ops, instead PCI and sbus can have their own instances assigned in *_probe. Also switch to C99 initializers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	net: tun: remove useless codes of tun_automq_select_queue	Wang Li
	Because the function __skb_get_hash_symmetric always returns non-zero. Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Wang Li <wangli39@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	virtio_net: ethtool tx napi configuration	Jason Wang
	Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers. Interrupt moderation is currently not supported, so these accept and display the default settings of 0 usec and 1 frame. Toggle tx napi through setting tx-frames. So as to not interfere with possible future interrupt moderation, value 1 means tx napi while value 0 means not. Only allow the switching when device is down for simplicity. Link: https://patchwork.ozlabs.org/patch/948149/ Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	Merge branch 'nfp-flower-speed-up-stats-update-loop'	David S. Miller
	Jakub Kicinski says: ==================== nfp: flower: speed up stats update loop This set from Pieter improves performance of processing FW stats update notifications. The FW seems to send those at relatively high rate (roughly ten per second per flow), therefore if we want to approach the million flows mark we have to be very careful about our data structures. We tried rhashtable for stat updates, but according to our experiments rhashtable lookup on a u32 takes roughly 60ns on an Xeon E5-2670 v3. Which translate to a hard limit of 16M lookups per second on this CPU, and, according to perf record jhash and memcmp account for 60% of CPU usage on the core handling the updates. Given that our statistic IDs are already array indices, and considering each statistic is only 24B in size, we decided to forego the use of hashtables and use a directly indexed array. The CPU savings are considerable. With the recent improvements in TC core and with our own bottlenecks out of the way Pieter removes the artificial limit of 128 flows, and allows the driver to install as many flows as FW supports. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	nfp: flower: use host context count provided by firmware	Pieter Jansen van Vuuren
	Read the host context count symbols provided by firmware and use it to determine the number of allocated stats ids. Previously it won't be possible to offload more than 2^17 filter even if FW was able to do so. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	nfp: flower: use stats array instead of storing stats per flow	Pieter Jansen van Vuuren
	Make use of an array stats instead of storing stats per flow which would require a hash lookup at critical times. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	nfp: flower: use rhashtable for flow caching	Pieter Jansen van Vuuren
	Make use of relativistic hash tables for tracking flows instead of fixed sized hash tables. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	isdn/hisax: amd7930_fn: Remove unnecessary parentheses	Nathan Chancellor
	Clang warns when multiple sets of parentheses are used for a single conditional statement. drivers/isdn/hisax/amd7930_fn.c:628:32: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((cs->dc.amd7930.ph_state == 8)) { ~~~~~~~~~~~~~~~~~~~~~~~~^~~~ drivers/isdn/hisax/amd7930_fn.c:628:32: note: remove extraneous parentheses around the comparison to silence this warning if ((cs->dc.amd7930.ph_state == 8)) { ~ ^ ~ drivers/isdn/hisax/amd7930_fn.c:628:32: note: use '=' to turn this equality comparison into an assignment if ((cs->dc.amd7930.ph_state == 8)) { ^~ = 1 warning generated. Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	Merge tag 'rxrpc-fixes-20181008' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix packet reception code Here are a set of patches that prepares for and fix problems in rxrpc's package reception code. There serious problems are: (A) There's a window between binding the socket and setting the data_ready hook in which packets can find their way into the UDP socket's receive queues. (B) The skb_recv_udp() will return an error (and clear the error state) if there was an error on the Tx side. rxrpc doesn't handle this. (C) The rxrpc data_ready handler doesn't fully drain the UDP receive queue. (D) The rxrpc data_ready handler assumes it is called in a non-reentrant state. The second patch fixes (A) - (C); the third patch renders (B) and (C) non-issues by using the recap_rcv hook instead of data_ready - and the final patch fixes (D). That last is the most complex. The preparatory patches are: (1) Fix some places that are doing things in the wrong net namespace. (2) Stop taking the rcu read lock as it's held by the IP input routine in the call chain. (3) Only end the Tx phase if we rotated the final packet out of the Tx buffer. (4) Don't assume that the call state won't change after dropping the call_state lock. (5) Only take receive window and MTU suze parameters from an ACK packet if it's the latest ACK packet. (6) Record connection-level abort information correctly. (7) Fix a trace line. And then there are three main patches - note that these are mixed in with the preparatory patches somewhat: (1) Fix the setup window (A), skb_recv_udp() error check (B) and packet drainage (C). (2) Switch to using the encap_rcv instead of data_ready to cut out the effects of the UDP read queues and get the packets delivered directly. (3) Add more locking into the various packet input paths to defend against re-entrance (D). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	tcp: refactor DCTCP ECN ACK handling	Yuchung Cheng
	DCTCP has two parts - a new ECN signalling mechanism and the response function to it. The first part can be used by other congestion control for DCTCP-ECN deployed networks. This patch moves that part into a separate tcp_dctcp.h to be used by other congestion control module (like how Yeah uses Vegas algorithmas). For example, BBR is experimenting such ECN signal currently https://tinyurl.com/ietf-102-iccrg-bbr2 Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	net/ipv6: Make ipv6_route_table_template static	David Ahern
	ipv6_route_table_template is exported but there are no users outside of route.c. Make it static. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	rtnetlink: Update comment in rtnl_stats_dump regarding strict data checking	David Ahern
	The NLM_F_DUMP_PROPER_HDR netlink flag was replaced by a setsockopt. Update the comment in rtnl_stats_dump. Fixes: 841891ec0c65 ("rtnetlink: Update rtnl_stats_dump for strict data checking") Reported-by: Christian Brauner <christian@brauner.io> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	rtnetlink: Move ifm in valid_fdb_dump_legacy to closer to use	David Ahern
	Move setting of local variable ifm to after the message parsing in valid_fdb_dump_legacy. Avoid potential future use of unchecked variable. Fixes: 8dfbda19a21b ("rtnetlink: Move input checking for rtnl_fdb_dump to helper") Reported-by: Christian Brauner <christian@brauner.io> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	Merge branch 'mlxsw-selftests-Few-small-updates'	David S. Miller
	Ido Schimmel says: ==================== mlxsw: selftests: Few small updates First patch fixes a typo in mlxsw. Second patch fixes a race in a recent test. Third patch makes a recent test executable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	selftests: mlxsw: qos_mc_aware: Make executable	Petr Machata
	This is a self-standing test and as such should be itself executable. Fixes: b5638d46c90a ("selftests: mlxsw: Add a test for UC behavior under MC flood") Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	selftests: forwarding: Have lldpad_app_wait_set() wait for unknown, too	Petr Machata
	Immediately after mlxsw module is probed and lldpad started, added APP entries are briefly in "unknown" state before becoming "pending". That's the state that lldpad_app_wait_set() typically sees, and since there are no pending entries at that time, it bails out. However the entries have not been pushed to the kernel yet at that point, and thus the test case fails. Fix by waiting for both unknown and pending entries to disappear before proceeding. Fixes: d159261f3662 ("selftests: mlxsw: Add test for trust-DSCP") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	mlxsw: pci: Fix a typo	Nir Dotan
	Signed-off-by: Nir Dotan <nird@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	rds: RDS (tcp) hangs on sendto() to unresponding address	Ka-Cheong Poon
	In rds_send_mprds_hash(), if the calculated hash value is non-zero and the MPRDS connections are not yet up, it will wait. But it should not wait if the send is non-blocking. In this case, it should just use the base c_path for sending the message. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11	Merge tag 'for-4.19/dm-fixes-4' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Mike writes: "device mapper fix for 4.19 final - Fix for earlier 4.19 final DM linear change that incorrectly checked for CONFIG_DM_ZONED rather than CONFIG_BLK_DEV_ZONED." * tag 'for-4.19/dm-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm linear: fix linear_end_io conditional definition
2018-10-10	net: aquantia: remove some redundant variable initializations	Colin Ian King
	There are several variables being initialized that are being set later and hence the initialization is redundant and can be removed. Remove then. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11	Merge tag 'xfs-fixes-for-4.19-rc7' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Dave writes: "xfs: fixes for 4.19-rc7 Update for 4.19-rc7 to fix numerous file clone and deduplication issues." * tag 'xfs-fixes-for-4.19-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix data corruption w/ unaligned reflink ranges xfs: fix data corruption w/ unaligned dedupe ranges xfs: update ctime and remove suid before cloning files xfs: zero posteof blocks when cloning above eof xfs: refactor clonerange preparation into a separate helper
2018-10-10	dm linear: fix linear_end_io conditional definition	Damien Le Moal
	The dm-linear target is independent of the dm-zoned target. For code requiring support for zoned block devices, use CONFIG_BLK_DEV_ZONED instead of CONFIG_DM_ZONED. While at it, similarly to dm linear, also enable the DM_TARGET_ZONED_HM feature in dm-flakey only if CONFIG_BLK_DEV_ZONED is defined. Fixes: beb9caac211c1 ("dm linear: eliminate linear_end_io call if CONFIG_DM_ZONED disabled") Fixes: 0be12c1c7fce7 ("dm linear: add support for zoned block devices") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-10	net/mlx5: WQ, fixes for fragmented WQ buffers API	Tariq Toukan
	mlx5e netdevice used to calculate fragment edges by a call to mlx5_wq_cyc_get_frag_size(). This calculation did not give the correct indication for queues smaller than a PAGE_SIZE, (broken by default on PowerPC, where PAGE_SIZE == 64KB). Here it is replaced by the correct new calls/API. Since (TX/RX) Work Queues buffers are fragmented, here we introduce changes to the API in core driver, so that it gets a stride index and returns the index of last stride on same fragment, and an additional wrapping function that returns the number of physically contiguous strides that can be written contiguously to the work queue. This obsoletes the following API functions, and their buggy usage in EN driver: * mlx5_wq_cyc_get_frag_size() * mlx5_wq_cyc_ctr2fragix() The new API improves modularity and hides the details of such calculation for mlx5e netdevice and mlx5_ib rdma drivers. New calculation is also more efficient, and improves performance as follows: Packet rate test: pktgen, UDP / IPv4, 64byte, single ring, 8K ring size. Before: 16,477,619 pps After: 17,085,793 pps 3.7% improvement Fixes: 3a2f70331226 ("net/mlx5: Use order-0 allocations for all WQ types") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10	net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault type	Huy Nguyen
	The HW spec defines only bits 24-26 of pftype_wq as the page fault type, use the required mask to ensure that. Fixes: d9aaed838765 ("{net,IB}/mlx5: Refactor page fault handling") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10	net/mlx5: Fix memory leak when setting fpga ipsec caps	Talat Batheesh
	Allocated memory for context should be freed once finished working with it. Fixes: d6c4f0298cec ("net/mlx5: Refactor accel IPSec code") Signed-off-by: Talat Batheesh <talatb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10	net/mlx5e: Do not ignore netdevice TX/RX queues number	Feras Daoud
	The current design of mlx5e driver ignores the netdevice TX/RX queues number for netdevices that RDMA IPoIB ULP creates. Instead, the queue number is initialized to the maximum number that mlx5 thinks best for performance. As a result, ULP drivers that choose to create a netdevice with queue number that is less than the maximum channels mlx5 creates, will get a memory corruption. This fix changes the mlx5e netdev logic to respect ULP netdevices TX/RX queue number and use it when creating resources instead of the maximum channel number. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10	net/mlx5e: Use non-delayed work for update stats	Saeed Mahameed
	Convert mlx5e update stats work to a normal work structure, since it is never used delayed. Add a helper function to queue update stats work on demand which checks for some conditions and reduce code duplication to have a better abstraction. Fixes: ed56c5193ad8 ("net/mlx5e: Update NIC HW stats on demand only") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10	net/mlx5e: Initialize all netdev common structures in one place	Saeed Mahameed
	Move all mlx5e generic structures initializations to mlx5e_netdev_init. The common structure new initializer function will be used to initialize mlx5 context for netlink created netdevs such as IPoIB mlx5 accelerated child netdevs. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com>
2018-10-10	net/mlx5e: Always initialize update stats delayed work	Feras Daoud
	mlx5e_detach_netdev cancels update_stats work which was not initialized in ipoib netdevice profile, as a result, the following assert occurs: ODEBUG: assert_init not available (active state 0) object type: timer_list hint:(null) This change moves the update stats work to be initialized for all mlx5e netdevices. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10	net/mlx5e: Gather common netdev init/cleanup functionality in one place	Feras Daoud
	Introduce a helper init/cleanup function that initializes mlx5e generic netdev private structure, and use them from all profiles init/cleanup callbacks. This patch will also be helpful to initialize/cleanup netdevs that are not created by mlx5 driver, e.g: accelerated ipoib child netdevs. Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10	RDMA/netdev: Fix netlink support in IPoIB	Denis Drozdov
	IPoIB netlink support was broken by the below commit since integrating the rdma_netdev support relies on an allocation flow for netdevs that was controlled by the ipoib driver while netdev's rtnl_newlink implementation assumes that the netdev will be allocated by netlink. Such situation leads to crash in __ipoib_device_add, once trying to reuse netlink device. This patch fixes the kernel oops for both mlx4 and mlx5 devices triggered by the following command: Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10	RDMA/netdev: Hoist alloc_netdev_mqs out of the driver	Denis Drozdov
	netdev has several interfaces that expect to call alloc_netdev_mqs from the core code, with the driver only providing the arguments. This is incompatible with the rdma_netdev interface that returns the netdev directly. Thus re-organize the API used by ipoib so that the verbs core code calls alloc_netdev_mqs for the driver. This is done by allowing the drivers to provide the allocation parameters via a 'get_params' callback and then initializing an allocated netdev as a second step. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10	Merge tag 'for-4.19/dm-fixes-3' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Mike writes: "device mapper fixes for 4.19 final - Fix a DM cache module init error path bug that doesn't properly cleanup a KMEM_CACHE if target registration fails. - Two stable@ fixes for DM zoned target; 4.20 will have changes that eliminate this code entirely but <= 4.19 needs these changes." * tag 'for-4.19/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm linear: eliminate linear_end_io call if CONFIG_DM_ZONED disabled dm: fix report zone remapping to account for partition offset dm cache: destroy migration_cache if cache target registration failed
2018-10-10	Merge tag 'trace-v4.19-rc5' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Steven writes: "vsprint fix: It was reported that trace_printk() was not reporting properly values that came after a dereference pointer. trace_printk() utilizes vbin_printf() and bstr_printf() to keep the overhead of tracing down. vbin_printf() does not do any conversions and just stors the string format and the raw arguments into the buffer. bstr_printf() is used to read the buffer and does the conversions to complete the printf() output. This can be troublesome with dereferenced pointers because the reference may be different from the time vbin_printf() is called to the time bstr_printf() is called. To fix this, a prior commit changed vbin_printf() to convert dereferenced pointers into strings and load the converted string into the buffer. But the change to bstr_printf() had an off-by-one error and didn't account for the nul character at the end of the string and this corrupted the rest of the values in the format that came after a dereferenced pointer." * tag 'trace-v4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: vsprintf: Fix off-by-one bug in bstr_printf() processing dereferenced pointers
2018-10-10	Merge tag 'devicetree-fixes-for-4.19-3' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Rob writes: "Devicetree fixes for 4.19, part 3: - Fix DT unittest on Oldworld MAC systems" * tag 'devicetree-fixes-for-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: unittest: Disable interrupt node tests for old world MAC systems
2018-10-10	IB/mlx5: Unmap DMA addr from HCA before IOMMU	Valentine Fatiev
	The function that puts back the MR in cache also removes the DMA address from the HCA. Therefore we need to call this function before we remove the DMA mapping from MMU. Otherwise the HCA may access a memory that is no longer DMA mapped. Call trace: NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0. CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc6+ #4 Hardware name: HP ProLiant DL360p Gen8, BIOS P71 08/20/2012 RIP: 0010:intel_idle+0x73/0x120 Code: 80 5c 01 00 0f ae 38 0f ae f0 31 d2 65 48 8b 04 25 80 5c 01 00 48 89 d1 0f 60 02 RSP: 0018:ffffffff9a403e38 EFLAGS: 00000046 RAX: 0000000000000030 RBX: 0000000000000005 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffff9a5790c0 RDI: 0000000000000000 RBP: 0000000000000030 R08: 0000000000000000 R09: 0000000000007cf9 R10: 000000000000030a R11: 0000000000000018 R12: 0000000000000000 R13: ffffffff9a5792b8 R14: ffffffff9a5790c0 R15: 0000002b48471e4d FS: 0000000000000000(0000) GS:ffff9c6caf400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5737185000 CR3: 0000000590c0a002 CR4: 00000000000606f0 Call Trace: cpuidle_enter_state+0x7e/0x2e0 do_idle+0x1ed/0x290 cpu_startup_entry+0x6f/0x80 start_kernel+0x524/0x544 ? set_init_arg+0x55/0x55 secondary_startup_64+0xa4/0xb0 DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read] Request device [04:00.0] fault addr b34d2000 [fault reason 06] PTE Read access is not set DMAR: [DMA Read] Request device [01:00.2] fault addr bff8b000 [fault reason 06] PTE Read access is not set Fixes: f3f134f5260a ("RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory") Signed-off-by: Valentine Fatiev <valentinef@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-10	net: make skb_partial_csum_set() more robust against overflows	Eric Dumazet
	syzbot managed to crash in skb_checksum_help() [1] : BUG_ON(offset + sizeof(__sum16) > skb_headlen(skb)); Root cause is the following check in skb_partial_csum_set() if (unlikely(start > skb_headlen(skb)) \|\| unlikely((int)start + off > skb_headlen(skb) - 2)) return false; If skb_headlen(skb) is 1, then (skb_headlen(skb) - 2) becomes 0xffffffff and the check fails to detect that ((int)start + off) is off the limit, since the compare is unsigned. When we fix that, then the first condition (start > skb_headlen(skb)) becomes obsolete. Then we should also check that (skb_headroom(skb) + start) wont overflow 16bit field. [1] kernel BUG at net/core/dev.c:2880! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 7330 Comm: syz-executor4 Not tainted 4.19.0-rc6+ #253 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_checksum_help+0x9e3/0xbb0 net/core/dev.c:2880 Code: 85 00 ff ff ff 48 c1 e8 03 42 80 3c 28 00 0f 84 09 fb ff ff 48 8b bd 00 ff ff ff e8 97 a8 b9 fb e9 f8 fa ff ff e8 2d 09 76 fb <0f> 0b 48 8b bd 28 ff ff ff e8 1f a8 b9 fb e9 b1 f6 ff ff 48 89 cf RSP: 0018:ffff8801d83a6f60 EFLAGS: 00010293 RAX: ffff8801b9834380 RBX: ffff8801b9f8d8c0 RCX: ffffffff8608c6d7 RDX: 0000000000000000 RSI: ffffffff8608cc63 RDI: 0000000000000006 RBP: ffff8801d83a7068 R08: ffff8801b9834380 R09: 0000000000000000 R10: ffff8801d83a76d8 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000010001 R14: 000000000000ffff R15: 00000000000000a8 FS: 00007f1a66db5700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7d77f091b0 CR3: 00000001ba252000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: skb_csum_hwoffload_help+0x8f/0xe0 net/core/dev.c:3269 validate_xmit_skb+0xa2a/0xf30 net/core/dev.c:3312 __dev_queue_xmit+0xc2f/0x3950 net/core/dev.c:3797 dev_queue_xmit+0x17/0x20 net/core/dev.c:3838 packet_snd net/packet/af_packet.c:2928 [inline] packet_sendmsg+0x422d/0x64c0 net/packet/af_packet.c:2953 Fixes: 5ff8dda3035d ("net: Ensure partial checksum offset is inside the skb head") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10	Merge branch 'devlink-param-type-string-fixes'	David S. Miller
	Moshe Shemesh says: ==================== devlink param type string fixes This patchset fixes devlink param infrastructure for string param type. The devlink param infrastructure doesn't handle copying the string data correctly. The first two patches fix it and the third patch adds helper function to safely copy string value without exceeding DEVLINK_PARAM_MAX_STRING_VALUE. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>