linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-08-29	drm/syncobj: Add a signal ioctl (v3)	Jason Ekstrand
	This IOCTL provides a mechanism for userspace to trigger a sync object directly. There are other ways that userspace can trigger a syncobj such as submitting a dummy batch somewhere or hanging on to a triggered sync_file and doing an import. This just provides an easy way to manually trigger the sync object without weird hacks. The motivation for this IOCTL is Vulkan fences. Vulkan lets you create a fence already in the signaled state so that you can wait on it immediatly without stalling. We could also handle this with a new create flag to ask the driver to create a syncobj that is already signaled but the IOCTL seemed a bit cleaner and more generic. v2: - Take an array of sync objects (Dave Airlie) v3: - Throw -EINVAL if pad != 0 Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-08-29	drm/syncobj: Add a reset ioctl (v3)	Jason Ekstrand
	This just resets the dma_fence to NULL so it looks like it's never been signaled. This will be useful once we add the new wait API for allowing wait on "submit and signal" behavior. v2: - Take an array of sync objects (Dave Airlie) v3: - Throw -EINVAL if pad != 0 Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Christian König <christian.koenig@amd.com> (v1) Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-08-28	Merge branch 'bnxt_en-next'	David S. Miller
	Michael Chan says: ==================== bnxt_en: Updates. Various changes including updated firmware interface, improved TX ring allocation scheme, improved out-of-memory logic in NAPI loop, reduced default rings on multi-port devices, new PCI IDs. Of particular note, CPU affinity hints from Vasundhara Volam. TC Flower eswitch support from Sathya Perla. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: add code to query TC flower offload stats	Sathya Perla
	This patch adds code to implement TC_CLSFLOWER_STATS TC-cmd and the required FW code to query the stats from the HW. Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: add TC flower offload flow_alloc/free FW cmds	Sathya Perla
	This patch adds the hwrm_cfa_flow_alloc/free() routines that are needed to issue the FW cmds needed for TC flower offload. Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: bnxt: add TC flower filter offload support	Sathya Perla
	This patch adds support for offloading TC based flow rules and actions for the 'flower' classifier in the bnxt_en driver. It includes logic to parse flow rules and actions received from the TC subsystem, store them and issue the corresponding hwrm_cfa_flow_alloc/free FW cmds. L2/IPv4/IPv6 flows and drop, redir, vlan push/pop actions are supported in this patch. In this patch the hwrm_cfa_flow_xxx routines are just stubs. The code for these routines is introduced in the next patch for easier review. Also, the code to query the TC/flower action stats will be introduced in a subsequent patch. Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: fix clearing devlink ptr from bnxt struct	Sathya Perla
	The routine bnxt_link_bp_to_dl() is used to set the devlink ptr in bnxt struct (bp) and also to set the bnxt back ptr in the devlink struct. If devlink_register() fails, bp->dl must be cleared which is not happening currently. This patch fixes bnxt_link_bp_to_dl() to clear bp->dl by passing a NULL dl ptr. Fixes: 4ab0c6a8ffd7 ("bnxt_en: add support to enable VF-representors") Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: Reduce default rings on multi-port cards.	Michael Chan
	Reduce default rings from 8 to 4 on multi-port cards to reduce memory usage. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: Improve -ENOMEM logic in NAPI poll loop.	Michael Chan
	If we cannot allocate RX buffers in the NAPI poll loop when processing an RX event, the current code does not count that event towards the NAPI budget. This can cause us to potentially loop forever in NAPI if we consistently cannot allocate new buffers. Improve it by counting -ENOMEM event as 1 towards the NAPI budget. Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt: initialize board_info values with proper enums	Scott Branden
	initialize board_info values with proper enums for defensive programming purposes. This will avoid any errors of the enums being declared not lining up with the board_info array. Signed-off-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt: Add PCIe device IDs for bcm58802/bcm58808	Ray Jui
	Add PCIe device ID for bcm58802 and bcm58808. Also add chip number update to declare bcm588xx as chip class phase 4 and later Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: assign CPU affinity hints to bnxt_en IRQs	Vasundhara Volam
	This patch provides hints to irqbalance to map bnxt_en device IRQs to specific CPU cores. cpumask_local_spread() is used, which first maps IRQs to near NUMA cores; when those cores are exhausted, IRQs are mapped to far NUMA cores. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: Improve tx ring reservation logic.	Michael Chan
	When the number of TX rings is changed (e.g. ethtool -L, enabling XDP TX rings, etc), the current code tries to reserve the new number of TX rings before closing and re-opening the NIC. If we are unable to reserve the new TX rings, we abort the operation and keep the current TX rings. The problem is that the firmware will disable the current TX rings even when it cannot reserve the new set of TX rings. We fix it as follows: 1. Instead of reserving the new set of TX rings, just ask the firmware to check if the new set of TX rings is available. There is a flag in the firmware message to do that. If not available, abort and the current TX rings will not be disabled. 2. Do the actual TX ring reservation in the path that opens the NIC. We keep the number of TX rings currently successfully reserved. If the number of TX rings is different than the reserved TX rings, we call firmware and reserve again. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: Update firmware interface spec. to 1.8.1.4.	Michael Chan
	Flow APIs are added in this firmware interface. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	Merge branch 'NCSI-vlan-filtering'	David S. Miller
	Samuel Mendoza-Jonas says: ==================== NCSI VLAN Filtering Support This series (mainly patch 2) adds VLAN filtering to the NCSI implementation. A fair amount of code already exists in the NCSI stack for VLAN filtering but none of it is actually hooked up. This goes the final mile and fixes a few bugs in the existing code found along the way (patch 1). Patch 3 adds the appropriate flag and callbacks to the ftgmac100 driver to enable filtering as it's a large consumer of NCSI (and what I've been testing on). v3: - Add comment describing change to ncsi_find_filter() - Catch NULL in clear_one_vid() from ncsi_get_filter() - Simplify state changes when kicking updated channel ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	ftgmac100: Support NCSI VLAN filtering when available	Samuel Mendoza-Jonas
	Register the ndo_vlan_rx_{add,kill}_vid callbacks and set the NETIF_F_HW_VLAN_CTAG_FILTER if NCSI is available. This allows the VLAN core to notify the NCSI driver when changes occur so that the remote NCSI channel can be properly configured to filter on the set VLAN tags. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	net/ncsi: Configure VLAN tag filter	Samuel Mendoza-Jonas
	Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI stack process new VLAN tags and configure the channel VLAN filter appropriately. Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent for each one, meaning the ncsi_dev_state_config_svf state must be repeated. An internal list of VLAN tags is maintained, and compared against the current channel's ncsi_channel_filter in order to keep track within the state. VLAN filters are removed in a similar manner, with the introduction of the ncsi_dev_state_config_clear_vids state. The maximum number of VLAN tag filters is determined by the "Get Capabilities" response from the channel. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	net/ncsi: Fix several packet definitions	Samuel Mendoza-Jonas
	Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	Merge branch '40GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2017-08-27 This series contains updates to i40e and i40evf only. Sudheer updates code comments and state variable so that adminq_subtask will have accutate information whenever it gets scheduled. Mariusz stores information about FEC modes, to be used to printing link states information, so that we do not need to call admin queue when reporting link status. Adds VF support for controlling VLAN tag stripping via ethtool. Jake provides the majority of changes in this series, starting with increasing the size of the prefix buffer so that it can hold enough characters for every possible input, which prevents snprintf truncation. Fixed other string truncation errors/warnings produced by GCC 7.x. Removed an unnecessary workaround for resetting XPS. Fixed an issue where there is a mismatched affinity mask value, so initialize the value to cpu_possible_mask and invert the logic for checking incorrect CPU vs IRQ affinity so that the exceptional case is handled at the check. Removed ULTRA latency mode due to several issues found and will be looking at better solution for small packet workloads. Akeem fixes an issue where the incorrect flag was being used to set promiscuous mode for unicast, which was enabling promiscuous mode only for multicast instead of unicast. Carolyn fixes an issue where an error return value is set, but this value can be overwritten before we actually do exit the function. So remove the error code assignment and add code comments for better understanding on why we do not need to set and return the error. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	page waitqueue: always add new entries at the end	Linus Torvalds
	Commit 3510ca20ece0 ("Minor page waitqueue cleanups") made the page queue code always add new waiters to the back of the queue, which helps upcoming patches to batch the wakeups for some horrid loads where the wait queues grow to thousands of entries. However, I forgot about the nasrt add_page_wait_queue() special case code that is only used by the cachefiles code. That one still continued to add the new wait queue entries at the beginning of the list. Fix it, because any sane batched wakeup will require that we don't suddenly start getting new entries at the beginning of the list that we already handled in a previous batch. [ The current code always does the whole list while holding the lock, so wait queue ordering doesn't matter for correctness, but even then it's better to add later entries at the end from a fairness standpoint ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-28	net-next/hinic: fix comparison of a uint16_t type with -1	Aviad Krawczyk
	Remove the search for index of constant buffer size Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com> Signed-off-by: Zhao Chen <zhaochen6@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	net-next/hinic: Fix MTU limitation	Aviad Krawczyk
	Fix the hw MTU limitation by setting max_mtu Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com> Signed-off-by: Zhao Chen <zhaochen6@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	Merge branch 'irda-move-to-staging'	David S. Miller
	Greg Kroah-Hartman says: ==================== irda: move it to drivers/staging so we can delete it The IRDA code has long been obsolete and broken. So, to keep people from trying to use it, and to prevent people from having to maintain it, let's move it to drivers/staging/ so that we can delete it entirely from the kernel in a few releases. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	staging: irda: add a TODO file.	Greg Kroah-Hartman
	The irda code will be deleted in a future kernel release, so no need to have anyone do any new work on it. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	irda: move include/net/irda into staging subdirectory	Greg Kroah-Hartman
	And finally, move the irda include files into drivers/staging/irda/include/net/irda. Yes, it's a long path, but it makes it easy for us to just add a Makefile directory path addition and all of the net and drivers code "just works". Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	irda: move drivers/net/irda to drivers/staging/irda/drivers	Greg Kroah-Hartman
	Move the irda drivers from drivers/net/irda/ to drivers/staging/irda/drivers as they will be deleted in a future kernel release. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	irda: move net/irda/ to drivers/staging/irda/net/	Greg Kroah-Hartman
	It's time to get rid of IRDA. It's long been broken, and no one seems to use it anymore. So move it to staging and after a while, we can delete it from there. To start, move the network irda core from net/irda to drivers/staging/irda/net/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29	intel_pstate: convert to use acpi_match_platform_list()	Toshi Kani
	Convert to use acpi_match_platform_list() for the platform check. There is no change in functionality. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-29	ACPI / blacklist: add acpi_match_platform_list()	Toshi Kani
	ACPI OEM ID / OEM Table ID / Revision can be used to identify a platform based on ACPI firmware info. acpi_blacklisted(), intel_pstate_platform_pwr_mgmt_exists(), and some other funcs, have been using similar check to detect a list of platforms that require special handlings. Move the platform check in acpi_blacklisted() to a new common utility function, acpi_match_platform_list(), so that other drivers do not have to implement their own version. There is no change in functionality. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-28	Merge branch 'dpaa_eth-rss'	David S. Miller
	Madalin Bucur says: ==================== Add RSS to DPAA 1.x Ethernet driver This patch set introduces Receive Side Scaling for the DPAA Ethernet driver. Documentation is updated with details related to the new feature and limitations that apply. Added also a small fix. v2: removed a C++ style comment v3: move struct fman to header file to avoid exporting a function v4: addressed compilation issues introduced in v3 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	dpaa_eth: check allocation result	Madalin Bucur
	Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	Documentation: networking: add RSS information	Madalin Bucur
	Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	dpaa_eth: add NETIF_F_RXHASH	Madalin Bucur
	Set the skb hash when then FMan Keygen hash result is available. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	dpaa_eth: enable Rx hashing control	Madalin Bucur
	Allow ethtool control of the Rx flow hashing. By default RSS is enabled, this allows to turn it off by bypassing the FMan Keygen block and sending all traffic on the default Rx frame queue. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	dpaa_eth: use multiple Rx frame queues	Madalin Bucur
	Add a block of 128 Rx frame queues per port. The FMan hardware will send traffic on one of these queues based on the FMan port Parse Classify Distribute setup. The hash computed by the FMan Keygen block will select the Rx FQ. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	fsl/fman: enable FMan Keygen	Iordache Florinel-R70177
	Add support for the FMan Keygen with a hardcoded scheme to spread incoming traffic on a FQ range based on source and destination IPs and ports. Signed-off-by: Iordache Florinel <florinel.iordache@nxp.com> Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	fsl/fman: move struct fman to header file	Madalin Bucur
	Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bridge: check for null fdb->dst before notifying switchdev drivers	Roopa Prabhu
	current switchdev drivers dont seem to support offloading fdb entries pointing to the bridge device which have fdb->dst not set to any port. This patch adds a NULL fdb->dst check in the switchdev notifier code. This patch fixes the below NULL ptr dereference: $bridge fdb add 00:02:00:00:00:33 dev br0 self [ 69.953374] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 69.954044] IP: br_switchdev_fdb_notify+0x29/0x80 [ 69.954044] PGD 66527067 [ 69.954044] P4D 66527067 [ 69.954044] PUD 7899c067 [ 69.954044] PMD 0 [ 69.954044] [ 69.954044] Oops: 0000 [#1] SMP [ 69.954044] Modules linked in: [ 69.954044] CPU: 1 PID: 3074 Comm: bridge Not tainted 4.13.0-rc6+ #1 [ 69.954044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 69.954044] task: ffff88007b827140 task.stack: ffffc90001564000 [ 69.954044] RIP: 0010:br_switchdev_fdb_notify+0x29/0x80 [ 69.954044] RSP: 0018:ffffc90001567918 EFLAGS: 00010246 [ 69.954044] RAX: 0000000000000000 RBX: ffff8800795e0880 RCX: 00000000000000c0 [ 69.954044] RDX: ffffc90001567920 RSI: 000000000000001c RDI: ffff8800795d0600 [ 69.954044] RBP: ffffc90001567938 R08: ffff8800795d0600 R09: 0000000000000000 [ 69.954044] R10: ffffc90001567a88 R11: ffff88007b849400 R12: ffff8800795e0880 [ 69.954044] R13: ffff8800795d0600 R14: ffffffff81ef8880 R15: 000000000000001c [ 69.954044] FS: 00007f93d3085700(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 69.954044] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 69.954044] CR2: 0000000000000008 CR3: 0000000066551000 CR4: 00000000000006e0 [ 69.954044] Call Trace: [ 69.954044] fdb_notify+0x3f/0xf0 [ 69.954044] __br_fdb_add.isra.12+0x1a7/0x370 [ 69.954044] br_fdb_add+0x178/0x280 [ 69.954044] rtnl_fdb_add+0x10a/0x200 [ 69.954044] rtnetlink_rcv_msg+0x1b4/0x240 [ 69.954044] ? skb_free_head+0x21/0x40 [ 69.954044] ? rtnl_calcit.isra.18+0xf0/0xf0 [ 69.954044] netlink_rcv_skb+0xed/0x120 [ 69.954044] rtnetlink_rcv+0x15/0x20 [ 69.954044] netlink_unicast+0x180/0x200 [ 69.954044] netlink_sendmsg+0x291/0x370 [ 69.954044] ___sys_sendmsg+0x180/0x2e0 [ 69.954044] ? filemap_map_pages+0x2db/0x370 [ 69.954044] ? do_wp_page+0x11d/0x420 [ 69.954044] ? __handle_mm_fault+0x794/0xd80 [ 69.954044] ? vma_link+0xcb/0xd0 [ 69.954044] __sys_sendmsg+0x4c/0x90 [ 69.954044] SyS_sendmsg+0x12/0x20 [ 69.954044] do_syscall_64+0x63/0xe0 [ 69.954044] entry_SYSCALL64_slow_path+0x25/0x25 [ 69.954044] RIP: 0033:0x7f93d2bad690 [ 69.954044] RSP: 002b:00007ffc7217a638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 69.954044] RAX: ffffffffffffffda RBX: 00007ffc72182eac RCX: 00007f93d2bad690 [ 69.954044] RDX: 0000000000000000 RSI: 00007ffc7217a670 RDI: 0000000000000003 [ 69.954044] RBP: 0000000059a1f7f8 R08: 0000000000000006 R09: 000000000000000a [ 69.954044] R10: 00007ffc7217a400 R11: 0000000000000246 R12: 00007ffc7217a670 [ 69.954044] R13: 00007ffc72182a98 R14: 00000000006114c0 R15: 00007ffc72182aa0 [ 69.954044] Code: 1f 00 66 66 66 66 90 55 48 89 e5 48 83 ec 20 f6 47 20 04 74 0a 83 fe 1c 74 09 83 fe 1d 74 2c c9 66 90 c3 48 8b 47 10 48 8d 55 e8 <48> 8b 70 08 0f b7 47 1e 48 83 c7 18 48 89 7d f0 bf 03 00 00 00 [ 69.954044] RIP: br_switchdev_fdb_notify+0x29/0x80 RSP: ffffc90001567918 [ 69.954044] CR2: 0000000000000008 [ 69.954044] ---[ end trace 03e9eec4a82c238b ]--- Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs	Tejun Heo
	When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of @node. The assumption seems that if !NUMA, there shouldn't be more than one node and thus reporting cpu_online_mask regardless of @node is correct. However, that assumption was broken years ago to support DISCONTIGMEM and whether a system has multiple nodes or not is separately controlled by NEED_MULTIPLE_NODES. This means that, on a system with !NUMA && NEED_MULTIPLE_NODES, cpumask_of_node() will report cpu_online_mask for all possible nodes, indicating that the CPUs are associated with multiple nodes which is an impossible configuration. This bug has been around forever but doesn't look like it has caused any noticeable symptoms. However, it triggers a WARN recently added to workqueue to verify NUMA affinity configuration. Fix it by reporting empty cpumask on non-zero nodes if !NUMA. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-28	IB/rxe: Handle NETDEV_CHANGE events	Andrew Boyer
	Without this fix, ports configured on top of ixgbe miss link up notifications. ibv_query_port() will continue to return IBV_PORT_DOWN even though the port is up and working. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28	IB/rxe: Avoid ICRC errors by copying into the skb first	Andrew Boyer
	The current process is to first calculate the CRC and then copy the client data into the packet. This leaves a window in which the packet contents and CRC can get out of sync, if the client changes the data after the CRC is calculated but before the data is copied. By copying the data into the packet and then calculating the CRC directly from the packet contents we eliminate the window. This can be seen with qperf's ud_bi_bw test. This seems like very strange/reckless client behavior, but whether the client has mangled its data or not RXE should be able to transfer it reliably. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28	IB/rxe: Another fix for broken receive queue draining	Andrew Boyer
	This fixes another path in rxe_requester() that might overlook stale SKBs, preventing cleanup. Fixes: 1217197142d1 ("rxe: fix broken receive queue draining") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28	IB/rxe: Remove unneeded initialization in prepare6()	Andrew Boyer
	Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28	IB/rxe: Fix up rxe_qp_cleanup()	Andrew Boyer
	Replace sk_dst_get()/dst_release() in rxe_qp_cleanup() with sk_dst_reset(). sk_dst_get() takes a new reference on dst, so the dst_release() doesn't actually release the original reference, which was the design intent. Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28	IB/rxe: Add dst_clone() in prepare_ipv6_hdr()	Andrew Boyer
	Otherwise the reference count goes negative as IPv6 packets complete. Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28	IB/rxe: Fix destination cache for IPv6	Andrew Boyer
	To successfully match an IPv6 path, the path cookie must match. Store it in the QP so that the IPv6 path can be reused. Replace open-coded version of dst_check() with the actual call, fixing the logic. The open-coded version skips the check call if dst->obsolete is 0 (DST_OBSOLETE_NONE), proceeding to replace the route. DST_OBSOLETE_NONE means that the route may continue to be used, though. Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28	IB/rxe: Fix up the responder's find_resources() function	Andrew Boyer
	The resource array is sized by max_dest_rd_atomic, not max_rd_atomic. Iterating over max_rd_atomic entries of qp->resp.resources[] will cause incorrect behavior when the two attributes are different (or even crash if max_rd_atomic is larger). Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28	IB/rxe: Remove dangling prototype	Andrew Boyer
	Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Acked-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28	IB/rxe: Disable completion upcalls when a CQ is destroyed	Andrew Boyer
	This prevents the stack from accessing userspace objects while they are being torn down. One possible sequence of events: - Userspace program exits - ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(), ib_destroy_cq(), etc. and releasing/freeing the UCQ - The QP still has tasklets running, so it isn't destroyed yet - The CQ is referenced by the QP, so the CQ isn't destroyed yet - The UCQ is kfree()'d anyway - A send work request completes - rxe_send_complete() calls cq->ibcq.comp_handler() - ib_uverbs_comp_handler() runs and crashes; the event queue is checked for is_closed, but it has no way to check the ib_ucq_object before accessing it The reference counting on the CQ doesn't protect against this since the CQ hasn't been destroyed yet. There's no available interface to deregister the UCQ from the CQ, and it didn't appear that attempting to add reference counting to the UCQ was going to be a good way to go since this solution is much simpler. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28	IB/rxe: Move refcounting earlier in rxe_send()	Andrew Boyer
	The network stack will call nskb's destructor, rxe_skb_tx_dtor(), if the packet gets dropped by ip_local_out()/ip6_local_out(). Thus we need to add the QP ref before output to avoid extra dereferences during network congestion. This could lead to unwanted destruction of the QP. Fix up the skb_out accounting, too. Fixes: fda85ce91240 ("IB/rxe: Fix kernel panic from skb destructor") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Acked-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>