summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-04-05bnxt_en: Add basic EEE support.Michael Chan
Get EEE capability and the initial EEE settings from firmware. Add "EEE is active | not active" to link up dmesg. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05bnxt_en: Improve flow control autoneg with Firmware 1.2.1 interface.Michael Chan
Make use of the new AUTONEG_PAUSE bit in the new interface to better control autoneg flow control settings, independent of RX and TX advertisement settings. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05bnxt_en: Update to Firmware 1.2.2 spec.Michael Chan
Use new field names in API structs and stop using deprecated fields auto_link_speed and auto_duplex in phy_cfg/phy_qcfg structs. Update copyright year to 2016. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2016-04-05 This series contains updates to fm10k only. Bruce provides nearly half of the patches in the series, most of which do general cleanup of the driver. These include semantic cleanups, checkpatch.pl fixes, update driver to use BIT() kernel macro, use BUILD_BUG_ON() where appropriate and use ether_addr_copy() instead of memcpy(). Jake provides the remaining patches in the series, starting with a fix for a possible NULL pointer deference. Next delays initialization of the service timer and service task until late in probe(). If we do not wait, failures in probe do not properly cleanup the service timer or service task items which result in a kernel panic. Added better reporting during error conditions. Fixed another possible kernel panic where we were clearing the interrupt scheme before we freed the mailbox IRQ. Added helper functions for setting strings and data for ethtool stats. Fixed comment mis-spelled words. v2: Dropped patch 3 from the original submission, until a better solution can be worked up based on feedback from Joe Perches and David Miller. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05fm10k: use ethtool_rxfh_indir_default for default redirection tableJacob Keller
The fm10k driver used its own code for generating a default indirection table on device load, which was not the same as the default generated by ethtool when indir_size of 0 is passed to SRXFH. Take advantage of ethtool_rxfh_indir_default() and simplify code to write the redirection table to reduce some code duplication. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: fix a minor typo in some commentsJacob Keller
s/funciton/function to resolve a typo, and cleanup grammar on a few comments regarding processing the VF mailboxes. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: correctly clean up when init_queueing_scheme failsJacob Keller
Fix a kernel panic that occurs during surprise removal. Clear the interface queue counts upon fm10k_init_msix_capability failure. This prevents further code (fm10k_update_stats etc.) from attempting to access unallocated queue vector or ring memory. [ 628.692648] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068 [ 628.692805] IP: [<ffffffffa0475caf>] fm10k_update_stats+0x7f/0x2c0 [fm10k] [ 628.693173] PGD 0 [ 628.693759] Oops: 0000 [#1] SMP [ 628.699321] CPU: 10 PID: 8164 Comm: kworker/10:0 Tainted: G OE ------------ 3.10.0-327.el7.x86_64 #1 [ 628.700096] Hardware name: Supermicro X9DAi/X9DAi, BIOS 3.2 05/09/2015 [ 628.700894] Workqueue: pciehp-1 pciehp_power_thread [ 628.701686] task: ffff88086559c500 ti: ffff8808593c0000 task.ti: ffff8808593c0000 [ 628.702493] RIP: 0010:[<ffffffffa0475caf>] [<ffffffffa0475caf>] fm10k_update_stats+0x7f/0x2c0 [fm10k] [ 628.703310] RSP: 0018:ffff8808593c3b00 EFLAGS: 00010282 [ 628.704132] RAX: 0000000000000000 RBX: ffff880860760000 RCX: 0000000000000000 [ 628.704963] RDX: ffff880860760b08 RSI: 0000000000000000 RDI: 0000000000000000 [ 628.705794] RBP: ffff8808593c3b40 R08: 0000000000000000 R09: 0000000000000000 [ 628.706604] R10: 0000000000000000 R11: ffff880860760c40 R12: 0000000000000080 [ 628.707420] R13: ffff8808607608c0 R14: ffff880860779ec0 R15: ffff880860779f40 [ 628.708238] FS: 0000000000000000(0000) GS:ffff88086f000000(0000) knlGS:0000000000000000 [ 628.709071] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 628.709923] CR2: 0000000000000068 CR3: 000000000194a000 CR4: 00000000001407e0 [ 628.710752] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 628.711596] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 628.712438] Stack: [ 628.713255] ffff880860764458 ffff8808607608c0 ffff880860760000 ffff880860760000 [ 628.714088] 0000000000000080 ffff8808607608c0 ffff880860779ec0 ffff880860779f40 [ 628.714925] ffff8808593c3b88 ffffffffa04780c5 ffff880860764458 0000000a8163cb5b [ 628.715752] Call Trace: [ 628.716560] [<ffffffffa04780c5>] fm10k_down+0x155/0x1f0 [fm10k] [ 628.717367] [<ffffffffa0479958>] fm10k_close+0x28/0xd0 [fm10k] [ 628.718184] [<ffffffff81526365>] __dev_close_many+0x85/0xd0 [ 628.718986] [<ffffffff815264d8>] dev_close_many+0x98/0x120 [ 628.719764] [<ffffffff81527ab8>] rollback_registered_many+0xa8/0x230 [ 628.720527] [<ffffffff81527c80>] rollback_registered+0x40/0x70 [ 628.721294] [<ffffffff81529198>] unregister_netdevice_queue+0x48/0x80 [ 628.722052] [<ffffffff815291ec>] unregister_netdev+0x1c/0x30 [ 628.722816] [<ffffffffa04762b8>] fm10k_remove+0xd8/0xe0 [fm10k] [ 628.723581] [<ffffffff81328c7b>] pci_device_remove+0x3b/0xb0 [ 628.724340] [<ffffffff813f5fbf>] __device_release_driver+0x7f/0xf0 [ 628.725088] [<ffffffff813f6053>] device_release_driver+0x23/0x30 [ 628.725814] [<ffffffff81321fe4>] pci_stop_bus_device+0x94/0xa0 [ 628.726535] [<ffffffff813220d2>] pci_stop_and_remove_bus_device+0x12/0x20 [ 628.727249] [<ffffffff8133de40>] pciehp_unconfigure_device+0xb0/0x1b0 [ 628.727964] [<ffffffff8133d822>] pciehp_disable_slot+0x52/0xd0 [ 628.728664] [<ffffffff8133d98a>] pciehp_power_thread+0xea/0x150 [ 628.729358] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [ 628.730036] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [ 628.730730] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [ 628.731385] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [ 628.732036] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 628.732674] [<ffffffff81645858>] ret_from_fork+0x58/0x90 [ 628.733289] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 628.733883] Code: 83 e8 01 48 8d 97 40 02 00 00 45 31 c0 4c 8d 9c c7 48 02 0 [ 628.735202] RIP [<ffffffffa0475caf>] fm10k_update_stats+0x7f/0x2c0 [fm10k] [ 628.735732] RSP <ffff8808593c3b00> [ 628.736285] CR2: 0000000000000068 [ 628.736846] ---[ end trace 9156088b311aff42 ]--- Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: prevent possibly uninitialized variableBruce Allan
If 'attr_flag < (1 << (2 * FM10K_TEST_MSG_NESTED))' is ever false, err will be used uninitialized. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: add helper functions to set strings and data for ethtool statsJacob Keller
Reduce duplicate code and the amount of indentation by adding fm10k_add_stat_strings and fm10k_add_ethtool_stats functions which help add fm10k_stat structures to the ethtool stats callbacks. This helps increase ease of use for future stat additions, and increases code readability. Skip handling of the per-queue stats as these will be reworked in a following patch. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: free MBX IRQ before clearing interrupt schemeJacob Keller
During fm10k_io_error_detected we were clearing the interrupt scheme before we freed the MBX IRQ. This causes a kernel panic because the MBX IRQ are assigned after MSI-X initialization. Clearing the interrupt scheme results in removing the MSI-X entry table. Fix this by freeing the MBX IRQ before we clear the interrupt scheme, as we do elsewhere in the driver. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: print error message when stop_hw failsJacob Keller
fm10k_stop_hw_generic calls fm10k_disable_queues_generic, which may return an error code indicating that the queues were not stopped within the time limit. Notify the user by displaying a message in the kernel message ring, in a similar way to how we notify the user when reset_hw fails. There isn't much we can do to recover from this error, so currently nothing else is done. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: base queue scheme covered by RSSJacob Keller
In fm10k_set_num_queues, we previously assigned the base template. This would always be overwritten by either fm10k_set_qos_queues or fm10k_set_rss_queues. In either case, we don't need the base values, so we can just remove them. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: don't initialize service task until later in probeJacob Keller
Delay initialization of the service timer and service task until late probe. If we don't wait, failures in probe do not properly cleanup the service timer or service task items, which results in the kernel panic below, potentially freezing the whole system. In addition, ensure that the SERVICE_DISABLE bit is set before we request the MBX IRQ since the MBX interrupt attempts to schedule the service task otherwise. This prevents a similar trace from occurring after this change. We didn't notice this issue before because probe almost always completes successfully. I discovered it due to a mis-ordered mailbox handler array, which resulted in the following failure when requesting mailbox interrupt. [ 555.325619] ------------[ cut here ]------------ [ 555.325628] WARNING: CPU: 0 PID: 4941 at lib/list_debug.c:33 __list_add+0xa0/0xd0() [ 555.325631] list_add corruption. prev->next should be next (ffffffff81f46648), but was (null). (prev=ffff8807fad5d0e8). <snip> [ 555.325722] CPU: 0 PID: 4941 Comm: insmod Tainted: G OE 4.0.4-303.fc22.x86_64 #1 [ 555.325725] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.8x23.060520140825 06/05/2014 [ 555.325727] 0000000000000000 00000000b4f161b3 ffff88081a21f8e8 ffffffff81783124 [ 555.325734] 0000000000000000 ffff88081a21f940 ffff88081a21f928 ffffffff8109c66a [ 555.325740] 0000000064000000 ffff8807fad5d0e8 ffff8807fad5d0e8 ffffffff81f46648 [ 555.325746] Call Trace: [ 555.325752] [<ffffffff81783124>] dump_stack+0x45/0x57 [ 555.325757] [<ffffffff8109c66a>] warn_slowpath_common+0x8a/0xc0 [ 555.325759] [<ffffffff8109c6f5>] warn_slowpath_fmt+0x55/0x70 [ 555.325763] [<ffffffff813ba270>] __list_add+0xa0/0xd0 [ 555.325768] [<ffffffff81102d1d>] __internal_add_timer+0x9d/0x110 [ 555.325771] [<ffffffff81102dbf>] internal_add_timer+0x2f/0xc0 [ 555.325774] [<ffffffff81104e5a>] mod_timer+0x12a/0x230 [ 555.325782] [<ffffffffa03d54ca>] fm10k_probe+0x69a/0xc80 [fm10k] [ 555.325787] [<ffffffff813e8355>] local_pci_probe+0x45/0xa0 [ 555.325791] [<ffffffff8129cf42>] ? sysfs_do_create_link_sd.isra.2+0x72/0xc0 [ 555.325794] [<ffffffff813e96b9>] pci_device_probe+0xf9/0x150 [ 555.325799] [<ffffffff814d7e73>] driver_probe_device+0xa3/0x400 [ 555.325802] [<ffffffff814d82ab>] __driver_attach+0x9b/0xa0 [ 555.325805] [<ffffffff814d8210>] ? __device_attach+0x40/0x40 [ 555.325808] [<ffffffff814d5bd3>] bus_for_each_dev+0x73/0xc0 [ 555.325811] [<ffffffff814d78ce>] driver_attach+0x1e/0x20 [ 555.325815] [<ffffffff814d7480>] bus_add_driver+0x180/0x250 [ 555.325819] [<ffffffffa03b2000>] ? 0xffffffffa03b2000 [ 555.325823] [<ffffffff814d8aa4>] driver_register+0x64/0xf0 [ 555.325826] [<ffffffff813e7bec>] __pci_register_driver+0x4c/0x50 [ 555.325832] [<ffffffffa03d6ca3>] fm10k_register_pci_driver+0x23/0x30 [fm10k] [ 555.325838] [<ffffffffa03b2080>] fm10k_init_module+0x80/0x1000 [fm10k] [ 555.325843] [<ffffffff81002128>] do_one_initcall+0xb8/0x200 [ 555.325848] [<ffffffff811e10d2>] ? __vunmap+0xa2/0x100 [ 555.325852] [<ffffffff811fe239>] ? kmem_cache_alloc_trace+0x1b9/0x240 [ 555.325855] [<ffffffff8178230e>] ? do_init_module+0x28/0x1cb [ 555.325858] [<ffffffff81782346>] do_init_module+0x60/0x1cb [ 555.325862] [<ffffffff8112168e>] load_module+0x205e/0x26b0 [ 555.325866] [<ffffffff8111d110>] ? store_uevent+0x70/0x70 [ 555.325870] [<ffffffff812234b0>] ? kernel_read+0x50/0x80 [ 555.325873] [<ffffffff81121f3e>] SyS_finit_module+0xbe/0xf0 [ 555.325878] [<ffffffff81789749>] system_call_fastpath+0x12/0x17 [ 555.325880] ---[ end trace 9e0f58d071eafd2a ]--- Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: prevent null pointer dereference of msix_entries tableJacob Keller
According to the C standard dereferencing a variable before it is checked invokes undefined behavior, and thus compilers are free to assume the check for NULL isn't necessary. Prevent this by re-ordering the NULL check of msix_entries in fm10k_free_mbx_irq. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: use ether_addr_copy to copy MAC addressBruce Allan
Cleanup the remaining instances of using memcpy() instead of the preferred ether_addr_copy(). Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: cleanup SPACE_BEFORE_TAB checkpatch warningBruce Allan
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: demote BUG_ON() to WARN_ON() where appropriateBruce Allan
We don't need to crash the kernel in this instance so just warn about the condition and play on. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: cleanup remaining right-bit-shifted 1Bruce Allan
Use BIT() macro instead. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05fm10k: Move constants to the right of binary operatorsBruce Allan
The semantic patch that makes this change is available in scripts/coccinelle/misc/compare_const_fl.cocci. More information about semantic patching is available at http://coccinelle.lip6.fr/ Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e/i40evf: Bump patch from 1.4.25 to 1.5.1Catherine Sullivan
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e: Change comment to reflect correct function nameMitch Williams
Minor correction in the comment to reflect the correct function name Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40evf: Add additional check for resetMitch Williams
If the driver happens to read a register during the time in which the device is undergoing reset, it will receive a value of 0xdeadbeef instead of a valid value. Unfortunately, the driver may misinterpret this as a valid value, especially if it's just looking for individual bits. Add an explicit check for this value when we are looking for admin queue errors, and trigger reset recovery if we find it. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e: Change unknown event error msg to ignore messageShannon Nelson
There's no real error in an unknown event from the Firmware, we're just posting a useful FYI notice, so this patch simply removes the "Error" word. Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e: Added code to prevent double resetsMitch Williams
Clear the VFLR bit after reset processing, instead of before. This prevents double resets on VF init. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e: Notify VFs of all resetsMitch Williams
Notify VFs in the reset interrupt handler, instead of the actual reset initiation code. This allows the VFs to get properly notified for all resets, including resets initiated by different PFs on the same physical device. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e: Remove timer and task only if createdShannon Nelson
In some error scenarios, we may find ourselves trying to remove a non-existent timer or worktask. This causes the kernel some bit of consternation, so don't do it. Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05Merge branch 'mlxsw-next'David S. Miller
Jiri Pirko says: ==================== mlxsw: small driver update, including switchdev doc update Ido Schimmel (3): mlxsw: spectrum: Reduce number of supported 802.1D bridges switchdev: Use switch ID in suggested udev rule mlxsw: spectrum: Add support for physical port names ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05mlxsw: spectrum: Add support for physical port namesIdo Schimmel
Export to userspace the front panel name of the port, so that udev can rename the ports accordingly. The convention suggested by switchdev documentation is used: 1) Non-split: pX 2) Split: pXsY Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05switchdev: Use switch ID in suggested udev ruleIdo Schimmel
Since there can be multiple switch ASICs on the same system we should use the switch ID in order to differentiate between them and set the switch name (e.g. swX) accordingly. Also, replace the order of the "Switch ID" and "Port Netdev Naming" sections following the above change. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05mlxsw: spectrum: Reduce number of supported 802.1D bridgesIdo Schimmel
Resources allocated for these bridges at init time cannot be later used for other purposes. While current number is supported by the device, it's mostly theoretical with regards to any real use case, which leads to poor utilization of device's resources. Solve that by reducing the number. The long term plan is to make this value (along with others) user configurable via devlink and write it to NVRAM, so that it can be used during the next init. Until then we must hardcode such values. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05i40e: Assure that adminq is alive in debug modeShannon Nelson
When dropping into debug mode in a failed probe, make sure that the AdminQ is left alive for possible hand debug of driver and firmware states. Move the mutex_init calls earlier in probe so that if init fails, the admin queue interface is still available for debugging purposes. Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e: Remove MSIx only if createdShannon Nelson
When cleaning up the interrupt handling, clean up the IRQs only if we actually got them set up. There are a couple of error recovery paths that were violating this and causing the kernel a bit of indigestion. Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Reviewed-by: Williams, Mitch A <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e: Fix up return codeJesse Brandeburg
The i40e_common.c typically uses i40e_status as a return code, but got missed this one case. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e: Save off VSI resource count when updating VSIKevin Scott
When updating a VSI, save off the number of allocated and unallocated VSIs as we do when adding a VSI. Signed-off-by: Kevin Scott <kevin.c.scott@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e/i40evf: Remove I40E_MAX_USER_PRIORITY defineCatherine Sullivan
This patch removes the duplicate definition of I40E_MAX_USER_PRIORITY in i40e.h that is not needed. Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e/i40evf: Fix casting in transmit codeJesse Brandeburg
Simple cast to fix a sparse warning. Fixes: commit 5453205cd097 ("i40e/i40evf: Enable support for SKB_GSO_UDP_TUNNEL_CSUM") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e/i40evf: Add support for bulk free in Tx cleanupAlexander Duyck
This patch enables bulk Tx clean for skbs. In order to enable it we need to pass the napi_budget value as that is used to determine if we are truly running in NAPI mode or if we are simply calling the routine from netpoll with a budget of 0. In order to avoid adding too many more variables I thought it best to pass the VSI directly in a fashion similar to what we do on igb and ixgbe with the q_vector. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e/i40evf: Fix handling of boolean logic in polling routinesAlexander Duyck
In the polling routines for i40e and i40evf we were using bitwise operators to avoid the side effects of the logical operators, specifically the fact that if the first case is true with "||" we skip the second case, or if it is false with "&&" we skip the second case. This fixes an earlier patch that converted the bitwise operators over to the logical operators and instead replaces the entire thing with just an if statement since it should be more readable what we are trying to do this way. Fixes: 1a36d7fadd14 ("i40e/i40evf: use logical operators, not bitwise") Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40evf: remove dead codeAlan Cox
The only error case is when the malloc fails, in which case the clean up loop does nothing at all, so remove it Signed-off-by: Alan Cox <alan@linux.intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8KAlexander Duyck
From what I can tell the practical limitation on the size of the Tx data buffer is the fact that the Tx descriptor is limited to 14 bits. As such we cannot use 16K as is typically used on the other Intel drivers. However artificially limiting ourselves to 8K can be expensive as this means that we will consume up to 10 descriptors (1 context, 1 for header, and 9 for payload, non-8K aligned) in a single send. I propose that we can reduce this by increasing the maximum data for a 4K aligned block to 12K. We can reduce the descriptors used for a 32K aligned block by 1 by increasing the size like this. In addition we still have the 4K - 1 of space that is still unused. We can use this as a bit of extra padding when dealing with data that is not aligned to 4K. By aligning the descriptors after the first to 4K we can improve the efficiency of PCIe accesses as we can avoid using byte enables and can fetch full TLP transactions after the first fetch of the buffer. This helps to improve PCIe efficiency. Below is the results of testing before and after with this patch: Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB Before: 87380 16384 16384 10.00 33682.24 20.27 -1.00 0.592 -1.00 After: 87380 16384 16384 10.00 34204.08 20.54 -1.00 0.590 -1.00 So the net result of this patch is that we have a small gain in throughput due to a reduction in overhead for putting together the frame. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05i40e: call ndo_stop() instead of dev_close() when running offline selftestStefan Assmann
Calling dev_close() causes IFF_UP to be cleared which will remove the interfaces routes and some addresses. That's probably not what the user intended when running the offline selftest. Besides this does not happen if the interface is brought down before the test, so the current behaviour is inconsistent. Instead call the net_device_ops ndo_stop function directly and avoid touching IFF_UP at all. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-04Merge branch 'tcp-udp-misc'David S. Miller
Eric Dumazet says: ==================== net: various udp/tcp changes First round of patches for linux-4.7 Add a generic facility for sockets to be freed after an RCU grace period, if they need to. Then UDP stack is changed to no longer use SLAB_DESTROY_BY_RCU, in order to speedup rx processing for traffic encapsulated in UDP. It gives a 17 % speedup for normal UDP reception in stress conditions. Then TCP listeners are changed to use SOCK_RCU_FREE as well to avoid touching sk_refcnt in synflood case : I got up to 30 % performance increase for a mono listener. Then three patches add SK_MEMINFO_DROPS to sock_diag and add per socket rx drops accounting to TCP. Last patch adds rate limiting on ACK sent on behalf of SYN_RECV to better resist to SYNFLOOD targeting one or few flows. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04tcp: rate limit ACK sent by SYN_RECV request socketsEric Dumazet
Attackers like to use SYNFLOOD targeting one 5-tuple, as they hit a single RX queue (and cpu) on the victim. If they use random sequence numbers in their SYN, we detect they do not match the expected window and send back an ACK. This patch adds a rate limitation, so that the effect of such attacks is limited to ingress only. We roughly double our ability to absorb such attacks. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Maciej Żenczykowski <maze@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04ipv4: tcp: set SOCK_USE_WRITE_QUEUE for ip_send_unicast_reply()Eric Dumazet
TCP uses per cpu 'sockets' to send some packets : - RST packets ( tcp_v4_send_reset()) ) - ACK packets for SYN_RECV and TIMEWAIT sockets By setting SOCK_USE_WRITE_QUEUE flag, we tell sock_wfree() to not call sk_write_space() since these internal sockets do not care. This gives a small performance improvement, merely by allowing cpu to properly predict the sock_wfree() conditional branch, and avoiding one atomic operation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04tcp: increment sk_drops for listenersEric Dumazet
Goal: packets dropped by a listener are accounted for. This adds tcp_listendrop() helper, and clears sk_drops in sk_clone_lock() so that children do not inherit their parent drop count. Note that we no longer increment LINUX_MIB_LISTENDROPS counter when sending a SYNCOOKIE, since the SYN packet generated a SYNACK. We already have a separate LINUX_MIB_SYNCOOKIESSENT Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04tcp: increment sk_drops for dropped rx packetsEric Dumazet
Now ss can report sk_drops, we can instruct TCP to increment this per socket counter when it drops an incoming frame, to refine monitoring and debugging. Following patch takes care of listeners drops. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04sock_diag: add SK_MEMINFO_DROPSEric Dumazet
Reporting sk_drops to user space was available for UDP sockets using /proc interface. Add this to sock_diag, so that we can have the same information available to ss users, and we'll be able to add sk_drops indications for TCP sockets as well. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04tcp/dccp: do not touch listener sk_refcnt under synfloodEric Dumazet
When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes. By letting listeners use SOCK_RCU_FREE infrastructure, we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets, only listeners are impacted by this change. Peak performance under SYNFLOOD is increased by ~33% : On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps Most consuming functions are now skb_set_owner_w() and sock_wfree() contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04inet: reqsk_alloc() needs to take care of dead listenersEric Dumazet
We'll soon no longer take a refcount on listeners, so reqsk_alloc() can not assume a listener refcount is not zero. We need to use atomic_inc_not_zero() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04tcp/dccp: use rcu locking in inet_diag_find_one_icsk()Eric Dumazet
RX packet processing holds rcu_read_lock(), so we can remove pairs of rcu_read_lock()/rcu_read_unlock() in lookup functions if inet_diag also holds rcu before calling them. This is needed anyway as __inet_lookup_listener() and inet6_lookup_listener() will soon no longer increment refcount on the found listener. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>