summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-12-08Merge tag 'mmc-v3.19-1' of git://git.linaro.org/people/ulf.hansson/mmcLinus Torvalds
Pull MMC updates from Ulf Hansson: "MMC core: - Consolidation and cleanups. - Some improvements regarding error handling. - Increase maximum amount of block devices. - Use correct OCR mask for SDIO when restoring power. - Fix prepared requests while doing BKOPS. - Convert to modern PM ops. - Add mmc_send_tuning() API and convert some hosts to use it. MMC host: - toshsd: New Toshiba PCI SD controller driver. - sdhci: 64-bit ADMA support. - sdhci: Some regulator fixes. - sdhci: HS400 support. - sdhci: Various fixes cleanups. - atmel-mci: Modernization and cleanups. - atmel-mci: Runtime PM support. - omap_hsmmc: Modernization and cleanups. - omap_hsmmc: Fix UHS card with DDR50 support. - dw_mmc: Support for ARM64 and Exynos 7 variant. - dw_mmc: Add support for IMG Pistachio variant. - dw_mmc: Various fixes and cleanups. - mvsdio: DMA fixes. - mxs-mmc: Modernization and cleanups. - mxcmmc: Various fixes" * tag 'mmc-v3.19-1' of git://git.linaro.org/people/ulf.hansson/mmc: (126 commits) mmc: sdhci-msm: Convert to mmc_send_tuning() mmc: sdhci-esdhc-imx: Convert to mmc_send_tuning() mmc: core: Let mmc_send_tuning() to take struct mmc_host* as parameter mmc: queue: Improve error handling during allocation of bounce buffers mmc: sdhci-acpi: Add two host capabilities for Intel mmc: sdhci-pci: Add two host capabilities for BYT mmc: sdhci-acpi: Add SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC mmc: sdhci-pci: Add SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC to BYT mmc: atmel-mci: use probe deferring if dma controller is not ready yet mmc: atmel-mci: stop using specific initcall mmc: atmel-mci: remove __init/__exit attributes mmc: atmel-mci: remove useless DMA stuff for non-dt devices mmc: omap_hsmmc: Fix UHS card with DDR50 support mmc: core: add core-level function for sending tuning commands mmc: core: hold SD Clock before CMD11 during Signal mmc: mxs-mmc: Check for clk_prepare_enable() error mmc: mxs-mmc: Propagate the real error mmc: mxs-mmc: No need to do NULL check on 'iores' mmc: dw_mmc: Add support for IMG Pistachio mmc: mxs-mmc: Simplify PM hooks ...
2014-12-08Merge branch 'genet-gphy'David S. Miller
Florian Fainelli says: ==================== net: bcmgenet: support for new GPHY revision scheme These two patches update the GENET GPHY revision logic to account for some of our newer designs starting with GPHY rev G0. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net: phy: bcm7xxx: add an explicit version check for GPHY rev G0Florian Fainelli
GPHY revision G0 has its version rolled over to 0x10, introduce an explicit check for that revision and invoke the proper workaround function for it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net: bcmgenet: add support for new GENET PHY revision schemeFlorian Fainelli
Starting with GPHY revision G0, the GENET register layout has changed to use the same numbering scheme as the Starfighter 2 switch. This means that GPHY major revision is in bits 15:12, minor in bits 11:8 and patch level is in bits 7:4. Introduce a small heuristic which checks for the old scheme first, tests for the new scheme and finally attempts to catch reserved values and aborts. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2014-12-03 1) Fix a set but not used warning. From Fabian Frederick. 2) Currently we make sequence number values available to userspace only if we use ESN. Make the sequence number values also available for non ESN states. From Zhi Ding. 3) Remove socket policy hashing. We don't need it because socket policies are always looked up via a linked list. From Herbert Xu. 4) After removing socket policy hashing, we can use __xfrm_policy_link in xfrm_policy_insert. From Herbert Xu. 5) Add a lookup method for vti6 tunnels with wildcard endpoints. I forgot this when I initially implemented vti6. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08Merge branch 'sunvnet-next'David S. Miller
David L Stevens says: ==================== sunvnet: add SG, HW_CSUM, GSO, and TSO support This patch set adds everything needed for TSO support in sunvnet. On my test hardware, this increases the single-stream TCP throughput for the default 1500-byte MTU Linux-Linux from ~2Gbps to 10Gbps and Linux-Solaris from ~2Gbps to 6Gbps. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08sunvnet: add TSO supportDavid L Stevens
This patch adds TSO support for the sunvnet driver. Signed-off-by: David L Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08sunvnet: add GSO supportDavid L Stevens
This patch adds GSO support to the sunvnet driver. Signed-off-by: David L Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08sunvnet: add checksum offload supportDavid L Stevens
This patch adds support for sender-side checksum offloading. Signed-off-by: David L Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08sunvnet: add scatter/gather supportDavid L Stevens
This patch adds scatter/gather support to the sunvnet driver. Signed-off-by: David L Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08sunvnet: add VIO v1.7 and v1.8 supportDavid L Stevens
This patch adds support for VIO v1.7 (extended descriptor format) and v1.8 (receive-side checksumming) to the sunvnet driver. Signed-off-by: David L Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08sunvnet: rename vnet_port_alloc_tx_bufs and move after version negotiationDavid L Stevens
This patch changes the name of vnet_port_alloc_tx_bufs to vnet_port_alloc_tx_ring, since there are no buffer allocations after transmit zero copy support was added. This patch also moves the ring allocation to after VIO version negotiation to allow for different-sized descriptors in later VIO versions. Signed-off-by: David L Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08fib_trie: Fix /proc/net/fib_trie when CONFIG_IP_MULTIPLE_TABLES is not definedAlexander Duyck
In recent testing I had disabled CONFIG_IP_MULTIPLE_TABLES and as a result when I ran "cat /proc/net/fib_trie" the main trie was displayed multiple times. I found that the problem line of code was in the function fib_trie_seq_next. Specifically the line below caused the indexes to go in the opposite direction of our traversal: h = tb->tb_id & (FIB_TABLE_HASHSZ - 1); This issue was that the RT tables are defined such that RT_TABLE_LOCAL is ID 255, while it is located at TABLE_LOCAL_INDEX of 0, and RT_TABLE_MAIN is 254 with a TABLE_MAIN_INDEX of 1. This means that the above line will return 1 for the local table and 0 for main. The result is that fib_trie_seq_next will return NULL at the end of the local table, fib_trie_seq_start will return the start of the main table, and then fib_trie_seq_next will loop on main forever as h will always return 0. The fix for this is to reverse the ordering of the two tables. It has the advantage of making it so that the tables now print in the same order regardless of if multiple tables are enabled or not. In order to make the definition consistent with the multiple tables case I simply masked the to RT_TABLE_XXX values by (FIB_TABLE_HASHSZ - 1). This way the two table layouts should always stay consistent. Fixes: 93456b6 ("[IPV4]: Unify access to the routing tables") Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08Merge branch 'rss_hash'David S. Miller
Amir Vadai says: ==================== ethtool, net/mlx4_en: RSS hash function selection This patchset by Eyal adds support in set/get of RSS hash function. Current supported functions are Toeplitz and XOR. The API is design to enable adding new hash functions without breaking backward compatibility. Userspace patch will be sent after API is available in kernel. The patchset was applied and tested over commit cd4c910 ("netpoll: delete defconfig references to obsolete NETPOLL_TRAP") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net/mlx4_en: Support for configurable RSS hash functionEyal Perry
The ConnectX HW is capable of using one of the following hash functions: Toeplitz and an XOR hash function. This patch extends the implementation of the mlx4_en driver set/get_rxfh callbacks to support getting and setting the RSS hash function used by the device. Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08ethtool: Support for configurable RSS hash functionEyal Perry
This patch extends the set/get_rxfh ethtool-options for getting or setting the RSS hash function. It modifies drivers implementation of set/get_rxfh accordingly. This change also delegates the responsibility of checking whether a modification to a certain RX flow hash parameter is supported to the driver implementation of set_rxfh. User-kernel API is done through the new hfunc bitmask field in the ethtool_rxfh struct. A bit set in the hfunc field is corresponding to an index in the new string-set ETH_SS_RSS_HASH_FUNCS. Got approval from most of the relevant driver maintainers that their driver is using Toeplitz, and for the few that didn't answered, also assumed it is Toeplitz. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Ariel Elior <ariel.elior@qlogic.com> Cc: Prashant Sreedharan <prashant@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Sathya Perla <sathya.perla@emulex.com> Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com> Cc: Ajit Khaparde <ajit.khaparde@emulex.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Bruce Allan <bruce.w.allan@intel.com> Cc: Carolyn Wyborny <carolyn.wyborny@intel.com> Cc: Don Skidmore <donald.c.skidmore@intel.com> Cc: Greg Rose <gregory.v.rose@intel.com> Cc: Matthew Vick <matthew.vick@intel.com> Cc: John Ronciak <john.ronciak@intel.com> Cc: Mitch Williams <mitch.a.williams@intel.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com> Cc: Shradha Shah <sshah@solarflare.com> Cc: Shreyas Bhatewara <sbhatewara@vmware.com> Cc: "VMware, Inc." <pv-drivers@vmware.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net: mvneta: fix race condition in mvneta_tx()Eric Dumazet
mvneta_tx() dereferences skb to get skb->len too late, as hardware might have completed the transmit and TX completion could have freed the skb from another cpu. Fixes: 71f6d1b31fb1 ("net: mvneta: replace Tx timer with a real interrupt") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_cgroup: remove unnecessary ifJiri Pirko
since head->handle == handle (checked before), just assign handle. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_flow: remove duplicate assignmentsJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_flow: remove faulty use of list_for_each_entry_rcuJiri Pirko
rcu variant is not correct here. The code is called by updater (rtnl lock is held), not by reader (no rcu_read_lock is held). Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_bpf: remove faulty use of list_for_each_entry_rcuJiri Pirko
rcu variant is not correct here. The code is called by updater (rtnl lock is held), not by reader (no rcu_read_lock is held). Signed-off-by: Jiri Pirko <jiri@resnulli.us> ACKed-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_bpf: remove unnecessary iteration and use passed argJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_basic: remove unnecessary iteration and use passed argJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2014-12-06 This series contains updates to i40e and i40evf. Shannon provides several patches to cleanup and fix i40e. First removes an unneeded break statement in i40e_vsi_link_event(). Then removes some debug messages that really do not give any useful information and ends up getting printed every service_task loop, which fills the logfile with noise when AQ tracing is enabled. Updates the aq_cmd arguments to use %i which is much more forgiving and user friendly than the more restrictive %x, or %d. Fixes the netdev_stat macro, where the old xxx_NETDEV_STAT() macro was defined long before the newer rtnl_link_stats64 came into being, and just never got updated. Getting the pf_id from the function number had an issue when when the PF was setup in passthru mode, the PCI bus/device/function was virtualized and the number in the VM is different from the number in the bare metal. This caused HW configuration issues when the wrong pf_id was used to set up the HMC and other structures. The PF_FUNC_RID register has the real bus/device/function information as configured by the BIOS, so use that for a better number. Carolyn adds additional text description for the base pf0 and flow director generated interrupts, since these interrupts are difficult to distinguish per port on a multi-function device. Jacob resolves an issue related to images with multiple PFs per physical port. We cannot fully support 1588 PTP features, since only one port should control (i.e. write) the registers at a time. Doing so can cause interference of functionality. Anjali provides several updates to i40e, first adds the Virtual Channel OP event opcode for CONFIG_RSS, so that the Virtual Channel state machine can properly decipher status change events. Then updates the driver to add (and use) i40e_is_vf macro for future expansion when new VF MAC types get added. Adds new update VSI flow to accommodate a firmware dix with VSI loopback mode. All VSIs on a VEB should either have loopback enabled or disabled, a mixed mode is not supported for a VEB. Since our driver supports multiple VSIs per PF that need to talk to each other make sure to enable Loopback for the PF and FDIR VSI as well. Mitch provides a couple of i40e and i40evf patches. First updates i40evf init code more adept at handling when multiple VFs attempt to initialize simultaneously. Joe Perches provides a i40e patch which resolves a compile warning about about frame size being larger than 2048 bytes by reducing the stack use by using kmemdup and not using a very large struct on the stack. v2: - Dropped patch 13 & 14 while Mitch reworks the patches based on feedback from Ben Hutchings, probably the tryptophan in the turkey is to blame for the delay... - Added Joe Perches patch which resolves a compile warning about frame size ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08Merge branch 'eth_skb_pad'David S. Miller
Alexander Duyck says: ==================== net: Add helper for padding short Ethernet frames This patch series adds a pair of helpers to pad short Ethernet frames. The general idea is to clean up a number of code paths that were all writing their own versions of the same or similar function. An added advantage is that this will help to discourage introducing new bugs as in at least one case I found the skb->len had been updated, but the tail pointer update was overlooked. v2: Added skb_put_padto for cases where length is not ETH_ZLEN Updated intel drivers and emulex driver to use skb_put_padto Updated eth_skb_pad to use skb_put_padto ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08r8169: Use eth_skb_pad functionAlexander Duyck
Replace rtl_skb_pad with eth_skb_pad since they do the same thing. Cc: Realtek linux nic maintainers <nic_swsd@realtek.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08myri10ge: use eth_skb_pad helperAlexander Duyck
Update myri10ge to use eth_skb_pad helper. This also corrects a minor issue as the driver was updating length without updating the tail pointer. Cc: Hyong-Youb Kim <hykim@myri.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08niu: Use eth_skb_pad helperAlexander Duyck
Replace the standard layout for padding an ethernet frame with the eth_skb_pad call. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08emulex: Use skb_put_padto instead of skb_padto() and skb->len assignmentAlexander Duyck
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08ethernet/intel: Use eth_skb_pad and skb_put_padto helpersAlexander Duyck
Update the Intel Ethernet drivers to use eth_skb_pad() and skb_put_padto instead of doing their own implementations of the function. Also this cleans up two other spots where skb_pad was called but the length and tail pointers were being manipulated directly instead of just having the padding length added via __skb_put. Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net: Add functions for handling padding frame and adding to lengthAlexander Duyck
This patch adds two new helper functions skb_put_padto and eth_skb_pad. These functions deviate from the standard skb_pad or skb_padto in that they will also update the length and tail pointers so that they reflect the padding added to the frame. The eth_skb_pad helper is meant to be used with Ethernet devices to update either Rx or Tx frames so that they report the correct size. The skb_put_padto helper is meant to be used primarily in the transmit path for network devices that need frames to be padded up to some minimum size and don't wish to simply update the length somewhere external to the frame. The motivation behind this is that there are a number of implementations throughout the network device drivers that are all doing the same thing, but each a little bit differently and as a result several implementations contain bugs such as updating the length without updating the tail offset and other similar issues. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08Merge branch 'mlx5-next'David S. Miller
Eli Cohen says: ==================== mlx5 driver updates The following series contains some fixes to mlx5 as well as update to the list of supported devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08mlx5: Fix error flow in add_keysEli Cohen
If mlx5_core_create_mkey fails, decrease the pending counter to undo the previous increment. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08mlx5: Fix sparse warningsEli Cohen
1. Add required __acquire/__release statements to balance spinlock usage. 2. Change the index parameter of begin_wqe() to be unsigned to match supplied argument type. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net/mlx5_core: Add more supported devicesEli Cohen
Add ConnectX-4LX to the list of supported devices as well as their virtual functions. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net/mlx5_core: Clear outbox of dealloc uarMajd Dibbiny
The outbox should be cleared before executing the command. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net/mlx5_core: Print resource number on QP/SRQ async eventsEli Cohen
Useful for debugging purposes. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net/mlx5_core: Remove unused dev cap enum fieldsEli Cohen
These enumerations are not used so remove them. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net/mlx5_core: Fix command queue size enforcementEli Cohen
Command queue descriptor page size is 4KB and not the page size used by the kernel. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net/mlx5_core: Fix min vectors value in mlx5_enable_msixEli Cohen
mlx5 requires at least one interrupt vector for completions so fix the minvec argument to pci_enable_msix_range() accordingly. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net/mlx5_core: Request the mlx5 IB module on driver loadEli Cohen
Call request module on mlx5_ib so it will be available for applications requiring it, such as installers that require boot over IB. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08Merge branch 'r8169-next'David S. Miller
Chunhao Lin says: ==================== r8169:change hardware setting This patch series contains two hardware setting modification to prevent hardware become abnormal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08r8169:disable rtl8168ep cmac engineChun-Hao Lin
Cmac engine is the bridge between driver and dash firmware. Other os may not disable cmac when leave. And r8169 did not allocate any resources for cmac engine. Disable it to prevent abnormal system behavior. Signed-off-by: Chunhao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08r8169:prevent enable hardware tx/rx too earlyChun-Hao Lin
For RTL8168G/GU/H/EP and RTL8411B remove enable tx/rx from its own hw_start function. This will prevent enable tx/rx before complete hardware tx/rx setting. Tx/Rx will be enabled in the end of function rtl_hw_start_8168. Signed-off-by: Chunhao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net: mvneta: fix Tx interrupt delaywilly tarreau
The mvneta driver sets the amount of Tx coalesce packets to 16 by default. Normally that does not cause any trouble since the driver uses a much larger Tx ring size (532 packets). But some sockets might run with very small buffers, much smaller than the equivalent of 16 packets. This is what ping is doing for example, by setting SNDBUF to 324 bytes rounded up to 2kB by the kernel. The problem is that there is no documented method to force a specific packet to emit an interrupt (eg: the last of the ring) nor is it possible to make the NIC emit an interrupt after a given delay. In this case, it causes trouble, because when ping sends packets over its raw socket, the few first packets leave the system, and the first 15 packets will be emitted without an IRQ being generated, so without the skbs being freed. And since the socket's buffer is small, there's no way to reach that amount of packets, and the ping ends up with "send: no buffer available" after sending 6 packets. Running with 3 instances of ping in parallel is enough to hide the problem, because with 6 packets per instance, that's 18 packets total, which is enough to grant a Tx interrupt before all are sent. The original driver in the LSP kernel worked around this design flaw by using a software timer to clean up the Tx descriptors. This timer was slow and caused terrible network performance on some Tx-bound workloads (such as routing) but was enough to make tools like ping work correctly. Instead here, we simply set the packet counts before interrupt to 1. This ensures that each packet sent will produce an interrupt. NAPI takes care of coalescing interrupts since the interrupt is disabled once generated. No measurable performance impact nor CPU usage were observed on small nor large packets, including when saturating the link on Tx, and this fixes tools like ping which rely on too small a send buffer. If one wants to increase this value for certain workloads where it is safe to do so, "ethtool -C $dev tx-frames" will override this default setting. This fix needs to be applied to stable kernels starting with 3.10. Tested-By: Maggie Mae Roxas <maggie.mae.roxas@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08Merge branch 'tipc-next'David S. Miller
Ying Xue says: ==================== tipc: convert name table read-write lock to RCU Now TIPC name table is statically allocated and is protected with a Read-Write lock. To enhance the performance of TIPC name table lookup, we are going to involve RCU lock to protect the name table. As a consequence, it becomes lockless to concurrently look up name table on read side. However, before the conversion can be successfully made, the following two things must be first done: - change allocation way of name table from static to dynamic - fix several incorrect locking policy issues ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: convert name table read-write lock to RCUYing Xue
Convert tipc name table read-write lock to RCU. After this change, a new spin lock is used to protect name table on write side while RCU is applied on read side. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: remove unnecessary INIT_LIST_HEADYing Xue
When a list_head variable is seen as a new entry to be added to a list head, it's unnecessary to be initialized with INIT_LIST_HEAD(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: simplify relationship between name table lock and node lockYing Xue
When tipc name sequence is published, name table lock is released before name sequence buffer is delivered to remote nodes through its underlying unicast links. However, when name sequence is withdrawn, the name table lock is held until the transmission of the removal message of name sequence is finished. During the process, node lock is nested in name table lock. To prevent node lock from being nested in name table lock, while withdrawing name, we should adopt the same locking policy of publishing name sequence: name table lock should be released before message is sent. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: any name table member must be protected under name table lockYing Xue
As tipc_nametbl_lock is used to protect name_table structure, the lock must be held while all members of name_table structure are accessed. However, the lock is not obtained while a member of name_table structure - local_publ_count is read in tipc_nametbl_publish(), as a consequence, an inconsistent value of local_publ_count might be got. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>