summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/igb
AgeCommit message (Collapse)Author
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support more Realtek wireless chips, from Jes Sorenson. 2) New BPF types for per-cpu hash and arrap maps, from Alexei Starovoitov. 3) Make several TCP sysctls per-namespace, from Nikolay Borisov. 4) Allow the use of SO_REUSEPORT in order to do per-thread processing of incoming TCP/UDP connections. The muxing can be done using a BPF program which hashes the incoming packet. From Craig Gallek. 5) Add a multiplexer for TCP streams, to provide a messaged based interface. BPF programs can be used to determine the message boundaries. From Tom Herbert. 6) Add 802.1AE MACSEC support, from Sabrina Dubroca. 7) Avoid factorial complexity when taking down an inetdev interface with lots of configured addresses. We were doing things like traversing the entire address less for each address removed, and flushing the entire netfilter conntrack table for every address as well. 8) Add and use SKB bulk free infrastructure, from Jesper Brouer. 9) Allow offloading u32 classifiers to hardware, and implement for ixgbe, from John Fastabend. 10) Allow configuring IRQ coalescing parameters on a per-queue basis, from Kan Liang. 11) Extend ethtool so that larger link mode masks can be supported. From David Decotigny. 12) Introduce devlink, which can be used to configure port link types (ethernet vs Infiniband, etc.), port splitting, and switch device level attributes as a whole. From Jiri Pirko. 13) Hardware offload support for flower classifiers, from Amir Vadai. 14) Add "Local Checksum Offload". Basically, for a tunneled packet the checksum of the outer header is 'constant' (because with the checksum field filled into the inner protocol header, the payload of the outer frame checksums to 'zero'), and we can take advantage of that in various ways. From Edward Cree" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits) bonding: fix bond_get_stats() net: bcmgenet: fix dma api length mismatch net/mlx4_core: Fix backward compatibility on VFs phy: mdio-thunder: Fix some Kconfig typos lan78xx: add ndo_get_stats64 lan78xx: handle statistics counter rollover RDS: TCP: Remove unused constant RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket net: smc911x: convert pxa dma to dmaengine team: remove duplicate set of flag IFF_MULTICAST bonding: remove duplicate set of flag IFF_MULTICAST net: fix a comment typo ethernet: micrel: fix some error codes ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it bpf, dst: add and use dst_tclassid helper bpf: make skb->tc_classid also readable net: mvneta: bm: clarify dependencies cls_bpf: reset class and reuse major in da ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c ldmvsw: Add ldmvsw.c driver code ...
2016-03-17mm: introduce page reference manipulation functionsJoonsoo Kim
The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-24igb: call ndo_stop() instead of dev_close() when running offline selftestStefan Assmann
Calling dev_close() causes IFF_UP to be cleared which will remove the interfaces routes and some addresses. That's probably not what the user intended when running the offline selftest. Besides this does not happen if the interface is brought down before the test, so the current behaviour is inconsistent. Instead call the net_device_ops ndo_stop function directly and avoid touching IFF_UP at all. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-24igb: Fix VLAN tag stripping on Intel i350Corinna Vinschen
Problem: When switching off VLAN offloading on an i350, the VLAN interface gets unusable. For testing, set up a VLAN on an i350 and some remote machine, e.g.: $ ip link add link eth0 name eth0.42 type vlan id 42 $ ip addr add 192.168.42.1/24 dev eth0.42 $ ip link set dev eth0.42 up Offloading is switched on by default: $ ethtool -k eth0 | grep vlan-offload rx-vlan-offload: on tx-vlan-offload: on $ ping -c 3 -I eth0.42 192.168.42.2 [...works as usual...] Now switch off VLAN offloading and try again: $ ethtool -K eth0 rxvlan off Actual changes: rx-vlan-offload: off tx-vlan-offload: off [requested on] $ ping -c 3 -I eth0.42 192.168.42.2 PING 192.168.42.2 (192.168.42.2) from 192.168.42.1 eth0.42: 56(84) bytes of da ta. --- 192.168.42.2 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms I can only reproduce it on an i350, the above works fine on a 82580. While inspecting the igb source, I came across the code in igb_set_vmolr which sets the E1000_VMOLR_STRVLAN/E1000_DVMOLR_STRVLAN flags once and for all, and in all of the igb code there's no other place where the STRVLAN is set or cleared. Thus, VLAN stripping is enabled in igb unconditionally, independently of the offloading setting. I compared that to the latest Intel igb-5.3.3.5 driver from http://sourceforge.net/projects/e1000/ which in fact sets and clears the STRVLAN flag independently from igb_set_vmolr in its own function igb_set_vf_vlan_strip, depending on the vlan settings. So I included the STRVLAN handling from the igb-5.3.3.5 driver into our current igb driver and tested the above scenario again. This time ping still works after switching off VLAN offloading. Tested on i350, with and without addtional VFs, as well as on 82580 successfully. Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-24igb: Add support for generic Tx checksumsAlexander Duyck
This patch adds support for generic Tx checksums to the igb driver. It turns out this is actually pretty easy after going over the datasheet as we were doing a number of steps we didn't need to. In order to perform a Tx checksum for an L4 header we need to fill in the following fields in the Tx descriptor: MACLEN (maximum of 127), retrieved from: skb_network_offset() IPLEN (maximum of 511), retrieved from: skb_checksum_start_offset() - skb_network_offset() TUCMD.L4T indicates offset and if checksum or crc32c, based on: skb->csum_offset The added advantage to doing this is that we can support inner checksum offloads for tunnels and MPLS while still being able to transparently insert VLAN tags. I also took the opportunity to clean-up many of the feature flag configuration bits to make them a bit more consistent between drivers. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-24igb: rename igb define to be more genericTodd Fujinaka
E1000_MRQC_ENABLE_RSS_4Q enables 4 and 8 queues depending on the part so rename to be generic. Similarly, E1000_MRQC_ENABLE_VMDQ_RSS_2Q has no numeric meaning so rename to be more generic. Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-24igb: add conditions for I210 to generate periodic clock outputRoland Hii
In general case the maximum supported half cycle time of the synchronized output clock is 70msec. Slower half cycle time than 70msec can be programmed also as long as the output clock is synchronized to whole seconds, useful specifically for generating a 1Hz clock. Permitted values for the clock half cycle time are: 125,000,000 decimal, 250,000,000 decimal and 500,000,000 decimal (equals to 125msec, 250msec and 500msec respectively). Before this patch, only the half cycle time of less than or equal to 70msec uses the I210 clock output function. This patch adds additional conditions when half cycle time is equal to 125msec or 250msec or 500msec to use clock output function. Under other conditions, interrupt driven target time output events method is still used to generate the desired clock output. Signed-off-by: Roland Hii <roland.king.guan.hii@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-24igb: enable WoL for OEM devices regardless of EEPROM settingTodd Fujinaka
Override EEPROM settings for specific OEM devices. Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-24igb: constify e1000_phy_operations structureJulia Lawall
This e1000_phy_operations structure is never modified, so declare it as const. Other structures of this type are already const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-24igb: When GbE link up, wait for Remote receiver status conditionTakuma Ueba
I210 device IPv6 autoconf test sometimes fails, because DAD NS for link-local is not transmitted. This packet is silently dropped. This problem is seen only GbE environment. igb_watchdog_task link up detection continues to the following process. The following cases are observed: 1.PHY 1000BASE-T Status Register Remote receiver status bit is NG. (NG status becomes OK after about 200 - 700ms) 2.In this case, the transfer packet is silently dropped. 1000BASE-T Status register [Expected]: 0x3800 or 0x7800 [problem occurred]: 0x2800 or 0x6800 Frequency of occurrence: approx 1/10 - 1/40 observed In order to avoid this problem, wait until 1000BASE-T Status register "Remote receiver status OK" After applying this patch, at least 400 runs succeed with no problems. Signed-off-by: Takuma Ueba <t.ueba11@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Add workaround for VLAN tag stripping on 82576Alexander Duyck
There was a workaround partially implemented for the 82576 that is needed in order for VLAN tag stripping to function correctly. The original code had side effects that would make it so the workaround was active on all MACs. I have updated the code so that the workaround is enabled, but limited to the 82576, or activated if we exceed the available unicast addresses. The workaround has a side effect of mirroring all of the traffic outgoing from the VFs back to the PF. As such it is not recommended to use the 82576 in promiscuous mode as it will take a performance hit, though this is now consistent with the performance as seen on the out-of-tree igb driver. I also limited the scope of the UTA bits all being set to only when the VMOLR register is enabled. This should limit the effects of the UTA register so that we don't pick up any excess traffic unless promiscuous mode has been enabled on the PF, whereas before the PF would have ended up in something equivalent to unicast promiscuous mode with VLAN filtering otherwise. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Enable use of "bridge fdb add" to set unicast table entriesAlexander Duyck
This change makes it so that we can use the bridge utility to add a FDB entry for the PF to an igb port. By doing this we can enable the VFs to talk to virtual ports residing on top of the PF. In addition this should also address issues with MACVLANs trying to reside on top of the PF as well as they would have had similar issues when added to the PF with SR-IOV enabled. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Drop unnecessary checks in transmit pathAlexander Duyck
This patch drops several checks that we dropped from ixgbe some ago. It should not be possible for us to be called with either of the conditional statements returning true so we can just drop them from the hot-path. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Add support for VLAN promiscuous with SR-IOV and NTUPLEAlexander Duyck
This change fixes things so that we can fully support SR-IOV or the recently added NTUPLE filtering while allowing support for VLAN promiscuous mode. By making this change we are able to support possible scenarios such as SR-IOV with the PF connected to a Linux bridge hosting other VMs. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Clean-up configuration of VF port VLANsAlexander Duyck
This patch is meant to clean-up the configuration of the VF port based VLAN configuration. The original logic was a bit muddled and had some undesirable side effects such as VLANs being either completely stripped from the port or VLANs being left when they shouldn't be. The idea behind this code is to avoid any events such as spurious spoof notifications when we are removing one VLAN tag and replacing it with another. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Merge VLVF configuration into igb_vfta_setAlexander Duyck
This change makes it so that we can merge the configuration of the VLVF registers into the setting of the VFTA register. By doing this we simplify the logic and make use of similar functionality that we have already added for ixgbe making it easier to maintain both drivers. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Always enable VLAN 0 even if 8021q is not loadedAlexander Duyck
This patch makes it so that we always add VLAN 0. This is important as we need to guarantee the PF can receive untagged frames in the case of SR-IOV being enabled but VLAN filtering not being enabled in the kernel. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Do not factor VLANs into RLPML calculationAlexander Duyck
The RLPML registers already take the size of VLAN headers into account when determining the maximum packet length. This is called out in EAS documents for several parts including the 82576 and the i350. As such we can drop the addition of size to the value programmed into the RLPML registers. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Allow asymmetric configuration of MTU versus Rx frame sizeAlexander Duyck
Since the igb driver is using page based receive there is no point in limiting the Rx capabilities of the device. The driver can receive 9K jumbo frames at all times. The only changes needed due to MTU changes are updates for the FIFO sizes and flow-control watermarks. Update the maximum frame size to reflect the 9.5K limitation of the hardware, and replace all instances of max_frame_size with MAX_JUMBO_FRAME_SIZE when referring to an Rx FIFO or frame. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Refactor VFTA configurationAlexander Duyck
This patch starts the clean-up process on the VFTA configuration. Specifically in this patch I attempt to address and simplify several items while also updating the code to bring it more inline with what is already in ixgbe. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: clean up code for setting MAC addressAlexander Duyck
Drop a bunch of hand written byte swapping code in favor of just doing the byte swapping ourselves. The registers are little endian registers storing a big endian value so if we read the MAC address array as little endian then we will get the CPU registers into the proper layout. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb/igbvf: don't give upMitch Williams
The driver shouldn't just give up if it fails to get the hardware mailbox lock. This can happen in a situation where the PF-VF communication channel is heavily loaded and causes complete communications failure between the PF and VF drivers. Add a counter and a delay. The driver will now retry ten times, waiting one millisecond between retries. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Unpair the queues when changing the number of queuesShota Suzuki
By the commit 72ddef0506da ("igb: Fix oops caused by missing queue pairing"), the IGB_FLAG_QUEUE_PAIRS flag can now be set when changing the number of queues by "ethtool -L", but it is never cleared unless the igb driver is reloaded. This patch clears it if queue pairing becomes unnecessary as a result of "ethtool -L". Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-15igb: Remove unnecessary flag setting in igb_set_flag_queue_pairs()Shota Suzuki
If VFs are enabled (max_vfs >= 1), both max_rss_queues and adapter->rss_queues are set to 2 in the case of e1000_82576. In this case, IGB_FLAG_QUEUE_PAIRS is always set in the default block as a result of fall-through, thus setting it in the e1000_82576 block is not necessary. Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-15sctp: Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRCTom Herbert
The SCTP checksum is really a CRC and is very different from the standards 1's complement checksum that serves as the checksum for IP protocols. This offload interface is also very different. Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC to highlight these differences. The term CSUM should be reserved in the stack to refer to the standard 1's complement IP checksum. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-14igb: Explicitly label self-test result indicesJoe Schultz
Previously, the ethtool self-test gstrings/data arrays were accessed via hardcoded indices, which made the code difficult to follow. This patch replaces the hardcoded values with enum-based labels. Signed-off-by: Joe Schultz <jschultz@xes-inc.com> Signed-off-by: Aaron Sierra <asierra@xes-inc.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-14igb: Improve cable length function for I210, etc.Joe Schultz
Previously, the PHY-specific code to get the cable length for the I210 internal and related PHYs was reporting the cable length of a single pair and reporting it as the min, max, and total cable length. Update it so that all four pairs are checked so the true min, max, and average cable lengths are reported. Signed-off-by: Joe Schultz <jschultz@xes-inc.com> Signed-off-by: Aaron Sierra <asierra@xes-inc.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-14igb: Don't add PHY address to PCDL addressAaron Sierra
There is no reason to add the PHY address into the PCDL register address. Signed-off-by: Aaron Sierra <asierra@xes-inc.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-14igb: Remove GS40G specific defines/functionsAaron Sierra
The I210 internal PHY can be accessed just as well with the access functions shared by 82580, I350, and I354 devices. A side effect of relying on the common functions, is that I210 cable length support is folded back into the common case which effectively reverts the following commit: commit 59f301046b276f87483b3afa3201a4273def06a9 Author: Carolyn Wyborny <carolyn.wyborny@intel.com> Date: Wed Oct 10 04:42:59 2012 +0000 igb: Update get cable length function for i210/i211 Cc: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Aaron Sierra <asierra@xes-inc.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-12igb: improve handling of disconnected adaptersJarod Wilson
Clean up array_rd32 so that it uses igb_rd32 the same as rd32, per the suggestion of Alexander Duyck, and use io_addr in more places, so that we don't have the need to call E1000_REMOVED (which simply looks for a null hw_addr) nearly as much. Signed-off-by: Jarod Wilson <jarod@redhat.com> Acked-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-12igb: fix NULL derefs due to skipped SR-IOV enablingJan Beulich
The combined effect of commits 6423fc3416 ("igb: do not re-init SR-IOV during probe") and ceee3450b3 ("igb: make sure SR-IOV init uses the right number of queues") causes VFs no longer getting set up, leading to NULL pointer dereferences due to the adapter's ->vf_data being NULL while ->vfs_allocated_count is non-zero. The first commit not only neglected the side effect of igb_sriov_reinit() that the second commit tried to account for, but also that of setting IGB_FLAG_HAS_MSIX, without which igb_enable_sriov() is effectively a no-op. Calling igb_{,re}set_interrupt_capability() as done here seems to address this, but I'm not sure whether this is better than sinply reverting the other two commits. Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-12igb: use the correct i210 register for EEMNGCTLTodd Fujinaka
The i210 has two EEPROM access registers that are located in non-standard offsets: EEARBC and EEMNGCTL. EEARBC was fixed previously and EEMNGCTL should also be corrected. Reported-by: Roman Hodek <roman.aud@siemens.com> Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-12igb: don't unmap NULL hw_addrJarod Wilson
I've got a startech thunderbolt dock someone loaned me, which among other things, has the following device in it: 08:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) This hotplugs just fine (kernel 4.2.0 plus a patch or two here): [ 863.020315] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k [ 863.020316] igb: Copyright (c) 2007-2014 Intel Corporation. [ 863.028657] igb 0000:08:00.0: enabling device (0000 -> 0002) [ 863.062089] igb 0000:08:00.0: added PHC on eth0 [ 863.062090] igb 0000:08:00.0: Intel(R) Gigabit Ethernet Network Connection [ 863.062091] igb 0000:08:00.0: eth0: (PCIe:2.5Gb/s:Width x1) e8:ea:6a:00:1b:2a [ 863.062194] igb 0000:08:00.0: eth0: PBA No: 000200-000 [ 863.062196] igb 0000:08:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) [ 863.064889] igb 0000:08:00.0 enp8s0: renamed from eth0 But disconnecting it is another story: [ 1002.807932] igb 0000:08:00.0: removed PHC on enp8s0 [ 1002.807944] igb 0000:08:00.0 enp8s0: PCIe link lost, device now detached [ 1003.341141] ------------[ cut here ]------------ [ 1003.341148] WARNING: CPU: 0 PID: 199 at lib/iomap.c:43 bad_io_access+0x38/0x40() [ 1003.341149] Bad IO access at port 0x0 () [ 1003.342767] Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi igb dca firewire_ohci firewire_core crc_itu_t rfcomm ctr ccm arc4 iwlmvm mac80211 fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter bnep dm_mirror dm_region_hash dm_log dm_mod coretemp x86_pkg_temp_thermal intel_powerclamp kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg [ 1003.342793] ansi_cprng aesni_intel hp_wmi aes_x86_64 iTCO_wdt lrw iTCO_vendor_support ppdev gf128mul sparse_keymap glue_helper ablk_helper cryptd snd_hda_codec_realtek snd_hda_codec_generic microcode snd_hda_intel uvcvideo iwlwifi snd_hda_codec videobuf2_vmalloc videobuf2_memops snd_hda_core videobuf2_core snd_hwdep btusb v4l2_common btrtl snd_seq btbcm btintel videodev cfg80211 snd_seq_device rtsx_pci_ms bluetooth pcspkr input_leds i2c_i801 media parport_pc memstick rfkill sg lpc_ich snd_pcm 8250_fintek parport joydev snd_timer snd soundcore hp_accel ie31200_edac mei_me lis3lv02d edac_core input_polldev mei hp_wireless shpchp tpm_infineon sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables autofs4 xfs libcrc32c sd_mod sr_mod cdrom rtsx_pci_sdmmc mmc_core crc32c_intel serio_raw rtsx_pci [ 1003.342822] nouveau ahci libahci mxm_wmi e1000e xhci_pci hwmon ptp drm_kms_helper pps_core xhci_hcd ttm wmi video ipv6 [ 1003.342839] CPU: 0 PID: 199 Comm: kworker/0:2 Not tainted 4.2.0-2.el7_UNSUPPORTED.x86_64 #1 [ 1003.342840] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70 Ver. 01.07 02/26/2015 [ 1003.342843] Workqueue: pciehp-3 pciehp_power_thread [ 1003.342844] ffffffff81a90655 ffff8804866d3b48 ffffffff8164763a 0000000000000000 [ 1003.342846] ffff8804866d3b98 ffff8804866d3b88 ffffffff8107134a ffff8804866d3b88 [ 1003.342847] ffff880486f46000 ffff88046c8a8000 ffff880486f46840 ffff88046c8a8098 [ 1003.342848] Call Trace: [ 1003.342852] [<ffffffff8164763a>] dump_stack+0x45/0x57 [ 1003.342855] [<ffffffff8107134a>] warn_slowpath_common+0x8a/0xc0 [ 1003.342857] [<ffffffff810713c6>] warn_slowpath_fmt+0x46/0x50 [ 1003.342859] [<ffffffff8133719e>] ? pci_disable_msix+0x3e/0x50 [ 1003.342860] [<ffffffff812f6328>] bad_io_access+0x38/0x40 [ 1003.342861] [<ffffffff812f6567>] pci_iounmap+0x27/0x40 [ 1003.342865] [<ffffffffa0b728d7>] igb_remove+0xc7/0x160 [igb] [ 1003.342867] [<ffffffff8132189f>] pci_device_remove+0x3f/0xc0 [ 1003.342869] [<ffffffff81433426>] __device_release_driver+0x96/0x130 [ 1003.342870] [<ffffffff814334e3>] device_release_driver+0x23/0x30 [ 1003.342871] [<ffffffff8131b404>] pci_stop_bus_device+0x94/0xa0 [ 1003.342872] [<ffffffff8131b3ad>] pci_stop_bus_device+0x3d/0xa0 [ 1003.342873] [<ffffffff8131b3ad>] pci_stop_bus_device+0x3d/0xa0 [ 1003.342874] [<ffffffff8131b516>] pci_stop_and_remove_bus_device+0x16/0x30 [ 1003.342876] [<ffffffff81333f5b>] pciehp_unconfigure_device+0x9b/0x180 [ 1003.342877] [<ffffffff81333a73>] pciehp_disable_slot+0x43/0xb0 [ 1003.342878] [<ffffffff81333b6d>] pciehp_power_thread+0x8d/0xb0 [ 1003.342885] [<ffffffff810881b2>] process_one_work+0x152/0x3d0 [ 1003.342886] [<ffffffff8108854a>] worker_thread+0x11a/0x460 [ 1003.342887] [<ffffffff81088430>] ? process_one_work+0x3d0/0x3d0 [ 1003.342890] [<ffffffff8108ddd9>] kthread+0xc9/0xe0 [ 1003.342891] [<ffffffff8108dd10>] ? kthread_create_on_node+0x180/0x180 [ 1003.342893] [<ffffffff8164e29f>] ret_from_fork+0x3f/0x70 [ 1003.342894] [<ffffffff8108dd10>] ? kthread_create_on_node+0x180/0x180 [ 1003.342895] ---[ end trace 65a77e06d5aa9358 ]--- Upon looking at the igb driver, I see that igb_rd32() attempted to read from hw_addr and failed, so it set hw->hw_addr to NULL and spit out the message in the log output above, "PCIe link lost, device now detached". Well, now that hw_addr is NULL, the attempt to call pci_iounmap is obviously not going to go well. As suggested by Mark Rustad, do something similar to what ixgbe does, and save a copy of hw_addr as adapter->io_addr, so we can still call pci_iounmap on it on teardown. Additionally, for consistency, make the pci_iomap call assignment directly to io_addr, so map and unmap match. Signed-off-by: Jarod Wilson <jarod@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-12igb: add 88E1543 initialization codeTodd Fujinaka
Initialize the 88E1543 PHY. Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-16drivers/net/intel: use napi_complete_done()Jesse Brandeburg
As per Eric Dumazet's previous patches: (see commit (24d2e4a50737) - tg3: use napi_complete_done()) Quoting verbatim: Using napi_complete_done() instead of napi_complete() allows us to use /sys/class/net/ethX/gro_flush_timeout GRO layer can aggregate more packets if the flush is delayed a bit, without having to set too big coalescing parameters that impact latencies. </end quote> Tested configuration: low latency via ethtool -C ethx adaptive-rx off rx-usecs 10 adaptive-tx off tx-usecs 15 workload: streaming rx using netperf TCP_MAERTS igb: MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo ... Interim result: 941.48 10^6bits/s over 1.000 seconds ending at 1440193171.589 Alignment Offset Bytes Bytes Recvs Bytes Sends Local Remote Local Remote Xfered Per Per Recv Send Recv Send Recv (avg) Send (avg) 8 8 0 0 1176930056 1475.36 797726 16384.00 71905 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo ... Interim result: 941.49 10^6bits/s over 0.997 seconds ending at 1440193142.763 Alignment Offset Bytes Bytes Recvs Bytes Sends Local Remote Local Remote Xfered Per Per Recv Send Recv Send Recv (avg) Send (avg) 8 8 0 0 1175182320 50476.00 23282 16384.00 71816 i40e: Hard to test because the traffic is incoming so fast (24Gb/s) that GRO always receives 87kB, even at the highest interrupt rate. Other drivers were only compile tested. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-16drivers/net: get rid of unnecessary initializations in .get_drvinfo()Ivan Vecera
Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len, eedump_len & regdump_len fields in their .get_drvinfo() ethtool op. It's not necessary as these fields is filled in ethtool_get_drvinfo(). v2: removed unused variable v3: removed another unused variable Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05net: igb: avoid using timespecArnd Bergmann
We want to deprecate the use of 'struct timespec' on 32-bit architectures, as it is will overflow in 2038. The igb driver uses it to read the current time, and can simply be changed to use ktime_get_real_ts64() instead. Because of hardware limitations, there is still an overflow in year 2106, which we cannot really avoid, but this documents the overflow. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: intel-wired-lan@lists.osuosl.org Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-28igb: assume MSI-X interrupts during initializationStefan Assmann
In igb_sw_init() the sequence of calls was changed from igb_init_queue_configuration() igb_init_interrupt_scheme() igb_probe_vfs() to igb_probe_vfs() igb_init_queue_configuration() igb_init_interrupt_scheme() This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not get enabled properly and we run into a NULL pointer if the max_vfs module parameter is specified (adapter->vf_data does not get allocated, crash on accessing the structure). [ 7.419348] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 7.419367] IP: [<ffffffffa02161c6>] igb_reset+0xe6/0x5d0 [igb] [ 7.419370] PGD 0 [ 7.419373] Oops: 0002 [#1] SMP [ 7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio [ 7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153 [ 7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 03/07/2013 [...] [ 7.419431] Call Trace: [ 7.419442] [<ffffffffa0217236>] igb_probe+0x8b6/0x1340 [igb] [ 7.419447] [<ffffffff814c7f15>] local_pci_probe+0x45/0xa0 Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling igb_probe_vfs(). The real interrupt capabilities will be checked during igb_init_interrupt_scheme() so this is safe to do. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-08-21mm: make page pfmemalloc check more robustMichal Hocko
Commit c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") added checks for page->pfmemalloc to __skb_fill_page_desc(): if (page->pfmemalloc && !page->mapping) skb->pfmemalloc = true; It assumes page->mapping == NULL implies that page->pfmemalloc can be trusted. However, __delete_from_page_cache() can set set page->mapping to NULL and leave page->index value alone. Due to being in union, a non-zero page->index will be interpreted as true page->pfmemalloc. So the assumption is invalid if the networking code can see such a page. And it seems it can. We have encountered this with a NFS over loopback setup when such a page is attached to a new skbuf. There is no copying going on in this case so the page confuses __skb_fill_page_desc which interprets the index as pfmemalloc flag and the network stack drops packets that have been allocated using the reserves unless they are to be queued on sockets handling the swapping which is the case here and that leads to hangs when the nfs client waits for a response from the server which has been dropped and thus never arrive. The struct page is already heavily packed so rather than finding another hole to put it in, let's do a trick instead. We can reuse the index again but define it to an impossible value (-1UL). This is the page index so it should never see the value that large. Replace all direct users of page->pfmemalloc by page_is_pfmemalloc which will hide this nastiness from unspoiled eyes. The information will get lost if somebody wants to use page->index obviously but that was the case before and the original code expected that the information should be persisted somewhere else if that is really needed (e.g. what SLAB and SLUB do). [akpm@linux-foundation.org: fix blooper in slub] Fixes: c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") Signed-off-by: Michal Hocko <mhocko@suse.com> Debugged-by: Vlastimil Babka <vbabka@suse.com> Debugged-by: Jiri Bohac <jbohac@suse.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Acked-by: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> [3.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-18igb: make sure SR-IOV init uses the right number of queuesTodd Fujinaka
Recent changes to igb_probe_vfs() could lead to the PF holding onto all of the queues. Reorder igb_probe_vfs() to be before gb_init_queue_configuration() and add some more error checking. Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-08-18igb: Fix a memory leak in igb_probeJia-Ju Bai
In error handling code of igb_probe, the memory adapter->shadow_vfta allocated by kcalloc in igb_sw_init is not freed. So when register_netdev or igb_init_i2c is failed, a memory leak will occur. This patch adds kfree to fix it. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-08-18igb: Fix a deadlock in igb_sriov_reinitJia-Ju Bai
When igb_init_interrupt_scheme in igb_sriov_reinit is failed, the lock acquired by rtnl_lock() is not released, which causes a deadlock. This patch adds rtnl_unlock() in error handling to fix it. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-08-18igb: Teardown SR-IOV before unregister_netdev()Alex Williamson
When the .remove() callback for a PF is called, SR-IOV support for the device is disabled, which requires unbinding and removing the VFs. The VFs may be in-use either by the host kernel or userspace, such as assigned to a VM through vfio-pci. In this latter case, the VFs may be removed either by shutting down the VM or hot-unplugging the devices from the VM. Unfortunately in the case of a Windows 2012 R2 guest, hot-unplug is broken due to the ordering of the PF driver teardown. Disabling SR-IOV prior to unregister_netdev() avoids this issue. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-08-18igb: add support for 1512 PHYTodd Fujinaka
This patch adds support for Marvell PHY 1512 (required for I354). Submitted by: Maciej Szwed <maciej.szwed@intel.com> Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-08-18igb: implement high frequency periodic output signalsRichard Cochran
In addition to interrupt driven target time output events, the i210 also has two programmable clock outputs. These clocks support periods between 16 nanoseconds and 140 milliseconds. This patch implements the periodic output function using the clock outputs when possible, falling back to the target time for longer periods. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-08-18igb: do not re-init SR-IOV during probeStefan Assmann
During driver probing the following code path is triggered. igb_probe ->igb_sw_init ->igb_probe_vfs ->igb_pci_enable_sriov ->igb_sriov_reinit Doing the SR-IOV re-init is not necessary during probing since we're starting from scratch. Here we can call igb_enable_sriov() right away. Running igb_sriov_reinit() during igb_probe() also seems to cause occasional packet loss on some onboard 82576 NICs. Reproduced on Dell and HP servers with onboard 82576 NICs. Example: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01) Subsystem: Dell Device [1028:0481] Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-08-18igb: missing rtnl_unlock in igb_sriov_reinit()Vasily Averin
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-08-18igb: Fix oops caused by missing queue pairingShota Suzuki
When initializing igb driver (e.g. 82576, I350), IGB_FLAG_QUEUE_PAIRS is set if adapter->rss_queues exceeds half of max_rss_queues in igb_init_queue_configuration(). On the other hand, IGB_FLAG_QUEUE_PAIRS is not set even if the number of queues exceeds half of max_combined in igb_set_channels() when changing the number of queues by "ethtool -L". In this case, if numvecs is larger than MAX_MSIX_ENTRIES (10), the size of adapter->msix_entries[], an overflow can occur in igb_set_interrupt_capability(), which in turn leads to an oops. Fix this problem as follows: - When changing the number of queues by "ethtool -L", set IGB_FLAG_QUEUE_PAIRS in the same way as initializing igb driver. - When increasing the size of q_vector, reallocate it appropriately. (With IGB_FLAG_QUEUE_PAIRS set, the size of q_vector gets larger.) Another possible way to fix this problem is to cap the queues at its initial number, which is the number of the initial online cpus. But this is not the optimal way because we cannot increase queues when another cpu becomes online. Note that before commit cd14ef54d25b ("igb: Change to use statically allocated array for MSIx entries"), this problem did not cause oops but just made the number of queues become 1 because of entering msi_only mode in igb_set_interrupt_capability(). Fixes: 907b7835799f ("igb: Add ethtool support to configure number of channels") CC: stable <stable@vger.kernel.org> Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-07-23igb: bump version to igb-5.3.0Todd Fujinaka
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>