summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)Author
2017-05-01Merge tag 'mlx5-updates-2017-04-30' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2017-04-30 Or says: ================ mlx5 neigh update This series (whose code name is 'neigh update') from Hadar, enhances the mlx5 TC IP tunnel offloads to deal with changes to tunnel destination neighbours used in offloaded flows which involved encapsulation. In order to keep track on the validity state of such neighbours, we register a netevent notifier callback and act on NEIGH_UPDATE events: if a neighbour becomes valid, offload the related flows to HW (the other way around when neigh becomes invalid) and similarly when a neigh mac addresses changes. Since this traffic is offloaded from the host OS, the neighbour for the IP tunnel destination can mistakenly become STALE and deleted by the kernel since its 'used' value wasn't changed. To address that, we proactively update the neighbour 'used' value every DELAY_PROBE_TIME seconds, using time stamps generated by the existing driver code for HW flow counters. We use the DELAY_PROBE_TIME_UPDATE event to adjust the frequency of the updates. Prior to the core of the series, there's a patch from Saeed that introduces an extendable vport representor implementation scheme. It provides a separation between the eswitch to the netdev related aspects of the representors. We would like to thank Ido Schimmel and Ilya Lesokhin for their coaching && advice through the long design and review cycles while we struggled to understand and (hopefully correctly) implement the locking around the different driver flows(..) . - Or. ================= Misc Updates: From Tariq: Some small performance and trivial code optimization for mlx5 netdev driver - Optimize poll ICOSQ completion queue - Use prefetchw when a write is to follow - Use u8 as ownership type in mlx5e_get_cqe() From Eran: - Disable LRO by default on specific setups From Eli: - Small cleanup for E-Switch to avoid redundant allocation Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: Prevent warning without CONFIG_RFS_ACCELMintz, Yuval
After removing the PTP related initialization from slowpath start, the remaining PTT entry is required only in case CONFIG_RFS_ACCEL is set. Otherwise, it leads to a warning due to it being unused. Fixes: d179bd1699fc ("qed: Acquire/release ptt_ptp lock when enabling/disabling PTP") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: output the DPM status and WID countRam Amrani
Output to the RDMA driver whether DPM mode is enabled or disabled in the HW and if so what is the number of WIDs it supports Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: align DPI configuration to HW requirementsRam Amrani
When calculating doorbell BAR partitioning round up the number of CPUs to the nearest power of 2 so the size of the DPI (per user section) configured in the hardware will be stored properly and not truncated. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: verify RoCE resource bitmaps are releasedRam Amrani
Add mechanism to verify RoCE resources are released prior to freeing the bitmaps. If this is not the case, print what resources were not released. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: add error handling flow to TID deregistratin posting failureRam Amrani
If the posting of the ramrod for the purpose of TID deregistration fails, abort the deregistration operation without using the FW's return code. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: remove unused SQ error stateRam Amrani
The internal RoCE SQE QP state isn't being used. Instead we mark the QP as in regular error state. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: configure the RoCE max message sizeRam Amrani
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01benet: Use time_before_eq for time comparisonKarim Eshapa
Use time_before_eq for time comparison more safe and dealing with timer wrapping to be future-proof. Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01virtio_net: make use of extended ack message reportingJakub Kicinski
Try to carry error messages to the user via the netlink extended ack message attribute. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01nfp: make use of extended ack message reportingJakub Kicinski
Try to carry error messages to the user via the netlink extended ack message attribute. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30net: phy: Allow BCM5481x PHYs to setup internal TX/RX clock delayAbhishek Shah
This patch allows users to enable/disable internal TX and/or RX clock delay for BCM5481x series PHYs so as to satisfy RGMII timing specifications. On a particular platform, whether TX and/or RX clock delay is required depends on how PHY connected to the MAC IP. This requirement can be specified through "phy-mode" property in the platform device tree. Signed-off-by: Abhishek Shah <abhishek.shah@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30net: sunhme: fix spelling mistakes: "ParityErro" -> "ParityError"Colin Ian King
trivial fix to spelling mistakes in printk message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30bnx2x: Align RX buffersScott Wood
The bnx2x driver is not providing proper alignment on the receive buffers it passes to build_skb(), causing skb_shared_info to be misaligned. skb_shared_info contains an atomic, and while PPC normally supports unaligned accesses, it does not support unaligned atomics. Aligning the size of rx buffers will ensure that page_frag_alloc() returns aligned addresses. This can be reproduced on PPC by setting the network MTU to 1450 (or other non-multiple-of-4) and then generating sufficient inbound network traffic (one or two large "wget"s usually does it), producing the following oops: Unable to handle kernel paging request for unaligned access at address 0xc00000ffc43af656 Faulting instruction address: 0xc00000000080ef8c Oops: Kernel access of bad area, sig: 7 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: vmx_crypto powernv_rng rng_core powernv_op_panel leds_powernv led_class nfsd ip_tables x_tables autofs4 xfs lpfc bnx2x mdio libcrc32c crc_t10dif crct10dif_generic crct10dif_common CPU: 104 PID: 0 Comm: swapper/104 Not tainted 4.11.0-rc8-00088-g4c761da #2 task: c00000ffd4892400 task.stack: c00000ffd4920000 NIP: c00000000080ef8c LR: c00000000080eee8 CTR: c0000000001f8320 REGS: c00000ffffc33710 TRAP: 0600 Not tainted (4.11.0-rc8-00088-g4c761da) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24082042 XER: 00000000 CFAR: c00000000080eea0 DAR: c00000ffc43af656 DSISR: 00000000 SOFTE: 1 GPR00: c000000000907f64 c00000ffffc33990 c000000000dd3b00 c00000ffcaf22100 GPR04: c00000ffcaf22e00 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000b80008 c00000ffc43af636 c00000ffc43af656 0000000000000000 GPR12: c0000000001f6f00 c00000000fe1a000 000000000000049f 000000000000c51f GPR16: 00000000ffffef33 0000000000000000 0000000000008a43 0000000000000001 GPR20: c00000ffc58a90c0 0000000000000000 000000000000dd86 0000000000000000 GPR24: c000007fd0ed10c0 00000000ffffffff 0000000000000158 000000000000014a GPR28: c00000ffc43af010 c00000ffc9144000 c00000ffcaf22e00 c00000ffcaf22100 NIP [c00000000080ef8c] __skb_clone+0xdc/0x140 LR [c00000000080eee8] __skb_clone+0x38/0x140 Call Trace: [c00000ffffc33990] [c00000000080fb74] skb_clone+0x74/0x110 (unreliable) [c00000ffffc339c0] [c000000000907f64] packet_rcv+0x144/0x510 [c00000ffffc33a40] [c000000000827b64] __netif_receive_skb_core+0x5b4/0xd80 [c00000ffffc33b00] [c00000000082b2bc] netif_receive_skb_internal+0x2c/0xc0 [c00000ffffc33b40] [c00000000082c49c] napi_gro_receive+0x11c/0x260 [c00000ffffc33b80] [d000000066483d68] bnx2x_poll+0xcf8/0x17b0 [bnx2x] [c00000ffffc33d00] [c00000000082babc] net_rx_action+0x31c/0x480 [c00000ffffc33e10] [c0000000000d5a44] __do_softirq+0x164/0x3d0 [c00000ffffc33f00] [c0000000000d60a8] irq_exit+0x108/0x120 [c00000ffffc33f20] [c000000000015b98] __do_irq+0x98/0x200 [c00000ffffc33f90] [c000000000027f14] call_do_irq+0x14/0x24 [c00000ffd4923a90] [c000000000015d94] do_IRQ+0x94/0x110 [c00000ffd4923ae0] [c000000000008d90] hardware_interrupt_common+0x150/0x160 Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30liquidio: silence a locking static checker warningDan Carpenter
Presumably we never hit this return, but static checkers complain that we need to unlock so we may as well fix that. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30qed: Unlock on error in qed_vf_pf_acquire()Dan Carpenter
My static checker complains that we're holding a mutex on this error path. Let's goto exit instead of returning directly. Fixes: b0bccb69eba3 ("qed: Change locking scheme for VF channel") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30net: hns: support deferred probe when no mdiolipeng
In the hip06 and hip07 SoCs, phy connect to mdio bus.The mdio module is probed with module_init, and, as such, is not guaranteed to probe before the HNS driver. So we need to support deferred probe. We check for probe deferral in the mac init, so we not init DSAF when there is no mdio, and free all resource, to later learn that we need to defer the probe. Signed-off-by: lipeng <lipeng321@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Reviewed-by: Matthias Brugger <mbrugger@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30net: hns: support deferred probe when can not obtain irqlipeng
In the hip06 and hip07 SoCs, the interrupt lines from the DSAF controllers are connected to mbigen hw module. The mbigen module is probed with module_init, and, as such, is not guaranteed to probe before the HNS driver. So we need to support deferred probe. Signed-off-by: lipeng <lipeng321@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Reviewed-by: Matthias Brugger <mbrugger@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30nfp: provide 256 bytes of XDP headroom in all configurationsJakub Kicinski
For legacy reasons NFP FW may be compiled to DMA packets to a constant offset into the buffer and use the space before it for metadata. This ensures that packets data always start at a certain offset regardless of the amount of preceding metadata. If rx offset is set to 0 there may still be up to 64 bytes of metadata but metadata will start at the beginning of the buffer, instead of: data_start_offset = rx_offset - meta_len Even though we make the buffers larger to accommodate up to 64 bytes of metadata, if there is only N bytes of metadata, we will end up with N bytes of headroom and 64 - N bytes of tailroom. Therefore we can't rely on that space for XDP headroom. Make sure we always allocate full 256 bytes. This, unfortunately, means we can't fit the headroom on an u8 any more. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30nfp: don't completely refuse to work with old flashesJakub Kicinski
Right now the required Service Process ABI version is still tied to max ID of known commands. For new NSP commands we are adding we are checking if NSP version is recent enough on command-by-command basis. The driver doesn't have to force the device to have the very latest flash, anything newer than 0.8 should do. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30nfp: avoid reading TX queue indexes from the deviceJakub Kicinski
Reading TX queue indexes from the device memory on each interrupt is expensive. It's doubly expensive with XDP running since we have two TX rings to check there. If the software indexes indicate that the TX queue is completely empty, however, we don't need to look at the device completion index at all. The queuing CPU is doing a wmb() before kicking the device TX so we should be safe to assume on the CPU handling the completions will never see old value of the software copy of the index. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30nfp: do simple XDP TX buffer recyclingJakub Kicinski
On the RX path we follow the "drop if allocation of replacement buffer fails" rule. With XDP we extended that to the TX action, so if XDP prog returned TX but allocation of replacement RX buffer failed, we will drop the packet. To improve our XDP TX performance extend the idea of rings being always full to XDP TX rings. Pre-fill the XDP TX rings with RX buffers, and when XDP prog returns TX action swap the RX buffer with the next buffer from the TX ring. XDP TX complete will no longer free the buffers but let them sit on the TX ring and wait for swap with RX buffer, instead. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30nfp: drop rx_ring param from buffer allocationJakub Kicinski
We will soon allocate RX buffers for caching on XDP TX rings. The rx_ring parameter passed to nfp_net_rx_alloc_one() is not actually used, remove it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30nfp: replace -ENOTSUPP with -EOPNOTSUPPJakub Kicinski
As Or points out in commit 423b3aecf290 ("net/mlx4: Change ENOTSUPP to EOPNOTSUPP"), ENOTSUPP is NFS specific error. Replace it with EOPNOTSUPP. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30virtio-net: use netif_tx_napi_add for tx napiWillem de Bruijn
Avoid hashing the tx napi struct into napi_hash[], which is used for busy polling receive queues. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30geneve: fix incorrect setting of UDP checksum flagGirish Moodalbail
Creating a geneve link with 'udpcsum' set results in a creation of link for which UDP checksum will NOT be computed on outbound packets, as can be seen below. 11: gen0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether c2:85:27:b6:b4:15 brd ff:ff:ff:ff:ff:ff promiscuity 0 geneve id 200 remote 192.168.13.1 dstport 6081 noudpcsum Similarly, creating a link with 'noudpcsum' set results in a creation of link for which UDP checksum will be computed on outbound packets. Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30vxlan: do not output confusing error messageJiri Benc
The message "Cannot bind port X, err=Y" creates only confusion. In metadata based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and no error message should be printed. But when IPv6 tunnel was requested, such failure is fatal. The vxlan_socket_create does not know when the error is harmless and when it's not. Instead of passing such information down to vxlan_socket_create, remove the message completely. It's not useful. We propagate the error code up to the user space and the port number comes from the user space. There's nothing in the message that the process creating vxlan interface does not know. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30vxlan: correctly handle ipv6.disable module parameterJiri Benc
When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns -EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole operation of bringing up the tunnel. Ignore failure of IPv6 socket creation for metadata based tunnels caused by IPv6 not being available. Fixes: b1be00a6c39f ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30bnx2x: Get rid of useless temporary variableAndy Shevchenko
Replace pattern int status; ... status = func(...); return status; by return func(...); No functional change intented. Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30bnx2x: Reuse bnx2x_null_format_ver()Andy Shevchenko
Reuse bnx2x_null_format_ver() in functions where it's appropriated instead of open coded variant. Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30bnx2x: Replace custom scnprintf()Andy Shevchenko
Use scnprintf() when printing version instead of custom open coded variants. Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30net: macb: fix phy interrupt parsingAlexandre Belloni
Since 83a77e9ec415, the phydev irq is explicitly set to PHY_POLL when there is no pdata. It doesn't work on DT enabled platforms because the phydev irq is already set by libphy before. Fixes: 83a77e9ec415 ("net: macb: Added PCI wrapper for Platform Driver.") Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2017-04-30 This series contains updates to i40e and i40evf only. Jake provides majority of the changes in this series, starting with the renaming of a flag to avoid confusion. Then renamed a variable to a more meaningful name to clarify what is actually being done and to reduce confusion. Amortizes the wait time when initializing or disabling lots of VFs by using i40e_reset_all_vfs() and i40e_vsi_stop_rings_no_wait(). Cleaned up a unnecessary delay since pci_disable_sriov() already has its own delay, so need to add a additional delay when removing VFs. Avoid using the same name flags for both vsi->state and pf->state, to make code review easier and assist future work to use the correct state field when checking bits. Use DECLARE_BITMAP() to ensure that we always allocate enough space for flags. Replace hw_disabled_flags with the new _AUTO_DISABLED flags, which are more readable because we are not setting an *_ENABLED flag to disable the feature. Alex corrects a oversight where we were not reprogramming the ports after a reset, which was causing us to lose all of the receive tunnel offloads. Arnd Bergmann moves the declaration of a local variable to avoid a warning seen on architectures with larger pages about an unused variable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30net/mlx5: E-Switch, Avoid redundant memory allocationEli Cohen
struct esw_mc_addr is a small struct that can be part of struct mlx5_eswitch. Define it as a field and not as a pointer and save the kzalloc call and then error flow handling. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Disable HW LRO when PCI is slower than link on striding RQEran Ben Elisha
We will activate the HW LRO only on servers with PCI BW > MAX LINK BW, or when PCI BW > 16Gbps. On other cases we do not want LRO by default as LRO sessions might get timeout and add redundant software overhead. Tested: ethtool -k <ifs-name> | grep large-receive-offload On systems with and without the limitations. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Use u8 as ownership type in mlx5e_get_cqe()Tariq Toukan
CQE ownership indication is as small as a single bit. Use u8 to speedup the comparison. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Use prefetchw when a write is to followTariq Toukan
"prefetchw()" prefetches the cacheline for write. Use it for skb->data, as soon we'll be copying the packet header there. Performance: Single-stream packet-rate tested with pktgen. Packets are dropped in tc level to zoom into driver data-path. Larger gain is expected for smaller packets, as less time is spent on handling SKB fragments, making the path shorter and the improvement more significant. --------------------------------------------- packet size | before | after | gain | 64B | 4,113,306 | 4,778,720 | 16% | 1024B | 3,633,819 | 3,950,593 | 8.7% | Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Optimize poll ICOSQ completion queueTariq Toukan
UMR operations are more frequent and important. Check them first, and add a compiler branch predictor hint. According to current design, ICOSQ CQ can contain at most one pending CQE per napi. Poll function is optimized accordingly. Performance: Single-stream packet-rate tested with pktgen. Packets are dropped in tc level to zoom into driver data-path. Larger gain is expected for larger packet sizes, as BW is higher and UMR posts are more frequent. --------------------------------------------- packet size | before | after | gain | 64B | 4,092,370 | 4,113,306 | 0.5% | 1024B | 3,421,435 | 3,633,819 | 6.2% | Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Act on delay probe time updatesHadar Hen Zion
The user can change delay_first_probe_time parameter through sysctl. Listen to NETEVENT_DELAY_PROBE_TIME_UPDATE notifications and update the intervals for updating the neighbours 'used' value periodic task and for flow HW counters query periodic task. Both of the intervals will be update only in case the new delay prob time value is lower the current interval. Since the driver saves only one min interval value and not per device, the users will be able to set lower interval value for updating neighbour 'used' value periodic task but they won't be able to schedule a higher interval for this periodic task. The used interval for scheduling neighbour 'used' value periodic task is the minimal delay prob time parameter ever seen by the driver. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Update neighbour 'used' state using HW flow rules countersHadar Hen Zion
When IP tunnel encapsulation rules are offloaded, the kernel can't see the traffic of the offloaded flow. The neighbour for the IP tunnel destination of the offloaded flow can mistakenly become STALE and deleted by the kernel since its 'used' value wasn't changed. To make sure that a neighbour which is used by the HW won't become STALE, we proactively update the neighbour 'used' value every DELAY_PROBE_TIME period, when packets were matched and counted by the HW for one of the tunnel encap flows related to this neighbour. The periodic task that updates the used neighbours is scheduled when a tunnel encap rule is successfully offloaded into HW and keeps re-scheduling itself as long as the representor's neighbours list isn't empty. Add, remove, lookup and status change operations done over the representor's neighbours list or the neighbour hash entry encaps list are all serialized by RTNL lock. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Add support to neighbour update flowHadar Hen Zion
In order to offload TC encap rules, the driver does a lookup for the IP tunnel neighbour according to the output device and the destination IP given by the user. To keep tracking after the validity state of such neighbours, we keep the neighbours information (pair of device pointer and destination IP) in a hash table maintained at the relevant egress representor and register to get NETEVENT_NEIGH_UPDATE events. When getting neighbour update netevent, we search for a match among the cached neighbours entries used for encapsulation. In case the neighbour isn't valid, we can't offload the flow into the HW. We cache the flow (requested matching and actions) in the driver and offload the rule later, when the neighbour is resolved and becomes valid. When a flow is only cached in the driver and not offloaded into HW yet, we use EAGAIN return value to mark it internally, the TC ndo still returns success. Listen to kernel neighbour update netevents to trace relevant neighbours validity state: 1. If a neighbour becomes valid, offload the related rules to HW. 2. If the neighbour becomes invalid, remove the related rules from HW. 3. If the neighbour mac address was changed, update the encap header. Remove all the offloaded rules using the old encap header from the HW and insert new rules to HW with updated encap header. Access to the neighbors hash table is protected by RTNL lock of its caller or by the table's spinlock. Details of the locking/synchronization among the different actions applied on the neighbour table: Add/remove operations - protected by RTNL lock of its caller (all TC commands are protected by RTNL lock). Add and remove operations are initiated only when the user inserts/removes a TC rule into/from the driver. Lookup/remove operations - since the lookup operation is done from netevent notifier block, RTNL lock can't be used (atomic context). Use the table's spin lock to protect lookups from TC user removal operation. bh is used since netevent can be called from a softirq context. Lookup/add operations - The hash table access functions are taking care of the protection between lookup and add operations. When adding/removing encap headers and rules to/from the HW, RTNL lock is used. It can happen when: 1. The user inserts/removes a TC rule into/from the driver (TC commands are protected by RTNL lock of it's caller). 2. The driver gets neighbour notification event, which reports about neighbour validity status change. Before adding/removing encap headers and rules to/from the HW, RTNL lock is taken. A neighbour hash table entry should be freed when its encap list is empty. Since The neighbour update netevent notification schedules a neighbour update work that uses the neighbour hash entry, it can't be freed unconditionally when the encap list becomes empty during TC delete rule flow. Use reference count to protect from freeing neighbour hash table entry while it's still in use. When the user asks to unregister a netdvice used by one of the neigbours, neighbour removal notification is received. Then we take a reference on the neighbour and don't free it until the relevant encap entries (and flows) are marked as invalid (not offloaded) and removed from HW. As long as the encap entry is still valid (checked under RTNL lock) we can safely access the neighbour device saved on mlx5e_neigh struct. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Add neighbour hash table to the representorsHadar Hen Zion
Add hash table to the representors which is to be used by the next patch to save neighbours information in the driver. In order to offload IP tunnel encapsulation rules, the driver must find the tunnel dst neighbour according to the output device and the destination address given by the user. The next patch will cache the neighbors information in the driver to allow support in neigh update flow for tunnel encap rules. The neighbour entries are also saved in a list so we easily iterate over them when querying statistics in order to provide 'used' feedback to the kernel neighbour NUD core. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Read neigh parameters with proper lockingHadar Hen Zion
The nud_state and hardware address fields are protected by the neighbour lock, we should acquire it before accessing those parameters. Use this lock to avoid inconsistency between the neighbour validity state and it's hardware address. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Use flag to properly monitor a flow rule offloading stateHadar Hen Zion
Instead of relaying on the 'flow->rule' pointer value which can be valid or invalid (in case the FW returns an error while trying to offload the rule), monitor the rule state using a flag. In downstream patch which adds support to IP tunneling neigh update flow, a TC rule could be cached in the driver and not offloaded into the HW. In this case, the flow handle pointer stays NULL. Check the offloaded flag to properly deal with rules which are currently not offloaded when querying rule statistics. This patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Remove output device parameter from create encap header helpers ↵Hadar Hen Zion
definition Passing output device parameter to the helper functions that deal with creation of encapsulation headers is redundant. Output device parameter can be defined inside those helpers, no need to pass it. Refactor the code by removing the parameter from the function signature. This patch doesn't change any functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Move the encap entry structure from the eswitch headerOr Gerlitz
The encap entry structure isn't manipulated by the eswitch code, hence it can/needs to be removed from the eswitch header. Do that, and change it to have mlx5e_ prefix. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5: Remove encap entry pointer from the eswitch flow attributesOr Gerlitz
Encap wise, the tc eswitch flow attribute struct needs to have only the encap ID which is programmed later to the HW and none of the higher level encap params, fix that. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Extendable vport representor netdev private dataSaeed Mahameed
Make representor netdev private data extendable by adding new struct "mlx5e_rep_priv" and use it as the rep netdev private data struct instead of directly pointing to mlx5_eswitch_rep. Added new en_rep.h header file to contain all representor related definitions and prototypes, and moved all representor specific logic into en_rep.c. Needed for downstream patches to extend representor functionality to support neighbour update. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2017-04-30e1000e: Add Support for 38.4MHZ frequencySasha Neftin
Add support for 38.4MHz frequency is required for PTP on CannonLake. SYSTIM frequency adjustment attributes for TIMINCA are get/set dependent on the hardware clock frequency for a different types of adapters. 38.4MHz frequency supported by CannonLake and active once time synchronisation mechanism was enabled Changed abbreviation from Hz to HZ to be compliant checkpatch code style Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Raanan Avargil <raanan.avargil@intel.com> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-30e1000e: Add Support for CannonLakeSasha Neftin
The propagation of CannonLake mac type to driver functionality Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Raanan Avargil <raanan.avargil@intel.com> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>