summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)Author
2017-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix multiqueue in stmmac driver on PCI, from Andy Shevchenko. 2) cdc_ncm doesn't actually fully zero out the padding area is allocates on TX, from Jim Baxter. 3) Don't leak map addresses in BPF verifier, from Daniel Borkmann. 4) If we randomize TCP timestamps, we have to do it everywhere including SYN cookies. From Eric Dumazet. 5) Fix "ethtool -S" crash in aquantia driver, from Pavel Belous. 6) Fix allocation size for ntp filter bitmap in bnxt_en driver, from Dan Carpenter. 7) Add missing memory allocation return value check to DSA loop driver, from Christophe Jaillet. 8) Fix XDP leak on driver unload in qed driver, from Suddarsana Reddy Kalluru. 9) Don't inherit MC list from parent inet connection sockets, another syzkaller spotted gem. Fix from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits) dccp/tcp: do not inherit mc_list from parent qede: Split PF/VF ndos. qed: Correct doorbell configuration for !4Kb pages qed: Tell QM the number of tasks qed: Fix VF removal sequence qede: Fix XDP memory leak on unload net/mlx4_core: Reduce harmless SRIOV error message to debug level net/mlx4_en: Avoid adding steering rules with invalid ring net/mlx4_en: Change the error print to debug print drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison DECnet: Use container_of() for embedded struct Revert "ipv4: restore rt->fi for reference counting" net: mdio-mux: bcm-iproc: call mdiobus_free() in error path net: ethernet: ti: cpsw: adjust cpsw fifos depth for fullduplex flow control ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf net: cdc_ncm: Fix TX zero padding stmmac: pci: split out common_default_data() helper stmmac: pci: RX queue routing configuration stmmac: pci: TX and RX queue priority configuration stmmac: pci: set default number of rx and tx queues ...
2017-05-09net/mlx4_core: Reduce harmless SRIOV error message to debug levelJack Morgenstein
Under SRIOV resource management, extra counters are allocated to VFs from a free pool. If that pool is empty, the ALLOC_RES command for a counter resource fails -- and this generates a misleading error message in the message log. Under SRIOV, each VF is allocated (i.e., guaranteed) 2 counters -- one counter per port. For ETH ports, the RoCE driver requests an additional counter (above the guaranteed counters). If that request fails, the VF RoCE driver simply uses the default (i.e., guaranteed) counter for that port. Thus, failing to allocate an additional counter does not constitute a problem, and the error message on the PF when this occurs should be reduced to debug level. Finally, to identify the situation that the reason for the failure is that no resources are available to grant to the VF, we modified the error returned by mlx4_grant_resource to -EDQUOT (Quota exceeded), which more accurately describes the error. Fixes: c3abb51bdb0e ("IB/mlx4: Add RoCE/IB dedicated counters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09net/mlx4_en: Avoid adding steering rules with invalid ringTalat Batheesh
Inserting steering rules with illegal ring is an invalid operation, block it. Fixes: 820672812f82 ('net/mlx4_en: Manage flow steering rules with ethtool') Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09net/mlx4_en: Change the error print to debug printKamal Heib
The error print within mlx4_en_calc_rx_buf() should be a debug print. Fixes: 51151a16a60f ('mlx4: allow order-0 memory allocations in RX path') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08Merge tags 'for-linus' and 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "As mentioned in my first pull request, this is the subsequent pull requests I had. This is all I have, and in fact this cleans out the RDMA subsystem's entire patchworks queue of kernel changes that are ready to go (well, it did for the weekend anyway, a few new patches are in, but they'll be coming during the -rc cycle). The first tag contains a single patch that would have conflicted if taken from my tree or DaveM's tree as it needed our trees merged to come cleanly. The second tag contains the patch series from Intel plus three other stragllers that came in late last week. I took them because it allowed me to legitimately claim that the RDMA patchworks queue was, for a short time, 100% cleared of all waiting kernel patches, woohoo! :-). I have it under my for-next tag, so it did get 0day and linux- next over the end of last week, and linux-next did show one minor conflict. Summary: 'for-linus' tag: - mlx5/IPoIB fixup patch 'for-next' tag: - the hfi1 15 patch set that landed late - IPoIB get_link_ksettings which landed late because I asked for a respin - one late rxe change - one -rc worthy fix that's in early" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/mlx5: Enable IPoIB acceleration * tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: rxe: expose num_possible_cpus() cnum_comp_vectors IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type IB/hfi1: Clean up on context initialization failure IB/hfi1: Fix an assign/ordering issue with shared context IDs IB/hfi1: Clean up context initialization IB/hfi1: Correctly clear the pkey IB/hfi1: Search shared contexts on the opened device, not all devices IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit IB/hfi1: Use filedata rather than filepointer IB/hfi1: Name function prototype parameters IB/hfi1: Fix a subcontext memory leak IB/hfi1: Return an error on memory allocation failure IB/hfi1: Adjust default eager_buffer_size to 8MB IB/hfi1: Get rid of divide when setting the tx request header IB/hfi1: Fix yield logic in send engine IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line IB/hfi1: Fix checks for Offline transient state IB/ipoib: add get_link_ksettings in ethtool
2017-05-08treewide: use kv[mz]alloc* rather than opencoded variantsMichal Hocko
There are many code paths opencoding kvmalloc. Let's use the helper instead. The main difference to kvmalloc is that those users are usually not considering all the aspects of the memory allocator. E.g. allocation requests <= 32kB (with 4kB pages) are basically never failing and invoke OOM killer to satisfy the allocation. This sounds too disruptive for something that has a reasonable fallback - the vmalloc. On the other hand those requests might fallback to vmalloc even when the memory allocator would succeed after several more reclaim/compaction attempts previously. There is no guarantee something like that happens though. This patch converts many of those places to kv[mz]alloc* helpers because they are more conservative. Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390 Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim Acked-by: David Sterba <dsterba@suse.com> # btrfs Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4 Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5 Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Santosh Raspatur <santosh@chelsio.com> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Yishai Hadas <yishaih@mellanox.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: "Yan, Zheng" <zyan@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-04IB/mlx5: Enable IPoIB accelerationErez Shitrit
Enable mlx5 IPoIB acceleration by declaring mlx5_ib_{alloc,free}_rdma_netdev and assigning the mlx5 IPoIB rdma_netdev callbacks. In addition, this patch brings in sync mlx5's IPoIB parts for net and IB trees. As a precaution, we disabled IPoIB acceleration by default (in the mlx5_core Kconfig file). Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01mlxsw: spectrum_router: Simplify VRF enslavementIdo Schimmel
When a netdev is enslaved to a VRF master, its router interface (RIF) needs to be destroyed (if exists) and a new one created using the corresponding virtual router (VR). >From the driver's perspective, the above is equivalent to an inetaddr event sent for this netdev. Therefore, when a port netdev (or its uppers) are enslaved to a VRF master, call the same function that would've been called had a NETDEV_UP was sent for this netdev in the inetaddr notification chain. This patch also fixes a bug when a LAG netdev with an existing RIF is enslaved to a VRF. Before this patch, each LAG port would drop the reference on the RIF, but would re-join the same one (in the wrong VR) soon after. With this patch, the corresponding RIF is first destroyed and a new one is created using the correct VR. Fixes: 7179eb5acd59 ("mlxsw: spectrum_router: Add support for VRFs") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30net/mlx5: E-Switch, Avoid redundant memory allocationEli Cohen
struct esw_mc_addr is a small struct that can be part of struct mlx5_eswitch. Define it as a field and not as a pointer and save the kzalloc call and then error flow handling. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Disable HW LRO when PCI is slower than link on striding RQEran Ben Elisha
We will activate the HW LRO only on servers with PCI BW > MAX LINK BW, or when PCI BW > 16Gbps. On other cases we do not want LRO by default as LRO sessions might get timeout and add redundant software overhead. Tested: ethtool -k <ifs-name> | grep large-receive-offload On systems with and without the limitations. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Use u8 as ownership type in mlx5e_get_cqe()Tariq Toukan
CQE ownership indication is as small as a single bit. Use u8 to speedup the comparison. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Use prefetchw when a write is to followTariq Toukan
"prefetchw()" prefetches the cacheline for write. Use it for skb->data, as soon we'll be copying the packet header there. Performance: Single-stream packet-rate tested with pktgen. Packets are dropped in tc level to zoom into driver data-path. Larger gain is expected for smaller packets, as less time is spent on handling SKB fragments, making the path shorter and the improvement more significant. --------------------------------------------- packet size | before | after | gain | 64B | 4,113,306 | 4,778,720 | 16% | 1024B | 3,633,819 | 3,950,593 | 8.7% | Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Optimize poll ICOSQ completion queueTariq Toukan
UMR operations are more frequent and important. Check them first, and add a compiler branch predictor hint. According to current design, ICOSQ CQ can contain at most one pending CQE per napi. Poll function is optimized accordingly. Performance: Single-stream packet-rate tested with pktgen. Packets are dropped in tc level to zoom into driver data-path. Larger gain is expected for larger packet sizes, as BW is higher and UMR posts are more frequent. --------------------------------------------- packet size | before | after | gain | 64B | 4,092,370 | 4,113,306 | 0.5% | 1024B | 3,421,435 | 3,633,819 | 6.2% | Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Act on delay probe time updatesHadar Hen Zion
The user can change delay_first_probe_time parameter through sysctl. Listen to NETEVENT_DELAY_PROBE_TIME_UPDATE notifications and update the intervals for updating the neighbours 'used' value periodic task and for flow HW counters query periodic task. Both of the intervals will be update only in case the new delay prob time value is lower the current interval. Since the driver saves only one min interval value and not per device, the users will be able to set lower interval value for updating neighbour 'used' value periodic task but they won't be able to schedule a higher interval for this periodic task. The used interval for scheduling neighbour 'used' value periodic task is the minimal delay prob time parameter ever seen by the driver. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Update neighbour 'used' state using HW flow rules countersHadar Hen Zion
When IP tunnel encapsulation rules are offloaded, the kernel can't see the traffic of the offloaded flow. The neighbour for the IP tunnel destination of the offloaded flow can mistakenly become STALE and deleted by the kernel since its 'used' value wasn't changed. To make sure that a neighbour which is used by the HW won't become STALE, we proactively update the neighbour 'used' value every DELAY_PROBE_TIME period, when packets were matched and counted by the HW for one of the tunnel encap flows related to this neighbour. The periodic task that updates the used neighbours is scheduled when a tunnel encap rule is successfully offloaded into HW and keeps re-scheduling itself as long as the representor's neighbours list isn't empty. Add, remove, lookup and status change operations done over the representor's neighbours list or the neighbour hash entry encaps list are all serialized by RTNL lock. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Add support to neighbour update flowHadar Hen Zion
In order to offload TC encap rules, the driver does a lookup for the IP tunnel neighbour according to the output device and the destination IP given by the user. To keep tracking after the validity state of such neighbours, we keep the neighbours information (pair of device pointer and destination IP) in a hash table maintained at the relevant egress representor and register to get NETEVENT_NEIGH_UPDATE events. When getting neighbour update netevent, we search for a match among the cached neighbours entries used for encapsulation. In case the neighbour isn't valid, we can't offload the flow into the HW. We cache the flow (requested matching and actions) in the driver and offload the rule later, when the neighbour is resolved and becomes valid. When a flow is only cached in the driver and not offloaded into HW yet, we use EAGAIN return value to mark it internally, the TC ndo still returns success. Listen to kernel neighbour update netevents to trace relevant neighbours validity state: 1. If a neighbour becomes valid, offload the related rules to HW. 2. If the neighbour becomes invalid, remove the related rules from HW. 3. If the neighbour mac address was changed, update the encap header. Remove all the offloaded rules using the old encap header from the HW and insert new rules to HW with updated encap header. Access to the neighbors hash table is protected by RTNL lock of its caller or by the table's spinlock. Details of the locking/synchronization among the different actions applied on the neighbour table: Add/remove operations - protected by RTNL lock of its caller (all TC commands are protected by RTNL lock). Add and remove operations are initiated only when the user inserts/removes a TC rule into/from the driver. Lookup/remove operations - since the lookup operation is done from netevent notifier block, RTNL lock can't be used (atomic context). Use the table's spin lock to protect lookups from TC user removal operation. bh is used since netevent can be called from a softirq context. Lookup/add operations - The hash table access functions are taking care of the protection between lookup and add operations. When adding/removing encap headers and rules to/from the HW, RTNL lock is used. It can happen when: 1. The user inserts/removes a TC rule into/from the driver (TC commands are protected by RTNL lock of it's caller). 2. The driver gets neighbour notification event, which reports about neighbour validity status change. Before adding/removing encap headers and rules to/from the HW, RTNL lock is taken. A neighbour hash table entry should be freed when its encap list is empty. Since The neighbour update netevent notification schedules a neighbour update work that uses the neighbour hash entry, it can't be freed unconditionally when the encap list becomes empty during TC delete rule flow. Use reference count to protect from freeing neighbour hash table entry while it's still in use. When the user asks to unregister a netdvice used by one of the neigbours, neighbour removal notification is received. Then we take a reference on the neighbour and don't free it until the relevant encap entries (and flows) are marked as invalid (not offloaded) and removed from HW. As long as the encap entry is still valid (checked under RTNL lock) we can safely access the neighbour device saved on mlx5e_neigh struct. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Add neighbour hash table to the representorsHadar Hen Zion
Add hash table to the representors which is to be used by the next patch to save neighbours information in the driver. In order to offload IP tunnel encapsulation rules, the driver must find the tunnel dst neighbour according to the output device and the destination address given by the user. The next patch will cache the neighbors information in the driver to allow support in neigh update flow for tunnel encap rules. The neighbour entries are also saved in a list so we easily iterate over them when querying statistics in order to provide 'used' feedback to the kernel neighbour NUD core. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Read neigh parameters with proper lockingHadar Hen Zion
The nud_state and hardware address fields are protected by the neighbour lock, we should acquire it before accessing those parameters. Use this lock to avoid inconsistency between the neighbour validity state and it's hardware address. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Use flag to properly monitor a flow rule offloading stateHadar Hen Zion
Instead of relaying on the 'flow->rule' pointer value which can be valid or invalid (in case the FW returns an error while trying to offload the rule), monitor the rule state using a flag. In downstream patch which adds support to IP tunneling neigh update flow, a TC rule could be cached in the driver and not offloaded into the HW. In this case, the flow handle pointer stays NULL. Check the offloaded flag to properly deal with rules which are currently not offloaded when querying rule statistics. This patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Remove output device parameter from create encap header helpers ↵Hadar Hen Zion
definition Passing output device parameter to the helper functions that deal with creation of encapsulation headers is redundant. Output device parameter can be defined inside those helpers, no need to pass it. Refactor the code by removing the parameter from the function signature. This patch doesn't change any functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Move the encap entry structure from the eswitch headerOr Gerlitz
The encap entry structure isn't manipulated by the eswitch code, hence it can/needs to be removed from the eswitch header. Do that, and change it to have mlx5e_ prefix. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5: Remove encap entry pointer from the eswitch flow attributesOr Gerlitz
Encap wise, the tc eswitch flow attribute struct needs to have only the encap ID which is programmed later to the HW and none of the higher level encap params, fix that. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30net/mlx5e: Extendable vport representor netdev private dataSaeed Mahameed
Make representor netdev private data extendable by adding new struct "mlx5e_rep_priv" and use it as the rep netdev private data struct instead of directly pointing to mlx5_eswitch_rep. Added new en_rep.h header file to contain all representor related definitions and prototypes, and moved all representor specific logic into en_rep.c. Needed for downstream patches to extend representor functionality to support neighbour update. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2017-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24Merge tag 'mlx5-fixes-2017-04-22' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2017-04-22 This series contains some mlx5 fixes for net. For your convenience, the series doesn't introduce any conflict with the ongoing net-next pull request. Please pull and let me know if there's any problem. For -stable: ("net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5") kernels >= 4.10 ("net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling") kernels >= 4.8 ("net/mlx5e: Fix small packet threshold") kernels >= 4.7 ("net/mlx5: Fix driver load bad flow when having fw initializing timeout") kernels >= 4.4 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net/mlx5e: Fix race in mlx5e_sw_stats and mlx5e_vport_statsMartin KaFai Lau
We have observed a sudden spike in rx/tx_packets and rx/tx_bytes reported under /proc/net/dev. There is a race in mlx5e_update_stats() and some of the get-stats functions (the one that we hit is the mlx5e_get_stats() which is called by ndo_get_stats64()). In particular, the very first thing mlx5e_update_sw_counters() does is 'memset(s, 0, sizeof(*s))'. For example, if mlx5e_get_stats() is unlucky at one point, rx_bytes and rx_packets could be 0. One second later, a normal (and much bigger than 0) value will be reported. This patch is to use a 'struct mlx5e_sw_stats temp' to avoid a direct memset zero on priv->stats.sw. mlx5e_update_vport_counters() has a similar race. Hence, addressed together. However, memset zero is removed instead because it is not needed. I am lucky enough to catch this 0-reset in rx multicast: eth0: 41457665 76804 70 0 0 70 0 47085 15586634 87502 3 0 0 0 3 0 eth0: 41459860 76815 70 0 0 70 0 47094 15588376 87516 3 0 0 0 3 0 eth0: 41460577 76822 70 0 0 70 0 0 15589083 87521 3 0 0 0 3 0 eth0: 41463293 76838 70 0 0 70 0 47108 15595872 87538 3 0 0 0 3 0 eth0: 41463379 76839 70 0 0 70 0 47116 15596138 87539 3 0 0 0 3 0 v2: Remove memset zero from mlx5e_update_vport_counters() v1: Use temp and memcpy Fixes: 9218b44dcc05 ("net/mlx5e: Statistics handling refactoring") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-22net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handlingIlan Tayari
Handler for ETHTOOL_GRXCLSRLALL must set info->data to the size of the table, regardless of the amount of entries in it. Existing code does not do that, and this breaks all usage of ethtool -N or -n without explicit location, with this error: rmgr: Invalid RX class rules table size: Success Set info->data to the table size. Tested: ethtool -n ens8 ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 loc 55 ethtool -n ens8 ethtool -N ens8 delete 1023 ethtool -N ens8 delete 55 Fixes: f913a72aa008 ("net/mlx5e: Add support to get ethtool flow rules") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22net/mlx5e: Fix small packet thresholdEugenia Emantayev
RX packet headers are meant to be contained in SKB linear part, and chose a threshold of 128. It turns out this is not enough, i.e. for IPv6 packet over VxLAN. In this case, UDP/IPv4 needs 42 bytes, GENEVE header is 8 bytes, and 86 bytes for TCP/IPv6. In total 136 bytes that is more than current 128 bytes. In this case expand header flow is reached. The warning in skb_try_coalesce() caused by a wrong truesize was already fixed here: commit 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()"). Still, we prefer to totally avoid the expand header flow for performance reasons. Tested regular TCP_STREAM with iperf for 1 and 8 streams, no degradation was found. Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)") Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22net/mlx5: Fix UAR memory leakMaor Gottlieb
When UAR is released, we deallocate the device resource, but don't unmmap the UAR mapping memory. Fix the leak by unmapping this memory. Fixes: a6d51b68611e9 ('net/mlx5: Introduce blue flame register allocator) Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnelsOr Gerlitz
Otherwise the code that fills the ipv6 encapsulation headers could be writing beyond the allocated headers buffer. Fixes: ce99f6b97fcd ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnelsOr Gerlitz
Otherwise the code that fills the ipv4 encapsulation headers could be writing beyond the allocated headers buffer. Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5Or Gerlitz
On ConnectX5 the wqe inline mode is "none" and hence the FW reports MLX5_CAP_INLINE_MODE_NOT_REQUIRED. Fix our devlink callbacks to deal with that on get and set. Also fix the tc flow parsing code not to fail anything when inline isn't required. Fixes: bffaa916588e ('net/mlx5: E-Switch, Add control for inline mode') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22net/mlx5: Fix driver load bad flow when having fw initializing timeoutMohamad Haj Yahia
If FW is stuck in initializing state we will skip the driver load, but current error handling flow doesn't clean previously allocated command interface resources. Fixes: e3297246c2c8 ('net/mlx5_core: Wait for FW readiness on startup') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22mlx5: fix warning about missing prototypeStephen Hemminger
Fix sparse warning about missing prototypes. The rx/tx code path defines functions with prototypes in ipoib.h. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22mlx5: hide unused functionsStephen Hemminger
Fix sparse warnings in recent ipoib support. The RDMA functions are not used yet, hide behind #ifdef. Based on comment, they will eventually be local so make static. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22net/mlx5: E-Switch, Add control for encapsulationRoi Dayan
Implement the devlink e-switch encapsulation control set and get callbacks. Apply the value set by the user on the switchdev offloads mode when creating the fast FDB table where offloaded rules will be set. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev modeOr Gerlitz
Refactor the creation of the fast path FDB table that holds the offloaded rules in SRIOV switchdev mode into it's own function. This will be used in the next patch to be able and re-create the table under different settings without going through legacy mode. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-20net/mlx5e: IPoIB, Fix error handling in mlx5_rdma_netdev_alloc()Dan Carpenter
The labels were out of order, so it either could result in an Oops or a leak. Fixes: 48935bbb7ae8 ("net/mlx5e: IPoIB, Add netdevice profile skeleton") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20mlxsw: spectrum: Add FID miss trapJiri Pirko
When there is no FID set for a specific packet, the HW will drop it. However, by default these packets are useful to be delivered to CPU as it can inspect them and program HW accordingly. So add this trap. This would only ever happen when port is enslaved to an OVS master. Otherwise, packets would be dropped during VLAN / STP filtering, before FID classification. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20mlxsw: spectrum: Allow ports to work under OVS masterJiri Pirko
>From now on, a port can become a slave of OVS master. All vlans are enabled, STP state is set to "forwarding". It is up to the OVS userspace daemon to setup the flows either in kernel or in HW using TC flower offload. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20mlxsw: spectrum: Teach mlxsw_sp_port_vlan_set to accept any vlan rangeJiri Pirko
So far, mlxsw_sp_port_vlan_set range is limited by MLXSW_REG_SPVM_REC_MAX_COUNT. In spectrum_switchdev code this is wrapped up by a helper function which actually does multiple calls to FW for bigger ranges. Move the code into mlxsw_sp_port_vlan_set and use it always. That allows caller not to care about the range. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20mlxsw: spectrum_flower: Set dummy FID before forward actionJiri Pirko
HW requires the FID to be valid in order for the forward action to work. So regardless of the current FID validity, just set the dummy FID which would do the trick. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20mlxsw: spectrum: Add dummy FID initializationJiri Pirko
For forwarding using ACL action, HW needs a valid FID to be setup. It does not actually use it, so it can be any valid FID. So create a dummy FID only for this purpose. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20mlxsw: spectrum: Implement action to set FIDJiri Pirko
Implement part of multipurpose Virtual Router and Forwarding Domain Action that takes care of setting up FID. We need to use it to be able to forward packets using ACL action when no FID is associated on RX. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20mlxsw: spectrum: Fix indent in mlxsw_sp_netdevice_port_upper_eventJiri Pirko
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20net/mlx4: suppress 'may be used uninitialized' warningGreg Thelen
gcc 4.8.4 complains that mlx4_SW2HW_MPT_wrapper() uses an uninitialized 'mpt' variable: drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_SW2HW_MPT_wrapper': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2802:12: warning: 'mpt' may be used uninitialized in this function [-Wmaybe-uninitialized] mpt->mtt = mtt; I think this warning is a false complaint. mpt is only used when mr_res_start_move_to() return zero, and in all such cases it initializes mpt. But apparently gcc cannot see that. Initialize mpt to avoid the warning. Signed-off-by: Greg Thelen <gthelen@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net/mlx5e: E-switch vport manager is valid for ethernet onlySaeed Mahameed
Currently the driver support only ethernet eswitch, and we want to protect downstream IPoIB netdev from trying to access it in IB link. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net/mlx5e: IPoIB, RX handlerSaeed Mahameed
Implement IPoIB RX SKB handler. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net/mlx5e: RX handlers per netdev profileSaeed Mahameed
In order to have different RX handler per profile, fix and refactor the current code to take the rx handler directly from the netdevice profile rather than computing it on runtime as it was done with the switchdev mode representor rx handler. This will also remove the current wrong assumption in mlx5e_alloc_rq code that mlx5e_priv->ppriv is of the type vport_rep. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net/mlx5e: IPoIB, Xmit flowSaeed Mahameed
Implement mlx5e's IPoIB SKB transmit using the helper functions provided by mlx5e ethernet tx flow, the only difference in the code between mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill (UD datagram segment) in the TX descriptor (WQE) and it doesn't need to have any vlan handling. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>