summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5/core/Makefile
AgeCommit message (Collapse)Author
2019-11-01net/mlx5: DR, Replace CRC32 implementation to use kernel libHamdan Igbaria
Use kernel function to calculate crc32 Instead of dr implementation since it has the same algorithm "slice by 8". Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities") Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-03net/mlx5: Add direct rule fs_cmd implementationMaor Gottlieb
Add support to create flow steering objects via direct rule API (SW steering). New layer is added - fs_dr, this layer translates the command that fs_core sends to the FW into direct rule API. In case that direct rule is not supported in some feature then -EOPNOTSUPP is returned. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-03net/mlx5: DR, Add CONFIG_MLX5_SW_STEERING for software steering supportAlex Vesker
Add new mlx5 Kconfig flag to allow selecting software steering support and compile all the steering files only if the flag is selected. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-02Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Merge mlx5-next patches needed for upcoming mlx5 software steering. 1) Alex adds HW bits and definitions required for SW steering 2) Ariel moves device memory management to mlx5_core (From mlx5_ib) 3) Maor, Cleanups and fixups for eswitch mode and RoCE 4) Mark, Set only stag for match untagged packets Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-01net/mlx5: Move device memory management to mlx5_coreAriel Levkovich
Move the device memory allocation and deallocation commands SW ICM memory to mlx5_core to expose this API for all mlx5_core users. This comes as preparation for supporting SW steering in kernel where it will be required to allocate and register device memory for direct rule insertion. In addition, an API to register this device memory for future remote access operations is introduced using the create_mkey commands. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-22net/mlx5e: Add mlx5e HV VHCA stats agentEran Ben Elisha
HV VHCA stats agent is responsible on running a preiodic rx/tx packets/bytes stats update. Currently the supported format is version MLX5_HV_VHCA_STATS_VERSION. Block ID 1 is dedicated for statistics data transfer from the VF to the PF. The reporter fetch the statistics data from all opened channels, fill it in a buffer and send it to mlx5_hv_vhca_write_agent. As the stats layer should include some metadata per block (sequence and offset), the HV VHCA layer shall modify the buffer before actually send it over block 1. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-22net/mlx5: Add HV VHCA infrastructureEran Ben Elisha
HV VHCA is a layer which provides PF to VF communication channel based on HyperV PCI config channel. It implements Mellanox's Inter VHCA control communication protocol. The protocol contains control block in order to pass messages between the PF and VF drivers, and data blocks in order to pass actual data. The infrastructure is agent based. Each agent will be responsible of contiguous buffer blocks in the VHCA config space. This infrastructure will bind agents to their blocks, and those agents can only access read/write the buffer blocks assigned to them. Each agent will provide three callbacks (control, invalidate, cleanup). Control will be invoked when block-0 is invalidated with a command that concerns this agent. Invalidate callback will be invoked if one of the blocks assigned to this agent was invalidated. Cleanup will be invoked before the agent is being freed in order to clean all of its open resources or deferred works. Block-0 serves as the control block. All execution commands from the PF will be written by the PF over this block. VF will ack on those by writing on block-0 as well. Its format is described by struct mlx5_hv_vhca_control_block layout. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-22net/mlx5: Add wrappers for HyperV PCIe operationsEran Ben Elisha
Add wrapper functions for HyperV PCIe read / write / block_invalidate_register operations. This will be used as an infrastructure in the downstream patch for software communication. This will be enabled by default if CONFIG_PCI_HYPERV_INTERFACE is set. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-21net/mlx5e: Add tc flower tracepointsDmytro Linkin
Implemented following tracepoints: 1. Configure flower (mlx5e_configure_flower) 2. Delete flower (mlx5e_delete_flower) 3. Stats flower (mlx5e_stats_flower) Usage example: ># cd /sys/kernel/debug/tracing ># echo mlx5:mlx5e_configure_flower >> set_event ># cat trace ... tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT Added corresponding documentation in Documentation/networking/device-driver/mellanox/mlx5.rst Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Add support to rx reporter diagnoseAya Levin
Add rx reporter, which supports diagnose call-back. Diagnostics output include: information common to all RQs: RQ type, RQ size, RQ stride size, CQ size and CQ stride size. In addition advertise information per RQ and its related icosq and attached CQ. $ devlink health diagnose pci/0000:00:0b.0 reporter rx Common config: RQ: type: 2 stride size: 2048 size: 8 CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 4308 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0 channel ix: 1 rqn: 4313 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0 channel ix: 2 rqn: 4318 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1040 HW status: 0 channel ix: 3 rqn: 4323 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1044 HW status: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter rx -jp { "Common config": { "RQ": { "type": 2, "stride size": 2048, "size": 8 }, "CQ": { "stride size": 64, "size": 1024 } }, "RQs": [ { "channel ix": 0, "rqn": 4308, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1032, "HW status": 0 } },{ "channel ix": 1, "rqn": 4313, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1036, "HW status": 0 } },{ "channel ix": 2, "rqn": 4318, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1040, "HW status": 0 } },{ "channel ix": 3, "rqn": 4323, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1044, "HW status": 0 } } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Generalize tx reporter's functionalityAya Levin
Prepare for code sharing with rx reporter, which is added in the following patches in the set. Introduce a generic error_ctx for agnostic recovery despatch. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-05net/mlx5e: Add kTLS TX HW offload supportTariq Toukan
Add support for transmit side kernel-TLS acceleration. Offload the crypto encryption to HW. Per TLS connection: - Use a separate TIS to maintain the HW context. - Use a separate encryption key. - Maintain static and progress HW contexts by posting the proper WQEs at creation time, or upon resync. - Use a special DUMP opcode to replay the previous frags and sync the HW context. To make sure the SQ is able to serve an xmit request, increase SQ stop room to cover: - static params WQE, - progress params WQE, and - resync DUMP per frag. Currently supporting TLS 1.2, and key size 128bit. Tested over SimX simulator. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05net/mlx5: Add crypto library to support create/destroy encryption keyTariq Toukan
Encryption key create / destroy is done via CREATE_GENERAL_OBJECT / DESTROY_GENERAL_OBJECT commands. To be used in downstream patches by TLS API wrappers, to configure the TIS context with the encryption key. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05net/mlx5: Kconfig, Better organize compilation flagsTariq Toukan
Always contain all acceleration functions declarations in 'accel' files, independent to the flags setting. For this, introduce new flags CONFIG_FPGA_{IPSEC/TLS} and use stubs where needed. This obsoletes the need for stubs in 'fpga' files. Remove them. Also use the new flags in Makefile, to decide whether to compile TLS-specific or IPSEC-specific objects, or not. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-07-03 The following pull-request contains BPF updates for your *net-next* tree. There is a minor merge conflict in mlx5 due to 8960b38932be ("linux/dim: Rename externally used net_dim members") which has been pulled into your tree in the meantime, but resolution seems not that bad ... getting current bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed just so he's aware of the resolution below: ** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD static int mlx5e_open_cq(struct mlx5e_channel *c, struct dim_cq_moder moder, struct mlx5e_cq_param *param, struct mlx5e_cq *cq) ======= int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder, struct mlx5e_cq_param *param, struct mlx5e_cq *cq) >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497 Resolution is to take the second chunk and rename net_dim_cq_moder into dim_cq_moder. Also the signature for mlx5e_open_cq() in ... drivers/net/ethernet/mellanox/mlx5/core/en.h +977 ... and in mlx5e_open_xsk() ... drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64 ... needs the same rename from net_dim_cq_moder into dim_cq_moder. ** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); struct dim_cq_moder icocq_moder = {0, 0}; struct net_device *netdev = priv->netdev; struct mlx5e_channel *c; unsigned int irq; ======= struct net_dim_cq_moder icocq_moder = {0, 0}; >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497 Take the second chunk and rename net_dim_cq_moder into dim_cq_moder as well. Let me know if you run into any issues. Anyway, the main changes are: 1) Long-awaited AF_XDP support for mlx5e driver, from Maxim. 2) Addition of two new per-cgroup BPF hooks for getsockopt and setsockopt along with a new sockopt program type which allows more fine-grained pass/reject settings for containers. Also add a sock_ops callback that can be selectively enabled on a per-socket basis and is executed for every RTT to help tracking TCP statistics, both features from Stanislav. 3) Follow-up fix from loops in precision tracking which was not propagating precision marks and as a result verifier assumed that some branches were not taken and therefore wrongly removed as dead code, from Alexei. 4) Fix BPF cgroup release synchronization race which could lead to a double-free if a leaf's cgroup_bpf object is released and a new BPF program is attached to the one of ancestor cgroups in parallel, from Roman. 5) Support for bulking XDP_TX on veth devices which improves performance in some cases by around 9%, from Toshiaki. 6) Allow for lookups into BPF devmap and improve feedback when calling into bpf_redirect_map() as lookup is now performed right away in the helper itself, from Toke. 7) Add support for fq's Earliest Departure Time to the Host Bandwidth Manager (HBM) sample BPF program, from Lawrence. 8) Various cleanups and minor fixes all over the place from many others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-28Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Misc updates from mlx5-next branch: 1) E-Switch vport metadata support for source vport matching 2) Convert mkey_table to XArray 3) Shared IRQs and to use single IRQ for all async EQs Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-27net/mlx5e: Add XSK zero-copy supportMaxim Mikityanskiy
This commit adds support for AF_XDP zero-copy RX and TX. We create a dedicated XSK RQ inside the channel, it means that two RQs are running simultaneously: one for non-XSK traffic and the other for XSK traffic. The regular and XSK RQs use a single ID namespace split into two halves: the lower half is regular RQs, and the upper half is XSK RQs. When any zero-copy AF_XDP socket is active, changing the number of channels is not allowed, because it would break to mapping between XSK RQ IDs and channels. XSK requires different page allocation and release routines. Such functions as mlx5e_{alloc,free}_rx_mpwqe and mlx5e_{get,put}_rx_frag are generic enough to be used for both regular and XSK RQs, and they use the mlx5e_page_{alloc,release} wrappers around the real allocation functions. Function pointers are not used to avoid losing the performance with retpolines. Wherever it's certain that the regular (non-XSK) page release function should be used, it's called directly. Only the stats that could be meaningful for XSK are exposed to the userspace. Those that don't take part in the XSK flow are not considered. Note that we don't wait for WQEs on the XSK RQ (unlike the regular RQ), because the newer xdpsock sample doesn't provide any Fill Ring entries at the setup stage. We create a dedicated XSK SQ in the channel. This separation has its advantages: 1. When the UMEM is closed, the XSK SQ can also be closed and stop receiving completions. If an existing SQ was used for XSK, it would continue receiving completions for the packets of the closed socket. If a new UMEM was opened at that point, it would start getting completions that don't belong to it. 2. Calculating statistics separately. When the userspace kicks the TX, the driver triggers a hardware interrupt by posting a NOP to a dedicated XSK ICO (internal control operations) SQ, in order to trigger NAPI on the right CPU core. This XSK ICO SQ is protected by a spinlock, as the userspace application may kick the TX from any core. Store the pointers to the UMEMs in the net device private context, independently from the kernel. This way the driver can distinguish between the zero-copy and non-zero-copy UMEMs. The kernel function xdp_get_umem_from_qid does not care about this difference, but the driver is only interested in zero-copy UMEMs, particularly, on the cleanup it determines whether to close the XSK RQ and SQ or not by looking at the presence of the UMEM. Use state_lock to protect the access to this area of UMEM pointers. LRO isn't compatible with XDP, but there may be active UMEMs while XDP is off. If this is the case, don't allow LRO to ensure XDP can be reenabled at any time. The validation of XSK parameters typically happens when XSK queues open. However, when the interface is down or the XDP program isn't set, it's still possible to have active AF_XDP sockets and even to open new, but the XSK queues will be closed. To cover these cases, perform the validation also in these flows: 1. A new UMEM is registered, but the XSK queues aren't going to be created due to missing XDP program or interface being down. 2. MTU changes while there are UMEMs registered. Having this early check prevents mlx5e_open_channels from failing at a later stage, where recovery is impossible and the application has no chance to handle the error, because it got the successful return value for an MTU change or XSK open operation. The performance testing was performed on a machine with the following configuration: - 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz - Mellanox ConnectX-5 Ex with 100 Gbit/s link The results with retpoline disabled, single stream: txonly: 33.3 Mpps (21.5 Mpps with queue and app pinned to the same CPU) rxdrop: 12.2 Mpps l2fwd: 9.4 Mpps The results with retpoline enabled, single stream: txonly: 21.3 Mpps (14.1 Mpps with queue and app pinned to the same CPU) rxdrop: 9.9 Mpps l2fwd: 6.8 Mpps Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-13net/mlx5: Add Crdump supportAlex Vesker
Crdump allows the driver to retrieve a dump of the FW PCI crspace. This is useful in case of catastrophic issues which may require FW reset. The crspace dump can be used for later debug. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Add Vendor Specific Capability access gatewayAlex Vesker
The Vendor Specific Capability (VSC) is used to activate a gateway interfacing with the device. The gateway is used to read or write device configurations, which are organized in different domains (spaces). A configuration access may result in multiple actions, reads, writes. Example usages are accessing the Crspace domain to read the crspace or locking a device semaphore using the Semaphore domain. The configuration access use pci_cfg_access to prevent parallel access to the VSC space by the driver and userspace calls. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Move all devlink related functions calls to devlink.cEran Ben Elisha
Centralize all devlink related callbacks in one file. In the downstream patch, some more functionality will be added, this patch is preparing the driver infrastructure for it. Currently, move devlink un/register functions calls into this file. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Move all IRQ logic to pci_irq.cYuval Avnery
Finalize IRQ separation and expose irq interface. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-31net/mlx5e: Use termination table for VLAN push actionsOz Shlomo
HW does not support push VLAN action in the RX direction (packets arriving from the wire). The FW works around this limitation by haripining the packet. The hairpin workaround applies only when the push VLAN action is specified in a termination table, assuring that there are no actions following the haripin. Instantiate termination table for push VLAN actions. Re-use identical terminating tables for increased HW cache efficiency. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-31net/mlx5e: Geneve, Add support for encap/decap flows offloadYevgeny Kliteynik
Add HW offloading support for flows with Geneve encap/decap. Notes about decap flows with Geneve TLV Options: - Support offloading of 32-bit options data only - At any given time, only one combination of class/type parameters can be offloaded, but the same class/type combination can have many different flows offloaded with different 32-bit option data - Options with value of 0 can't be offloaded Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-31net/mlx5e: Rearrange tc tunnel code in a modular wayYevgeny Kliteynik
Rearrange tc tunnel code so that it would be easy to add future tunnels: - Define tc tunnel object with the fields and callbacks that any tunnel must implement. - Define tc UDP tunnel object for UDP tunnels, such as VXLAN - Move each tunnel code (GRE, VXLAN) to its own separate file - Rewrite tc tunnel implementation in a general way - using only the objects and their callbacks. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-31net/mlx5: Geneve, Manage Geneve TLV optionsYevgeny Kliteynik
Use Geneve TLV Options object to manage the flex parser matching on the 32-bit options data. When the first flow with a certain class/type values is requested to be offloaded, create a FW object with FW command (Geneve TLV Options general object) and start counting the number of flows using this object. During this time, any request with a different class/type values will fail to be offloaded. Once the refcount reaches 0, destroy the TLV options general object, and can now offload a flow with any class/type parameters. Geneve TLV Options object is added to core device. It is currently used to manage Geneve TLV options general object allocation in FW and its reference counting only. In the future it will also be used for managing geneve ports by registering callbacks for ndo_udp_tunnel_add/del. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5e: remove meaningless CFLAGS_tracepoint.oMasahiro Yamada
CFLAGS_tracepoint.o specifies CFLAGS for compiling tracepoint.c but it does not exist under drivers/net/ethernet/mellanox/mlx5/core/. CFLAGS_tracepoint.o is unused. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This merge commit includes some misc shared code updates from mlx5-next branch needed for net-next. 1) From Aya: Enable general events on all physical link types and restrict general event handling of subtype DELAY_DROP_TIMEOUT in mlx5 rdma driver to ethernet links only as it was intended. 2) From Eli: Introduce low level bits for prio tag mode 3) From Maor: Low level steering updates to support RDMA RX flow steering and enables RoCE loopback traffic when switchdev is enabled. 4) From Vu and Parav: Two small mlx5 core cleanups 5) From Yevgeny add HW definitions of geneve offloads Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-29net/mlx5: Eswitch, enable RoCE loopback trafficMaor Gottlieb
When in switchdev mode, we would like to treat loopback RoCE traffic (on eswitch manager) as RDMA and not as regular Ethernet traffic In order to enable it we add flow steering rule that forward RoCE loopback traffic to the HW RoCE filter (by adding allow rule). In addition we add RoCE address in GID index 0, which will be set in the RoCE loopback packet. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23net/mlx5e: Move parameter calculation functions to en/params.cMaxim Mikityanskiy
This commit moves the parameter calculation functions to a separate file for better modularity and code sharing with future features. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5: Add multipath modeRoi Dayan
In order to offload ecmp-on-host scheme where next-hop routes are used, we will make use of HW LAG. Add accessor function to let upper layers in the driver to realize if the lag acts in multi-path mode. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-22net/mlx5e: Fix GRE key by controlling port tunnel entropy calculationEli Britstein
Flow entropy is calculated on the inner packet headers and used for flow distribution in processing, routing etc. For GRE-type encapsulations the entropy value is placed in the eight LSB of the key field in the GRE header as defined in NVGRE RFC 7637. For UDP based encapsulations the entropy value is placed in the source port of the UDP header. The hardware may support entropy calculation specifically for GRE and for all tunneling protocols. With commit df2ef3bff193 ("net/mlx5e: Add GRE protocol offloading") GRE is offloaded, but the hardware is configured by default to calculate flow entropy so packets transmitted on the wire have a wrong key. To support UDP based tunnels (i.e VXLAN), GRE (i.e. no flow entropy) and NVGRE (i.e. with flow entropy) the hardware behaviour must be controlled by the driver. Ensure port entropy calculation is enabled for offloaded VXLAN tunnels and disable port entropy calculation in the presence of offloaded GRE tunnels by monitoring the presence of entropy enabling tunnels (i.e VXLAN) and entropy disabing tunnels (i.e GRE). Fixes: df2ef3bff193 ("net/mlx5e: Add GRE protocol offloading") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Merge mlx5-next shared branched into net-next, From Bodong Wang: 1) Introduction of ECPF (Embedded CPU Physical Function), and low level bits for mlx5 SmartNic capabilities support. 2) Vport enumeration refactoring that affect mlx5_ib and mlx5_core From Aya Levin, 3) Add support for 50Gbps per lane link modes in the Port Type and Speed register (PTYS) 4) Refactor low level query functions for PTYS register 5) Add support for 50Gbps per lane link modes to mlx5_ib Note: due to a change in API in mlx5/core and a later patch from net-next, a fixup was squashed with this merge commit that replaces FDB_UPLINK_VPORT with MLX5_VPORT_UPLINK which exists only in upstream net-next. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Introduce Mellanox SmartNIC and modify page management logicBodong Wang
Mellanox's SmartNIC combines embedded CPU(e.g, ARM) processing power with advanced network offloads to accelerate a multitude of security, networking and storage applications. With the introduction of the SmartNIC, there is a new PCI function called Embedded CPU Physical Function(ECPF). And it's possible for a PF to get its ICM pages from the ECPF PCI function. Driver shall identify if it is running on such a function by reading a bit in the initialization segment. When firmware asks for pages, it would issue a page request event specifying how many pages it requests and for which function. That driver responds with a manage_pages command providing the requested pages along with an indication for which function it is providing these pages. The encoding before this patch was as follows: function_id == 0: pages are requested for the function receiving the EQE. function_id != 0: pages are requested for VF identified by the function_id value A new one bit field in the EQE identifies that pages are requested for the ECPF. The notion of page_supplier can be introduced here and to support that, manage pages and query pages were modified so firmware can distinguish the following cases: 1. Function provides pages for itself 2. PF provides pages for its VF 3. ECPF provides pages to itself 4. ECPF provides pages for another function This distinction is possible through the introduction of the bit "embedded_cpu_function" in query_pages, manage_pages and page request EQE. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-07net/mlx5e: Add tx reporter supportEran Ben Elisha
Add mlx5e tx reporter to devlink health reporters. This reporter will be responsible for diagnosing, reporting and recovering of tx errors. This patch declares the TX reporter operations and creates it using the devlink health API. Currently, this reporter supports reporting and recovering from send error CQE only. In addition, it adds diagnose information for the open SQs. For a local SQ recover (due to driver error report), in case of SQ recover failure, the recover operation will be considered as a failure. For a full tx recover, an attempt to close and open the channels will be done. If this one passed successfully, it will be considered as a successful recover. The SQ recover from error CQE flow is not a new feature in the driver, this patch re-organize the functions and adapt them for the devlink health API. For this purpose, move code from en_main.c to a new file named reporter_tx.c. Diagnose output: $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p { "SQs": [ { "sqn": 138, "HW state": 1, "stopped": false },{ "sqn": 142, "HW state": 1, "stopped": false } ] } $devlink health diagnose pci/0000:00:09.0 reporter tx SQs: sqn: 138 HW state: 1 stopped: false sqn: 142 HW state: 1 stopped: false Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25net: Revert devlink health changes.David S. Miller
This reverts the devlink health changes from 9/17/2019, Jiri wants things to be designed differently and it was agreed that the easiest way to do this is start from the beginning again. Commits reverted: cb5ccfbe73b389470e1dc11061bb185ef4bc9aec 880ee82f0313453ec5a6cb122866ac057263066b c7af343b4e33578b7de91786a3f639c8cfa0d97b ff253fedab961b22117a73ab808fcfa9e6852b50 6f9d56132eb6d2603d4273cfc65bed914ec47acb fcd852c69d776c0f46c8f79e8e431e5cc6ddc7b7 8a66704a13d9713593342e29b4f0c19762f5746b 12bd0dcefe88782ac1c9fff632958dd1b71d27e5 aba25279c10094c5c97d09c3491ca86d00b4ad5e ce019faa70f81555fa17ebc1d5a03651f2e7e15a b8c45a033acc607201588f7665ba84207e5149e0 And the follow-on build fix: o33a0efa4baecd689da9474ce0e8b673eb6931c60 Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18net/mlx5e: Add TX reporter supportEran Ben Elisha
Add mlx5e tx reporter to devlink health reporters. This reporter will be responsible for diagnosing, reporting and recovering of TX errors. This patch declares the TX reporter operations and allocate it using the devlink health API. Currently, this reporter supports reporting and recovering from send error CQE only. In addition, it adds diagnose information for the open SQs. For a local SQ recover (due to driver error report), in case of SQ recover failure, the recover operation will be considered as a failure. For a full TX recover, an attempt to close and open the channels will be done. If this one passed successfully, it will be considered as a successful recover. The SQ recover from error CQE flow is not a new feature in the driver, this patch re-organize the functions and adapt them for the devlink health API. For this purpose, move code from en_main.c to a new file named reporter_tx.c. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15RDMA/mad: Reduce MAD scope to mlx5_ib onlyLeon Romanovsky
Management Datagram Interface (MAD) is applicable only when physical port is Infiniband. It makes MAD command logic to be completely unrelated to eth/core parts of mlx5. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-14net/mlx5: Introduce inter-device communication mechanismAviv Heller
This introduces devcom, a generic mechanism for performing operations on both physical functions of the same Connect-X card. The first user of this API is merged eswitch, which will be introduced in subsequent patches. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11net/mlx5e: Monitor counters commands supportEyal Davidovich
new file monitor_stats.c for the new API. add arm_monitor_counter new command support. add set_monitor_counter new command support. The device can monitor specific counters and provide an event to notify when these counters are changed. The monitoring is done in best effort manner where the minimum notification period is 200 ms, however when the device is loaded, the notification might be delayed. To configure the required counters to be monitored, the SET_MONITOR_COUNTER command shall be used with a list of counters to be monitored. The device firmware can monitor up to HCA_CAP.max_num_of_monitor_counters. The configuration is done based on counter type (such as ppcnt, q counter, etc) and additional param according to the type of counter selected. Upon monitor counter change, the device will generate Monitor_Counter_Change event. The device will not generate new events unless the driver re-arms the monitoring functionality, using the ARM_MONITOR_COUNTER command. Signed-off-by: Eyal Davidovich <eyald@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10net/mlx5e: Move TC tunnel offloading code to separate source fileOz Shlomo
Move tunnel offloading related code to a separate source file for better code maintainability. Code refactoring with no functional change. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-04RDMA/mlx5: Initialize SRQ tables on mlx5_ibLeon Romanovsky
Transfer initialization and cleanup from mlx5_priv struct of mlx5_core_dev to be part of mlx5_ib_dev. This completes removal of SRQ from mlx5_core. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26net/mlx5: Device events, Use async events chainSaeed Mahameed
Move all the generic async events handling into new specific events handling file events.c to keep eq.c file clean from concrete event logic handling. Use new API to register for NOTIFY_ANY to handle generic events and dispatch allowed events to mlx5_core consumers (mlx5_ib and mlx5e) Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-08-13net/mlx5: Reorganize the makefileSaeed Mahameed
Reorganize the Makefile and group files together according to their functionality and importance. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-08-13net/mlx5e: clock.c depends on CONFIG_PTP_1588_CLOCKMoshe Shemesh
lib/clock.c includes clock related functions which require ptp support. Thus compile out lib/clock.c and add the needed function stubs in case kconfig CONFIG_PTP_1588_CLOCK is off. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-08-13net/mlx5e: vxlan.c depends on CONFIG_VXLANSaeed Mahameed
When vxlan is not enabled by kernel, no need to enable it in mlx5. Compile out lib/vxlan.c if CONFIG_VXLAN is not selected. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
2018-08-13net/mlx5e: Add CONFIG_MLX5_EN_ARFS for accelerated flow steering supportSaeed Mahameed
Add new mlx5 Kconfig flag to allow selecting accelerated flow steering support, and compile out en_arfs.c if not selected. Move arfs declarations and definitions to en/fs.h header file. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
2018-08-13net/mlx5e: Add CONFIG_MLX5_EN_RXNFC for ethtool rx nfcSaeed Mahameed
Add new mlx5 Kconfig flag to allow selecting ethtool rx nfc support, and compile out en_fs_ehtool.c if not selected. Add en/fs.h header file to host all steering declarations and definitions. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
2018-07-27net/mlx5e: Vxlan, move vxlan logic to core driverSaeed Mahameed
Move vxlan logic and objects to mlx5 core dirver. Since it going to be used from different mlx5 interfaces. e.g. mlx5e PF NIC netdev and mlx5e E-Switch representors. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2018-07-26net/mlx5e: Move XDP related code into new XDP filesTariq Toukan
Take XDP code out of the general EN header and RX file into new XDP files. Currently, XDP-SQ resides only within an RQ and used from a single flow (XDP_TX) triggered upon RX completions. In a downstream patch, additional type of XDP-SQ instances will be presented and used for the XDP_REDIRECT flow, totally unrelated to the RX context. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23net/mlx5e: Use PARTIAL_GSO for UDP segmentationBoris Pismenny
This patch removes the splitting of UDP_GSO_L4 packets in the driver, and exposes UDP_GSO_L4 as a PARTIAL_GSO feature. Thus, the network stack is not responsible for splitting the packet into two. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>