summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
AgeCommit message (Collapse)Author
2021-10-29net/mlx5e: IPsec: Refactor checksum code in tx data pathRaed Salem
Part of code that is related solely to IPsec is always compiled in the driver code regardless if the IPsec functionality is enabled or disabled in the driver code, this will add unnecessary branch in case IPsec is disabled at Tx data path. Move IPsec related code to IPsec related file such that in case of IPsec is disabled and because of unlikely macro the compiler should be able to optimize and omit the checksum IPsec code all together from Tx data path Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-20net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flagsEmeel Hakim
Current Work Queue Entry (WQE) checksum (csum) flags in the ethernet segment (eseg) in case of IPsec crypto offload datapath are not aligned with PRM/HW expectations. Currently the driver always sets the l3_inner_csum flag in case of IPsec because of the wrong usage of skb->encapsulation as indicator for inner IPsec header since skb->encapsulation is always ON for IPsec packets since IPsec itself is an encapsulation protocol. The above forced a failing attempts of calculating csum of non-existing segments (like in the IP|ESP|TCP packet case which does not have an l3_inner) which led to lots of packet drops hence the low throughput. Fix by using xo->inner_ipproto as indicator for inner IPsec header instead of skb->encapsulation in addition to setting the csum flags as following: * Tunnel Mode: * Pkt: MAC IP ESP IP L4 * CSUM: l3_cs | l3_inner_cs | l4_inner_cs * * Transport Mode: * Pkt: MAC IP ESP L4 * CSUM: l3_cs [ | l4_cs (checksum partial case)] * * Tunnel(VXLAN TCP/UDP) over Transport Mode * Pkt: MAC IP ESP UDP VXLAN IP L4 * CSUM: l3_cs | l3_inner_cs | l4_inner_cs Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Trivial conflicts in net/can/isotp.c and tools/testing/selftests/net/mptcp/mptcp_connect.sh scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c to include/linux/ptp_clock_kernel.h in -next so re-apply the fix there. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-09net/mlx5e: Fix select queue to consider SKBTX_HW_TSTAMPAya Levin
Steering packets to PTP-SQ should be done only if the SKB has SKBTX_HW_TSTAMP set in the tx_flags. While here, take the function into a header and inline it. Set the whole condition to select the PTP-SQ to unlikely. Fixes: 24c22dd0918b ("net/mlx5e: Add states to PTP channel") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-03net/mlx5e: Remove unreachable code in mlx5e_xmit()Vladyslav Tarasiuk
After some commits, mlx5e_txwqe_build_eseg() lost its ability to return boolean value and became effectively void. Change its return type to void and remove unreachable branches. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-29net/mlx5e: Add states to PTP channelAya Levin
Add PTP TX state to PTP channel, which indicates the corresponding SQ is available. Further patches in the set extend PTP channel to include RQ. The PTP channel state will be used for separation and coexistence of RX and TX PTP. Enhance conditions to verify the TX PTP state is set. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-25net/mlx5e: Generalize PTP implementationAya Levin
Following patches in the set add support for RX PTP. Rename PTP prefix from %s/port_ptp/ptp/g to include RX PTP too. In addition rename indication (used in statistics context) that PTP-SQ was opened: %s/port_ptp_opened/tx_ptp_opened/g. This will simplify adding indication that PTP-RQ was opened. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapathMaxim Mikityanskiy
Commit e20f0dbf204f ("net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES") switched to using net_prefetchw at all places in mlx5e. In the same time frame, commit 5af75c747e2a ("net/mlx5e: Enhanced TX MPWQE for SKBs") added one more usage of prefetchw. When these two changes were merged, this new occurrence of prefetchw wasn't replaced with net_prefetchw. This commit fixes this last occurrence of prefetchw in mlx5e_tx_mpwqe_session_start, making the same change that was done in mlx5e_xdp_mpwqe_session_start. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-16Merge branch 'mlx5-next' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== pull-request: mlx5-next 2021-02-16 The patches in this pr are already submitted and reviewed through the netdev and rdma mailing lists. The series includes mlx5 HW bits and definitions for mlx5 real time clock translation and handling in the mlx5 driver clock module to enable and support such mode [1] [1] https://patchwork.kernel.org/project/netdevbpf/patch/20210212223042.449816-7-saeed@kernel.org/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16net/mlx5: Add cyc2time HW translation mode supportAya Levin
Device timestamp can be in real time mode (cycles to time translation is offloaded into the Hardware). With real time mode, HW provides timestamp which is already translated into nanoseconds. With this mode, driver adjusts both the HW and timecounter (to keep clock_info_page updated) using callbacks: adjfreq, adjtime and settime. HW clock modifications are done via MTUTC access reg commands. Driver is allowed to modify HW real time clock only if MCAM ptpcyc2realtime_modify capability is set. Add MTUTC set function to be used for configuring the HW real time clock. Modify existing code to support both internal timer (with conversion via timecounter_cyc2time() and real time (no conversions). Align the signatures of the helpers converting from timestamp to nanoseconds. With that, when allocating a queue assign the corresponding callback with respect to the capability. Adjust 1PPS timestamp calculation flows based on the timestamp mode. Cyc2time offload brings two major advantages: - Improve MTAE (Max Time Absolute Error) for HW TS by up to 160 ns over a 100% loaded CPU. - Faster data-path timestamp to nanoseconds, as translation is lock-less and done in HW. On real time mode, timestamp format is 32 high bits of seconds and 32 low bits of nanoseconds. On some flows, driver shall convert this format into nanoseconds wall-clock with REAL_TIME_TO_NS macro. HW supports a single clock, and it is shared by all functions on a device. In case real time clock is used, it is recommended to use a single GM to all device's functions. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-22net/mlx5e: Support HTB offloadMaxim Mikityanskiy
This commit adds support for HTB offload in the mlx5e driver. Performance: NIC: Mellanox ConnectX-6 Dx CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (24 cores with HT) 100 Gbit/s line rate, 500 UDP streams @ ~200 Mbit/s each 48 traffic classes, flower used for steering No shaping (rate limits set to 4 Gbit/s per TC) - checking for max throughput. Baseline: 98.7 Gbps, 8.25 Mpps HTB: 6.7 Gbps, 0.56 Mpps HTB offload: 95.6 Gbps, 8.00 Mpps Limitations: 1. 256 leaf nodes, 3 levels of depth. 2. Granularity for ceil is 1 Mbit/s. Rates are converted to weights, and the bandwidth is split among the siblings according to these weights. Other parameters for classes are not supported. Ethtool statistics support for QoS SQs are also added. The counters are called qos_txN_*, where N is the QoS queue number (starting from 0, the numeration is separate from the normal SQs), and * is the counter name (the counters are the same as for the normal SQs). Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-13net/mlx5e: IPsec, Enclose csum logic under ipsec configTariq Toukan
All IPsec logic should be wrapped under the compile flag, including its checksum logic. Introduce an inline function in ipsec datapath header, with a corresponding stub. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Huy Nguyen <huyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net/mlx5e: Fix SWP offsets when vlan inserted by driverMoshe Shemesh
In case WQE includes inline header the vlan is inserted by driver even if vlan offload is set. On geneve over vlan interface where software parser is used the SWP offsets should be updated according to the added vlan. Fixes: e3cfc7e6b7bd ("net/mlx5e: TX, Add geneve tunnel stateless offload support") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-12-08net/mlx5e: Add TX port timestamp supportEran Ben Elisha
Transmitted packet timestamping accuracy can be improved when using timestamp from the port, instead of packet CQE creation timestamp, as it better reflects the actual time of a packet's transmit. TX port timestamping is supported starting from ConnectX6-DX hardware. Although at the original completion, only CQE timestamp can be attached, we are able to get TX port timestamping via an additional completion over a special CQ associated with the SQ (in addition to the regular CQ). Driver to ignore the original packet completion timestamp, and report back the timestamp of the special CQ completion. If the absolute timestamp diff between the two completions is greater than 1 / 128 second, ignore the TX port timestamp as it has a jitter which is too big. No skb will be generate out of the extra completion. Allocate additional CQ per ptpsq, to receive the TX port timestamp. Driver to hold an skb FIFO in order to map between transmitted skb to the two expected completions. When using ptpsq, hold double refcount on the skb, to gaurantee it will not get released before both completions arrive. Expose dedicated counters of the ptp additional CQ and connect it to the TX health reporter. This patch improves TX Hardware timestamping offset to be less than 40ns at a 100Gbps line rate, compared to 600ns before. With that, making our HW compliant with G.8273.2 class C, and allow Linux systems to be deployed in the 5G telco edge, where this standard is a must. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-12-08net/mlx5e: Add TX PTP port object supportEran Ben Elisha
Add TX PTP port object support for better TX timestamping accuracy. Currently, driver supports CQE based TX port timestamp. Device also offers TX port timestamp, which has less jitter and better reflects the actual time of a packet's transmit. Define new driver layout called ptpsq, on which driver will create SQs that will support TX port timestamp for their transmitted packets. Driver to identify PTP TX skbs and steer them to these dedicated SQs as part of the select queue ndo. Driver to hold ptpsq per TC and report them at netif_set_real_num_tx_queues(). Add support for all needed functionality in order to xmit and poll completions received via ptpsq. Add ptpsq to the TX reporter recover, diagnose and dump methods. Creation of ptpsqs is disabled by default, and can be enabled via tx_port_ts private flag. This patch steer all timestamp related packets to a ptpsq, but it does not open the port timestamp support for it. The support will be added in the following patch. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-12-08net/mlx5e: Change skb fifo push/pop API to be used without SQEran Ben Elisha
The skb fifo push/pop API used pre-defined attributes within the mlx5e_txqsq. In order to share the skb fifo API with other non-SQ use cases, change the API input to get newly defined mlx5e_skb_fifo struct. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-12-08net/mlx5e: Allow CQ outside of channel contextAya Levin
In order to be able to create a CQ outside of a channel context, remove cq->channel direct pointer. This requires adding a direct pointer to channel statistics, netdevice, priv and to mlx5_core in order to support CQs that are a part of mlx5e_channel. In addition, parameters the were previously derived from the channel like napi, NUMA node, channel stats and index are now assembled in struct mlx5e_create_cq_param which is given to mlx5e_open_cq() instead of channel pointer. Generalizing mlx5e_open_cq() allows opening CQ outside of channel context which will be used in following patches in the patch-set. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-12-03net/mlx5e: kTLS, Enforce HW TX csum offload with kTLSTariq Toukan
Checksum calculation cannot be done in SW for TX kTLS HW offloaded packets. Offload it to the device, disregard the declared state of the TX csum offload feature. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.Huy Nguyen
The IP's checksum partial still requires L4 csum flag on Ethernet WQE. Make the IPsec WAs only for the IP's non checksum partial case (for example icmd packet) Fixes: 5be019040cb7 ("net/mlx5e: IPsec: Add Connect-X IPsec Tx data path offload") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-12net/mlx5e: IPsec: Add Connect-X IPsec Tx data path offloadRaed Salem
In the TX data path, spot packets with xfrm stack IPsec offload indication. Fill Software-Parser segment in TX descriptor so that the hardware may parse the ESP protocol, and perform TX checksum offload on the inner payload. Support GSO, by providing the trailer data and ICV placeholder so HW can fill it post encryption operation. Padding alignment cannot be performed in HW (ConnectX-6Dx) due to a bug. Software can overcome this limitation by adding NETIF_F_HW_ESP to the gso_partial_features field in netdev so the packets being aligned by the stack. l4_inner_checksum cannot be offloaded by HW for IPsec tunnel type packet. Note that for GSO SKBs, the stack does not include an ESP trailer, unlike the non-GSO case. Below is the iperf3 performance report on two server of 24 cores Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz with ConnectX6-DX. All the bandwidth test uses iperf3 TCP traffic with packet size 128KB. Each tunnel uses one iperf3 stream with one thread (option -P1). TX crypto offload shows improvements on both bandwidth and CPU utilization. ---------------------------------------------------------------------- Mode | Num tunnel | BW | Send CPU util | Recv CPU util | | (Gbps) | (Average %) | (Average %) ---------------------------------------------------------------------- Cryto offload | | | | (RX only) | 1 | 4.7 | 4.2 | 3.5 ---------------------------------------------------------------------- Cryto offload | | | | (RX only) | 24 | 15.6 | 20 | 10 ---------------------------------------------------------------------- Non-offload | 1 | 4.6 | 4 | 5 ---------------------------------------------------------------------- Non-offload | 24 | 11.9 | 16 | 12 ---------------------------------------------------------------------- Cryto offload | | | | (TX & RX) | 1 | 11.9 | 2.1 | 5.9 ---------------------------------------------------------------------- Cryto offload | | | | (TX & RX) | 24 | 38 | 9.5 | 27.5 ---------------------------------------------------------------------- Cryto offload | | | | (TX only) | 1 | 4.7 | 0.7 | 5 ---------------------------------------------------------------------- Cryto offload | | | | (TX only) | 24 | 14.5 | 6 | 20 Regression tests show no degradation on non-ipsec and non-offload-ipsec traffics. The packet rate test uses pktgen UDP to transmit on single CPU, the instructions and cycles are measured on the transmit CPU. before: ---------------------------------------------------------------------- Non-offload | 1 | 4.7 | 4.2 | 5.1 ---------------------------------------------------------------------- Non-offload | 24 | 11.2 | 14 | 15 ---------------------------------------------------------------------- Non-ipsec | 1 | 28 | 4 | 5.7 ---------------------------------------------------------------------- Non-ipsec | 24 | 68.3 | 17.8 | 39.7 ---------------------------------------------------------------------- Non-ipsec packet rate(BURST=1000 BC=5 NCPUS=1 SIZE=60) 13.56Mpps, 456 instructions/pkt, 191 cycles/pkt after: ---------------------------------------------------------------------- Non-offload | 1 | 4.69 | 4.2 | 5 ---------------------------------------------------------------------- Non-offload | 24 | 11.9 | 13.5 | 15.1 ---------------------------------------------------------------------- Non-ipsec | 1 | 29 | 3.2 | 5.5 ---------------------------------------------------------------------- Non-ipsec | 24 | 68.2 | 18.5 | 39.8 ---------------------------------------------------------------------- Non-ipsec packet rate: 13.56Mpps, 472 instructions/pkt, 191 cycles/pkt Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21net/mlx5e: Enhanced TX MPWQE for SKBsMaxim Mikityanskiy
This commit adds support for Enhanced TX MPWQE feature in the regular (SKB) data path. A MPWQE (multi-packet work queue element) can serve multiple packets, reducing the PCI bandwidth on control traffic. Two new stats (tx*_mpwqe_blks and tx*_mpwqe_pkts) are added. The feature is on by default and controlled by the skb_tx_mpwqe private flag. In a MPWQE, eseg is shared among all packets, so eseg-based offloads (IPSEC, GENEVE, checksum) run on a separate eseg that is compared to the eseg of the current MPWQE session to decide if the new packet can be added to the same session. MPWQE is not compatible with certain offloads and features, such as TLS offload, TSO, nonlinear SKBs. If such incompatible features are in use, the driver gracefully falls back to non-MPWQE. This change has no performance impact in TCP single stream test and XDP_TX single stream test. UDP pktgen, 64-byte packets, single stream, MPWQE off: Packet rate: 16.96 Mpps (±0.12 Mpps) -> 17.01 Mpps (±0.20 Mpps) Instructions per packet: 421 -> 429 Cycles per packet: 156 -> 161 Instructions per cycle: 2.70 -> 2.67 UDP pktgen, 64-byte packets, single stream, MPWQE on: Packet rate: 16.96 Mpps (±0.12 Mpps) -> 20.94 Mpps (±0.33 Mpps) Instructions per packet: 421 -> 329 Cycles per packet: 156 -> 123 Instructions per cycle: 2.70 -> 2.67 Enabling MPWQE can reduce PCI bandwidth: PCI Gen2, pktgen at fixed rate of 36864000 pps on 24 CPU cores: Inbound PCI utilization with MPWQE off: 80.3% Inbound PCI utilization with MPWQE on: 59.0% PCI Gen3, pktgen at fixed rate of 56064000 pps on 24 CPU cores: Inbound PCI utilization with MPWQE off: 65.4% Inbound PCI utilization with MPWQE on: 49.3% Enabling MPWQE can also reduce CPU load, increasing the packet rate in case of CPU bottleneck: PCI Gen2, pktgen at full rate on 24 CPU cores: Packet rate with MPWQE off: 37.5 Mpps Packet rate with MPWQE on: 49.0 Mpps PCI Gen3, pktgen at full rate on 24 CPU cores: Packet rate with MPWQE off: 57.0 Mpps Packet rate with MPWQE on: 66.8 Mpps Burst size in all pktgen tests is 32. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64) NIC: Mellanox ConnectX-6 Dx GCC 10.2.0 Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21net/mlx5e: Move TX code into functions to be used by MPWQEMaxim Mikityanskiy
mlx5e_txwqe_complete performs some actions that can be taken to separate functions: 1. Update the flags needed for hardware timestamping. 2. Stop the TX queue if it's full. Take these actions into separate functions to be reused by the MPWQE code in the following commit and to maintain clear responsibilities of functions. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21net/mlx5e: Support multiple SKBs in a TX WQEMaxim Mikityanskiy
TX MPWQE support for SKBs is coming in one of the following patches, and a single MPWQE can send multiple SKBs. This commit prepares the TX path code to handle such cases: 1. An additional FIFO for SKBs is added, just like the FIFO for DMA chunks. 2. struct mlx5e_tx_wqe_info will contain num_fifo_pkts. If a given WQE contains only one packet, num_fifo_pkts will be zero, and the SKB will be stored in mlx5e_tx_wqe_info, as usual. If num_fifo_pkts > 0, the SKB pointer will be NULL, and the SKBs will be stored in the FIFO. This change has no performance impact in TCP single stream test and XDP_TX single stream test. When compiled with a recent GCC, this change shows no visible performance impact on UDP pktgen (burst 32) single stream test either: Packet rate: 16.95 Mpps (±0.15 Mpps) -> 16.96 Mpps (±0.12 Mpps) Instructions per packet: 429 -> 421 Cycles per packet: 160 -> 156 Instructions per cycle: 2.69 -> 2.70 CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64) NIC: Mellanox ConnectX-6 Dx GCC 10.2.0 Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21net/mlx5e: Move the TLS resync check out of the functionMaxim Mikityanskiy
Before this patch, mlx5e_ktls_tx_handle_resync_dump_comp checked for resync_dump_frag_page. It happened for all WQEs without an SKB, including padding WQEs, and required a function call. Normally, padding WQEs happen more often than TLS resyncs. Take this check out of the function and put it to an inline function to save a call on all padding WQEs. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21net/mlx5e: Unify constants for WQE_EMPTY_DS_COUNTMaxim Mikityanskiy
A constant for the number of DS in an empty WQE (i.e. a WQE without data segments) is needed in multiple places (normal TX data path, MPWQE in XDP), but currently we have a constant for XDP and an inline formula in normal TX. This patch introduces a common constant. Additionally, mlx5e_xdp_mpwqe_session_start is converted to use struct assignment, because the code nearby is touched. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21net/mlx5e: Refactor xmit functionsMaxim Mikityanskiy
A huge function mlx5e_sq_xmit was split into several to achieve multiple goals: 1. Reuse the code in IPoIB. 2. Better intergrate with TLS, IPSEC, GENEVE and checksum offloads. Now it's possible to reserve space in the WQ before running eseg-based offloads, so: 2.1. It's not needed to copy cseg and eseg after mlx5e_fill_sq_frag_edge anymore. 2.2. mlx5e_txqsq_get_next_pi will be used instead of the legacy mlx5e_fill_sq_frag_edge for better code maintainability and reuse. 3. Prepare for the upcoming TX MPWQE for SKBs. It will intervene after mlx5e_sq_calc_wqe_attr to check if it's possible to use MPWQE, and the code flow will split into two paths: MPWQE and non-MPWQE. Two high-level functions are provided to send packets: * mlx5e_xmit is called by the networking stack, runs offloads and sends the packet. In one of the following patches, MPWQE support will be added to this flow. * mlx5e_sq_xmit_simple is called by the TLS offload, runs only the checksum offload and sends the packet. This change has no performance impact in TCP single stream test and XDP_TX single stream test. When compiled with a recent GCC, this change shows no visible performance impact on UDP pktgen (burst 32) single stream test either: Packet rate: 16.86 Mpps (±0.15 Mpps) -> 16.95 Mpps (±0.15 Mpps) Instructions per packet: 434 -> 429 Cycles per packet: 158 -> 160 Instructions per cycle: 2.75 -> 2.69 CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64) NIC: Mellanox ConnectX-6 Dx GCC 10.2.0 Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21net/mlx5e: Move mlx5e_tx_wqe_inline_mode to en_tx.cMaxim Mikityanskiy
Move mlx5e_tx_wqe_inline_mode from en/txrx.h to en_tx.c as it's only used there. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21net/mlx5e: Use struct assignment to initialize mlx5e_tx_wqe_infoMaxim Mikityanskiy
Struct assignment guarantees that all fields of the structure are initialized (those that are not mentioned are zeroed). It makes code mode robust and reduces chances for unpredictable behavior when one forgets to reset some field and it holds an old value from previous iterations of using the structure. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21net/mlx5e: Refactor inline header size calculation in the TX pathMaxim Mikityanskiy
As preparation for the next patch, don't increase ihs to calculate ds_cnt and then decrease it, but rather calculate the intermediate value temporarily. This code has the same amount of arithmetic operations, but now allows to split out ds_cnt calculation, which will be performed in the next patch. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-06-27net/mlx5e: kTLS, Improve TLS feature modularityTariq Toukan
Better separate the code into c/h files, so that kTLS internals are exposed to the corresponding non-accel flow as follows: - Necessary datapath functions are exposed via ktls_txrx.h. - Necessary caps and configuration functions are exposed via ktls.h, which became very small. In addition, kTLS internal code sharing is done via ktls_utils.h, which is not exposed to any non-accel file. Add explicit WQE structures for the TLS static and progress params, breaking the union of the static with UMR, and the progress with PSV. Generalize the API as a preparation for TLS RX offload support. Move kTLS TX-specific code to the proper file. Remove the inline tag for function in C files, let the compiler decide. Use kzalloc/kfree for the priv_tx context. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
2020-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22net/mlx5e: Update netdev txq on completions during closureMoshe Shemesh
On sq closure when we free its descriptors, we should also update netdev txq on completions which would not arrive. Otherwise if we reopen sqs and attach them back, for example on fw fatal recovery flow, we may get tx timeout. Fixes: 29429f3300a3 ("net/mlx5e: Timeout if SQ doesn't flush during close") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-09net/mlx5e: Split TX acceleration offloads into two phasesMaxim Mikityanskiy
After previous modifications, the offloads are no longer called one by one, the pi is calculated and the wqe is cleared on between of TLS and IPSEC offloads, which doesn't quite fit mlx5e_accel_handle_tx's purpose. This patch splits mlx5e_accel_handle_tx into two functions that correspond to two logical phases of running offloads: 1. Before fetching a WQE. Here runs the code that can post WQEs on its own, before the main WQE is fetched. It's the main part of TLS offload. 2. After fetching a WQE. Here runs the code that updates the WQE's fields, but can't post other WQEs any more. It's a minor part of TLS offload that sets the tisn field in the cseg, and eseg-based offloads (currently IPSEC, and later patches will move GENEVE and checksum offloads there, too). It allows to make mlx5e_xmit take care of all actions needed to transmit a packet in the right order, improve the structure of the code and reduce unnecessary operations. The structure will be further improved in the following patches (all eseg-based offloads will be moved to a single place, and reserving space for the main WQE will happen between phase 1 and phase 2 of offloads to eliminate unneeded data movements). Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-09net/mlx5e: Return void from mlx5e_sq_xmit and mlx5i_sq_xmitMaxim Mikityanskiy
mlx5e_sq_xmit and mlx5i_sq_xmit always return NETDEV_TX_OK. Drop the return value to simplify the code. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-09net/mlx5e: Return bool from TLS and IPSEC offloadsMaxim Mikityanskiy
TLS and IPSEC offloads currently return struct sk_buff *, but the value is either NULL or the same skb that was passed as a parameter. Return bool instead to provide stronger guarantees to the calling code (it won't need to support handling a different SKB that could be potentially returned before this change) and to simplify restructuring this code in the following commits. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30net/mlx5e: Unify reserving space for WQEsMaxim Mikityanskiy
In our fast-path design, a WQE (Work Queue Element) must not cross the page boundary. To enforce that, for WQEs consisting of more than one BB (Basic Block), the driver checks the available contiguous space in the WQ in advance, and if it's not enough, it pads it with NOPs. This patch modifies the code that calculates the position of next WQE, considering the padding, and prepares the WQE. This code is common for all SQ types. In this patch it's reorganized in a way that makes the usage pattern unified for all SQ types, and makes the implementations self-contained and look almost the same, preparing the repeating code to further attempts to deduplicate it. One place is left as is: mlx5e_sq_xmit and mlx5e_fill_sq_frag_edge call inside, because it is special in a way that it may also copy WQE's cseg and eseg when reserving space. This will be eliminated in one of the following patches, and this place will be converted to the new approach, too. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30net/mlx5e: Fetch WQE: reuse code and enforce typingMaxim Mikityanskiy
There are multiple functions mlx5{e,i}_*_fetch_wqe that contain the same code, that is repeated, because they operate on different SQ struct types. mlx5e_sq_fetch_wqe also returns void *, instead of the concrete WQE type. This commit generalizes the fetch WQE operation by putting this code into a single function. To simplify calls of the generic function in concrete use cases, macros are provided that substitute the right WQE size and cast the return type. Before this patch, fetch_wqe used to calculate pi itself, but the value was often known to the caller. This calculation is moved outside to eliminate this unnecessary step and prepare for the fill_frag_edge refactoring in the next patch. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30net/mlx5e: TX, Generalise code and usage of error CQE dumpTariq Toukan
Error CQE was dumped only for TXQ SQs. Generalise the function, and add usage for error completions on ICO SQs and XDP SQs. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25net/mlx5e: Define one flow for TXQ selection when TCs are configuredEran Ben Elisha
We shall always extract channel index out of the txq, regardless of the relation between txq_ix and num channels. The extraction is always valid, as if txq is smaller than number of channels, txq_ix == priv->txq2sq[txq_ix]->ch_ix. By doing so, we can remove an if clause from the select queue method, and have one flow for all packets. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-06net/mlx5e: TX, Error completion is for last WQE in batchTariq Toukan
For a cyclic work queue, when not requesting a completion per WQE, a single CQE might indicate the completion of several WQEs. However, in case some WQE in the batch causes an error, then an error completion is issued, breaking the batch, and pointing to the offending WQE in the wqe_counter field. Hence, WQE-specific error CQE handling (like printing, breaking, etc...) should be performed only for the last WQE in batch. Fixes: 130c7b46c93d ("net/mlx5e: TX, Dump WQs wqe descriptors on CQE with error events") Fixes: fd9b4be8002c ("net/mlx5e: RX, Support multiple outstanding UMR posts") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix TXQ indices to be sequentialEran Ben Elisha
Cited patch changed (channel index, tc) => (TXQ index) mapping to be a static one, in order to keep indices consistent when changing number of channels or TCs. For 32 channels (OOB) and 8 TCs, real num of TXQs is 256. When reducing the amount of channels to 8, the real num of TXQs will be changed to 64. This indices method is buggy: - Channel #0, TC 3, the TXQ index is 96. - Index 8 is not valid, as there is no such TXQ from driver perspective (As it represents channel #8, TC 0, which is not valid with the above configuration). As part of driver's select queue, it calls netdev_pick_tx which returns an index in the range of real number of TXQs. Depends on the return value, with the examples above, driver could have returned index larger than the real number of tx queues, or crash the kernel as it tries to read invalid address of SQ which was not allocated. Fix that by allocating sequential TXQ indices, and hold a new mapping between (channel index, tc) => (real TXQ index). This mapping will be updated as part of priv channels activation, and is used in mlx5e_select_queue to find the selected queue index. The existing indices mapping (channel_tc2txq) is no longer needed, as it is used only for statistics structures and can be calculated on run time. Delete its definintion and updates. Fixes: 8bfaf07f7806 ("net/mlx5e: Present SW stats when state is not opened") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-03Merge tag 'mlx5-updates-2019-11-01' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-11-01 Misc updates for mlx5 netdev and core driver 1) Steering Core: Replace CRC32 internal implementation with standard kernel lib. 2) Steering Core: Support IPv4 and IPv6 mixed matcher. 3) Steering Core: Lockless FTE read lookups 4) TC: Bit sized fields rewrite support. 5) FPGA: Standalone FPGA support. 6) SRIOV: Reset VF parameters configurations on SRIOV disable. 7) netdev: Dump WQs wqe descriptors on CQE with error events. 8) MISC Cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01net/mlx5e: TX, Dump WQs wqe descriptors on CQE with error eventsSaeed Mahameed
Dump the Work Queue's TX WQE descriptor when a completion with error is received. Example: [5.331832] mlx5_core 0000:00:04.0 enp0s4: Error cqe on cqn 0xa, ci 0x1, TXQ-SQ qpn 0xe, opcode 0xd, syndrome 0x2, vendor syndrome 0x0 [5.333127] 00000000: 55 65 02 75 31 fe c2 d2 6b 6c 62 1e f9 e1 d8 5c [5.333837] 00000010: d3 b2 6c b8 89 e4 84 20 0b f4 3c e0 f3 75 41 ca [5.334568] 00000020: 46 00 00 00 cd 70 a0 92 18 3a 01 de 00 00 00 00 [5.335313] 00000030: 7d bc 05 89 b2 e9 00 02 1e 00 00 0e 00 00 30 d2 [5.335972] WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0x0, len: 64 [5.336710] 00000000: 00 00 00 1e 00 00 0e 04 00 00 00 08 00 00 00 00 [5.337524] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 12 33 33 [5.338151] 00000020: 00 00 00 16 52 54 00 00 00 01 86 dd 60 00 00 00 [5.338740] 00000030: 00 00 00 48 00 00 00 00 00 00 00 00 66 ba 58 14 Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: TX, Fix consumer index of error cqe dumpTariq Toukan
The completion queue consumer index increments upon a call to mlx5_cqwq_pop(). When dumping an error CQE, the index is already incremented. Decrease one for the print command. Fixes: 16cc14d81733 ("net/mlx5e: Dump xmit error completions") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: kTLS, Release reference on DUMPed fragments in shutdown flowTariq Toukan
A call to kTLS completion handler was missing in the TXQSQ release flow. Add it. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18net/mlx5e: Tx, Fix assumption of single WQEBB of NOP in cleanup flowTariq Toukan
Cited patch removed the assumption only in datapath. Here we remove it also form control/cleanup flow. Fixes: 9ab0233728ca ("net/mlx5e: Tx, Don't implicitly assume SKB-less wqe has one WQEBB") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01net/mlx5e: Tx, Soften inline mode VLAN dependenciesTariq Toukan
If capable, use zero inline mode in TX WQE for non-VLAN packets. For VLAN ones, keep the enforcement of at least L2 inline mode, unless the WQE VLAN insertion offload cap is on. Performance: Tested single core packet rate of 64Bytes. NIC: ConnectX-5 CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz pktgen: Before: 12.46 Mpps After: 14.65 Mpps (+17.5%) XDP_TX: The MPWQE flow is not affected, as it already has this optimization. So we test with priv-flag xdp_tx_mpwqe: off. Before: 9.90 Mpps After: 10.20 Mpps (+3%) Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Tested-by: Noam Stolero <noams@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-22net: Use skb accessors in network driversMatthew Wilcox (Oracle)
In preparation for unifying the skb_frag and bio_vec, use the fine accessors which already exist and use skb_frag_t instead of struct skb_frag_struct. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05net/mlx5e: Add kTLS TX HW offload supportTariq Toukan
Add support for transmit side kernel-TLS acceleration. Offload the crypto encryption to HW. Per TLS connection: - Use a separate TIS to maintain the HW context. - Use a separate encryption key. - Maintain static and progress HW contexts by posting the proper WQEs at creation time, or upon resync. - Use a special DUMP opcode to replay the previous frags and sync the HW context. To make sure the SQ is able to serve an xmit request, increase SQ stop room to cover: - static params WQE, - progress params WQE, and - resync DUMP per frag. Currently supporting TLS 1.2, and key size 128bit. Tested over SimX simulator. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05net/mlx5e: Tx, Unconstify SQ stop roomTariq Toukan
Use an SQ field for stop_room, and use the larger value only if TLS is supported. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>