diff options
author | Eran Ben Elisha <eranbe@mellanox.com> | 2017-12-26 16:02:24 +0200 |
---|---|---|
committer | Saeed Mahameed <saeedm@mellanox.com> | 2018-03-27 17:29:28 -0700 |
commit | db75373c91b0cfb6a68ad6ae88721e4e21ae6261 (patch) | |
tree | 5724e4db736f47e6bce86154b3018b6881b30a95 /drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | |
parent | 16cc14d817338fc297970d2d9d146c88ec87474d (diff) |
net/mlx5e: Recover Send Queue (SQ) from error state
An error TX completion (CQE) which arrived on a specific SQ indicates
that this SQ got moved by the hardware to error state, which means all
pending and incoming TX requests are dropped or will be dropped and no
further "Good" CQEs will be generated for that SQ.
Before this patch TX completions (CQEs) were not monitored and were
handled as a regular CQE. This caused the SQ to stay in an error state,
making it useless for xmiting new packets.
Mitigation plan:
In case of an error completion, schedule a recovery work which would do
the following:
- Mark the TXQ as DRV_XOFF to disable new packets to arrive from the
stack
- NAPI to flush all pending SQ WQEs (via flush_in_error_en bit) to
release SW and HW resources(SKB, DMA, etc) and have the SQ and CQ
consumer/producer indices synced.
- Modify the SQ state ERR -> RST -> RDY (restart the SQ).
- Reactivate the SQ and reset SQ cc and pc
If we identify two consecutive requests for SQ recover in less than
500 msecs, drop the recover request to avoid CPU overload, as this
scenario most likely happened due to a severe repeated bug.
In addition, add SQ recover SW counter to monitor successful recoveries.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Diffstat (limited to 'drivers/net/ethernet/mellanox/mlx5/core/en_stats.h')
-rw-r--r-- | drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h index 43dc808684c9..53111a2df587 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h @@ -79,6 +79,7 @@ struct mlx5e_sw_stats { u64 tx_queue_dropped; u64 tx_xmit_more; u64 tx_cqe_err; + u64 tx_recover; u64 rx_wqe_err; u64 rx_mpwqe_filler; u64 rx_buff_alloc_err; @@ -199,6 +200,7 @@ struct mlx5e_sq_stats { u64 wake; u64 dropped; u64 cqe_err; + u64 recover; }; struct mlx5e_ch_stats { |