summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
AgeCommit message (Collapse)Author
2021-03-12net/mlx5: Use order-0 allocations for EQsTariq Toukan
Currently we are allocating high-order page for EQs. In case of fragmented system, VF hot remove/add in VMs for example, there isn't enough contiguous memory for EQs allocation, which results in crashing of the VM. Therefore, use order-0 fragments for the EQ allocations instead. Performance tests: ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz Performance tests show no sensible degradation. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-26net/mlx5e: free page before returnPan Bian
Instead of directly return, goto the error handling label to free allocated page. Fixes: 5f29458b77d5 ("net/mlx5e: Support dump callback in TX reporter") Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-12-08net/mlx5e: Allow SQ outside of channel contextEran Ben Elisha
In order to be able to create an SQ outside of a channel context, remove sq->channel direct pointer. This requires adding a direct pointer to: netdevice, priv and mlx5_core in order to support SQs that are part of mlx5e_channel. Use channel_stats from the corresponding CQ. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-12-08net/mlx5e: Allow RQ outside of channel contextAya Levin
In order to be able to create an RQ outside of a channel context, remove rq->channel direct pointer. This requires adding a direct pointer to: ICOSQ and priv in order to support RQs that are part of mlx5e_channel. Use channel_stats from the corresponding CQ. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-12-08net/mlx5e: Allow CQ outside of channel contextAya Levin
In order to be able to create a CQ outside of a channel context, remove cq->channel direct pointer. This requires adding a direct pointer to channel statistics, netdevice, priv and to mlx5_core in order to support CQs that are a part of mlx5e_channel. In addition, parameters the were previously derived from the channel like napi, NUMA node, channel stats and index are now assembled in struct mlx5e_create_cq_param which is given to mlx5e_open_cq() instead of channel pointer. Generalizing mlx5e_open_cq() allows opening CQ outside of channel context which will be used in following patches in the patch-set. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-15net/mlx5: Fix uninitialized variable warningMoshe Tal
Add variable initialization to eliminate the warning "variable may be used uninitialized". Fixes: 5f29458b77d5 ("net/mlx5e: Support dump callback in TX reporter") Signed-off-by: Moshe Tal <moshet@mellanox.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-07-02net/mlx5e: Add EQ info to TX/RX reporter's diagnoseAya Levin
Enhance TX/RX reporter's diagnose to include info about the corresponding EQ. $ devlink health diagnose pci/0000:00:0b.0 reporter rx Common config: RQ: type: 2 stride size: 2048 size: 8 CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 1713 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0 ci: 0 size: 1024 EQ: eqn: 7 irqn: 42 vecidx: 1 ci: 93 size: 2048 channel ix: 1 rqn: 1718 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0 ci: 0 size: 1024 EQ: eqn: 8 irqn: 43 vecidx: 2 ci: 2 size: 2048 $ devlink health diagnose pci/0000:00:0b.0 reporter tx Common Config: SQ: stride size: 64 size: 1024 CQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 1712 HW state: 1 stopped: false cc: 91 pc: 91 CQ: cqn: 1030 HW status: 0 ci: 91 size: 1024 EQ: eqn: 7 irqn: 42 vecidx: 1 ci: 93 size: 2048 channel ix: 1 tc: 0 txq ix: 1 sqn: 1717 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1034 HW status: 0 ci: 0 size: 1024 EQ: eqn: 8 irqn: 43 vecidx: 2 ci: 2 size: 2048 Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-02net/mlx5e: Enhance CQ data on diagnose outputAya Levin
Add CQ's consumer index and size to the CQ's diagnose output retruved on RX/TX reporter diadgnose. $ devlink health diagnose pci/0000:00:0b.0 reporter rx Common config: RQ: type: 2 stride size: 2048 size: 8 CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 2413 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0 ci: 0 size: 1024 channel ix: 1 rqn: 2418 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0 ci: 0 size: 1024 $ devlink health diagnose pci/0000:00:0b.0 reporter tx Common Config: SQ: stride size: 64 size: 1024 CQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 2412 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1030 HW status: 0 ci: 0 size: 1024 channel ix: 1 tc: 0 txq ix: 1 sqn: 2417 HW state: 1 stopped: false cc: 5 pc: 5 CQ: cqn: 1034 HW status: 0 ci: 5 size: 1024 Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-02net/mlx5e: Rename reporter's helpersAya Levin
Change prefix to match resident file: %s/mlx5e_reporter_cq_diagnose/mlx5e_health_cq_diag_fmsg %s/mlx5e_reporter_cq_common_diagnose/mlx5e_health_cq_common_diag_fmsg %s/mlx5e_reporter_named_obj_nest_start/mlx5e_health_fmsg_named_obj_nest_start %s/mlx5e_reporter_named_obj_nest_end/mlx5e_health_fmsg_named_obj_nest_end Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-02net/mlx5e: Change reporters create functions to return voidEran Ben Elisha
Creation of devlink health reporters is not fatal for mlx5e instance load. In case of error in reporter's creation, the return value is ignored. Change all reporters creation functions to return void. In addition, with this change, a failure in creating a reporter, will not prevent the driver from trying to create the next reporter in the list. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-23net/mlx5: Update cq.c to new cmd interfaceLeon Romanovsky
Do mass update of cq.c to reuse newly introduced mlx5_cmd_exec_in*() interfaces. Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Conflict resolution of ice_virtchnl_pf.c based upon work by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18mlx5: Use proper logging and tracing line terminationsJoe Perches
netdev_err should use newline termination but mlx5_health_report is used in a trace output function devlink_health_report where no newline should be used. Remove the newlines from a couple formats and add a format string of "%s\n" to the netdev_err call to not directly output the logging string. Also use snprintf to avoid any possible output string overrun. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-18net/mlx5e: Support dump callback in TX reporterAya Levin
Add support for SQ's FW dump on TX reporter's events. Use Resource dump API to retrieve the relevant data: SX slice, SQ dump and SQ buffer. Wrap it in formatted messages and store the binary output in devlink core. Example: $ devlink health dump show pci/0000:00:0b.0 reporter tx SX Slice: data: 00 00 00 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 02 01 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 20 40 90 81 88 ff ff 00 00 00 00 00 00 00 00 15 00 15 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 81 ae 41 06 00 ea ff ff SQs: SQ: index: 1511 data: 00 00 00 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 02 01 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 20 40 90 81 88 ff ff 00 00 00 00 00 00 00 00 15 00 15 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 81 ae 41 06 00 ea ff ff SQ: index: 1516 data: 00 00 00 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 02 01 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 20 40 90 81 88 ff ff 00 00 00 00 00 00 00 00 15 00 15 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 81 ae 41 06 00 ea ff ff $ devlink health dump show pci/0000:00:0b.0 reporter tx -jp { "SX Slice": { "data": [ 0,0,0,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,32,64,144,129,136,255,255,0,0,0,0,0,0,0,0,21,0,21,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,129,174,65,6,0,234,255,255], }, "SQs": [ { "SQ": { "index": 1511, "data": [ 0,0,0,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,32,64,144,129,136,255,255,0,0,0,0,0,0,0,0,21,0,21,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,129,174,65,6,0,234,255,255] } },{ "SQ": { "index": 1516, "data": [ 0,0,0,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,32,64,144,129,136,255,255,0,0,0,0,0,0,0,0,21,0,21,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,129,174,65,6,0,234,255,255] } } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-18net/mlx5e: Fix crash in recovery flow without devlink reporterAya Levin
When health reporters are not supported, recovery function is invoked directly, not via devlink health reporters. In this direct flow, the recover function input parameter was passed incorrectly and is causing a kernel oops. This patch is fixing the input parameter. Following call trace is observed on rx error health reporting. Internal error: Oops: 96000007 [#1] PREEMPT SMP Process kworker/u16:4 (pid: 4584, stack limit = 0x00000000c9e45703) Call trace: mlx5e_rx_reporter_err_rq_cqe_recover+0x30/0x164 [mlx5_core] mlx5e_health_report+0x60/0x6c [mlx5_core] mlx5e_reporter_rq_cqe_err+0x6c/0x90 [mlx5_core] mlx5e_rq_err_cqe_work+0x20/0x2c [mlx5_core] process_one_work+0x168/0x3d0 worker_thread+0x58/0x3d0 kthread+0x108/0x134 Fixes: c50de4af1d63 ("net/mlx5e: Generalize tx reporter's functionality") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06net/mlx5e: Always print health reporter message to dmesgEran Ben Elisha
In case a reporter exists, error message is logged only to the devlink tracer. The devlink tracer is a visibility utility only, which user can choose not to monitor. After cited patch, 3rd party monitoring tools that tracks these error message will no longer find them in dmesg, causing a regression. With this patch, error messages are also logged into the dmesg. Fixes: c50de4af1d63 ("net/mlx5e: Generalize tx reporter's functionality") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Add support to rx reporter diagnoseAya Levin
Add rx reporter, which supports diagnose call-back. Diagnostics output include: information common to all RQs: RQ type, RQ size, RQ stride size, CQ size and CQ stride size. In addition advertise information per RQ and its related icosq and attached CQ. $ devlink health diagnose pci/0000:00:0b.0 reporter rx Common config: RQ: type: 2 stride size: 2048 size: 8 CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 4308 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0 channel ix: 1 rqn: 4313 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0 channel ix: 2 rqn: 4318 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1040 HW status: 0 channel ix: 3 rqn: 4323 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1044 HW status: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter rx -jp { "Common config": { "RQ": { "type": 2, "stride size": 2048, "size": 8 }, "CQ": { "stride size": 64, "size": 1024 } }, "RQs": [ { "channel ix": 0, "rqn": 4308, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1032, "HW status": 0 } },{ "channel ix": 1, "rqn": 4313, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1036, "HW status": 0 } },{ "channel ix": 2, "rqn": 4318, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1040, "HW status": 0 } },{ "channel ix": 3, "rqn": 4323, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1044, "HW status": 0 } } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Add helper functions for reporter's basicsAya Levin
Introduce helper functions for create and destroy reporters and update channels. In the following patch, rx reporter is added and it will use these helpers too. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Add cq info to tx reporter diagnoseAya Levin
Add cq information to general diagnose output: CQ size and stride size. Per SQ add information about the related CQ: cqn and CQ's HW status. $ devlink health diagnose pci/0000:00:0b.0 reporter tx Common Config: SQ: stride size: 64 size: 1024 CQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 4307 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1030 HW status: 0 channel ix: 1 tc: 0 txq ix: 1 sqn: 4312 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1034 HW status: 0 channel ix: 2 tc: 0 txq ix: 2 sqn: 4317 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1038 HW status: 0 channel ix: 3 tc: 0 txq ix: 3 sqn: 4322 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1042 HW status: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp { "Common Config": { "SQ": { "stride size": 64, "size": 1024 }, "CQ": { "stride size": 64, "size": 1024 } }, "SQs": [ { "channel ix": 0, "tc": 0, "txq ix": 0, "sqn": 4307, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1030, "HW status": 0 } },{ "channel ix": 1, "tc": 0, "txq ix": 1, "sqn": 4312, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1034, "HW status": 0 } },{ "channel ix": 2, "tc": 0, "txq ix": 2, "sqn": 4317, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1038, "HW status": 0 } },{ "channel ix": 3, "tc": 0, "txq ix": 3, "sqn": 4322, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1042, "HW status": 0 } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Extend tx reporter diagnostics outputAya Levin
Enhance tx reporter's diagnostics output to include: information common to all SQs: SQ size, SQ stride size. In addition add channel ix, tc, txq ix, cc and pc. $ devlink health diagnose pci/0000:00:0b.0 reporter tx Common config: SQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 4307 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 1 tc: 0 txq ix: 1 sqn: 4312 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 2 tc: 0 txq ix: 2 sqn: 4317 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 3 tc: 0 txq ix: 3 sqn: 4322 HW state: 1 stopped: false cc: 0 pc: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp { "Common config": { "SQ": { "stride size": 64, "size": 1024 } }, "SQs": [ { "channel ix": 0, "tc": 0, "txq ix": 0, "sqn": 4307, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 1, "tc": 0, "txq ix": 1, "sqn": 4312, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 2, "tc": 0, "txq ix": 2, "sqn": 4317, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 3, "tc": 0, "txq ix": 3, "sqn": 4322, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Generalize tx reporter's functionalityAya Levin
Prepare for code sharing with rx reporter, which is added in the following patches in the set. Introduce a generic error_ctx for agnostic recovery despatch. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>