From 1f507e80c700e31e358bf4213dc7e4dd614c7c72 Mon Sep 17 00:00:00 2001 From: Adham Faris Date: Mon, 7 Aug 2023 11:05:07 -0700 Subject: net/mlx5: Expose NIC temperature via hardware monitoring kernel API MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Expose NIC temperature by implementing hwmon kernel API, which turns current thermal zone kernel API to redundant. For each one of the supported and exposed thermal diode sensors, expose the following attributes: 1) Input temperature. 2) Highest temperature. 3) Temperature label: Depends on the firmware capability, if firmware doesn't support sensors naming, the fallback naming convention would be: "sensorX", where X is the HW spec (MTMP register) sensor index. 4) Temperature critical max value: refers to the high threshold of Warning Event. Will be exposed as `tempY_crit` hwmon attribute (RO attribute). For example for ConnectX5 HCA's this temperature value will be 105 Celsius, 10 degrees lower than the HW shutdown temperature). 5) Temperature reset history: resets highest temperature. For example, for dualport ConnectX5 NIC with a single IC thermal diode sensor will have 2 hwmon directories (one for each PCI function) under "/sys/class/hwmon/hwmon[X,Y]". Listing one of the directories above (hwmonX/Y) generates the corresponding output below: $ grep -H -d skip . /sys/class/hwmon/hwmon0/* Output ======================================================================= /sys/class/hwmon/hwmon0/name:mlx5 /sys/class/hwmon/hwmon0/temp1_crit:105000 /sys/class/hwmon/hwmon0/temp1_highest:48000 /sys/class/hwmon/hwmon0/temp1_input:46000 /sys/class/hwmon/hwmon0/temp1_label:asic grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied In addition, displaying the sensors data via lm_sensors generates the corresponding output below: $ sensors Output ======================================================================= mlx5-pci-0800 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) mlx5-pci-0801 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) CC: Jean Delvare Signed-off-by: Adham Faris Reviewed-by: Tariq Toukan Reviewed-by: Gal Pressman Signed-off-by: Saeed Mahameed Acked-by: Guenter Roeck Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mellanox/mlx5/core/hwmon.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/hwmon.h (limited to 'drivers/net/ethernet/mellanox/mlx5/core/hwmon.h') diff --git a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h new file mode 100644 index 000000000000..999654a9b9da --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ +#ifndef __MLX5_HWMON_H__ +#define __MLX5_HWMON_H__ + +#include + +#if IS_ENABLED(CONFIG_HWMON) + +int mlx5_hwmon_dev_register(struct mlx5_core_dev *mdev); +void mlx5_hwmon_dev_unregister(struct mlx5_core_dev *mdev); + +#else +static inline int mlx5_hwmon_dev_register(struct mlx5_core_dev *mdev) +{ + return 0; +} + +static inline void mlx5_hwmon_dev_unregister(struct mlx5_core_dev *mdev) {} + +#endif + +#endif /* __MLX5_HWMON_H__ */ -- cgit