summaryrefslogtreecommitdiff
path: root/drivers/thermal
AgeCommit message (Collapse)Author
2021-01-19thermal/core: Make cooling device state change privateDaniel Lezcano
The change of the cooling device state should be used by the governor or at least by the core code, not by the drivers themselves. Remove the API usage and move the function declaration to the internal headers. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20210118173824.9970-1-daniel.lezcano@linaro.org
2021-01-19thermal: intel: pch: Fix unexpected shutdown at critical temperatureKai-Heng Feng
Like previous patch, the intel_pch_thermal device is not in ACPI ThermalZone namespace, so a critical trip doesn't mean shutdown. Override the default .critical callback to prevent surprising thermal shutdoown. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201221172345.36976-2-kai.heng.feng@canonical.com
2021-01-19thermal: int340x: Fix unexpected shutdown at critical temperatureKai-Heng Feng
We are seeing thermal shutdown on Intel based mobile workstations, the shutdown happens during the first trip handle in thermal_zone_device_register(): kernel: thermal thermal_zone15: critical temperature reached (101 C), shutting down However, we shouldn't do a thermal shutdown here, since 1) We may want to use a dedicated daemon, Intel's thermald in this case, to handle thermal shutdown. 2) For ACPI based system, _CRT doesn't mean shutdown unless it's inside ThermalZone namespace. ACPI Spec, 11.4.4 _CRT (Critical Temperature): "... If this object it present under a device, the device’s driver evaluates this object to determine the device’s critical cooling temperature trip point. This value may then be used by the device’s driver to program an internal device temperature sensor trip point." So a "critical trip" here merely means we should take a more aggressive cooling method. As int340x device isn't present under ACPI ThermalZone, override the default .critical callback to prevent surprising thermal shutdown. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201221172345.36976-1-kai.heng.feng@canonical.com
2021-01-19thermal/core: Remove pointless thermal_zone_device_reset() functionDaniel Lezcano
The function thermal_zone_device_reset() is called in the thermal_zone_device_register() which allocates and initialize the structure. The passive field is already zero-ed by the allocation, the function is useless. Call directly thermal_zone_device_init() instead and thermal_zone_device_reset(). Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201222181110.1231977-1-daniel.lezcano@linaro.org
2021-01-19thermal/core: Remove ms based delay fieldsDaniel Lezcano
The code does no longer use the ms unit based fields to set the delays as they are replaced by the jiffies. Remove them and replace their user to use the jiffies version instead. Cc: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Peter Kästle <peter@piie.net> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20201216220337.839878-3-daniel.lezcano@linaro.org
2021-01-19thermal/core: Use precomputed jiffies for the pollingDaniel Lezcano
The delays are also stored in jiffies based unit. Use them instead of the ms. Cc: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Link: https://lore.kernel.org/r/20201216220337.839878-2-daniel.lezcano@linaro.org
2021-01-19thermal/core: Precompute the delays from msecs to jiffiesDaniel Lezcano
The delays are stored in ms units and when the polling function is called this delay is converted into jiffies at each call. Instead of doing the conversion again and again, compute the jiffies at init time and use the value directly when setting the polling. Cc: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Link: https://lore.kernel.org/r/20201216220337.839878-1-daniel.lezcano@linaro.org
2021-01-19thermal/core: Remove THERMAL_TRIPS_NONE testDaniel Lezcano
The last site calling the thermal_zone_bind_cooling_device() function with the THERMAL_TRIPS_NONE parameter was removed. We can get rid of this test as no user of this function is calling this function with this parameter. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Link: https://lore.kernel.org/r/20201214233811.485669-5-daniel.lezcano@linaro.org
2021-01-19thermal/core: Remove pointless test with the THERMAL_TRIPS_NONE macroDaniel Lezcano
The THERMAL_TRIPS_NONE is equal to -1, it is pointless to do a conversion in this function. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Link: https://lore.kernel.org/r/20201214233811.485669-3-daniel.lezcano@linaro.org
2021-01-19thermal/core: Remove unused functions rebind/unbind exceptionDaniel Lezcano
The functions thermal_zone_device_rebind_exception and thermal_zone_device_unbind_exception are not used from anywhere. Remove that code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Link: https://lore.kernel.org/r/20201214233811.485669-2-daniel.lezcano@linaro.org
2021-01-19thermal/core: Remove the 'forced_passive' optionDaniel Lezcano
The code was reorganized in 2012 with the commit 0c01ebbfd3caf1. The main change is a loop on the trip points array and a unconditional call to the throttle() ops of the governors for each of them even if the trip temperature is not reached yet. With this change, the 'forced_passive' is no longer checked in the thermal_zone_device_update() function but in the step wise governor's throttle() callback. As the force_passive does no belong to the trip point array, the thermal_zone_device_update() can not compare with the specified passive temperature, thus does not detect the passive limit has been crossed. Consequently, throttle() is never called and the 'forced_passive' branch is unreached. In addition, the default processor cooling device is not automatically bound to the thermal zone if there is not passive trip point, thus the 'forced_passive' can not operate. If there is an active trip point, then the throttle function will be called to mitigate at this temperature and the 'forced_passive' will override the mitigation of the active trip point in this case but with the default cooling device bound to the thermal zone, so usually a fan, and that is not a passive cooling effect. Given the regression exists since more than 8 years, nobody complained and at the best of my knowledge there is no bug open in https://bugzilla.kernel.org, it is reasonable to say it is unused. Remove the 'forced_passive' related code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Link: https://lore.kernel.org/r/20201214233811.485669-1-daniel.lezcano@linaro.org
2021-01-14thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platformsViresh Kumar
Several parts of the kernel are already using the effective CPU utilization (as seen by the scheduler) to get the current load on the CPU, do the same here instead of depending on the idle time of the CPU, which isn't that accurate comparatively. This is also the right thing to do as it makes the cpufreq governor (schedutil) align better with the cpufreq_cooling driver, as the power requested by cpufreq_cooling governor will exactly match the next frequency requested by the schedutil governor since they are both using the same metric to calculate load. This was tested on ARM Hikey6220 platform with hackbench, sysbench and schbench. None of them showed any regression or significant improvements. Schbench is the most important ones out of these as it creates the scenario where the utilization numbers provide a better estimate of the future. Scenario 1: The CPUs were mostly idle in the previous polling window of the IPA governor as the tasks were sleeping and here are the details from traces (load is in %): Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339 New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960 Here, the "Old" line gives the load and requested_power (dynamic_power here) numbers calculated using the idle time based implementation, while "New" is based on the CPU utilization from scheduler. As can be clearly seen, the load and requested_power numbers are simply incorrect in the idle time based approach and the numbers collected from CPU's utilization are much closer to the reality. Scenario 2: The CPUs were busy in the previous polling window of the IPA governor: Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280 New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672 As can be seen, the idle time based load is 100% for all the CPUs as it took only the last window into account, but in reality the CPUs aren't that loaded as shown by the utilization numbers. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lkml.kernel.org/r/9c255c83d78d58451abc06848001faef94c87a12.1607400596.git.viresh.kumar@linaro.org
2021-01-07thermal/core: Remove notify opsDaniel Lezcano
With the removal of the notifys user in a previous patches, the ops is no longer needed, remove it. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20201210121514.25760-5-daniel.lezcano@linaro.org
2020-12-18Merge tag 'thermal-v5.11-2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal fixlet from Daniel Lezcano: "A trivial change which fell through the cracks: Add Alder Lake support ACPI ids (Srinivas Pandruvada)" * tag 'thermal-v5.11-2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: thermal: int340x: Support Alder Lake
2020-12-17thermal: int340x: Support Alder LakeSrinivas Pandruvada
Add ACPI IDs for thermal drivers for Alder Lake support. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201117194802.503337-1-srinivas.pandruvada@linux.intel.com
2020-12-15Merge tag 'thermal-v5.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal updates from Daniel Lezcano: - Add upper and lower limits clamps for the cooling device state in the power allocator governor (Michael Kao) - Add upper and lower limits support for the power allocator governor (Lukasz Luba) - Optimize conditions testing for the trip points (Bernard Zhao) - Replace spin_lock_irqsave by spin_lock in hard IRQ on the rcar driver (Tian Tao) - Add MT8516 dt-bindings and device reset optional support (Fabien Parent) - Add a quiescent period to cool down the PCH when entering S0iX (Sumeet Pawnikar) - Use bitmap API instead of re-inventing the wheel on sun8i (Yangtao Li) - Remove useless NULL check in the hwmon driver (Bernard Zhao) - Update the current state in the cpufreq cooling device only if the frequency change is effective (Zhuguangqing) - Improve the schema validation for the rcar DT bindings (Geert Uytterhoeven) - Fix the user time unit in the documentation (Viresh Kumar) - Add PCI ids for Lewisburg PCH (Andres Freund) - Add hwmon support on amlogic (Martin Blumenstingl) - Fix build failure for PCH entering on in S0iX (Randy Dunlap) - Improve the k_* coefficient for the power allocator governor (Lukasz Luba) - Fix missing const on a sysfs attribute (Rikard Falkeborn) - Remove broken interrupt support on rcar to be replaced by a new one (Niklas Söderlund) - Improve the error code handling at init time on imx8mm (Fabio Estevam) - Compute interval validity once instead at each temperature reading iteration on acerhdf (Daniel Lezcano) - Add r8a779a0 support (Niklas Söderlund) - Add PCI ids for AlderLake PCH and mmio refactoring (Srinivas Pandruvada) - Add RFIM and mailbox support on int340x (Srinivas Pandruvada) - Use macro for temperature calculation on PCH (Sumeet Pawnikar) - Simplify return conditions at probe time on Broadcom (Zheng Yongjun) - Fix workload name on PCH (Srinivas Pandruvada) - Migrate the devfreq cooling device code to the energy model API (Lukasz Luba) - Emit a warning if the thermal_zone_device_update is called without the .get_temp() ops (Daniel Lezcano) - Add critical and hot ops for the thermal zone (Daniel Lezcano) - Remove notification usage when critical is reached on rcar (Daniel Lezcano) - Fix devfreq build when ENERGY_MODEL is not set (Lukasz Luba) * tag 'thermal-v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (45 commits) thermal/drivers/devfreq_cooling: Fix the build when !ENERGY_MODEL thermal/drivers/rcar: Remove notification usage thermal/core: Add critical and hot ops thermal/core: Emit a warning if the thermal zone is updated without ops drm/panfrost: Register devfreq cooling and attempt to add Energy Model thermal: devfreq_cooling: remove old power model and use EM thermal: devfreq_cooling: add new registration functions with Energy Model thermal: devfreq_cooling: use a copy of device status thermal: devfreq_cooling: change tracing function and arguments thermal: int340x: processor_thermal: Correct workload type name thermal: broadcom: simplify the return expression of bcm2711_thermal_probe() thermal: intel: pch: use macro for temperature calculation thermal: int340x: processor_thermal: Add mailbox driver thermal: int340x: processor_thermal: Add RFIM driver thermal: int340x: processor_thermal: Add AlderLake PCI device id thermal: int340x: processor_thermal: Refactor MMIO interface thermal: rcar_gen3_thermal: Add r8a779a0 support dt-bindings: thermal: rcar-gen3-thermal: Add r8a779a0 support platform/x86/drivers/acerhdf: Check the interval value when it is set platform/x86/drivers/acerhdf: Use module_param_cb to set/get polling interval ...
2020-12-15thermal/drivers/devfreq_cooling: Fix the build when !ENERGY_MODELLukasz Luba
Prevent build failure if the option CONFIG_ENERGY_MODEL is not set. The devfreq cooling is able to operate without the Energy Model. Don't use dev->em_pd directly and use local pointer. Fixes: 615510fe13bd2 ("thermal: devfreq_cooling: remove old power model and use EM") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201215154221.8828-1-lukasz.luba@arm.com
2020-12-15thermal/drivers/rcar: Remove notification usageDaniel Lezcano
The ops is only showing a trace telling a critical trip point is crossed. The same information is given by the thermal framework. This is redundant, remove the code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/20201210121514.25760-4-daniel.lezcano@linaro.org
2020-12-11thermal/core: Add critical and hot opsDaniel Lezcano
Currently there is no way to the sensors to directly call an ops in interrupt mode without calling thermal_zone_device_update assuming all the trip points are defined. A sensor may want to do something special if a trip point is hot or critical. This patch adds the critical and hot ops to the thermal zone device, so a sensor can directly invoke them or let the thermal framework to call the sensor specific ones. Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20201210121514.25760-2-daniel.lezcano@linaro.org
2020-12-11thermal/core: Emit a warning if the thermal zone is updated without opsDaniel Lezcano
The actual code is silently ignoring a thermal zone update when a driver is requesting it without a get_temp ops set. That looks not correct, as the caller should not have called this function if the thermal zone is unable to read the temperature. That makes the code less robust as the check won't detect the driver is inconsistently using the thermal API and that does not help to improve the framework as these circumvolutions hide the problem at the source. In order to detect the situation when it happens, let's add a warning when the update is requested without the get_temp() ops set. Any warning emitted will have to be fixed at the source of the problem: the caller must not call thermal_zone_device_update if there is not get_temp callback set. Cc: Thara Gopinath <thara.gopinath@linaro.org> Cc: Amit Kucheria <amitk@kernel.org> Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20201210121514.25760-1-daniel.lezcano@linaro.org
2020-12-11thermal: devfreq_cooling: remove old power model and use EMLukasz Luba
Remove old power model and use new Energy Model to calculate the power budget. It drops static + dynamic power calculations and power table in order to use Energy Model performance domain data. This model should be easy to use and could find more users. It is also less complicated to setup the needed structures. Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201210143014.24685-5-lukasz.luba@arm.com
2020-12-11thermal: devfreq_cooling: add new registration functions with Energy ModelLukasz Luba
The Energy Model (EM) framework supports devices such as Devfreq. Create new registration function which automatically register EM for the thermal devfreq_cooling devices. This patch prepares the code for coming changes which are going to replace old power model with the new EM. Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201210143014.24685-4-lukasz.luba@arm.com
2020-12-11thermal: devfreq_cooling: use a copy of device statusLukasz Luba
Devfreq cooling needs to now the correct status of the device in order to operate. Devfreq framework can change the device status in the background. To mitigate issues make a copy of the status structure and use it for internal calculations. In addition this patch adds normalization function, which also makes sure that whatever data comes from the device, the load will be in range from 1 to 1024. Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201210143014.24685-3-lukasz.luba@arm.com
2020-12-11thermal: devfreq_cooling: change tracing function and argumentsLukasz Luba
Prepare for deleting the static and dynamic power calculation and clean the trace function. These two fields are going to be removed in the next changes. Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> # for tracing code Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201210143014.24685-2-lukasz.luba@arm.com
2020-12-10thermal: int340x: processor_thermal: Correct workload type nameSrinivas Pandruvada
Change "Burusty" to "bursty". Reported-by: Michael Larabel <Michael@phoronix.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201210213324.2113041-1-srinivas.pandruvada@linux.intel.com
2020-12-10thermal: broadcom: simplify the return expression of bcm2711_thermal_probe()Zheng Yongjun
Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201210135432.1249-1-zhengyongjun3@huawei.com
2020-12-10thermal: intel: pch: use macro for temperature calculationSumeet Pawnikar
Use macro for temperature calculation Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201210124801.13850-1-sumeet.r.pawnikar@intel.com
2020-12-10thermal: int340x: processor_thermal: Add mailbox driverSrinivas Pandruvada
Added processor thermal device mail box interface for workload hints setting. These hints will give indication to hardware to better manage power and thermals. The supported hints are: idle semi_active burusty sustained battery_life For example when the system is on battery, the hardware can be less aggressive in power ramp up. This will create an attribute group at /sys/bus/pci/devices/0000:00:04.0/workload_request This folder contains two attributes: workload_available_types : (RO): This shows available workload types workload_type: (RW) : Allows to set and get current workload type setting Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201126171829.945969-4-srinivas.pandruvada@linux.intel.com
2020-12-10thermal: int340x: processor_thermal: Add RFIM driverSrinivas Pandruvada
Add support for RFIM (Radio Frequency Interference Mitigation) support via processor thermal PCI device. This drivers allows adjustment of FIVR (Fully Integrated Voltage Regulator) and DDR (Double Data Rate) frequencies to avoid RF interference with WiFi and 5G. Switching voltage regulators (VR) generate radiated EMI or RFI at the fundamental frequency and its harmonics. Some harmonics may interfere with very sensitive wireless receivers such as Wi-Fi and cellular that are integrated into host systems like notebook PCs. One of mitigation methods is requesting SOC integrated VR (IVR) switching frequency to a small % and shift away the switching noise harmonic interference from radio channels. OEM or ODMs can use the driver to control SOC IVR operation within the range where it does not impact IVR performance. DRAM devices of DDR IO interface and their power plane can generate EMI at the data rates. Similar to IVR control mechanism, Intel offers a mechanism by which DDR data rates can be changed if several conditions are met: there is strong RFI interference because of DDR; CPU power management has no other restriction in changing DDR data rates; PC ODMs enable this feature (real time DDR RFI Mitigation referred to as DDR-RFIM) for Wi-Fi from BIOS. This change exports two folders under /sys/bus/pci/devices/0000:00:04.0. One folder "fivr" contains all attributes exposed for controling FIVR features. The other folder "dvfs" contains all attributes for DDR features. Changes done to implement: - New module for rfim interfaces - Two new per processor features for DDR and FIVR - Enable feature for Tiger Lake (FIVR only) and Alder Lake The attributes exposed and explanation: FIVR attributes vco_ref_code_lo (RW): The VCO reference code is an 11-bit field and controls the FIVR switching frequency. This is the 3-bit LSB field. vco_ref_code_hi (RW): The VCO reference code is an 11-bit field and controls the FIVR switching frequency. This is the 8-bit MSB field. spread_spectrum_pct (RW): Set the FIVR spread spectrum clocking percentage spread_spectrum_clk_enable (RW): Enable/disable of the FIVR spread spectrum clocking feature rfi_vco_ref_code (RW): This field is a read only status register which reflects the current FIVR switching frequency fivr_fffc_rev (RW): This field indicated the revision of the FIVR HW. DVFS attributes rfi_restriction_run_busy (RW): Request the restriction of specific DDR data rate and set this value 1. Self reset to 0 after operation. rfi_restriction_err_code (RW): Values: 0 :Request is accepted, 1:Feature disabled, 2: the request restricts more points than it is allowed rfi_restriction_data_rate_Delta (RW): Restricted DDR data rate for RFI protection: Lower Limit rfi_restriction_data_rate_Base (RW): Restricted DDR data rate for RFI protection: Upper Limit ddr_data_rate_point_0 (RO): DDR data rate selection 1st point ddr_data_rate_point_1 (RO): DDR data rate selection 2nd point ddr_data_rate_point_2 (RO): DDR data rate selection 3rd point ddr_data_rate_point_3 (RO): DDR data rate selection 4th point rfi_disable (RW): Disable DDR rate change feature Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201126171829.945969-3-srinivas.pandruvada@linux.intel.com
2020-12-10thermal: int340x: processor_thermal: Add AlderLake PCI device idSrinivas Pandruvada
Added AlderLake PCI device id to support processor thermal driver. Reuse the feature set (just includes RAPL) from previous generations. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201126171829.945969-2-srinivas.pandruvada@linux.intel.com
2020-12-10thermal: int340x: processor_thermal: Refactor MMIO interfaceSrinivas Pandruvada
The Processor Thermal PCI device supports multiple features. Currently we export only RAPL. But we need more features from this device exposed for Tiger Lake and Alder Lake based platforms. So re-structure the current MMIO interface, so that more features can be added cleanly. No functional changes are expected with this change. Changes done in this patch: - Using PCI_DEVICE_DATA(), hence names of defines changed - Move RAPL MMIO code to its own module - Move the RAPL MMIO offsets to RAPL MMIO module - Adjust Kconfig dependency of PROC_THERMAL_MMIO_RAPL - Per processor driver data now contains the supported features - Moved all the common data structures and defines to a common header file - This new header file contains all the processor_thermal_* interfaces - Based on the features supported the module interface is called - Each module atleast provides one add and one remove function Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201126171829.945969-1-srinivas.pandruvada@linux.intel.com
2020-12-08thermal: rcar_gen3_thermal: Add r8a779a0 supportNiklas Söderlund
Add support for R-Car V3U. The new THCODE values are taken from the example in the datasheet. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201126223028.3119044-4-niklas.soderlund+renesas@ragnatech.se
2020-12-04thermal: imx8mm: Disable the clock on probe failureFabio Estevam
Prior to returning an error in probe, disable the previously enabled clock. Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201202232448.2692-2-festevam@gmail.com
2020-12-04thermal: imx8mm: Print the correct error codeFabio Estevam
Currently the error message does not print the correct error code. Fix it by initializing 'ret' to the proper error code. Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201202232448.2692-1-festevam@gmail.com
2020-11-30thermal: rcar_gen3_thermal: Do not use interrupts for normal operationNiklas Söderlund
Remove the usage of interrupts for the normal temperature operation and depend on the polling performed by the thermal core. This is done to prepare to use the interrupts as they are intended to trigger once specific trip points are passed and not to react to temperature changes in the normal operational range. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201126220923.3107213-1-niklas.soderlund+renesas@ragnatech.se
2020-11-30thermal: core: Constify static attribute_group structsRikard Falkeborn
The only usage of these structs is to assign their address to the thermal_zone_attribute_groups array, which consists of pointers to const, so make them const to allow the compiler to put them in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201128234342.36684-1-rikard.falkeborn@gmail.com
2020-11-26thermal: power allocator: change the 'k_*' always in estimate_pid_constants()Lukasz Luba
The PID coefficients should be estimated again when there was a change to sustainable power value made by user. This change removes unused argument 'force' and makes the function ready for such updates. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201124161025.27694-4-lukasz.luba@arm.com
2020-11-26thermal: power allocator: refactor sustainable power estimationLukasz Luba
The sustainable power value might come from the Device Tree or can be estimated in run time. The sustainable power might be updated by the user via sysfs interface, which should trigger new estimation of PID coefficients. There is no need to estimate it every time when the governor is called and temperature is high. Instead, store the estimated value and make it available via standard sysfs interface, so it can be checked from the user-space. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201124161025.27694-3-lukasz.luba@arm.com
2020-11-26thermal: power allocator: change the 'k_i' coefficient estimationLukasz Luba
Intelligent Power Allocation (IPA) is built around the PID controller concept. The initialization code tries to setup the environment based on the information available in DT or estimate the value based on minimum power reported by each of the cooling device. The estimation will have an impact on the PID controller behaviour via the related 'k_po', 'k_pu', 'k_i' coefficients and also on the power budget calculation. This change prevents the situation when 'k_i' is relatively big compared to 'k_po' and 'k_pu' values. This might happen when the estimation for 'sustainable_power' returned small value, thus 'k_po' and 'k_pu' are small. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201124161025.27694-2-lukasz.luba@arm.com
2020-11-17thermal: intel_pch_thermal: fix build for ACPI not enabledRandy Dunlap
The reference to acpi_gbl_FADT causes a build error when ACPI is not enabled. Fix by making that conditional on CONFIG_ACPI. ../drivers/thermal/intel/intel_pch_thermal.c: In function 'pch_wpt_suspend': ../drivers/thermal/intel/intel_pch_thermal.c:217:8: error: 'acpi_gbl_FADT' undeclared (first use in this function); did you mean 'acpi_get_type'? if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) ^~~~~~~~~~~~~ Fixes: ef63b043ac86 ("thermal: intel: pch: fix S0ix failure due to PCH temperature above threshold") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Amit Kucheria <amitk@kernel.org> Cc: linux-pm@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201117023807.8266-1-rdunlap@infradead.org
2020-11-16thermal: amlogic: Add hwmon supportMartin Blumenstingl
Many monitoring tools read the CPU temperature using the hwmon interface. Expose the thermal sensors on Amlogic boards as hwmon devices. Without this lm_sensors' "sensors" tool does not find any temperature sensors. Now it prints: cpu_thermal-virtual-0 Adapter: Virtual device temp1: +44.7 C (crit = +110.0 C) ddr_thermal-virtual-0 Adapter: Virtual device temp1: +45.9 C (crit = +110.0 C) Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201115190658.631578-1-martin.blumenstingl@googlemail.com
2020-11-14thermal: intel_pch_thermal: Add PCI ids for Lewisburg PCH.Andres Freund
I noticed that I couldn't read the PCH temperature on my workstation (C620 series chipset, w/ 2x Xeon Gold 5215 CPUs) directly, but had to go through IPMI. Looking at the data sheet, it looks to me like the existing intel PCH thermal driver should work without changes for Lewisburg. I suspect there's some other PCI IDs missing. But I hope somebody at Intel would have an easier time figuring that out than I... Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Tushar Dave <tushar.n.dave@intel.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/lkml/20200115184415.1726953-1-andres@anarazel.de/ Signed-off-by: Andres Freund <andres@anarazel.de> Reviewed-by: Pandruvada, Srinivas <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201113204916.1144907-1-andres@anarazel.de
2020-11-12thermal: ti-soc-thermal: Disable the CPU PM notifier for OMAP4430Peter Ujfalusi
It has been observed that on OMAP4430 (ES2.0, ES2.1 and ES2.3) the enabled notifier causes errors on the DTEMP readout values: ti-soc-thermal 4a002260.bandgap: in range ADC val: 52 ti-soc-thermal 4a002260.bandgap: in range ADC val: 64 ti-soc-thermal 4a002260.bandgap: in range ADC val: 64 ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0 thermal thermal_zone0: failed to read out thermal zone (-5) ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0 thermal thermal_zone0: failed to read out thermal zone (-5) ti-soc-thermal 4a002260.bandgap: out of range ADC val: 4 thermal thermal_zone0: failed to read out thermal zone (-5) ti-soc-thermal 4a002260.bandgap: in range ADC val: 100 raw 100 translates to 133 Celsius on omap4-sdp, triggering shutdown due to critical temperature. When the notifier is disable for OMAP4430 the DTEMP values are stable: ti-soc-thermal 4a002260.bandgap: in range ADC val: 56 ti-soc-thermal 4a002260.bandgap: in range ADC val: 56 ti-soc-thermal 4a002260.bandgap: in range ADC val: 57 ti-soc-thermal 4a002260.bandgap: in range ADC val: 57 ti-soc-thermal 4a002260.bandgap: in range ADC val: 56 Fixes: 5093402e5b44 ("thermal: ti-soc-thermal: Enable addition power management") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Tested-by: Tony Lindgren <tony@atomide.com> Acked-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201029100335.27665-1-peter.ujfalusi@ti.com
2020-11-12thermal/drivers/cpufreq_cooling: Update cpufreq_state only if state has changedZhuguangqing
If state has not changed successfully and we updated cpufreq_state, next time when the new state is equal to cpufreq_state (not changed successfully last time), we will return directly and miss a freq_qos_update_request() that should have been. Fixes: 5130802ddbb1 ("thermal: cpu_cooling: Switch to QoS requests for freq limits") Cc: v5.4+ <stable@vger.kernel.org> # v5.4+ Signed-off-by: Zhuguangqing <zhuguangqing@xiaomi.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201106092243.15574-1-zhuguangqing83@gmail.com
2020-11-12thermal/drivers/hwmon: Cleanup coding style a bitBernard Zhao
Function thermal_add_hwmon_sysfs, hwmon will be NULL when new_hwmon_device = 0, so there is no need to check, kfree will handle NULL point. Signed-off-by: Bernard Zhao <bernard@vivo.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201102023121.3312-1-bernard@vivo.com
2020-11-12thermal: sun8i: Use bitmap API instead of open codeYangtao Li
The bitmap_* API is the standard way to access data in the bitfield. So convert irq_ack to return an unsigned long, and make things to use bitmap API. Signed-off-by: Yangtao Li <frank@allwinnertech.com> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201109114624.23035-1-frank@allwinnertech.com
2020-11-07thermal: intel: pch: fix S0ix failure due to PCH temperature above thresholdSumeet Pawnikar
When system tries to enter S0ix suspend state, just after active load scenarios, it fails due to PCH current temperature is higher than set threshold. This patch introduces delay loop mechanism that allows PCH temperature to go down below threshold during suspend so it won't fail to enter S0ix. Add delay loop timeout and count as module parameters for user to tune it, if required based on system design. This change notifies the different warning messages like when PCH temperature above the threshold and executing delay loop. Also, notify the messages when it success or failure for S0ix entry. Previously out of 1000 runs around 3 to 5 times it might fail to enter S0ix just after heavy workload. With this change, S0ix failures reduced as PCH cools down below threshold. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201106170633.20838-1-sumeet.r.pawnikar@intel.com
2020-10-27thermal: mtk_thermal: make device_reset optionalFabien Parent
MT8516 does not support thermal reset. Use device_reset_optional instead of device_reset. Signed-off-by: Fabien Parent <fparent@baylibre.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201021164231.3029956-3-fparent@baylibre.com
2020-10-27thermal/drivers/rcar: Replace spin_lock_irqsave by spin_lock in hard IRQTian Tao
On RT or even on mainline with 'threadirqs' on the command line all interrupts which are not explicitly requested with IRQF_NO_THREAD run their handlers in thread context. The same applies to soft interrupts. That means they are subject to the normal scheduler rules and no other code is going to acquire that lock from hard interrupt context either, so the irqsave() here is pointless in all cases. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1603760790-37748-1-git-send-email-tiantao6@hisilicon.com
2020-10-27drivers/thermal/core: Optimize trip points checkBernard Zhao
The trip points are checked one by one with multiple condition branches where one condition is enough to disable the trip point. Merge all these conditions in a single 'OR' statement. Signed-off-by: Bernard Zhao <bernard@vivo.com> Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20201027013743.62392-1-bernard@vivo.com [dlezcano] Changed patch description