summaryrefslogtreecommitdiff
path: root/drivers/thermal/step_wise.c
AgeCommit message (Collapse)Author
2017-10-31thermal/drivers/step_wise: Fix temperature regulation misbehaviorDaniel Lezcano
There is a particular situation when the cooling device is cpufreq and the heat dissipation is not efficient enough where the temperature increases little by little until reaching the critical threshold and leading to a SoC reset. The behavior is reproducible on a hikey6220 with bad heat dissipation (eg. stacked with other boards). Running a simple C program doing while(1); for each CPU of the SoC makes the temperature to reach the passive regulation trip point and ends up to the maximum allowed temperature followed by a reset. This issue has been also reported by running the libhugetlbfs test suite. What is observed is a ping pong between two cpu frequencies, 1.2GHz and 900MHz while the temperature continues to grow. It appears the step wise governor calls get_target_state() the first time with the throttle set to true and the trend to 'raising'. The code selects logically the next state, so the cpu frequency decreases from 1.2GHz to 900MHz, so far so good. The temperature decreases immediately but still stays greater than the trip point, then get_target_state() is called again, this time with the throttle set to true *and* the trend to 'dropping'. From there the algorithm assumes we have to step down the state and the cpu frequency jumps back to 1.2GHz. But the temperature is still higher than the trip point, so get_target_state() is called with throttle=1 and trend='raising' again, we jump to 900MHz, then get_target_state() is called with throttle=1 and trend='dropping', we jump to 1.2GHz, etc ... but the temperature does not stabilizes and continues to increase. [ 237.922654] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 237.922678] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 237.922690] thermal cooling_device0: cur_state=0 [ 237.922701] thermal cooling_device0: old_target=0, target=1 [ 238.026656] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 238.026680] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=1 [ 238.026694] thermal cooling_device0: cur_state=1 [ 238.026707] thermal cooling_device0: old_target=1, target=0 [ 238.134647] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 238.134667] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 238.134679] thermal cooling_device0: cur_state=0 [ 238.134690] thermal cooling_device0: old_target=0, target=1 In this situation the temperature continues to increase while the trend is oscillating between 'dropping' and 'raising'. We need to keep the current state untouched if the throttle is set, so the temperature can decrease or a higher state could be selected, thus preventing this oscillation. Keeping the next_target untouched when 'throttle' is true at 'dropping' time fixes the issue. The following traces show the governor does not change the next state if trend==2 (dropping) and throttle==1. [ 2306.127987] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 2306.128009] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 2306.128021] thermal cooling_device0: cur_state=0 [ 2306.128031] thermal cooling_device0: old_target=0, target=1 [ 2306.231991] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 2306.232016] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=1 [ 2306.232030] thermal cooling_device0: cur_state=1 [ 2306.232042] thermal cooling_device0: old_target=1, target=1 [ 2306.335982] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1 [ 2306.336006] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=1 [ 2306.336021] thermal cooling_device0: cur_state=1 [ 2306.336034] thermal cooling_device0: old_target=1, target=1 [ 2306.439984] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 2306.440008] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=0 [ 2306.440022] thermal cooling_device0: cur_state=1 [ 2306.440034] thermal cooling_device0: old_target=1, target=0 [ ... ] After a while, if the temperature continues to increase, the next state becomes 2 which is 720MHz on the hikey. That results in the temperature stabilizing around the trip point. [ 2455.831982] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 2455.832006] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=0 [ 2455.832019] thermal cooling_device0: cur_state=1 [ 2455.832032] thermal cooling_device0: old_target=1, target=1 [ 2455.935985] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1 [ 2455.936013] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=0 [ 2455.936027] thermal cooling_device0: cur_state=1 [ 2455.936040] thermal cooling_device0: old_target=1, target=1 [ 2456.043984] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1 [ 2456.044009] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=0 [ 2456.044023] thermal cooling_device0: cur_state=1 [ 2456.044036] thermal cooling_device0: old_target=1, target=1 [ 2456.148001] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 2456.148028] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 2456.148042] thermal cooling_device0: cur_state=1 [ 2456.148055] thermal cooling_device0: old_target=1, target=2 [ 2456.252009] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 2456.252041] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=0 [ 2456.252058] thermal cooling_device0: cur_state=2 [ 2456.252075] thermal cooling_device0: old_target=2, target=1 IOW, this change is needed to keep the state for a cooling device if the temperature trend is oscillating while the temperature increases slightly. Without this change, the situation above leads to a catastrophic crash by a hardware reset on hikey. This issue has been reported to happen on an OMAP dra7xx also. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Keerthy <j-keerthy@ti.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Tested-by: Keerthy <j-keerthy@ti.com> Reviewed-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-06-29thermal: fix source code documentation for parametersWilly WOLFF
Some parameters are not documented, or not present at all, in thermal governors code. Signed-off-by: Willy Wolff <willy.mh.wolff@gmail.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2016-08-08thermal: fix race condition when updating cooling deviceMichele Di Giorgio
When multiple thermal zones are bound to the same cooling device, multiple kernel threads may want to update the cooling device state by calling thermal_cdev_update(). Having cdev not protected by a mutex can lead to a race condition. Consider the following situation with two kernel threads k1 and k2: Thread k1 Thread k2 || || call thermal_cdev_update() || ... || set_cur_state(cdev, target); call power_actor_set_power() || ... || instance->target = state; || cdev->updated = false; || || cdev->updated = true; || // completes execution call thermal_cdev_update() || // cdev->updated == true || return; || \/ time k2 has already looped through the thermal instances looking for the deepest cooling device state and is preempted right before setting cdev->updated to true. Now, k1 runs, modifies the thermal instance state and sets cdev->updated to false. Then, k1 is preempted and k2 continues the execution by setting cdev->updated to true, therefore preventing k1 from performing the update. Notice that this is not an issue if k2 looks at the instance->target modified by k1 "after" it is assigned by k1. In fact, in this case the update will happen anyway and k1 can safely return immediately from thermal_cdev_update(). This may lead to a situation where a thermal governor never updates the cooling device. For example, this is the case for the step_wise governor: when calling the function thermal_zone_trip_update(), the governor may always get a new state equal to the old one (which, however, wasn't notified to the cooling device) and will therefore skip the update. CC: Zhang Rui <rui.zhang@intel.com> CC: Eduardo Valentin <edubezval@gmail.com> CC: Peter Feuerer <peter@piie.net> Reported-by: Toby Huang <toby.huang@arm.com> Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2015-12-29Thermal: initialize thermal zone device correctlyZhang Rui
After thermal zone device registered, as we have not read any temperature before, thus tz->temperature should not be 0, which actually means 0C, and thermal trend is not available. In this case, we need specially handling for the first thermal_zone_device_update(). Both thermal core framework and step_wise governor is enhanced to handle this. And since the step_wise governor is the only one that uses trends, so it's the only thermal governor that needs to be updated. CC: <stable@vger.kernel.org> #3.18+ Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2015-08-03thermal: consistently use int for temperaturesSascha Hauer
The thermal code uses int, long and unsigned long for temperatures in different places. Using an unsigned type limits the thermal framework to positive temperatures without need. Also several drivers currently will report temperatures near UINT_MAX for temperatures below 0°C. This will probably immediately shut the machine down due to overtemperature if started below 0°C. 'long' is 64bit on several architectures. This is not needed since INT_MAX °mC is above the melting point of all known materials. Consistently use a plain 'int' for temperatures throughout the thermal code and the drivers. This only changes the places in the drivers where the temperature is passed around as pointer, when drivers internally use another type this is not changed. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Lukasz Majewski <l.majewski@samsung.com> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Peter Feuerer <peter@piie.net> Cc: Punit Agrawal <punit.agrawal@arm.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Eduardo Valentin <edubezval@gmail.com> Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Jean Delvare <jdelvare@suse.de> Cc: Peter Feuerer <peter@piie.net> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Lukasz Majewski <l.majewski@samsung.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: linux-acpi@vger.kernel.org Cc: platform-driver-x86@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-omap@vger.kernel.org Cc: linux-samsung-soc@vger.kernel.org Cc: Guenter Roeck <linux@roeck-us.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Cc: Darren Hart <dvhart@infradead.org> Cc: lm-sensors@lm-sensors.org Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2015-02-06thermal: step_wise: spelling fixesBrian Norris
Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2014-10-11Merge branch 'thermal-core-fix' of .git into nextZhang Rui
2014-10-09thermal: step_wise: fix: Prevent from binary overflow when trend is droppingLukasz Majewski
It turns out that some boards can have instance->lower greater than 0 and when thermal trend is dropping it results with next_target equal to -1. Since the next_target is defined as unsigned long it is interpreted as 0xFFFFFFFF and larger than instance->upper. As a result the next_target is set to instance->upper which ramps up to maximal cooling device target when the temperature is steadily decreasing. Signed-off-by: Lukasz Majewski <l.majewski@samsung.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2014-07-29thermal: trace: Trace when temperature is above a trip pointPunit Agrawal
Create a new event to trace when the temperature is above a trip point. Use the trace-point when handling non-critical and critical trip pionts. Cc: Zhang Rui <rui.zhang@intel.com> Cc: Eduardo Valentin <edubezval@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2014-01-02thermal: debug: add debug statement for core and step_wiseAaron Lu
To ease debugging thermal problem, add these dynamic debug statements so that user do not need rebuild kernel to see these info. Based on a patch from Zhang Rui for debugging on bugzilla: https://bugzilla.kernel.org/attachment.cgi?id=98671 A sample output after we turn on dynamic debug with the following cmd: # echo 'module thermal_sys +fp' > /sys/kernel/debug/dynamic_debug/control is like: [ 355.147627] update_temperature: thermal thermal_zone0: last_temperature=52000, current_temperature=55000 [ 355.147636] thermal_zone_trip_update: thermal thermal_zone0: Trip1[type=1,temp=79000]:trend=2,throttle=0 [ 355.147644] get_target_state: thermal cooling_device8: cur_state=0 [ 355.147647] thermal_zone_trip_update: thermal cooling_device8: old_target=-1, target=-1 [ 355.147652] get_target_state: thermal cooling_device7: cur_state=0 [ 355.147655] thermal_zone_trip_update: thermal cooling_device7: old_target=-1, target=-1 [ 355.147660] get_target_state: thermal cooling_device6: cur_state=0 [ 355.147663] thermal_zone_trip_update: thermal cooling_device6: old_target=-1, target=-1 [ 355.147668] get_target_state: thermal cooling_device5: cur_state=0 [ 355.147671] thermal_zone_trip_update: thermal cooling_device5: old_target=-1, target=-1 [ 355.147678] thermal_zone_trip_update: thermal thermal_zone0: Trip2[type=0,temp=90000]:trend=1,throttle=0 [ 355.147776] get_target_state: thermal cooling_device0: cur_state=0 [ 355.147783] thermal_zone_trip_update: thermal cooling_device0: old_target=-1, target=-1 [ 355.147792] thermal_zone_trip_update: thermal thermal_zone0: Trip3[type=0,temp=80000]:trend=1,throttle=0 [ 355.147845] get_target_state: thermal cooling_device1: cur_state=0 [ 355.147849] thermal_zone_trip_update: thermal cooling_device1: old_target=-1, target=-1 [ 355.147856] thermal_zone_trip_update: thermal thermal_zone0: Trip4[type=0,temp=70000]:trend=1,throttle=0 [ 355.147904] get_target_state: thermal cooling_device2: cur_state=0 [ 355.147908] thermal_zone_trip_update: thermal cooling_device2: old_target=-1, target=-1 [ 355.147915] thermal_zone_trip_update: thermal thermal_zone0: Trip5[type=0,temp=60000]:trend=1,throttle=0 [ 355.147963] get_target_state: thermal cooling_device3: cur_state=0 [ 355.147967] thermal_zone_trip_update: thermal cooling_device3: old_target=-1, target=-1 [ 355.147973] thermal_zone_trip_update: thermal thermal_zone0: Trip6[type=0,temp=55000]:trend=1,throttle=1 [ 355.148022] get_target_state: thermal cooling_device4: cur_state=0 [ 355.148025] thermal_zone_trip_update: thermal cooling_device4: old_target=-1, target=1 [ 355.148036] thermal_cdev_update: thermal cooling_device4: zone0->target=1 [ 355.169279] thermal_cdev_update: thermal cooling_device4: set to state 1 Signed-off-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Eduardo Valentin <eduardo.valentin@ti.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-08-15thermal: step_wise: return instance->target by defaultEduardo Valentin
In case the trend is not changing or when there is no request for throttling, it is expected that the instance would not change its requested target. This patch improves the code implementation to cover for this expected behavior. With current implementation, the instance will always reset to cdev.cur_state, even in not expected cases, like those mentioned above. This patch changes the step_wise governor implementation of get_target so that we accomplish: (a) - default value will be current instance->target, so we do not change the thermal instance target unnecessarily. (b) - the code now it is clear about what is the intention. There is a clear statement of what are the expected outcomes (c) - removal of hardcoded constants, now it is put in use the THERMAL_NO_TARGET macro. (d) - variable names are also improved so that reader can clearly understand the difference between instance cur target, next target and cdev cur_state. Cc: Zhang Rui <rui.zhang@intel.com> Cc: Durgadoss R <durgadoss.r@intel.com> Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reported-by: Ruslan Ruslichenko <ruslan.ruslichenko@ti.com> Signed-of-by: Eduardo Valentin <eduardo.valentin@ti.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-08-15thermal: step_wise: cdev only needs update on a new target stateShawn Guo
The cooling device only needs update on a new target state. Since we already check old target in thermal_zone_trip_update(), we can do one more check to see if it's a new target state. If not, we can reasonably save some uncecesary code execution. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Eduardo Valentin <eduardo.valentin@ti.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-04-14Thermal: build thermal governors into thermal_sys moduleZhang Rui
The thermal governors are part of the thermal framework, rather than a seperate feature/module. Because the generic thermal layer can not work without thermal governors, and it must load the thermal governors during its initialization. Build them into one module in this patch. This also fix a problem that the generic thermal layer does not work when CONFIG_THERMAL=m and CONFIG_THERMAL_GOV_XXX=y. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Eduardo Valentin <eduardo.valentin@ti.com> Acked-by: Durgadoss R <durgadoss.r@intel.com>
2013-04-12thermal: step_wise: set throttle target within thermal instance limitsAndrew Bresticker
When selecting a target cooling state in get_target_state(), make sure that the state is at least as high as the minimum when the temperature is rising and at least as low as the maximum when the temperature is falling. This is necessary because, in the THREAML_TREND_RAISING and THERMAL_TREND_DROPPING cases, the current state may only be incremented or decremented by one even if it is outside the bounds of the thermal instance. This might occur, for example, if the CPU is heating up and hits a thermal trip point for the first time when it's frequency is much higher than the range specified by the thermal instance corresponding to the trip point. Signed-off-by: Andrew Bresticker <abrestic@chromium.org> Acked-by: Eduardo Valentin <eduardo.valentin@ti.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-01-04step_wise: Unify the code for both throttle and dethrottleZhang Rui
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-01-04Introduce THERMAL_TREND_RAISE/DROP_FULL support for step_wise governorZhang Rui
step_wise governor should set the device cooling state to upper/lower limit directly when THERMAL_TREND_RAISE/DROP_FULL. Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2012-12-12Thermal: Fix DEFAULT_THERMAL_GOVERNORZhang Rui
Fix DEFAULT_THERMAL_GOVERNOR to be consistant with the default governor selected in kernel config file. Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2012-11-05thermal: step_wise: Add missing static storage class specifiersSachin Kamat
Fixes the following sparse warnings: drivers/thermal/step_wise.c:153:5: warning: symbol 'step_wise_throttle' was not declared. Should it be static? drivers/thermal/step_wise.c:172:25: warning: symbol 'thermal_gov_step_wise' was not declared. Should it be static? Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Durgadoss R <durgadoss.r@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2012-11-05Thermal: Introduce a step_wise thermal governorDurgadoss R
This patch adds a simple step_wise governor to the generic thermal layer. This algorithm throttles the cooling devices in a linear fashion. If the 'trend' is heating, it throttles by one step. And if the thermal trend is cooling it de-throttles by one step. This actually moves the throttling logic from thermal_sys.c and puts inside step_wise.c, without any change. Signed-off-by: Durgadoss R <durgadoss.r@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>