summaryrefslogtreecommitdiff
path: root/drivers/thermal/thermal_core.c
AgeCommit message (Collapse)Author
2024-08-22thermal: core: Introduce .should_bind() thermal zone callbackRafael J. Wysocki
The current design of the code binding cooling devices to trip points in thermal zones is convoluted and hard to follow. Namely, a driver that registers a thermal zone can provide .bind() and .unbind() operations for it, which are required to call either thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip(), respectively, or thermal_zone_bind_cooling_device() and thermal_zone_unbind_cooling_device(), respectively, for every relevant trip point and the given cooling device. Moreover, if .bind() is provided and .unbind() is not, the cleanup necessary during the removal of a thermal zone or a cooling device may not be carried out. In other words, the core relies on the thermal zone owners to do the right thing, which is error prone and far from obvious, even though all of that is not really necessary. Specifically, if the core could ask the thermal zone owner, through a special thermal zone callback, whether or not a given cooling device should be bound to a given trip point in the given thermal zone, it might as well carry out all of the binding and unbinding by itself. In particular, the unbinding can be done automatically without involving the thermal zone owner at all because all of the thermal instances associated with a thermal zone or cooling device going away must be deleted regardless. Accordingly, introduce a new thermal zone operation, .should_bind(), that can be invoked by the thermal core for a given thermal zone, trip point and cooling device combination in order to check whether or not the cooling device should be bound to the trip point at hand. It takes an additional cooling_spec argument allowing the thermal zone owner to specify the highest and lowest cooling states of the cooling device and its weight for the given trip point binding. Make the thermal core use this operation, if present, in the absence of .bind() and .unbind(). Note that .should_bind() will be called under the thermal zone lock. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Huisong Li <lihuisong@huawei.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/9334403.CDJkKcVGEf@rjwysocki.net
2024-08-22thermal: core: Move thermal zone locking out of bind/unbind functionsRafael J. Wysocki
Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip() acquire the thermal zone lock, the locking rules for their callers get complicated. In particular, the thermal zone lock cannot be acquired in any code path leading to one of these functions even though it might be useful to do so. To address this, remove the thermal zone locking from both these functions, add lockdep assertions for the thermal zone lock to both of them and make their callers acquire the thermal zone lock instead. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/3837835.kQq0lBPeGt@rjwysocki.net
2024-08-22thermal: core: Drop redundant thermal instance checksRafael J. Wysocki
Because the trip and cdev pointers are sufficient to identify a thermal instance holding them unambiguously, drop the additional thermal zone checks from two loops walking the list of thermal instances in a thermal zone. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/10527734.nUPlyArG6x@rjwysocki.net
2024-08-22thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()Rafael J. Wysocki
It is not necessary to look up the thermal zone and the cooling device in the respective global lists to check whether or not they are registered. It is sufficient to check whether or not their respective list nodes are empty for this purpose. Use the above observation to simplify thermal_bind_cdev_to_trip(). In addition, eliminate an unnecessary ternary operator from it. Moreover, add lockdep_assert_held() for thermal_list_lock to it because that lock must be held by its callers when it is running. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/3324214.44csPzL39Z@rjwysocki.net
2024-08-22thermal: core: Fold two functions into their respective callersRafael J. Wysocki
Fold bind_cdev() into __thermal_cooling_device_register() and bind_tz() into thermal_zone_device_register_with_trips() to reduce code bloat and make it somewhat easier to follow the code flow. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/2962184.e9J7NaK4W3@rjwysocki.net
2024-08-19thermal/core: Compute low and high boundaries in thermal_zone_device_update()Daniel Lezcano
In order to set the scene for the thresholds support which have to manipulate the low and high temperature boundaries for the interrupt support, we must pass the low and high values to the incoming thresholds routine. The variables are set from the thermal_zone_set_trips() where the function loops the thermal trips to figure out the next and the previous temperatures to set the interrupt to be triggered when they are crossed. These variables will be needed by the function in charge of handling the thresholds in the incoming changes but they are local to the aforementioned function thermal_zone_set_trips(). Move the low and high boundaries computation out of the function in thermal_zone_device_update() so they are accessible from there. The positive side effect is they are computed in the same loop as handle_thermal_trip(), so we remove one loop. Co-developed-by: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20240816081241.1925221-2-daniel.lezcano@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-16thermal: gov_bang_bang: Use governor_data to reduce overheadRafael J. Wysocki
After running once, the for_each_trip_desc() loop in bang_bang_manage() is pure needless overhead because it is not going to make any changes unless a new cooling device has been bound to one of the trips in the thermal zone or the system is resuming from sleep. For this reason, make bang_bang_manage() set governor_data for the thermal zone and check it upfront to decide whether or not it needs to do anything. However, governor_data needs to be reset in some cases to let bang_bang_manage() know that it should walk the trips again, so add an .update_tz() callback to the governor and make the core additionally invoke it during system resume. To avoid affecting the other users of that callback unnecessarily, add a special notification reason for system resume, THERMAL_TZ_RESUME, and also pass it to __thermal_zone_device_update() called during system resume for consistency. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Kästle <peter@piie.net> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Cc: 6.10+ <stable@vger.kernel.org> # 6.10+ Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net
2024-07-24thermal: core: Back off when polling thermal zones on errorsRafael J. Wysocki
Commit a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid") introduced a polling mechanism by which the thermal core attampts to get a valid temperature value for thermal zones where the .get_temp() callback returns errors to start with (for example, due to initialization ordering woes). However, this polling is carried out periodically ad infinitum and every iteration of it causes a message to be printed to the kernel log which means a lot of log noise on systems where there are thermal zones that never get ready for some reason. It is also not really useful to continuously poll thermal zones that never respond. To address this, modify the thermal core to increase the delay between consecutive thermal zone temperature checks after every check that fails until it reaches a certain maximum value. At that point, the thermal zone in question will be disabled, but user space will be able to reenable it if it believes that the failure is transient. Also change the code to print messages regarding failed temperature checks to the kernel log only twice, once when the thermal zone's .get_temp() callback returns an error for the first time and once when disabling the given thermal zone. In addition, a dev_crit() message will be printed at that point if the given thermal zone contains a critical trip point to notify the system operator about the situation. Fixes: a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid") Link: https://lore.kernel.org/linux-acpi/CAGnHSE=RyPK++UG0-wAtVKgeJxe0uzFYgLxm+RUOKKoQquW=Ow@mail.gmail.com/ Reported-by: Tom Yan <tom.ty89@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2962033.e9J7NaK4W3@rjwysocki.net
2024-07-23thermal: trip: Split thermal_zone_device_set_mode()Rafael J. Wysocki
Pull a wrapper around thermal zone .change_mode() callback out of thermal_zone_device_set_mode() because it will be used elsewhere subsequently. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2206793.irdbgypaU6@rjwysocki.net
2024-07-18thermal: core: Allow thermal zones to tell the core to ignore themRafael J. Wysocki
The iwlwifi wireless driver registers a thermal zone that is only needed when the network interface handled by it is up and it wants that thermal zone to be effectively ignored by the core otherwise. Before commit a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid") that could be achieved by returning an error code from the thermal zone's .get_temp() callback because the core did not really handle errors returned by it almost at all. However, commit a8a261774466 made the core attempt to recover from the situation in which the temperature of a thermal zone cannot be determined due to errors returned by its .get_temp() and is always invalid from the core's perspective. That was done because there are thermal zones in which .get_temp() returns errors to start with due to some difficulties related to the initialization ordering, but then it will start to produce valid temperature values at one point. Unfortunately, the simple approach taken by commit a8a261774466, which is to poll the thermal zone periodically until its .get_temp() callback starts to return valid temperature values, is at odds with the special thermal zone in iwlwifi in which .get_temp() may always return an error because its network interface may always be down. If that happens, every attempt to invoke the thermal zone's .get_temp() callback resulting in an error causes the thermal core to print a dev_warn() message to the kernel log which is super-noisy. To address this problem, make the core handle the case in which .get_temp() returns 0, but the temperature value returned by it is not actually valid, in a special way. Namely, make the core completely ignore the invalid temperature value coming from .get_temp() in that case, which requires folding in update_temperature() into its caller and a few related changes. On the iwlwifi side, modify iwl_mvm_tzone_get_temp() to return 0 and put THERMAL_TEMP_INVALID into the temperature return memory location instead of returning an error when the firmware is not running or it is not of the right type. Also, to clearly separate the handling of invalid temperature values from the thermal zone initialization, introduce a special THERMAL_TEMP_INIT value specifically for the latter purpose. Fixes: a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid") Closes: https://lore.kernel.org/linux-pm/20240715044527.GA1544@sol.localdomain/ Reported-by: Eric Biggers <ebiggers@kernel.org> Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201761 Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Cc: 6.10+ <stable@vger.kernel.org> # 6.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/4950004.31r3eYUQgx@rjwysocki.net [ rjw: Rebased on top of the current mainline ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-15Merge branch 'thermal-core'Rafael J. Wysocki
Merge updates related to the thermal core for 6.11-rc1: - Redesign the .set_trip_temp() thermal zone callback to take a trip pointer instead of a trip ID and update its users (Rafael Wysocki). - Avoid using invalid combinations of polling_delay and passive_delay thermal zone parameters (Rafael Wysocki). - Update a cooling device registration function to take a const argument (Krzysztof Kozlowski). - Make the uniphier thermal driver use thermal_zone_for_each_trip() for walking trip points (Rafael Wysocki). * thermal-core: thermal: core: Add sanity checks for polling_delay and passive_delay thermal: trip: Fold __thermal_zone_get_trip() into its caller thermal: trip: Pass trip pointer to .set_trip_temp() thermal zone callback thermal: imx: Drop critical trip check from imx_set_trip_temp() thermal: trip: Add conversion macros for thermal trip priv field thermal: helpers: Introduce thermal_trip_is_bound_to_cdev() thermal: core: Change passive_delay and polling_delay data type thermal: core: constify 'type' in devm_thermal_of_cooling_device_register() thermal: uniphier: Use thermal_zone_for_each_trip() for walking trip points
2024-07-12thermal: core: Add sanity checks for polling_delay and passive_delayRafael J. Wysocki
If polling_delay is nonzero and passive_delay is greater than polling_delay, the thermal zone temperature will be updated less often when tz->passive is nonzero, which is not as expected. Make the thermal zone registration fail with -EINVAL in that case as this is a clear thermal zone configuration mistake. If polling_delay is nonzero and passive_delay is 0, which is regarded as a valid thermal zone configuration, the thermal zone will use polling except when tz->passive is nonzero. However, the expected behavior in that case is to continue temperature polling with the same delay value regardless of tz->passive, so set passive_delay to the polling_delay value then. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/5802156.DvuYhMxLoT@rjwysocki.net
2024-07-10Merge back thermal control material for 6.11.Rafael J. Wysocki
2024-07-09thermal: core: Change passive_delay and polling_delay data typeRafael J. Wysocki
It is better to use unsigned int as the data type for the passive_delay and polling_delay arguments of thermal_zone_device_register_with_trips() because they are implicitly cast to unsigned int anyway in thermal_set_delay_jiffies() and if they happen to be negative at that point, the resulting behavior may not be as desired. Update the thermal_zone_device_register_with_trips() definition accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/5803791.DvuYhMxLoT@rjwysocki.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-08thermal: core: Fix list sorting in __thermal_zone_device_update()Rafael J. Wysocki
The order in which lists are sorted in __thermal_zone_device_update() is reverse with respect to what it should be due to a mistake in thermal_trip_notify_cmp(). Fix it and observe that it is not necessary to sort the lists in different orders. They can both be sorted in ascending order if way_down_list is walked in reverse order which allows the code to be slightly more straightforward (and less prone to silly mistakes). Fixes: 7454f2c42cce ("thermal: core: Sort trip point crossing notifications by temperature") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/12481676.O9o76ZdvQC@rjwysocki.net
2024-07-04thermal: core: Call monitor_thermal_zone() if zone temperature is invalidRafael J. Wysocki
Commit 202aa0d4bb53 ("thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid") caused __thermal_zone_device_update() to return early if the current thermal zone temperature was invalid. This was done to avoid running handle_thermal_trip() and governor callbacks in that case which led to confusion. However, it went too far because monitor_thermal_zone() still needs to be called even when the zone temperature is invalid to ensure that it will be updated eventually in case thermal polling is enabled and the driver has no other means to notify the core of zone temperature changes (for example, it does not register an interrupt handler or ACPI notifier). Also if the .set_trips() zone callback is expected to set up monitoring interrupts for a thermal zone, it has to be provided with valid boundaries and that can only happen if the zone temperature is known. Accordingly, to ensure that __thermal_zone_device_update() will run again after a failing zone temperature check, make it call monitor_thermal_zone() regardless of whether or not the zone temperature is valid and make the latter schedule a thermal zone temperature update if the zone temperature is invalid even if polling is not enabled for the thermal zone. Fixes: 202aa0d4bb53 ("thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid") Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2764814.mvXUDI8C0e@rjwysocki.net [ rjw: Changed THERMAL_RECHECK_DELAY_MS to 250 ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-04thermal: core: constify 'type' in devm_thermal_of_cooling_device_register()Krzysztof Kozlowski
The 'type' string passed to thermal_of_cooling_device_register() is a 'const char *', so do the same in the devm interface. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20240703083141.96013-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-25Merge branch 'thermal-core'Rafael J. Wysocki
Merge thermal core changes for v6.11: - Fix and clean up several minor shortcomings in thermal debug (Rafael Wysocki). - Rename __thermal_zone_set_trips() to thermal_zone_set_trips() and make it use trip thresholds (Rafael Wysocki). - Use READ_ONCE() for lockless access to trip temperature and hysteresis (Rafael Wysocki). - Drop unnecessary cooling device target state checks from the Bang-Bang thermal governor (Rafael Wysocki). - Avoid invoking thermal governor .trip_crossed() callback for critical and hot trip points (Rafael Wysocki). * thermal-core: thermal: core: Avoid calling .trip_crossed() for critical and hot trips thermal: gov_bang_bang: Drop unnecessary cooling device target state checks thermal: trip: Use READ_ONCE() for lockless access to trip properties thermal: trip: Make thermal_zone_set_trips() use trip thresholds thermal: trip: Rename __thermal_zone_set_trips() to thermal_zone_set_trips() thermal: trip: Use common set of trip type names thermal/debugfs: Move some statements from under thermal_dbg->lock thermal/debugfs: Compute maximum temperature for mitigation episode as a whole thermal/debugfs: Adjust check for trips without statistics in tze_seq_show() thermal/debugfs: Fix up units in "mitigations" files thermal/debugfs: Print mitigation timestamp value in milliseconds thermal/debugfs: Do not extend mitigation episodes beyond system resume thermal/debugfs: Use helper to update trip point overstepping duration
2024-06-21Merge tag 'thermal-6.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "These fix the Mediatek lvts_thermal driver, the Intel int340x driver, and the thermal core (two issues related to system suspend). Specifics: - Remove the filtered mode for mt8188 from lvts_thermal as it is not supported on this platform and fail the lvts_thermal initialization when the golden temperature is zero as that means the efuse data is not correctly set (Julien Panis) - Update the processor_thermal part of the Intel int340x driver to support shared interrupts as the processor thermal device interrupt may in fact be shared with PCI devices (Srinivas Pandruvada) - Synchronize the suspend-prepare and post-suspend actions of the thermal PM notifier to avoid a destructive race condition and change the priority of that notifier to the minimum to avoid interference between the work items spawned by it and the other PM notifiers during system resume (Rafael Wysocki)" * tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: int340x: processor_thermal: Support shared interrupts thermal: core: Change PM notifier priority to the minimum thermal: core: Synchronize suspend-prepare and post-suspend actions thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188
2024-06-21Merge back thermal control material for v6.11.Rafael J. Wysocki
2024-06-14thermal: core: Change PM notifier priority to the minimumRafael J. Wysocki
It is reported that commit 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously") causes battery data in sysfs on Thinkpad P1 Gen2 to become invalid after a resume from S3 (and it is necessary to reboot the machine to restore correct battery data). Some investigation into the problem indicated that it happened because, after the commit in question, the ACPI battery PM notifier ran in parallel with thermal_zone_device_resume() for one of the thermal zones which apparently confused the platform firmware on the affected system. While the exact reason for the firmware confusion remains unclear, it is arguably not particularly relevant, and the expected behavior of the affected system can be restored by making the thermal PM notifier run at the lowest priority which avoids interference between work items spawned by it and the other PM notifiers (that will run before those work items now). Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218881 Reported-by: fhortner@yahoo.de Tested-by: fhortner@yahoo.de Cc: 6.8+ <stable@vger.kernel.org> # 6.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-14thermal: core: Synchronize suspend-prepare and post-suspend actionsRafael J. Wysocki
After commit 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously") it is theoretically possible that, if a system suspend starts immediately after a system resume, thermal_zone_device_resume() spawned by the thermal PM notifier for one of the thermal zones at the end of the system resume will run after the PM thermal notifier for the suspend-prepare action. If that happens, tz->suspended set by the latter will be reset by the former which may lead to unexpected consequences. To avoid that race, synchronize thermal_zone_device_resume() with the suspend-prepare thermal PM notifier with the help of additional bool field and completion in struct thermal_zone_device. Note that this also ensures running __thermal_zone_device_update() at least once for each thermal zone between system resume and the following system suspend in case it is needed to start thermal mitigation. Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-11thermal: core: Avoid calling .trip_crossed() for critical and hot tripsRafael J. Wysocki
Invoking the governor .trip_crossed() callback for critical and hot trips is pointless because they are handled directly by the core, so make thermal_governor_trip_crossed() avoid doing that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-11thermal: trip: Make thermal_zone_set_trips() use trip thresholdsRafael J. Wysocki
Modify thermal_zone_set_trips() to use trip thresholds instead of computing the low temperature for each trip to avoid deriving both the high and low temperature levels from the same trip (which may happen if the zone temperature falls into the hysteresis range of one trip). Accordingly, make __thermal_zone_device_update() call thermal_zone_set_trips() later, when threshold values have been updated for all trips. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal: trip: Rename __thermal_zone_set_trips() to thermal_zone_set_trips()Rafael J. Wysocki
Drop the pointless double underline prefix from the function name as per the subject. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal/debugfs: Do not extend mitigation episodes beyond system resumeRafael J. Wysocki
Because thermal zone handling by the thermal core is started from scratch during resume from system-wide suspend, prevent the debug code from extending mitigation episodes beyond that point by ending the mitigation episode currently in progress, if any, for each thermal zone. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-07thermal: core: Do not fail cdev registration because of invalid initial stateRafael J. Wysocki
It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") causes the ACPI fan driver to fail probing on some systems which turns out to be due to the _FST control method returning an invalid value until _FSL is first evaluated for the given fan. If this happens, the .get_cur_state() cooling device callback returns an error and __thermal_cooling_device_register() fails as uses that callback after commit 31a0fa0019b0. Arguably, _FST should not return an invalid value even if it is evaluated before _FSL, so this may be regarded as a platform firmware issue, but at the same time it is not a good enough reason for failing the cooling device registration where the initial cooling device state is only needed to initialize a thermal debug facility. Accordingly, modify __thermal_cooling_device_register() to avoid calling thermal_debug_cdev_add() instead of returning an error if the initial .get_cur_state() callback invocation fails. Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@collabora.com Reported-by: Laura Nao <laura.nao@collabora.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Laura Nao <laura.nao@collabora.com>
2024-05-27thermal: trip: Trigger trip down notifications when trips involved in ↵Rafael J. Wysocki
mitigation become invalid When a trip point becomes invalid after being crossed on the way up, it is involved in a mitigation episode that needs to be adjusted to compensate for the trip going away. For this reason, introduce thermal_zone_trip_down() as a wrapper around thermal_trip_crossed() and make thermal_zone_set_trip_temp() call it if the new temperature of the trip at hand is equal to THERMAL_TEMP_INVALID and it has been crossed on the way up to trigger all of the necessary adjustments in user space, the thermal debug code and the zone governor. Fixes: 8c69a777e480 ("thermal: core: Fix the handling of invalid trip points") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27thermal: core: Introduce thermal_trip_crossed()Rafael J. Wysocki
Add a helper function called thermal_trip_crossed() to be invoked by __thermal_zone_device_update() in order to notify user space, the thermal debug code and the zone governor about trip crossing. Subsequently, this will also be used in the case when a trip point becomes invalid after being crossed on the way up. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-17thermal: core: Fix the handling of invalid trip pointsRafael J. Wysocki
Commit 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed") overlooked the case when a trip point that has started as invalid is set to a valid temperature later. Namely, the initial threshold value for all trips is zero, so if a previously invalid trip becomes valid and its (new) low temperature is above the zone temperature, a spurious trip crossing notification will occur and it may trigger the WARN_ON() in handle_thermal_trip(). To address this, set the initial threshold for all trips to INT_MAX. There is also the case when a valid writable trip becomes invalid that requires special handling. First, in accordance with the change mentioned above, the trip's threshold needs to be set to INT_MAX to avoid the same issue. Second, if the trip in question is passive and it has been crossed by the thermal zone temperature on the way up, the zone's passive count has been incremented and it is in the passive polling mode, so its passive count needs to be adjusted to allow the passive polling to be turned off eventually. Fixes: 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed") Fixes: 042a3d80f118 ("thermal: core: Move passive polling management to the core") Reported-by: Zhang Rui <zhang.rui@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Wendy Wang <wendy.wang@intel.com>
2024-04-30thermal: core: Move passive polling management to the coreRafael J. Wysocki
Passive polling is enabled by setting the 'passive' field in struct thermal_zone_device to a positive value so long as the 'passive_delay_jiffies' field is greater than zero. It causes the thermal core to actively check the thermal zone temperature periodically which in theory should be done after crossing a passive trip point on the way up in order to allow governors to react more rapidly to temperature changes and adjust mitigation more precisely. However, the 'passive' field in struct thermal_zone_device is currently managed by governors which is quite problematic. First of all, only two governors, Step-Wise and Power Allocator, update that field at all, so the other governors do not benefit from passive polling, although in principle they should. Moreover, if the zone governor is changed from, say, Step-Wise to Fair-Share after 'passive' has been incremented by the former, it is not going to be reset back to zero by the latter even if the zone temperature falls down below all passive trip points. For this reason, make handle_thermal_trip() increment 'passive' to enable passive polling for the given thermal zone whenever a passive trip point is crossed on the way up and decrement it whenever a passive trip point is crossed on the way down. Also remove the 'passive' field updates from governors and additionally clear it in thermal_zone_device_init() to prevent passive polling from being enabled after a system resume just beacuse it was enabled before suspending the system. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-30thermal: core: Do not call handle_thermal_trip() if zone temperature is invalidRafael J. Wysocki
Make __thermal_zone_device_update() bail out if update_temperature() fails to update the zone temperature because __thermal_zone_get_temp() has returned an error and the current zone temperature is THERMAL_TEMP_INVALID (user space receiving netlink thermal messages, thermal debug code and thermal governors may get confused otherwise). Fixes: 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()Rafael J. Wysocki
If cdev_dt_seq_show() runs before the first state transition of a cooling device, it will not print any state residency information for it, even though it might be reasonably expected to print residency information for the initial state of the cooling device. For this reason, rearrange the code to get the initial state of a cooling device at the registration time and pass it to thermal_debug_cdev_add(), so that the latter can create a duration record for that state which will allow cdev_dt_seq_show() to print its residency information. Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information") Reported-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-24thermal: core: Introduce thermal_governor_trip_crossed()Rafael J. Wysocki
Add a wrapper around the .trip_crossed() governor callback invocation to reduce code duplications slightly and improve the code layout in __thermal_zone_device_update(). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-24thermal/debugfs: Rename thermal_debug_update_temp() to ↵Rafael J. Wysocki
thermal_debug_update_trip_stats() Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats() which is a better match for the purpose of the function. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24thermal/debugfs: Avoid excessive updates of trip point statisticsRafael J. Wysocki
Since thermal_debug_update_temp() is called before invoking thermal_debug_tz_trip_down() for the trips that were crossed by the zone temperature on the way up, it updates the statistics for them as though the current zone temperature was above the low temperature of each of them. However, if a given trip has just been crossed on the way down, the zone temperature is in fact below its low temperature, but this is handled by thermal_debug_tz_trip_down() running after the update of the trip statistics. The remedy is to call thermal_debug_update_temp() after thermal_debug_tz_trip_down() has been invoked for all of the trips in question, but then thermal_debug_tz_trip_up() needs to be adjusted, so it does not update the statistics for the trips that has just been crossed on the way up, as that will be taken care of by thermal_debug_update_temp() down the road. Modify the code accordingly. Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24thermal: core: Relocate critical and hot trip handlingRafael J. Wysocki
Modify handle_thermal_trip() to call handle_critical_trips() only after finding that the trip temperature has been crossed on the way up and remove the redundant temperature check from the latter. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24thermal: core: Drop the .throttle() governor callbackRafael J. Wysocki
Since all of the governors in the tree have been switched over to using the new callbacks, either .trip_crossed() or .manage(), the .throttle() governor callback is not used any more, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-23thermal: core: Introduce .manage() callback for thermal governorsRafael J. Wysocki
Introduce a new thermal governor callback called .manage() that will be invoked once per thermal zone update after processing all of the trip points in the core. This will allow governors that look at multiple trip points together to check all of them in a consistent configuration, so they don't need to play tricks with skipping .throttle() invocations that they are not interested in and they can avoid carrying out the same computations for multiple times in one cycle. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-19thermal: core: Introduce .trip_crossed() callback for thermal governorsRafael J. Wysocki
Introduce a new thermal governor callback called .trip_crossed() that will be invoked whenever a trip point is crossed by the zone temperature, either on the way up or on the way down. The trip crossing direction information will be passed to it and if multiple trips are crossed in the same direction during one thermal zone update, the new callback will be invoked for them in temperature order, either ascending or descending, depending on the trip crossing direction. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-08thermal: core: Sort trip point crossing notifications by temperatureRafael J. Wysocki
If multiple trip points are crossed in one go and the trips table in the thermal zone device object is not sorted, the corresponding trip point crossing notifications sent to user space will not be ordered either. Moreover, if the trips table is sorted by trip temperature in ascending order, the trip crossing notifications on the way up will be sent in that order too, but the trip crossing notifications on the way down will be sent in the reverse order. This is generally confusing and it is better to make the kernel send the notifications in the order of growing (on the way up) or falling (on the way down) trip temperature. To achieve that, instead of sending a trip crossing notification and recording a trip crossing event in the statistics right away from handle_thermal_trip(), put the trip in question on a list that will be sorted by __thermal_zone_device_update() after processing all of the trips and before sending the notifications and recording trip crossing events. Link: https://lore.kernel.org/linux-pm/20240306085428.88011-1-daniel.lezcano@linaro.org/ Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-08thermal: core: Send trip crossing notifications at init time if neededRafael J. Wysocki
If a trip point is already exceeded by the zone temperature at the initialization time, no trip crossing notification is send regarding this even though mitigation should be started then. Address this by rearranging the code in handle_thermal_trip() to send a trip crossing notification for trip points already exceeded by the zone temperature initially which also allows to reduce its size by using the observation that the initialization and regular trip crossing on the way up become the same case then. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-08thermal: core: Rewrite comments in handle_thermal_trip()Rafael J. Wysocki
Make the comments regarding trip crossing and threshold updates in handle_thermal_trip() slightly more clear. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-08thermal: core: Move threshold out of struct thermal_tripRafael J. Wysocki
The threshold field in struct thermal_trip is only used internally by the thermal core and it is better to prevent drivers from misusing it. It also takes some space unnecessarily in the trip tables passed by drivers to the core during thermal zone registration. For this reason, introduce struct thermal_trip_desc as a wrapper around struct thermal_trip, move the threshold field directly into it and make the thermal core store struct thermal_trip_desc objects in the internal thermal zone trip tables. Adjust all of the code using trip tables in the thermal core accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-03-05thermal: core: Remove excess empty line from a commentFlavio Suligoi
The first and the third lines of the kerneldoc comment for: thermal_zone_device_set_polling() belong to the same sentences, so join them together. Signed-off-by: Flavio Suligoi <f.suligoi@asem.it> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-27thermal: core: Eliminate writable trip points masksRafael J. Wysocki
All of the thermal_zone_device_register_with_trips() callers pass zero writable trip points masks to it, so drop the mask argument from that function and update all of its callers accordingly. This also removes the artificial trip points per zone limit of 32, related to using writable trip points masks. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-02-27thermal: core: Add flags to struct thermal_tripRafael J. Wysocki
In order to allow thermal zone creators to specify the writability of trip point temperature and hysteresis on a per-trip basis, add a flags field to struct thermal_trip and define flags to represent the desired trip properties. Also make thermal_zone_device_register_with_trips() set the THERMAL_TRIP_FLAG_RW_TEMP flag for all trips covered by the writable trips mask passed to it and modify the thermal sysfs code to look at the trip flags instead of using the writable trips mask directly or checking the presence of the .set_trip_hyst() zone callback. Additionally, make trip_point_temp_store() and trip_point_hyst_store() fail with an error code if the trip passed to one of them has THERMAL_TRIP_FLAG_RW_TEMP or THERMAL_TRIP_FLAG_RW_HYST, respectively, clear in its flags. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-27thermal: core: Move initial num_trips assignment before memcpy()Nathan Chancellor
When booting a CONFIG_FORTIFY_SOURCE=y kernel compiled with a toolchain that supports __counted_by() (such as clang-18 and newer), there is a panic on boot: [ 2.913770] memcpy: detected buffer overflow: 72 byte write of buffer size 0 [ 2.920834] WARNING: CPU: 2 PID: 1 at lib/string_helpers.c:1027 __fortify_report+0x5c/0x74 ... [ 3.039208] Call trace: [ 3.041643] __fortify_report+0x5c/0x74 [ 3.045469] __fortify_panic+0x18/0x20 [ 3.049209] thermal_zone_device_register_with_trips+0x4c8/0x4f8 This panic occurs because trips is counted by num_trips but num_trips is assigned after the call to memcpy(), so the fortify checks think the buffer size is zero because tz was allocated with kzalloc(). Move the num_trips assignment before the memcpy() to resolve the panic and ensure that the fortify checks work properly. Fixes: 9b0a62758665 ("thermal: core: Store zone trips table in struct thermal_zone_device") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-23thermal: core: Store zone ops in struct thermal_zone_deviceRafael J. Wysocki
The current code requires thermal zone creators to pass pointers to writable ops structures to thermal_zone_device_register_with_trips() which needs to modify the target struct thermal_zone_device_ops object if the "critical" operation in it is NULL. Moreover, the callers of thermal_zone_device_register_with_trips() are required to hold on to the struct thermal_zone_device_ops object passed to it until the given thermal zone is unregistered. Both of these requirements are quite inconvenient, so modify struct thermal_zone_device to contain struct thermal_zone_device_ops as field and make thermal_zone_device_register_with_trips() copy the contents of the struct thermal_zone_device_ops passed to it via a pointer (which can be const now) to that field. Also adjust the code using thermal zone ops accordingly and modify thermal_of_zone_register() to use a local ops variable during thermal zone registration so ops do not need to be freed in thermal_of_zone_unregister() any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-02-23thermal: core: Store zone trips table in struct thermal_zone_deviceRafael J. Wysocki
The current code expects thermal zone creators to pass a pointer to a writable trips table to thermal_zone_device_register_with_trips() and that trips table is then used by the thermal core going forward. Consequently, the callers of thermal_zone_device_register_with_trips() are required to hold on to the trips table passed to it until the given thermal zone is unregistered, at which point the trips table can be freed, but at the same time they are not expected to access that table directly. This is both error prone and confusing. To address it, turn the trips table pointer in struct thermal_zone_device into a flex array (counted by its num_trips field), allocate it during thermal zone device allocation and copy the contents of the trips table supplied by the zone creator (which can be const now) into it, which will allow the callers of thermal_zone_device_register_with_trips() to drop their trip tables right after the zone registration. This requires the imx thermal driver to be adjusted to store the new temperature in its internal trips table in imx_set_trip_temp(), because it will be separate from the core's trips table now and it has to be explicitly kept in sync with the latter. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>