summaryrefslogtreecommitdiff
path: root/drivers/thermal
AgeCommit message (Collapse)Author
2024-07-04thermal: uniphier: Use thermal_zone_for_each_trip() for walking trip pointsRafael J. Wysocki
It is generally inefficient to iterate over trip indices and call thermal_zone_get_trip() every time to get the struct thermal_trip corresponding to the given trip index, so modify the uniphier thermal driver to use thermal_zone_for_each_trip() for walking trips. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Link: https://patch.msgid.link/2148114.bB369e8A3T@kreacher [ rjw: Add missing return statement, remove unused local variable ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-27Merge back thermal control material for v6.11.Rafael J. Wysocki
2024-06-25Merge branch 'thermal-core'Rafael J. Wysocki
Merge thermal core changes for v6.11: - Fix and clean up several minor shortcomings in thermal debug (Rafael Wysocki). - Rename __thermal_zone_set_trips() to thermal_zone_set_trips() and make it use trip thresholds (Rafael Wysocki). - Use READ_ONCE() for lockless access to trip temperature and hysteresis (Rafael Wysocki). - Drop unnecessary cooling device target state checks from the Bang-Bang thermal governor (Rafael Wysocki). - Avoid invoking thermal governor .trip_crossed() callback for critical and hot trip points (Rafael Wysocki). * thermal-core: thermal: core: Avoid calling .trip_crossed() for critical and hot trips thermal: gov_bang_bang: Drop unnecessary cooling device target state checks thermal: trip: Use READ_ONCE() for lockless access to trip properties thermal: trip: Make thermal_zone_set_trips() use trip thresholds thermal: trip: Rename __thermal_zone_set_trips() to thermal_zone_set_trips() thermal: trip: Use common set of trip type names thermal/debugfs: Move some statements from under thermal_dbg->lock thermal/debugfs: Compute maximum temperature for mitigation episode as a whole thermal/debugfs: Adjust check for trips without statistics in tze_seq_show() thermal/debugfs: Fix up units in "mitigations" files thermal/debugfs: Print mitigation timestamp value in milliseconds thermal/debugfs: Do not extend mitigation episodes beyond system resume thermal/debugfs: Use helper to update trip point overstepping duration
2024-06-25thermal: gov_step_wise: Go straight to instance->lower when mitigation is overRafael J. Wysocki
Commit b6846826982b ("thermal: gov_step_wise: Restore passive polling management") attempted to fix a Step-Wise thermal governor issue introduced by commit 042a3d80f118 ("thermal: core: Move passive polling management to the core"), which caused the governor to leave cooling devices in high states, by partially reverting that commit. However, this turns out to be insufficient on some systems due to interactions between the governor code restored by commit b6846826982b and the passive polling management in the thermal core. For this reason, revert commit b6846826982b and make the governor set the target cooling device state to the "lower" one as soon as the zone temperature falls below the threshold of the trip point corresponding to the given thermal instance, which means that thermal mitigation is not necessary any more. Before this change the "lower" cooling device state would be reached in steps through the passive polling mechanism which was questionable for three reasons: (1) cooling device were kept in high states when that was not necessary (and it could adversely impact performance), (2) it only worked for thermal zones with nonzero passive_delay_jiffies value, and (3) passive polling belongs to the core and should not be hijacked by governors for their internal purposes. Fixes: b6846826982b ("thermal: gov_step_wise: Restore passive polling management") Closes: https://lore.kernel.org/linux-pm/6759ce9f-281d-4fcd-bb4c-b784a1cc5f6e@oldschoolsolutions.biz Reported-by: Jens Glathe <jens.glathe@oldschoolsolutions.biz> Tested-by: Jens Glathe <jens.glathe@oldschoolsolutions.biz> Link: https://patch.msgid.link/12464461.O9o76ZdvQC@rjwysocki.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Steev Klimaszewski <steev@kali.org> Tested-by: Johan Hovold <johan+linaro@kernel.org>
2024-06-21Merge tag 'thermal-6.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "These fix the Mediatek lvts_thermal driver, the Intel int340x driver, and the thermal core (two issues related to system suspend). Specifics: - Remove the filtered mode for mt8188 from lvts_thermal as it is not supported on this platform and fail the lvts_thermal initialization when the golden temperature is zero as that means the efuse data is not correctly set (Julien Panis) - Update the processor_thermal part of the Intel int340x driver to support shared interrupts as the processor thermal device interrupt may in fact be shared with PCI devices (Srinivas Pandruvada) - Synchronize the suspend-prepare and post-suspend actions of the thermal PM notifier to avoid a destructive race condition and change the priority of that notifier to the minimum to avoid interference between the work items spawned by it and the other PM notifiers during system resume (Rafael Wysocki)" * tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: int340x: processor_thermal: Support shared interrupts thermal: core: Change PM notifier priority to the minimum thermal: core: Synchronize suspend-prepare and post-suspend actions thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188
2024-06-21thermal: intel: int340x: Enable WLT and power floor support for Lunar LakeSrinivas Pandruvada
Add feature flgas for WLT and power floor for Lunar Lake. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619172109.497639-4-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: int340x: Support MSI interrupt for Lunar LakeSrinivas Pandruvada
The legacy PCI interrupt is no longer supported for processor thermal device on Lunar Lake. The support is via MSI. Add feature PROC_THERMAL_FEATURE_MSI_SUPPORT to support MSI feature per generation. Define this feature for Lunar Lake processors. There are 4 MSI sources: 0 - Package thermal 1 - DDR Thermal 2 - Power floor interrupt 3 - Workload type hint On interrupt, check the source and call the corresponding handler. Here don't need to call proc_thermal_check_wt_intr() and proc_thermal_check_power_floor_intr() to check if the interrupt is for those sources as there is a dedicated MSI interrupt. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619172109.497639-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: int340x: Remove unnecessary calls to free irqSrinivas Pandruvada
Remove calls to devm_free_irq() and pci_free_irq_vectors(). They will be called on driver release anyway. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619172109.497639-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: int340x: Add DLVR support for Lunar LakeSrinivas Pandruvada
Add support for DLVR (Digital Linear Voltage Regulator) for Lunar Lake. There are no new sysfs attributes or difference in operation compared to prior generations. MMIO offset and bit positions are changed compared to Meteor Lake processors. Also for two attributes dlvr_frequency_mhz and dlvr_frequency_select, the value presented or accepted by the firmware is not raw frequency value but an index. For example: RFI_FREQ_SELECT and RFI_FREQ : 0 DLVR freq point 2227.2 MHz : 1 DLVR freq point 2140 MHz Hence create a mapping table for Lunar Lake to map user space values to the firmware accepted values. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619124600.491168-4-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: int340x: Capability to map user space to firmware valuesSrinivas Pandruvada
To ensure compatibility between user inputs and firmware requirements, a conversion mechanism is necessary for certain attributes. For instance, on some platforms, the DLVR frequency must be translated into a predefined index before being communicated to the firmware. On Lunar Lake platform: RFI_FREQ_SELECT and RFI_FREQ: Index 0 corresponds to a DLVR frequency of 2227.2 MHz Index 1 corresponds to a DLVR frequency of 2140 MHz Introduce a feature that enables the conversion of values between user space inputs and firmware-accepted formats. This feature would also facilitate the reverse process, converting firmware values back into user friendly display values. To support this functionality, a model-specific mapping table will be utilized. When available, this table will provide the necessary translations between user space values and firmware values, ensuring seamless communication and accurate settings. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619124600.491168-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: int340x: Cleanup of DLVR sysfs on driver removeSrinivas Pandruvada
When only DLVR enabled without DVFS, during driver remove, proc_thermal_rfim_remove() is not called. Hence the DLVR sysfs is not deleted. On Lunar Lake DLVR is enabled without DVFS, hence this issue can be reproduced. Check also PROC_THERMAL_FEATURE_DLVR to call proc_thermal_rfim_remove(). Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619124600.491168-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: intel_tcc_cooling: Use a model-specific bitmask for TCC offsetRicardo Neri
The TCC offset field in the register MSR_TEMPERATURE_TARGET is not architectural. The TCC library provides a model-specific bitmask. Use it to determine the maximum TCC offset. Suggested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://patch.msgid.link/20240614211606.5896-3-ricardo.neri-calderon@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: intel_tcc: Add model checks for temperature registersRicardo Neri
The register MSR_TEMPERATURE_TARGET is not architectural. Its fields may be defined differently for each processor model. TCC_OFFSET is an example of such case. Despite being specified as architectural, the registers IA32_[PACKAGE]_ THERM_STATUS have become model-specific: in recent processors, the digital temperature readout uses bits [23:16] whereas the Intel Software Developer's manual specifies bits [22:16]. Create an array of processor models and their bitmasks for TCC_OFFSET and the digital temperature readout fields. Do not include recent processors. Instead, use the bitmasks of these recent processors as default. Use these model-specific bitmasks when reading TCC_OFFSET or the temperature sensors. Initialize a model-specific data structure during subsys_initcall() to have it ready when thermal drivers are loaded. Expose the new interface intel_tcc_get_offset_mask(). The intel_tcc_cooling driver will use it. Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://patch.msgid.link/20240614211606.5896-2-ricardo.neri-calderon@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21Merge back thermal control changes related to Intel platforms for v6.11Rafael J. Wysocki
2024-06-21Merge back thermal control material for v6.11.Rafael J. Wysocki
2024-06-19thermal: int340x: processor_thermal: Support shared interruptsSrinivas Pandruvada
On some systems the processor thermal device interrupt is shared with other PCI devices. In this case return IRQ_NONE from the interrupt handler when the interrupt is not for the processor thermal device. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Fixes: f0658708e863 ("thermal: int340x: processor_thermal: Use non MSI interrupts by default") Cc: 6.7+ <stable@vger.kernel.org> # 6.7+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-17Merge tag 'thermal-v6.10-rc4' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux Merge thermal driver fixes for 6.10-rc5 from Daniel Lezcano: "- Remove the filtered mode for mt8188 as it is not supported on this platform (Julien Panis) - Fail in case the golden temperature is zero as that means the efuse data is not correctly set (Julien Panis)" * tag 'thermal-v6.10-rc4' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux: thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188
2024-06-14thermal: core: Change PM notifier priority to the minimumRafael J. Wysocki
It is reported that commit 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously") causes battery data in sysfs on Thinkpad P1 Gen2 to become invalid after a resume from S3 (and it is necessary to reboot the machine to restore correct battery data). Some investigation into the problem indicated that it happened because, after the commit in question, the ACPI battery PM notifier ran in parallel with thermal_zone_device_resume() for one of the thermal zones which apparently confused the platform firmware on the affected system. While the exact reason for the firmware confusion remains unclear, it is arguably not particularly relevant, and the expected behavior of the affected system can be restored by making the thermal PM notifier run at the lowest priority which avoids interference between work items spawned by it and the other PM notifiers (that will run before those work items now). Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218881 Reported-by: fhortner@yahoo.de Tested-by: fhortner@yahoo.de Cc: 6.8+ <stable@vger.kernel.org> # 6.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-14thermal: core: Synchronize suspend-prepare and post-suspend actionsRafael J. Wysocki
After commit 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously") it is theoretically possible that, if a system suspend starts immediately after a system resume, thermal_zone_device_resume() spawned by the thermal PM notifier for one of the thermal zones at the end of the system resume will run after the PM thermal notifier for the suspend-prepare action. If that happens, tz->suspended set by the latter will be reset by the former which may lead to unexpected consequences. To avoid that race, synchronize thermal_zone_device_resume() with the suspend-prepare thermal PM notifier with the help of additional bool field and completion in struct thermal_zone_device. Note that this also ensures running __thermal_zone_device_update() at least once for each thermal zone between system resume and the following system suspend in case it is needed to start thermal mitigation. Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-12thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse ↵Julien Panis
data This patch prevents from registering thermal entries and letting the driver misbehave if efuse data is invalid. A device is not properly calibrated if the golden temperature is zero. Fixes: f5f633b18234 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver") Signed-off-by: Julien Panis <jpanis@baylibre.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240604-mtk-thermal-calib-check-v2-1-8f258254051d@baylibre.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal: core: Avoid calling .trip_crossed() for critical and hot tripsRafael J. Wysocki
Invoking the governor .trip_crossed() callback for critical and hot trips is pointless because they are handled directly by the core, so make thermal_governor_trip_crossed() avoid doing that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-11thermal: gov_bang_bang: Drop unnecessary cooling device target state checksRafael J. Wysocki
Some cooling device target state checks in bang_bang_control() done before setting the new target state are not necessary after recent changes, so drop them. Also avoid updating the target state before checking it for unexpected values. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-11thermal: trip: Use READ_ONCE() for lockless access to trip propertiesRafael J. Wysocki
When accessing trip temperature and hysteresis without locking, it is better to use READ_ONCE() to prevent compiler optimizations possibly affecting the read from being applied. Of course, for the READ_ONCE() to be effective, WRITE_ONCE() needs to be used when updating their values. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-11thermal: trip: Make thermal_zone_set_trips() use trip thresholdsRafael J. Wysocki
Modify thermal_zone_set_trips() to use trip thresholds instead of computing the low temperature for each trip to avoid deriving both the high and low temperature levels from the same trip (which may happen if the zone temperature falls into the hysteresis range of one trip). Accordingly, make __thermal_zone_device_update() call thermal_zone_set_trips() later, when threshold values have been updated for all trips. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal: trip: Rename __thermal_zone_set_trips() to thermal_zone_set_trips()Rafael J. Wysocki
Drop the pointless double underline prefix from the function name as per the subject. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal: trip: Use common set of trip type namesRafael J. Wysocki
Use the same set of trip type names in sysfs and in the thermal debug code output. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal/debugfs: Move some statements from under thermal_dbg->lockRafael J. Wysocki
The tz_dbg local variable assignments in thermal_debug_tz_trip_up(), thermal_debug_tz_trip_down(), and thermal_debug_update_trip_stats() need not be carried out under thermal_dbg->lock, so move them from under that lock (to avoid possible future confusion). While at it, reorder local variable definitions in thermal_debug_tz_trip_up() for more clarity. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal/debugfs: Compute maximum temperature for mitigation episode as a wholeRafael J. Wysocki
Notice that the maximum temperature above the trip point must be the same for all of the trip points involved in a given mitigation episode, so it need not be computerd for each of them separately. It is sufficient to compute the maximum temperature for the mitigation episode as a whole and print it accordingly, so do that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal/debugfs: Adjust check for trips without statistics in tze_seq_show()Rafael J. Wysocki
Initialize the trip_temp field in struct trip_stats to THERMAL_TEMP_INVALID and adjust the check for trips without statistics in tze_seq_show() to look at that field instead of comparing min and max. This will mostly be useful to simplify subsequent changes. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal/debugfs: Fix up units in "mitigations" filesRafael J. Wysocki
Print temperature units as m°C rather than °mC (the meaning of which is unclear) and add time unit to the duration column. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal/debugfs: Print mitigation timestamp value in millisecondsRafael J. Wysocki
Because mitigation episode duration is printed in milliseconds, there is no reason to print timestamp information for mitigation episodes in smaller units which also makes it somewhat harder to interpret the numbers. Print it in milliseconds for consistency. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal/debugfs: Do not extend mitigation episodes beyond system resumeRafael J. Wysocki
Because thermal zone handling by the thermal core is started from scratch during resume from system-wide suspend, prevent the debug code from extending mitigation episodes beyond that point by ending the mitigation episode currently in progress, if any, for each thermal zone. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal/debugfs: Use helper to update trip point overstepping durationRafael J. Wysocki
Add a helper for updating trip point overstepping duration to be called from thermal_debug_tz_trip_down(). Subsequently, it will also be used during resume from system-wide suspend. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11thermal: gov_step_wise: Restore passive polling managementRafael J. Wysocki
Consider a thermal zone with one passive trip point, a cooling device with 3 states (0, 1, 2) bound to it, passive polling enabled (nonzero passive_delay_jiffies) and no regular polling (polling_delay_jiffies equal to 0) that is managed by the Step-Wise governor. Suppose that the initial state of the cooling device is 0 and the zone temperature is below the trip point to start with. When the trip point is crossed, tz->passive is incremented by the thermal core and the governor's .manage() callback is invoked. It sets 'throttle' to 'true' for the trip in question and get_target_state() returns 1 for the instance corresponding to the cooling device (say that 'upper' and 'lower' are set to 2 and 0 for it, respectively), so its state changes to 1. Passive polling is still active for the zone, so next time the temperature is updated, the governor's .manage() callback will be invoked again. If the temperature is still rising, it will change the state of the cooling device to 2. Now suppose that next time the zone temperature is updated, it falls below the trip point, so tz->passive is decremented for the zone (say it becomes 0 then) and the governor's .manage() callbacks runs. It finds that the temperature trend for the zone is 'falling' and 'throttle' will be set to 'false' for the trip in question, so the cooling device's state will be changed to 1. However, because tz->polling is 0 for the zone, the governor's .manage() callback may not be invoked again for a long time and the cooling device's state will not be reset back to 0. This can happen because commit 042a3d80f118 ("thermal: core: Move passive polling management to the core") removed passive polling management from the Step-Wise governor. Before that change, thermal_zone_trip_update() would bump up tz->passive when changing the target state for a thermal instance from "no target" to a specific value and it would drop tz->passive when changing it back to "no target" which would cause passive polling to be active for the zone until the governor has reset the states of all cooling devices. In particular, in the example above tz->passive would be incremented when changing the state of the cooling device from 0 to 1 and then it would be still nonzero when the state of the cooling device was changed from 2 to 1. To prevent this problem from occurring, restore the passive polling management in the Step-Wise governor by partially reverting the commit in question and update the comment in the restored code to explain its role more clearly. Fixes: 042a3d80f118 ("thermal: core: Move passive polling management to the core") Closes: https://lore.kernel.org/linux-pm/ZmVfcEOxmjUHZTSX@hovoldconsulting.com Reported-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07thermal: intel: intel_pch: Improve cooling logZhang Rui
The intel_pch_thermal cooling mechanism currently only provides one of the following final conclusions: 1. intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [48C] 2. intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [49C] after 30700 ms delay 3. intel_pch_thermal 0000:00:12.0: CPU-PCH is hot [60C] after 60000 ms delay. S0ix might fail 4. intel_pch_thermal 0000:00:12.0: Wakeup event detected, abort cooling This does not provide sufficient context about what is happening, especially for case 4. Add one line log to indicate when PCH overheats and the cooling delay has started. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07thermal: int3403: remove unused struct 'int3403_performance_state'Dr. David Alan Gilbert
'int3403_performance_state' has never been used since the original commit 4384b8fe162d ("Thermal: introduce int3403 thermal driver"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07thermal: int3400: Use sizeof(*pointer) instead of sizeof(type)Erick Archer
It is preferred to use sizeof(*pointer) instead of sizeof(type) due to the type of the variable can change and one needs not change the former (unlike the latter). This patch has no effect on runtime behavior. Signed-off-by: Erick Archer <erick.archer@outlook.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07thermal: intel: intel_soc_dts_thermal: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07thermal: intel: intel_tcc_cooling: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07thermal: core: Do not fail cdev registration because of invalid initial stateRafael J. Wysocki
It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") causes the ACPI fan driver to fail probing on some systems which turns out to be due to the _FST control method returning an invalid value until _FSL is first evaluated for the given fan. If this happens, the .get_cur_state() cooling device callback returns an error and __thermal_cooling_device_register() fails as uses that callback after commit 31a0fa0019b0. Arguably, _FST should not return an invalid value even if it is evaluated before _FSL, so this may be regarded as a platform firmware issue, but at the same time it is not a good enough reason for failing the cooling device registration where the initial cooling device state is only needed to initialize a thermal debug facility. Accordingly, modify __thermal_cooling_device_register() to avoid calling thermal_debug_cdev_add() instead of returning an error if the initial .get_cur_state() callback invocation fails. Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@collabora.com Reported-by: Laura Nao <laura.nao@collabora.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Laura Nao <laura.nao@collabora.com>
2024-06-04thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188Julien Panis
Filtered mode is not supported on mt8188 SoC and is the source of bad results. Move to immediate mode which provides good temperatures. Fixes: f4745f546e60 ("thermal/drivers/mediatek/lvts_thermal: Add MT8188 support") Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Julien Panis <jpanis@baylibre.com> Link: https://lore.kernel.org/r/20240516-mtk-thermal-mt8188-mode-fix-v2-1-40a317442c62@baylibre.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-05-27thermal: trip: Trigger trip down notifications when trips involved in ↵Rafael J. Wysocki
mitigation become invalid When a trip point becomes invalid after being crossed on the way up, it is involved in a mitigation episode that needs to be adjusted to compensate for the trip going away. For this reason, introduce thermal_zone_trip_down() as a wrapper around thermal_trip_crossed() and make thermal_zone_set_trip_temp() call it if the new temperature of the trip at hand is equal to THERMAL_TEMP_INVALID and it has been crossed on the way up to trigger all of the necessary adjustments in user space, the thermal debug code and the zone governor. Fixes: 8c69a777e480 ("thermal: core: Fix the handling of invalid trip points") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27thermal: core: Introduce thermal_trip_crossed()Rafael J. Wysocki
Add a helper function called thermal_trip_crossed() to be invoked by __thermal_zone_device_update() in order to notify user space, the thermal debug code and the zone governor about trip crossing. Subsequently, this will also be used in the case when a trip point becomes invalid after being crossed on the way up. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27thermal/debugfs: Allow tze_seq_show() to print statistics for invalid tripsRafael J. Wysocki
Commit a6258fde8de3 ("thermal/debugfs: Make tze_seq_show() skip invalid trips and trips with no stats") modified tze_seq_show() to skip invalid trips, but it overlooked the fact that a trip may become invalid during a mitigation eposide involving it, in which case its statistics should still be reported. For this reason, remove the invalid trip temperature check from the main loop in tze_seq_show(). The trips that have never been valid will still be skipped after this change because there are no statistics to report for them. Fixes: a6258fde8de3 ("thermal/debugfs: Make tze_seq_show() skip invalid trips and trips with no stats") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27thermal/debugfs: Print initial trip temperature and hysteresis in tze_seq_show()Rafael J. Wysocki
The temperature and hysteresis of a trip point may change during a mitigation episode it is involved in (it may even become invalid altogether), so in order to avoid possible confusion related to that, store the temperature and hysteresis of trip points at the time they are crossed on the way up and print those values instead of their current temperature and hysteresis. Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-22tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-05-22Merge tag 'driver-core-6.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the small set of driver core and kernfs changes for 6.10-rc1. Nothing major here at all, just a small set of changes for some driver core apis, and minor fixups. Included in here are: - sysfs_bin_attr_simple_read() helper added and used - device_show_string() helper added and used All usages of these were acked by the various maintainers. Also in here are: - kernfs minor cleanup - removed unused functions - typo fix in documentation - pay attention to sysfs_create_link() failures in module.c finally All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: device property: Fix a typo in the description of device_get_child_node_count() kernfs: mount: Remove unnecessary ‘NULL’ values from knparent scsi: Use device_show_string() helper for sysfs attributes platform/x86: Use device_show_string() helper for sysfs attributes perf: Use device_show_string() helper for sysfs attributes IB/qib: Use device_show_string() helper for sysfs attributes hwmon: Use device_show_string() helper for sysfs attributes driver core: Add device_show_string() helper for sysfs attributes treewide: Use sysfs_bin_attr_simple_read() helper sysfs: Add sysfs_bin_attr_simple_read() helper module: don't ignore sysfs_create_link() failures driver core: Remove unused platform_notify, platform_notify_remove
2024-05-21Merge tag 'thermal-6.10-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "These fix the MediaTek lvts_thermal driver and the handling of trip points that start as invalid and are adjusted later by user space via sysfs. Specifics: - Fix and clean up the MediaTek lvts_thermal driver (Julien Panis) - Prevent invalid trip point handling from triggering spurious trip point crossing events and allow passive polling to stop when a passive trip point involved in it becomes invalid (Rafael Wysocki)" * tag 'thermal-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: core: Fix the handling of invalid trip points thermal/drivers/mediatek/lvts_thermal: Fix wrong lvts_ctrl index thermal/drivers/mediatek/lvts_thermal: Remove unused members from struct lvts_ctrl_data thermal/drivers/mediatek/lvts_thermal: Check NULL ptr on lvts_data
2024-05-17thermal: core: Fix the handling of invalid trip pointsRafael J. Wysocki
Commit 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed") overlooked the case when a trip point that has started as invalid is set to a valid temperature later. Namely, the initial threshold value for all trips is zero, so if a previously invalid trip becomes valid and its (new) low temperature is above the zone temperature, a spurious trip crossing notification will occur and it may trigger the WARN_ON() in handle_thermal_trip(). To address this, set the initial threshold for all trips to INT_MAX. There is also the case when a valid writable trip becomes invalid that requires special handling. First, in accordance with the change mentioned above, the trip's threshold needs to be set to INT_MAX to avoid the same issue. Second, if the trip in question is passive and it has been crossed by the thermal zone temperature on the way up, the zone's passive count has been incremented and it is in the passive polling mode, so its passive count needs to be adjusted to allow the passive polling to be turned off eventually. Fixes: 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed") Fixes: 042a3d80f118 ("thermal: core: Move passive polling management to the core") Reported-by: Zhang Rui <zhang.rui@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Wendy Wang <wendy.wang@intel.com>
2024-05-15Merge tag 'thermal-v6.10-rc1-2' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux Merge thermal driver fixes for 6.10-rc1 from Daniel Lezcano: "- Check for a NULL pointer before using it in the probe routine of the Mediatek LVTS driver (Julien Panis) - Remove the num_lvts_sensor and cal_offset fields of the lvts_ctrl_data as they are not used. These are not functional fixes but slight memory usage fix of the Mediatek LVTS driver (Julien Panis) - Fix wrong lvts_ctrl index leading to a NULL pointer dereference in the Mediatek LVTS driver (Julien Panis)" * tag 'thermal-v6.10-rc1-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux: thermal/drivers/mediatek/lvts_thermal: Fix wrong lvts_ctrl index thermal/drivers/mediatek/lvts_thermal: Remove unused members from struct lvts_ctrl_data thermal/drivers/mediatek/lvts_thermal: Check NULL ptr on lvts_data