summaryrefslogtreecommitdiff
path: root/kernel/power
AgeCommit message (Collapse)Author
2024-05-27Merge tag 'vfs-6.10-rc2.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix io_uring based write-through after converting cifs to use the netfs library - Fix aio error handling when doing write-through via netfs library - Fix performance regression in iomap when used with non-large folio mappings - Fix signalfd error code - Remove obsolete comment in signalfd code - Fix async request indication in netfs_perform_write() by raising BDP_ASYNC when IOCB_NOWAIT is set - Yield swap device immediately to prevent spurious EBUSY errors - Don't cross a .backup mountpoint from backup volumes in afs to avoid infinite loops - Fix a race between umount and async request completion in 9p after 9p was converted to use the netfs library * tag 'vfs-6.10-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: netfs, 9p: Fix race between umount and async request completion afs: Don't cross .backup mountpoint from backup volume swap: yield device immediately netfs: Fix setting of BDP_ASYNC from iocb flags signalfd: drop an obsolete comment signalfd: fix error return code iomap: fault in smaller chunks for non-large folio mappings filemap: add helper mapping_max_folio_size() netfs: Fix AIO error handling when doing write-through netfs: Fix io_uring based write-through
2024-05-24swap: yield device immediatelyChristian Brauner
Otherwise we can cause spurious EBUSY issues when trying to mount the rootfs later on. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218845 Reported-by: Petri Kaukasoina <petri.kaukasoina@tuni.fi> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-21Merge tag 'pull-set_blocksize' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs blocksize updates from Al Viro: "This gets rid of bogus set_blocksize() uses, switches it over to be based on a 'struct file *' and verifies that the caller has the device opened exclusively" * tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make set_blocksize() fail unless block device is opened exclusive set_blocksize(): switch to passing struct file * btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens swsusp: don't bother with setting block size zram: don't bother with reopening - just use O_EXCL for open swapon(2): open swap with O_EXCL swapon(2)/swapoff(2): don't bother with block size pktcdvd: sort set_blocksize() calls out bcache_register(): don't bother with set_blocksize()
2024-05-15Merge tag 'cgroup-for-6.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - The locking around cpuset hotplug processing has always been a bit of mess which was worked around by making hotplug processing asynchronous. The asynchronity isn't great and led to other issues. We tried to make the behavior synchronous a while ago but that led to lockdep splats. Waiman took another stab at cleaning up and making it synchronous. The patch has been in -next for well over a month and there haven't been any complaints, so fingers crossed. - Tracepoints added to help understanding rstat lock contentions. - A bunch of minor changes - doc updates, code cleanups and selftests. * tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits) cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints selftests/cgroup: Drop define _GNU_SOURCE docs: cgroup-v1: Update page cache removal functions selftests/cgroup: fix uninitialized variables in test_zswap.c selftests/cgroup: cpu_hogger init: use {} instead of {NULL} selftests/cgroup: fix clang warnings: uninitialized fd variable selftests/cgroup: fix clang build failures for abs() calls cgroup/cpuset: Remove outdated comment in sched_partition_write() cgroup/cpuset: Fix incorrect top_cpuset flags cgroup/cpuset: Avoid clearing CS_SCHED_LOAD_BALANCE twice cgroup/cpuset: Statically initialize more members of top_cpuset cgroup: Avoid unnecessary looping in cgroup_no_v1() cgroup, legacy_freezer: update comment for freezer_css_offline() docs, cgroup: add entries for pids to cgroup-v2.rst cgroup: don't call cgroup1_pidlist_destroy_all() for v2 cgroup_freezer: update comment for freezer_css_online() cgroup/rstat: desc member cgrp in cgroup_rstat_flush_release cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints cgroup/pids: Remove superfluous zeroing docs: cgroup-v1: Fix description for css_online ...
2024-05-13Merge branches 'pm-em' and 'pm-docs'Rafael J. Wysocki
Merge Enery Model update and a power management documentation update for 6.10: - Make the Samsung exynos-asv driver update the Energy Model after adjusting voltage on top of some preliminary changes of the OPP and Enery Model generic code (Lukasz Luba). - Remove a reference to a function that has been dropped from the power management documentation (Bjorn Helgaas). * pm-em: soc: samsung: exynos-asv: Update Energy Model after adjusting voltage PM: EM: Add em_dev_update_chip_binning() PM: EM: Refactor em_adjust_new_capacity() OPP: OF: Export dev_opp_pm_calc_power() for usage from EM * pm-docs: Documentation: PM: Update platform_pci_wakeup_init() reference
2024-05-02swsusp: don't bother with setting block sizeAl Viro
same as with the swap... Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-30PM: hibernate: replace deprecated strncpy() with strscpy()Justin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. This kernel config option is simply assigned with the resume_file buffer. It should be NUL-terminated but not necessarily NUL-padded as per its further usage with other string apis: | static int __init find_resume_device(void) | { | if (!strlen(resume_file)) | return -ENOENT; | | pm_pr_dbg("Checking hibernation image partition %s\n", resume_file); Use strscpy() [2] as it guarantees NUL-termination on the destination buffer. Specifically, use the new 2-argument version of strscpy() introduced in Commit e6584c3964f2f ("string: Allow 2-argument strscpy()"). Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Dhruva Gole <d-gole@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-08cgroup/cpuset: Make cpuset hotplug processing synchronousWaiman Long
Since commit 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()"), cpuset hotplug was done asynchronously via a work function. This is to avoid recursive locking of cgroup_mutex. Since then, the cgroup locking scheme has changed quite a bit. A cpuset_mutex was introduced to protect cpuset specific operations. The cpuset_mutex is then replaced by a cpuset_rwsem. With commit d74b27d63a8b ("cgroup/cpuset: Change cpuset_rwsem and hotplug lock order"), cpu_hotplug_lock is acquired before cpuset_rwsem. Later on, cpuset_rwsem is reverted back to cpuset_mutex. All these locking changes allow the hotplug code to call into cpuset core directly. The following commits were also merged due to the asynchronous nature of cpuset hotplug processing. - commit b22afcdf04c9 ("cpu/hotplug: Cure the cpusets trainwreck") - commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs") - commit 28b89b9e6f7b ("cpuset: handle race between CPU hotplug and cpuset_hotplug_work") Clean up all these bandages by making cpuset hotplug processing synchronous again with the exception that the call to cgroup_transfer_tasks() to transfer tasks out of an empty cgroup v1 cpuset, if necessary, will still be done via a work function due to the existing cgroup_mutex -> cpu_hotplug_lock dependency. It is possible to reverse that dependency, but that will require updating a number of different cgroup controllers. This special hotplug code path should be rarely taken anyway. As all the cpuset states will be updated by the end of the hotplug operation, we can revert most the above commits except commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs") which is partially reverted. Also removing some cpus_read_lock trylock attempts in the cpuset partition code as they are no longer necessary since the cpu_hotplug_lock is now held for the whole duration of the cpuset hotplug code path. Signed-off-by: Waiman Long <longman@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-08PM: EM: Add em_dev_update_chip_binning()Lukasz Luba
Add a function which allows to modify easily the EM after the new voltage information is available. The device drivers for the chip can adjust the voltage values after setup. The voltage for the same frequency in OPP can be different due to chip binning. The voltage impacts the power usage and the EM power values can be updated to reflect that. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-08PM: EM: Refactor em_adjust_new_capacity()Lukasz Luba
Extract em_table_dup() and em_recalc_and_update() from em_adjust_new_capacity(). Both functions will be later reused by the 'update EM due to chip binning' functionality. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-08PM: s2idle: Make sure CPUs will wakeup directly on resumeAnna-Maria Behnsen
s2idle works like a regular suspend with freezing processes and freezing devices. All CPUs except the control CPU go into idle. Once this is completed the control CPU kicks all other CPUs out of idle, so that they reenter the idle loop and then enter s2idle state. The control CPU then issues an swait() on the suspend state and therefore enters the idle loop as well. Due to being kicked out of idle, the other CPUs leave their NOHZ states, which means the tick is active and the corresponding hrtimer is programmed to the next jiffie. On entering s2idle the CPUs shut down their local clockevent device to prevent wakeups. The last CPU which enters s2idle shuts down its local clockevent and freezes timekeeping. On resume, one of the CPUs receives the wakeup interrupt, unfreezes timekeeping and its local clockevent and starts the resume process. At that point all other CPUs are still in s2idle with their clockevents switched off. They only resume when they are kicked by another CPU or after resuming devices and then receiving a device interrupt. That means there is no guarantee that all CPUs will wakeup directly on resume. As a consequence there is no guarantee that timers which are queued on those CPUs and should expire directly after resume, are handled. Also timer list timers which are remotely queued to one of those CPUs after resume will not result in a reprogramming IPI as the tick is active. Queueing a hrtimer will also not result in a reprogramming IPI because the first hrtimer event is already in the past. The recent introduction of the timer pull model (7ee988770326 ("timers: Implement the hierarchical pull model")) amplifies this problem, if the current migrator is one of the non woken up CPUs. When a non pinned timer list timer is queued and the queuing CPU goes idle, it relies on the still suspended migrator CPU to expire the timer which will happen by chance. The problem exists since commit 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path"). There the cpuidle_pause() call which in turn invoked a wakeup for all idle CPUs was moved to a later point in the resume process. This might not be reached or reached very late because it waits on a timer of a still suspended CPU. Address this by kicking all CPUs out of idle after the control CPU returns from swait() so that they resume their timers and restore consistent system state. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218641 Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path") Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Cc: 5.16+ <stable@kernel.org> # 5.16+ Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-21Merge tag 'rtc-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "Subsytem: - rtc_class is now const Drivers: - ds1511: cleanup, set date and time range and alarm offset limit - max31335: fix interrupt handler - pcf8523: improve suspend support" * tag 'rtc-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (28 commits) MAINTAINER: Include linux-arm-msm for Qualcomm RTC patches dt-bindings: rtc: zynqmp: Add support for Versal/Versal NET SoCs rtc: class: make rtc_class constant dt-bindings: rtc: abx80x: Improve checks on trickle charger constraints MAINTAINERS: adjust file entry in ARM/Mediatek RTC DRIVER rtc: nct3018y: fix possible NULL dereference rtc: max31335: fix interrupt status reg rtc: mt6397: select IRQ_DOMAIN instead of depending on it dt-bindings: rtc: abx80x: convert to yaml rtc: m41t80: Use the unified property API get the wakeup-source property dt-bindings: at91rm9260-rtt: add sam9x7 compatible dt-bindings: rtc: convert MT7622 RTC to the json-schema dt-bindings: rtc: convert MT2717 RTC to the json-schema rtc: pcf8523: add suspend handlers for alarm IRQ rtc: ds1511: set alarm offset limit rtc: ds1511: set range rtc: ds1511: drop inline/noinline hints rtc: ds1511: rename pdata rtc: ds1511: implement ds1511_rtc_read_alarm properly rtc: ds1511: remove partial alarm support ...
2024-03-13PM: EM: Force device drivers to provide power in uWLukasz Luba
The EM only supports power in uW. Make sure that it is not possible to register some downstream driver which doesn't provide power in uW. The only exception is artificial EM, but that EM is ignored by the rest of kernel frameworks (thermal, powercap, etc). Reported-by: PoShao Chen <poshao.chen@mediatek.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-13Merge tag 'pm-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "From the functional perspective, the most significant change here is the addition of support for Energy Models that can be updated dynamically at run time. There is also the addition of LZ4 compression support for hibernation, the new preferred core support in amd-pstate, new platforms support in the Intel RAPL driver, new model-specific EPP handling in intel_pstate and more. Apart from that, the cpufreq default transition delay is reduced from 10 ms to 2 ms (along with some related adjustments), the system suspend statistics code undergoes a significant rework and there is a usual bunch of fixes and code cleanups all over. Specifics: - Allow the Energy Model to be updated dynamically (Lukasz Luba) - Add support for LZ4 compression algorithm to the hibernation image creation and loading code (Nikhil V) - Fix and clean up system suspend statistics collection (Rafael Wysocki) - Simplify device suspend and resume handling in the power management core code (Rafael Wysocki) - Fix PCI hibernation support description (Yiwei Lin) - Make hibernation take set_memory_ro() return values into account as appropriate (Christophe Leroy) - Set mem_sleep_current during kernel command line setup to avoid an ordering issue with handling it (Maulik Shah) - Fix wake IRQs handling when pm_runtime_force_suspend() is used as a driver's system suspend callback (Qingliang Li) - Simplify pm_runtime_get_if_active() usage and add a replacement for pm_runtime_put_autosuspend() (Sakari Ailus) - Add a tracepoint for runtime_status changes tracking (Vilas Bhat) - Fix section title markdown in the runtime PM documentation (Yiwei Lin) - Enable preferred core support in the amd-pstate cpufreq driver (Meng Li) - Fix min_perf assignment in amd_pstate_adjust_perf() and make the min/max limit perf values in amd-pstate always stay within the (highest perf, lowest perf) range (Tor Vic, Meng Li) - Allow intel_pstate to assign model-specific values to strings used in the EPP sysfs interface and make it do so on Meteor Lake (Srinivas Pandruvada) - Drop long-unused cpudata::prev_cummulative_iowait from the intel_pstate cpufreq driver (Jiri Slaby) - Prevent scaling_cur_freq from exceeding scaling_max_freq when the latter is an inefficient frequency (Shivnandan Kumar) - Change default transition delay in cpufreq to 2ms (Qais Yousef) - Remove references to 10ms minimum sampling rate from comments in the cpufreq code (Pierre Gondois) - Honour transition_latency over transition_delay_us in cpufreq (Qais Yousef) - Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar) - General enhancements / cleanups to ARM cpufreq drivers (tianyu2, Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia Belova) - Update cpufreq-dt-platdev to block/approve devices (Richard Acayan) - Make the SCMI cpufreq driver get a transition delay value from firmware (Pierre Gondois) - Prevent the haltpoll cpuidle governor from shrinking guest poll_limit_ns below grow_start (Parshuram Sangle) - Avoid potential overflow in integer multiplication when computing cpuidle state parameters (C Cheng) - Adjust MWAIT hint target C-state computation in the ACPI cpuidle driver and in intel_idle to return a correct value for C0 (He Rongguang) - Address multiple issues in the TPMI RAPL driver and add support for new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui) - Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel Lezcano) - Fix kernel-doc for dtpm_create_hierarchy() (Yang Li) - Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth Norway Ananda) - Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil) - Fix a couple of warnings in the OPP core code related to W=1 builds (Viresh Kumar) - Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh Kumar) - Extend dev_pm_opp_data with turbo support (Sibi Sankar) - dt-bindings: drop maxItems from inner items (David Heidelberg)" * tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (95 commits) dt-bindings: opp: drop maxItems from inner items OPP: debugfs: Fix warning around icc_get_name() OPP: debugfs: Fix warning with W=1 builds cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h OPP: Extend dev_pm_opp_data with turbo support Fix cpupower-frequency-info.1 man page typo cpufreq: scmi: Set transition_delay_us firmware: arm_scmi: Populate fast channel rate_limit firmware: arm_scmi: Populate perf commands rate_limit cpuidle: ACPI/intel: fix MWAIT hint target C-state computation PM: sleep: wakeirq: fix wake irq warning in system suspend powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function cpufreq: Don't unregister cpufreq cooling on CPU hotplug PM: suspend: Set mem_sleep_current during kernel command line setup cpufreq: Honour transition_latency over transition_delay_us cpufreq: Limit resolving a frequency to policy min/max Documentation: PM: Fix runtime_pm.rst markdown syntax cpufreq: amd-pstate: adjust min/max limit perf cpufreq: Remove references to 10ms min sampling rate cpufreq: intel_pstate: Update default EPPs for Meteor Lake ...
2024-03-11Merge branch 'pm-em'Rafael J. Wysocki
Merge Enery Model changes for 6.9-rc1: - Allow the Energy Model to be updated dynamically (Lukasz Luba). * pm-em: (24 commits) PM: EM: Fix nr_states warnings in static checks Documentation: EM: Update with runtime modification design PM: EM: Add em_dev_compute_costs() PM: EM: Remove old table PM: EM: Change debugfs configuration to use runtime EM table data drivers/thermal/devfreq_cooling: Use new Energy Model interface drivers/thermal/cpufreq_cooling: Use new Energy Model interface powercap/dtpm_devfreq: Use new Energy Model interface to get table powercap/dtpm_cpu: Use new Energy Model interface to get table PM: EM: Optimize em_cpu_energy() and remove division PM: EM: Support late CPUs booting and capacity adjustment PM: EM: Add performance field to struct em_perf_state and optimize PM: EM: Add em_perf_state_from_pd() to get performance states table PM: EM: Introduce em_dev_update_perf_domain() for EM updates PM: EM: Add functions for memory allocations for new EM tables PM: EM: Use runtime modified EM for CPUs energy estimation in EAS PM: EM: Introduce runtime modifiable table PM: EM: Split the allocation and initialization of the EM table PM: EM: Check if the get_cost() callback is present in em_compute_costs() PM: EM: Introduce em_compute_costs() ...
2024-03-08rtc: class: make rtc_class constantRicardo B. Marliere
Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the rtc_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Link: https://lore.kernel.org/r/20240305-class_cleanup-abelloni-v1-1-944c026137c8@marliere.net Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-02-29PM: suspend: Set mem_sleep_current during kernel command line setupMaulik Shah
psci_init_system_suspend() invokes suspend_set_ops() very early during bootup even before kernel command line for mem_sleep_default is setup. This leads to kernel command line mem_sleep_default=s2idle not working as mem_sleep_current gets changed to deep via suspend_set_ops() and never changes back to s2idle. Set mem_sleep_current along with mem_sleep_default during kernel command line setup as default suspend mode. Fixes: faf7ec4a92c0 ("drivers: firmware: psci: add system suspend support") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-25power: port block device access to fileChristian Brauner
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-6-adbd023e19cc@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-22PM: EM: Fix nr_states warnings in static checksLukasz Luba
During the static checks nr_states has been mentioned by the kernel test robot. Fix the warning in those 2 places. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-22PM: hibernate: Don't ignore return from set_memory_ro()Christophe Leroy
set_memory_ro() and set_memory_rw() can fail, leaving memory unprotected. Take the returned value into account and abort in case of failure. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-22PM: hibernate: Support to select compression algorithmNikhil V
Currently the default compression algorithm is selected based on compile time options. Introduce a module parameter "hibernate.compressor" to override this behaviour. Different compression algorithms have different characteristics and hibernation may benefit when it uses any of these algorithms, especially when a secondary algorithm(LZ4) offers better decompression speeds over a default algorithm(LZO), which in turn reduces hibernation image restore time. Users can override the default algorithm in two ways: 1) Passing "hibernate.compressor" as kernel command line parameter. Usage: LZO: hibernate.compressor=lzo LZ4: hibernate.compressor=lz4 2) Specifying the algorithm at runtime. Usage: LZO: echo lzo > /sys/module/hibernate/parameters/compressor LZ4: echo lz4 > /sys/module/hibernate/parameters/compressor Currently LZO and LZ4 are the supported algorithms. LZO is the default compression algorithm used with hibernation. Signed-off-by: Nikhil V <quic_nprakash@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Add em_dev_compute_costs()Lukasz Luba
The device drivers can modify EM at runtime by providing a new EM table. The EM is used by the EAS and the em_perf_state::cost stores pre-calculated value to avoid overhead. This patch provides the API for device drivers to calculate the cost values properly (and not duplicate the same code). Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Remove old tableLukasz Luba
Remove the old EM table which wasn't able to modify the data. Clean the unneeded function and refactor the code a bit. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Change debugfs configuration to use runtime EM table dataLukasz Luba
Dump the runtime EM table values which can be modified in time. In order to do that allocate chunk of debug memory which can be later freed automatically thanks to devm_kcalloc(). This design can handle the fact that the EM table memory can change after EM update, so debug code cannot use the pointer from initialization phase. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Optimize em_cpu_energy() and remove divisionLukasz Luba
The Energy Model (EM) can be modified at runtime which brings new possibilities. The em_cpu_energy() is called by the Energy Aware Scheduler (EAS) in its hot path. The energy calculation uses power value for a given performance state (ps) and the CPU busy time as percentage for that given frequency. It is possible to avoid the division by 'scale_cpu' at runtime, because EM is updated whenever new max capacity CPU is set in the system. Use that feature and do the needed division during the calculation of the coefficient 'ps->cost'. That enhanced 'ps->cost' value can be then just multiplied simply by utilization: pd_nrg = ps->cost * \Sum cpu_util to get the needed energy for whole Performance Domain (PD). With this optimization and earlier removal of map_util_freq(), the em_cpu_energy() should run faster on the Big CPU by 1.43x and on the Little CPU by 1.69x (RockPi 4B board). Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Support late CPUs booting and capacity adjustmentLukasz Luba
The patch adds needed infrastructure to handle the late CPUs boot, which might change the previous CPUs capacity values. With this changes the new CPUs which try to register EM will trigger the needed re-calculations for other CPUs EMs. Thanks to that the em_per_state::performance values will be aligned with the CPU capacity information after all CPUs finish the boot and EM registrations. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Add performance field to struct em_perf_state and optimizeLukasz Luba
The performance doesn't scale linearly with the frequency. Also, it may be different in different workloads. Some CPUs are designed to be particularly good at some applications e.g. images or video processing and other CPUs in different. When those different types of CPUs are combined in one SoC they should be properly modeled to get max of the HW in Energy Aware Scheduler (EAS). The Energy Model (EM) provides the power vs. performance curves to the EAS, but assumes the CPUs capacity is fixed and scales linearly with the frequency. This patch allows to adjust the curve on the 'performance' axis as well. Code speed optimization: Removing map_util_freq() allows to avoid one division and one multiplication operations from the EAS hot code path. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Introduce em_dev_update_perf_domain() for EM updatesLukasz Luba
Add API function em_dev_update_perf_domain() which allows the EM to be changed safely. Concurrent updaters are serialized with a mutex and the removal of memory that will not be used any more is carried out with the help of RCU. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Add functions for memory allocations for new EM tablesLukasz Luba
The runtime modified EM table can be provided from drivers. Create mechanism which allows safely allocate and free the table for device drivers. The same table can be used by the EAS in task scheduler code paths, so make sure the memory is not freed when the device driver module is unloaded. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Introduce runtime modifiable tableLukasz Luba
The new runtime table can be populated with a new power data to better reflect the actual efficiency of the device e.g. CPU. The power can vary over time e.g. due to the SoC temperature change. Higher temperature can increase power values. For longer running scenarios, such as game or camera, when also other devices are used (e.g. GPU, ISP) the CPU power can change. The new EM framework is able to addresses this issue and change the EM data at runtime safely. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Split the allocation and initialization of the EM tableLukasz Luba
Split the process of allocation and data initialization for the EM table. The upcoming changes for modifiable EM will use it. This change is not expected to alter the general functionality. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Check if the get_cost() callback is present in em_compute_costs()Lukasz Luba
Subsequent changes will introduce a case in which 'cb->get_cost' may not be set in em_compute_costs(), so add a check to ensure that it is not NULL before attempting to dereference it. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Introduce em_compute_costs()Lukasz Luba
Move the EM costs computation code into a new dedicated function, em_compute_costs(), that can be reused in other places in the future. This change is not expected to alter the general functionality. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Find first CPU active while updating OPP efficiencyLukasz Luba
The Energy Model might be updated at runtime and the energy efficiency for each OPP may change. Thus, there is a need to update also the cpufreq framework and make it aligned to the new values. In order to do that, use a first active CPU from the Performance Domain. This is needed since the first CPU in the cpumask might be offline when we run this code path. Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Extend em_cpufreq_update_efficiencies() argument listLukasz Luba
In order to prepare the code for the modifiable EM perf_state table, make em_cpufreq_update_efficiencies() take a pointer to the EM table as its second argument and modify it to use that new argument instead of the 'table' member of dev->em_pd. No functional impact. Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Add missing newline for the message logLukasz Luba
Fix missing newline for the string long in the error code path. Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-05PM: hibernate: Add support for LZ4 compression for hibernationNikhil V
Extend the support for LZ4 compression to be used with hibernation. The main idea is that different compression algorithms have different characteristics and hibernation may benefit when it uses any of these algorithms: a default algorithm, having higher compression rate but is slower(compression/decompression) and a secondary algorithm, that is faster(compression/decompression) but has lower compression rate. LZ4 algorithm has better decompression speeds over LZO. This reduces the hibernation image restore time. As per test results: LZO LZ4 Size before Compression(bytes) 682696704 682393600 Size after Compression(bytes) 146502402 155993547 Decompression Rate 335.02 MB/s 501.05 MB/s Restore time 4.4s 3.8s LZO is the default compression algorithm used for hibernation. Enable CONFIG_HIBERNATION_COMP_LZ4 to set the default compressor as LZ4. Signed-off-by: Nikhil V <quic_nprakash@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-05PM: hibernate: Move to crypto APIs for LZO compressionNikhil V
Currently for hibernation, LZO is the only compression algorithm available and uses the existing LZO library calls. However, there is no flexibility to switch to other algorithms which provides better results. The main idea is that different compression algorithms have different characteristics and hibernation may benefit when it uses alternate algorithms. By moving to crypto based APIs, it lays a foundation to use other compression algorithms for hibernation. There are no functional changes introduced by this approach. Signed-off-by: Nikhil V <quic_nprakash@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-05PM: hibernate: Rename lzo* to make it genericNikhil V
Renaming lzo* to generic names, except for lzo_xxx() APIs. This is used in the next patch where we move to crypto based APIs for compression. There are no functional changes introduced by this approach. Signed-off-by: Nikhil V <quic_nprakash@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-05PM: sleep: stats: Use locking in dpm_save_failed_dev()Rafael J. Wysocki
Because dpm_save_failed_dev() may be called simultaneously by multiple failing device PM functions, the state of the suspend_stats fields updated by it may become inconsistent. Prevent that from happening by using a lock in dpm_save_failed_dev(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-02-05PM: sleep: stats: Define suspend_stats next to the code using itRafael J. Wysocki
It is not necessary to define struct suspend_stats in a header file and the suspend_stats variable in the core device system-wide PM code. They both can be defined in kernel/power/main.c, next to the sysfs and debugfs code accessing suspend_stats, which can be static. Modify the code in question in accordance with the above observation and replace the static inline functions manipulating suspend_stats with regular ones defined in kernel/power/main.c. While at it, move the enum suspend_stat_step to the end of suspend.h which is a more suitable place for it. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-02-05PM: sleep: stats: Use unsigned int for success and failure countersRafael J. Wysocki
Change the type of the "success" and "fail" fields in struct suspend_stats to unsigned int, because they cannot be negative. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-02-05PM: sleep: stats: Use an array of step failure countersRafael J. Wysocki
Instead of using a set of individual struct suspend_stats fields representing suspend step failure counters, use an array of counters indexed by enum suspend_stat_step for this purpose, which allows dpm_save_failed_step() to increment the appropriate counter automatically, so that its callers don't need to do that directly. It also allows suspend_stats_show() to carry out a loop over the counters array to print their values. Because the counters cannot become negative, use unsigned int for representing them. The only user-observable impact of this change is a different ordering of entries in the suspend_stats debugfs file which is not expected to matter. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-02-05PM: sleep: stats: Use array of suspend step namesRafael J. Wysocki
Replace suspend_step_name() in the suspend statistics code with an array of suspend step names which has fewer lines of code and less overhead. While at it, remove two unnecessary line breaks in suspend_stats_show() and adjust some white space in there to the kernel coding style for a more consistent code layout. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-12-20PM: hibernate: Repair excess function parameter description warningRandy Dunlap
Function swsusp_close() does not have any parameters, so remove the description of parameter @exclusive to prevent this warning. swap.c:1573: warning: Excess function parameter 'exclusive' description in 'swsusp_close' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-20PM: sleep: Remove obsolete comment from unlock_system_sleep()Kevin Hao
With the freezer changes introduced by commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"), the comment in unlock_system_sleep() has become obsolete, there is no need to retain it. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-19PM: hibernate: Use kmap_local_page() in copy_data_page()Chen Haonan
kmap_atomic() has been deprecated in favor of kmap_local_page(). kmap_atomic() disables page-faults and preemption (the latter only for !PREEMPT_RT kernels).The code between the mapping and un-mapping in this patch does not depend on the above-mentioned side effects.So simply replaced kmap_atomic() with kmap_local_page(). Signed-off-by: Chen Haonan <chen.haonan2@zte.com.cn> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-15PM: hibernate: Enforce ordering during image compression/decompressionHongchen Zhang
An S4 (suspend to disk) test on the LoongArch 3A6000 platform sometimes fails with the following error messaged in the dmesg log: Invalid LZO compressed length That happens because when compressing/decompressing the image, the synchronization between the control thread and the compress/decompress/crc thread is based on a relaxed ordering interface, which is unreliable, and the following situation may occur: CPU 0 CPU 1 save_image_lzo lzo_compress_threadfn atomic_set(&d->stop, 1); atomic_read(&data[thr].stop) data[thr].cmp = data[thr].cmp_len; WRITE data[thr].cmp_len Then CPU0 gets a stale cmp_len and writes it to disk. During resume from S4, wrong cmp_len is loaded. To maintain data consistency between the two threads, use the acquire/release variants of atomic set and read operations. Fixes: 081a9d043c98 ("PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Co-developed-by: Weihao Li <liweihao@loongson.cn> Signed-off-by: Weihao Li <liweihao@loongson.cn> [ rjw: Subject rewrite and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-15PM: hibernate: Avoid missing wakeup events during hibernationChris Feng
Wakeup events that occur in the hibernation process's hibernation_platform_enter() cannot wake up the system. Although the current hibernation framework will execute part of the recovery process after a wakeup event occurs, it ultimately performs a shutdown operation because the system does not check the return value of hibernation_platform_enter(). In short, if a wakeup event occurs before putting the system into the final low-power state, it will be missed. To solve this problem, check the return value of hibernation_platform_enter(). When it returns -EAGAIN or -EBUSY (indicate the occurrence of a wakeup event), execute the hibernation recovery process, discard the previously saved image, and ultimately return to the working state. Signed-off-by: Chris Feng <chris.feng@mediatek.com> [ rjw: Rephrase the message printed when going back to the working state ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-11PM: hibernate: Do not initialize error in snapshot_write_next()Li zeming
The error variable in snapshot_write_next() gets a value before it is used, so don't initialize it to 0 upfront. Signed-off-by: Li zeming <zeming@nfschina.com> [ rjw: Subject and changelog rewrite ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>