linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-03-25	Merge tag 'pm-6.15-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These are dominated by cpufreq updates which in turn are dominated by updates related to boost support in the core and drivers and amd-pstate driver optimizations. Apart from the above, there are some cpuidle updates including a rework of the most recent idle intervals handling in the venerable menu governor that leads to significant improvements in some performance benchmarks, as the governor is now more likely to predict a shorter idle duration in some cases, and there are updates of the core device power management code, mostly related to system suspend and resume, that should help to avoid potential issues arising when the drivers of devices depending on one another want to use different optimizations. There is also a usual collection of assorted fixes and cleanups, including removal of some unused code. Specifics: - Manage sysfs attributes and boost frequencies efficiently from cpufreq core to reduce boilerplate code in drivers (Viresh Kumar) - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider, Dhananjay Ugwekar, Imran Shaik, zuoqian) - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky Bai) - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski) - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng) - Optimize the amd-pstate driver to avoid cases where call paths end up calling the same writes multiple times and needlessly caching variables through code reorganization, locking overhaul and tracing adjustments (Mario Limonciello, Dhananjay Ugwekar) - Make it possible to avoid enabling capacity-aware scheduling (CAS) in the intel_pstate driver and relocate a check for out-of-band (OOB) platform handling in it to make it detect OOB before checking HWP availability (Rafael Wysocki) - Fix dbs_update() to avoid inadvertent conversions of negative integer values to unsigned int which causes CPU frequency selection to be inaccurate in some cases when the "conservative" cpufreq governor is in use (Jie Zhan) - Update the handling of the most recent idle intervals in the menu cpuidle governor to prevent useful information from being discarded by it in some cases and improve the prediction accuracy (Rafael Wysocki) - Make it possible to tell the intel_idle driver to ignore its built-in table of idle states for the given processor, clean up the handling of auto-demotion disabling on Baytrail and Cherrytrail chips in it, and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy, Rafael Wysocki) - Make some cpuidle drivers use for_each_present_cpu() instead of for_each_possible_cpu() during initialization to avoid issues occurring when nosmp or maxcpus=0 are used (Jacky Bai) - Clean up the Energy Model handling code somewhat (Rafael Wysocki) - Use kfree_rcu() to simplify the handling of runtime Energy Model updates (Li RongQing) - Add an entry for the Energy Model framework to MAINTAINERS as properly maintained (Lukasz Luba) - Address RCU-related sparse warnings in the Energy Model code (Rafael Wysocki) - Remove ENERGY_MODEL dependency on SMP and allow it to be selected when DEVFREQ is set without CPUFREQ so it can be used on a wider range of systems (Jeson Gao) - Unify error handling during runtime suspend and runtime resume in the core to help drivers to implement more consistent runtime PM error handling (Rafael Wysocki) - Drop a redundant check from pm_runtime_force_resume() and rearrange documentation related to __pm_runtime_disable() (Rafael Wysocki) - Rework the handling of the "smart suspend" driver flag in the PM core to avoid issues hat may occur when drivers using it depend on some other drivers and clean up the related PM core code (Rafael Wysocki, Colin Ian King) - Fix the handling of devices with the power.direct_complete flag set if device_suspend() returns an error for at least one device to avoid situations in which some of them may not be resumed (Rafael Wysocki) - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a possible deadlock that may occur if the "compressor" hibernation module parameter is accessed during the registration of a new ieee80211 device (Lizhi Xu) - Suppress sleeping parent warning in device_pm_add() in the case when new children are added under a device with the power.direct_complete set after it has been processed by device_resume() (Xu Yang) - Remove needless return in three void functions related to system wakeup (Zijun Hu) - Replace deprecated kmap_atomic() with kmap_local_page() in the hibernation core code (David Reaver) - Remove unused helper functions related to system sleep (David Alan Gilbert) - Clean up s2idle_enter() so it does not lock and unlock CPU offline in vain and update comments in it (Ulf Hansson) - Clean up broken white space in dpm_wait_for_children() (Geert Uytterhoeven) - Update the cpupower utility to fix lib version-ing in it and memory leaks in error legs, remove hard-coded values, and implement CPU physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah Khan, Yiwei Lin, Zhongqiu Han)" * tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits) PM: sleep: Fix bit masking operation dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650 dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1 dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible cpufreq: Init cpufreq only for present CPUs PM: sleep: Fix handling devices with direct_complete set on errors cpuidle: Init cpuidle only for present CPUs PM: clk: Remove unused pm_clk_remove() PM: sleep: core: Fix indentation in dpm_wait_for_children() PM: s2idle: Extend comment in s2idle_enter() PM: s2idle: Drop redundant locks when entering s2idle PM: sleep: Remove unused pm_generic_ wrappers cpufreq: tegra186: Share policy per cluster cpupower: Make lib versioning scheme more obvious and fix version link PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL PM: EM: Address RCU-related sparse warnings cpupower: Implement CPU physical core querying pm: cpupower: remove hard-coded topology depth values pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology ...
2025-02-27	cpufreq: intel_pstate: Avoid SMP calls to get cpu-type	Pawan Gupta
	Intel pstate driver relies on SMP calls to get the cpu-type of a given CPU. Remove the SMP calls and instead use the cached value of cpu-type which is more efficient. [ mingo: Forward ported it. ] Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-2-2ae010f50370@linux.intel.com
2025-02-21	cpufreq: intel_pstate: Relocate platform preference check	Rafael J. Wysocki
	Move the invocation of intel_pstate_platform_pwr_mgmt_exists() before checking whether or not HWP is enabled because it does not depend on any code running before it except for the vendor check and if CPU performance scaling is going to be carried out by the platform, all of the code that runs before that function (again, except for the vendor check) is redundant. This is not expected to alter any functionality except for the ordering of messages printed by intel_pstate_init() when it is going to return an error before attempting to register the driver. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/2776745.mvXUDI8C0e@rjwysocki.net
2025-02-18	cpufreq: intel_pstate: Make it possible to avoid enabling CAS	Rafael J. Wysocki
	Capacity-aware scheduling (CAS) is enabled by default by intel_pstate on hybrid systems without SMT, but in some usage scenarios it may be more attractive to place tasks for maximum CPU performance regardless of the extra cost in terms of energy, which is the case on such systems when CAS is not enabled, so introduce a command line option to forbid intel_pstate to enable CAS. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by:Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/2781262.mvXUDI8C0e@rjwysocki.net
2024-12-18	cpufreq: intel_pstate: Use CPUFREQ_POLICY_UNKNOWN	Christian Loehle
	epp_policy uses the same values as cpufreq_policy.policy and resets to CPUFREQ_POLICY_UNKNOWN during offlining. Be consistent about it and initialize to CPUFREQ_POLICY_UNKNOWN instead of 0, too. No functional change intended. Signed-off-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/20241211122605.3048503-3-christian.loehle@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-12-10	cpufreq: intel_pstate: Drop Arrow Lake from "scaling factor" list	Rafael J. Wysocki
	Since HYBRID_SCALING_FACTOR_MTL is not going to be suitable for Arrow Lake in general, drop it from the "known hybrid scaling factors" list of platforms, so the scaling factor for it will be determined with the help of information provided by the platform firmware via CPPC. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2307515.iZASKD2KPV@rjwysocki.net
2024-12-10	cpufreq: intel_pstate: Use CPPC to get scaling factors	Rafael J. Wysocki
	The perf-to-frequency scaling factors are used by intel_pstate on hybrid platforms to cast performance levels to frequency on different types of CPUs which is needed because the generic cpufreq sysfs interface works in the frequency domain. For some hybrid platforms already in the field, the scaling factors are known, but for others (including some upcoming ones) they most likely will be different and the only way to get them that scales is to use information provided by the platform firmware. In this particular case, the requisite information can be obtained via CPPC. If the P-core hybrid scaling factor for the given processor model is not known, use CPPC to compute hybrid scaling factors for all CPUs. Since the current default hybrid scaling factor is only suitable for a few early hybrid platforms, add intel_hybrid_scaling_factor[] entries for them and initialize the scaling factor to zero ("unknown") by default. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/8476313.T7Z3S40VBb@rjwysocki.net
2024-11-14	Merge back cpufreq material for 6.13	Rafael J. Wysocki

2024-11-13	cpufreq: intel_pstate: Update Balance-performance EPP for Granite Rapids	Srinivas Pandruvada
	Update EPP default for balance_performance to 32. This will give better performance out of the box using Intel P-State powersave governor while still offering power savings compared to performance governor. This is in line with what has already been done for Emerald Rapids and Sapphire Rapids. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20241112235946.368082-1-srinivas.pandruvada@linux.intel.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-11	cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling()	Rafael J. Wysocki
	Notice that hybrid_init_cpu_capacity_scaling() only needs to hold hybrid_capacity_lock around __hybrid_init_cpu_capacity_scaling() calls, so introduce a "locked" wrapper around the latter and call it from the former. This allows to drop a local variable and a label that are not needed any more. Also, rename __hybrid_init_cpu_capacity_scaling() to __hybrid_refresh_cpu_capacity_scaling() for consistency. Interestingly enough, this fixes a locking issue introduced by commit 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") that put an arch_enable_hybrid_capacity_scale() call under hybrid_capacity_lock, which was a mistake because the latter is acquired in CPU hotplug paths and so it cannot be held around cpus_read_lock() calls. Link: https://lore.kernel.org/linux-pm/SJ1PR11MB6129EDBF22F8A90FC3A3EDC8B9582@SJ1PR11MB6129.namprd11.prod.outlook.com/ Fixes: 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com> Link: https://patch.msgid.link/12554508.O9o76ZdvQC@rjwysocki.net [ rjw: Changelog update ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-04	cpufreq: intel_pstate: Update asym capacity for CPUs that were offline initially	Rafael J. Wysocki
	Commit 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") overlooked a corner case in which some CPUs may be offline to start with and brought back online later, after the intel_pstate driver has been registered, so their asymmetric capacity will not be set. Address this by calling hybrid_update_capacity() in the CPU initialization path that is executed instead of the online path for those CPUs. Note that this asymmetric capacity update will be skipped during driver initialization and mode switches because hybrid_max_perf_cpu is NULL in those cases. Fixes: 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/1913414.tdWV9SEqCh@rjwysocki.net
2024-11-04	cpufreq: intel_pstate: Clear hybrid_max_perf_cpu before driver registration	Rafael J. Wysocki
	Modify intel_pstate_register_driver() to clear hybrid_max_perf_cpu before calling cpufreq_register_driver(), so that asymmetric CPU capacity scaling is not updated until hybrid_init_cpu_capacity_scaling() runs down the road. This is done in preparation for a subsequent change adding asymmetric CPU capacity computation to the CPU init path to handle CPUs that are initially offline. The information on whether or not hybrid_max_perf_cpu was NULL before it has been cleared is passed to hybrid_init_cpu_capacity_scaling(), so full initialization of CPU capacity scaling can be skipped if it has been carried out already. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/4616631.LvFx2qVVIh@rjwysocki.net
2024-10-01	cpufreq: intel_pstate: Make hwp_notify_lock a raw spinlock	Uwe Kleine-König
	notify_hwp_interrupt() is called via sysvec_thermal() -> smp_thermal_vector() -> intel_thermal_interrupt() in hard irq context. For this reason it must not use a simple spin_lock that sleeps with PREEMPT_RT enabled. So convert it to a raw spinlock. Reported-by: xiao sheng wen <atzlinux@sina.com> Link: https://bugs.debian.org/1076483 Signed-off-by: Uwe Kleine-König <ukleinek@debian.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: xiao sheng wen <atzlinux@sina.com> Link: https://patch.msgid.link/20240919081121.10784-2-ukleinek@debian.org Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-09-04	cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems	Rafael J. Wysocki
	Make intel_pstate use the HWP_HIGHEST_PERF values from MSR_HWP_CAPABILITIES to set asymmetric CPU capacity information via the previously introduced arch_set_cpu_capacity() on hybrid systems without SMT. Setting asymmetric CPU capacity is generally necessary to allow the scheduler to compute task sizes in a consistent way across all CPUs in a system where they differ by capacity. That, in turn, should help to improve scheduling decisions. It is also necessary for the schedutil cpufreq governor to operate as expected on hybrid systems where tasks migrate between CPUs of different capacities. The underlying observation is that intel_pstate already uses MSR_HWP_CAPABILITIES to get CPU performance information which is exposed by it via sysfs and CPU performance scaling is based on it. Thus using this information for setting asymmetric CPU capacity is consistent with what the driver has been doing already. Moreover, HWP_HIGHEST_PERF reflects the maximum capacity of a given CPU including both the instructions-per-cycle (IPC) factor and the maximum turbo frequency and the units in which that value is expressed are the same for all CPUs in the system, so the maximum capacity ratio between two CPUs can be obtained by computing the ratio of their HWP_HIGHEST_PERF values. Of course, in principle that capacity ratio need not be directly applicable at lower frequencies, so using it for providing the asymmetric CPU capacity information to the scheduler is a rough approximation, but it is as good as it gets. Also, measurements indicate that this approximation is not too bad in practice. If the given system is hybrid and non-SMT, the new code disables ITMT support in the scheduler (because it may get in the way of asymmetric CPU capacity code in the scheduler that automatically gets enabled by setting asymmetric CPU capacity) after initializing all online CPUs and finds the one with the maximum HWP_HIGHEST_PERF value. Next, it computes the capacity number for each (online) CPU by dividing the product of its HWP_HIGHEST_PERF and SCHED_CAPACITY_SCALE by the maximum HWP_HIGHEST_PERF. When a CPU goes offline, its capacity is reset to SCHED_CAPACITY_SCALE and if it is the one with the maximum HWP_HIGHEST_PERF value, the capacity numbers for all of the other online CPUs are recomputed. This also takes care of a cleanup during driver operation mode changes. Analogously, when a new CPU goes online, its capacity number is updated and if its HWP_HIGHEST_PERF value is greater than the current maximum one, the capacity numbers for all of the other online CPUs are recomputed. The case when the driver is notified of a CPU capacity change, either through the HWP interrupt or through an ACPI notification, is handled similarly to the CPU online case above, except that if the target CPU is the current highest-capacity one and its capacity is reduced, the capacity numbers for all of the other online CPUs need to be recomputed either. If the driver's "no_trubo" sysfs attribute is updated, all of the CPU capacity information is computed from scratch to reflect the new turbo status. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> # scale invariance Link: https://patch.msgid.link/1979653.PYKUYFuaPT@rjwysocki.net [ rjw: Fixed a typo in the changelog ] [ rjw: Renamed 3 new functions and added a comment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-19	cpufreq: intel_pstate: Support Granite Rapids and Sierra Forest OOB mode	Srinivas Pandruvada
	Prevent intel_pstate from loading when OOB (Out Of Band) P-states mode is enabled. The OOB identifying bits are same as for the prior generation CPUs like Emerald Rapids servers. Add Granite Rapids and Sierra Forest CPU models to intel_pstate_cpu_oob_ids[]. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240802184839.1909091-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-02	cpufreq: intel_pstate: Update Balance performance EPP for Emerald Rapids	Pedro Henrique Kopper
	On Intel Emerald Rapids machines, we ship the Energy Performance Preference (EPP) default for balance_performance as 128. However, during an internal investigation together with Intel, we have determined that 32 is a more suitable value. This leads to significant improvements in both performance and energy: POV-Ray: 32% faster \| 12% less energy OpenSSL: 12% faster \| energy within 1% Build Linux Kernel: 29% faster \| 18% less energy Therefore, we should move the default EPP for balance_performance to 32. This is in line with what has already been done for Sapphire Rapids. Signed-off-by: Pedro Henrique Kopper <pedro.kopper@canonical.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/Zqu6zjVMoiXwROBI@capivara Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-09	Merge tag 'cpufreq-arm-updates-6.11' of ↵	Rafael J. Wysocki
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Merge ARM cpufreq updates for 6.11 from Viresh Kumar: "- cpufreq: Add Loongson-3 CPUFreq driver support (Huacai Chen). - Make exit() callback return void (Lizhe and Viresh Kumar). - Minor cleanups and fixes in several drivers (Bryan Brattlof, Javier Carrasco, Jagadeesh Kona, Jeff Johnson, Nícolas F. R. A. Prado, Primoz Fiser, Raphael Gallais-Pou, and Riwen Lu)." * tag 'cpufreq-arm-updates-6.11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (21 commits) cpufreq: sti: fix build warning cpufreq: mediatek: Use dev_err_probe in every error path in probe cpufreq: Add Loongson-3 CPUFreq driver support cpufreq: Make cpufreq_driver->exit() return void cpufreq: pcc: Remove empty exit() callback cpufreq: loongson2: Remove empty exit() callback cpufreq: nforce2: Remove empty exit() callback cpufreq: sti: add missing MODULE_DEVICE_TABLE entry for stih418 cpufreq: ti: update OPP table for AM62Px SoCs cpufreq: ti: update OPP table for AM62Ax SoCs cpufreq: sun50i: add Allwinner H700 speed bin cpufreq/cppc: Don't compare desired_perf in target() OPP: ti: Fix ti_opp_supply_probe wrong return values cpufreq: ti-cpufreq: Handle deferred probe with dev_err_probe() cpufreq: dt-platdev: add missing MODULE_DESCRIPTION() macro cpufreq: longhaul: Fix kernel-doc param for longhaul_setstate cpufreq: qcom-nvmem: eliminate uses of of_node_put() cpufreq: qcom-nvmem: fix memory leaks in probe error paths cpufreq: scmi: Avoid overflow of target_freq in fast switch cpufreq: sun50i: replace of_node_put() with automatic cleanup handler ...
2024-07-09	cpufreq: Make cpufreq_driver->exit() return void	Lizhe
	The cpufreq core doesn't check the return type of the exit() callback and there is not much the core can do on failures at that point. Just drop the returned value and make it return void. Signed-off-by: Lizhe <sensor1010@163.com> [ Viresh: Reworked the patches to fix all missing changes together. ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> # Mediatek Acked-by: Sudeep Holla <sudeep.holla@arm.com> # scpi, scmi, vexpress Acked-by: Mario Limonciello <mario.limonciello@amd.com> # amd Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> # bmips Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Kevin Hilman <khilman@baylibre.com> # omap
2024-06-28	cpufreq: intel_pstate: Support highest performance change interrupt	Srinivas Pandruvada
	On some systems, the HWP (Hardware P-states) highest performance level can change from the value set at boot-up. This behavior can lead to two issues: - The 'cpuinfo_max_freq' within the 'cpufreq' sysfs will not reflect the CPU's highest achievable performance. - Even if the CPU's highest performance level is increased after booting, the CPU may not reach the full expected performance. The availability of this feature is indicated by the CPUID instruction: if CPUID[6].EAX[15] is set to 1, the feature is supported. When supported, setting bit 2 of the MSR_HWP_INTERRUPT register enables notifications of the highest performance level changes. Therefore, as part of enabling the HWP interrupt, bit 2 of the MSR_HWP_INTERRUPT should also be set when this feature is supported. Upon a change in the highest performance level, a new HWP interrupt is generated, with bit 3 of the MSR_HWP_STATUS register set, and the MSR_HWP_CAPABILITIES register is updated with the new highest performance limit. The processing of the interrupt is the same as the guaranteed performance change. Notify change to cpufreq core and update MSR_HWP_REQUEST with new performance limits. The current driver implementation already takes care of the highest performance change as part of: commit dfeeedc1bf57 ("cpufreq: intel_pstate: Update cpuinfo.max_freq on HWP_CAP changes") For example: Before highest performance change interrupt: cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 3700000 cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 3700000 After highest performance changes interrupt: cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 3900000 cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 3900000 Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240624161109.1427640-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-27	Merge back cpufreq material for v6.11.	Rafael J. Wysocki

2024-06-25	cpufreq: intel_pstate: Replace boot_cpu_has()	Srinivas Pandruvada
	Replace boot_cpu_has() with cpu_feature_enabled(). Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240624162714.1431182-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-24	cpufreq: intel_pstate: Use HWP to initialize ITMT if CPPC is missing	Rafael J. Wysocki
	It is reported that single-thread performance on some hybrid systems dropped significantly after commit 7feec7430edd ("ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked") which prevented _CPC from being used if the support for it had not been confirmed by the platform firmware. The problem is that if the platform firmware does not confirm CPPC v2 support, cppc_get_perf_caps() returns an error which prevents the intel_pstate driver from enabling ITMT. Consequently, the scheduler does not get any hints on CPU performance differences, so in a hybrid system some tasks may run on CPUs with lower capacity even though they should be running on high-capacity CPUs. To address this, modify intel_pstate to use the information from MSR_HWP_CAPABILITIES to enable ITMT if CPPC is not available (which is done already if the highest performance number coming from CPPC is not realistic). Fixes: 7feec7430edd ("ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked") Closes: https://lore.kernel.org/linux-acpi/d01b0a1f-bd33-47fe-ab41-43843d8a374f@kfocus.org Link: https://lore.kernel.org/linux-acpi/ZnD22b3Br1ng7alf@kf-XE Reported-by: Aaron Rainbolt <arainbolt@kfocus.org> Tested-by: Aaron Rainbolt <arainbolt@kfocus.org> Cc: 5.19+ <stable@vger.kernel.org> # 5.19+ Link: https://patch.msgid.link/12460110.O9o76ZdvQC@rjwysocki.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
2024-06-19	cpufreq: intel_pstate: Update Lunar Lake hybrid scaling factor	Srinivas Pandruvada
	Change hybrid scaling factor for Lunar Lake. Scaling factor is 1.15 for P-cores compared to E-cores. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240618055221.446108-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-19	cpufreq: intel_pstate: Update Arrow Lake hybrid scaling factor	Srinivas Pandruvada
	Arrow Lake uses the same scaling factor as Meteor Lake, so reuse the same scaling factor. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240618055221.446108-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-14	Merge back new cpufreq material for v6.11.	Rafael J. Wysocki

2024-06-12	cpufreq: intel_pstate: Check turbo_is_disabled() in store_no_turbo()	Rafael J. Wysocki
	After recent changes in intel_pstate, global.turbo_disabled is only set at the initialization time and never changed. However, it turns out that on some systems the "turbo disabled" bit in MSR_IA32_MISC_ENABLE, the initial state of which is reflected by global.turbo_disabled, can be flipped later and there should be a way to take that into account (other than checking that MSR every time the driver runs which is costly and useless overhead on the vast majority of systems). For this purpose, notice that before the changes in question, store_no_turbo() contained a turbo_is_disabled() check that was used for updating global.turbo_disabled if the "turbo disabled" bit in MSR_IA32_MISC_ENABLE had been flipped and that functionality can be restored. Then, users will be able to reset global.turbo_disabled by writing 0 to no_turbo which used to work before on systems with flipping "turbo disabled" bit. This guarantees the driver state to remain in sync, but READ_ONCE() annotations need to be added in two places where global.turbo_disabled is accessed locklessly, so modify the driver to make that happen. Fixes: 0940f1a8011f ("cpufreq: intel_pstate: Do not update global.turbo_disabled after initialization") Closes: https://lore.kernel.org/linux-pm/bf3ebf1571a4788e97daf861eb493c12d42639a3.camel@xry111.site Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reported-by: Xi Ruoyao <xry111@xry111.site> Tested-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07	cpufreq: intel_pstate: Support Emerald Rapids OOB mode	Srinivas Pandruvada
	Prevent intel_pstate from loading when OOB (Out Of Band) P-states mode is enabled in Emerald Rapids. The OOB identifying bits are same as for the prior generation CPUs like Sapphire Rapids servers, so also add Emerald Rapids to the intel_pstate_cpu_oob_ids[] list. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07	cpufreq: intel_pstate: Use Meteor Lake EPPs for Arrow Lake	Srinivas Pandruvada
	Use the same default EPPs as Meteor Lake generation. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07	cpufreq: intel_pstate: Update Meteor Lake EPPs	Srinivas Pandruvada
	Update the default balance_performance EPP to 64. This gives better performance and also perf/watt compared to current value of 115. For example: Speedometer 2.1 score: +19% Perf/watt: +5.25% Webxprt 4 score score: +12% Perf/watt: +6.12% 3DMark Wildlife extreme unlimited score score: +3.2% Perf/watt: +11.5% Geekbench6 MT score: +2.14% Perf/watt: +0.32% Also update balance_power EPP default to 179. With this change: Video Playback power is reduced by 52% Team video conference power is reduced by 35% With Power profile daemon now sets balance_power EPP on DC instead of balance_performance, updating balance_power EPP will help to extend battery life. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07	cpufreq: intel_pstate: Switch to new Intel CPU model defines	Tony Luck
	New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-03	cpufreq: intel_pstate: Fix unchecked HWP MSR access	Srinivas Pandruvada
	Fix unchecked MSR access error for processors with no HWP support. On such processors, maximum frequency can be changed by the system firmware using ACPI event ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED. This results in accessing HWP MSR 0x771. Call Trace: <TASK> generic_exec_single+0x58/0x120 smp_call_function_single+0xbf/0x110 rdmsrl_on_cpu+0x46/0x60 intel_pstate_get_hwp_cap+0x1b/0x70 intel_pstate_update_limits+0x2a/0x60 acpi_processor_notify+0xb7/0x140 acpi_ev_notify_dispatch+0x3b/0x60 HWP MSR 0x771 can be only read on a CPU which supports HWP and enabled. Hence intel_pstate_get_hwp_cap() can only be called when hwp_active is true. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Closes: https://lore.kernel.org/linux-pm/20240529155740.Hq2Hw7be@linutronix.de/ Fixes: e8217b4bece3 ("cpufreq: intel_pstate: Update the maximum CPU frequency consistently") Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-07	cpufreq: intel_pstate: fix struct cpudata::epp_cached kernel-doc	Jeff Johnson
	make C=1 currently gives the following warning: drivers/cpufreq/intel_pstate.c:262: warning: Function parameter or struct member 'epp_cached' not described in 'cpudata' Add the missing ":" to fix the trivial kernel-doc syntax error. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-03	cpufreq: intel_pstate: hide unused intel_pstate_cpu_oob_ids[]	Arnd Bergmann
	The reference to this variable is hidden in an #ifdef: drivers/cpufreq/intel_pstate.c:2440:32: error: 'intel_pstate_cpu_oob_ids' defined but not used [-Werror=unused-const-variable=] Use the same check around the definition. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02	cpufreq: intel_pstate: Update the maximum CPU frequency consistently	Rafael J. Wysocki
	There are 3 places at which the maximum CPU frequency may change, store_no_turbo(), intel_pstate_update_limits() (when called by the cpufreq core) and intel_pstate_notify_work() (when handling a HWP change notification). Currently, cpuinfo.max_freq is only updated by store_no_turbo() and intel_pstate_notify_work(), although it principle it may be necessary to update it in intel_pstate_update_limits() either. Make all of them mutually consistent. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02	cpufreq: intel_pstate: Replace three global.turbo_disabled checks	Rafael J. Wysocki
	Replace the global.turbo_disabled in __intel_pstate_update_max_freq() with a global.no_turbo one to make store_no_turbo() actually update the maximum CPU frequency on the trubo preference changes, which needs to be consistent with arch_set_max_freq_ratio() called from there. For more consistency, replace the global.turbo_disabled checks in __intel_pstate_cpu_init() and intel_cpufreq_adjust_perf() with global.no_turbo checks either. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02	cpufreq: intel_pstate: Read global.no_turbo under READ_ONCE()	Rafael J. Wysocki
	Because global.no_turbo is generally not read under intel_pstate_driver_lock make store_no_turbo() use WRITE_ONCE() for updating it (this is the only place at which it is updated except for the initialization) and make the majority of places reading it use READ_ONCE(). Also remove redundant global.turbo_disabled checks from places that depend on the 'true' value of global.no_turbo because it can only be 'true' if global.turbo_disabled is also 'true'. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02	cpufreq: intel_pstate: Rearrange show_no_turbo() and store_no_turbo()	Rafael J. Wysocki
	Now that global.turbo_disabled can only change at the cpufreq driver registration time, initialize global.no_turbo at that time too so they are in sync to start with (if the former is set, the latter cannot be updated later anyway). That allows show_no_turbo() to be simlified because it does not need to check global.turbo_disabled and store_no_turbo() can be rearranged to avoid doing anything if the new value of global.no_turbo is equal to the current one and only return an error on attempts to clear global.no_turbo when global.turbo_disabled. While at it, eliminate the redundant ret variable from store_no_turbo(). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02	cpufreq: intel_pstate: Do not update global.turbo_disabled after initialization	Rafael J. Wysocki
	The global.turbo_disabled is updated quite often, especially in the passive mode in which case it is updated every time the scheduler calls into the driver. However, this is generally not necessary and it adds MSR read overhead to scheduler code paths (and that particular MSR is slow to read). For this reason, make the driver read MSR_IA32_MISC_ENABLE_TURBO_DISABLE just once at the cpufreq driver registration time and remove all of the in-flight updates of global.turbo_disabled. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02	cpufreq: intel_pstate: Fold intel_pstate_max_within_limits() into caller	Rafael J. Wysocki
	Fold intel_pstate_max_within_limits() into its only caller. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02	cpufreq: intel_pstate: Use __ro_after_init for three variables	Rafael J. Wysocki
	There are at least 3 variables in intel_pstate that do not get updated after they have been initialized, so annotate them with __ro_after_init. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02	cpufreq: intel_pstate: Get rid of unnecessary READ_ONCE() annotations	Rafael J. Wysocki
	Drop two redundant checks involving READ_ONCE() from notify_hwp_interrupt() and make it check hwp_active without READ_ONCE() which is not necessary, because that variable is only set once during the early initialization of the driver. In order to make that clear, annotate hwp_active with __ro_after_init. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02	cpufreq: intel_pstate: Wait for canceled delayed work to complete	Rafael J. Wysocki
	Make intel_pstate_disable_hwp_interrupt() wait for canceled delayed work to complete to avoid leftover work items running when it returns which may be during driver unregistration and may confuse things going forward. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02	cpufreq: intel_pstate: Simplify spinlock locking	Rafael J. Wysocki
	Because intel_pstate_enable/disable_hwp_interrupt() are only called from thread context, they need not save the IRQ flags when using a spinlock as interrupts are guaranteed to be enabled when they run, so make them use spin_lock/unlock_irq(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02	cpufreq: intel_pstate: Drop redundant locking from intel_pstate_driver_cleanup()	Rafael J. Wysocki
	Remove the spinlock locking from intel_pstate_driver_cleanup() as it is not necessary because no other code accessing all_cpu_data[] can run in parallel with that function. Had the locking been necessary, though, it would have been incorrect because the lock in question is acquired from a hardirq handler and it cannot be acquired from thread context without disabling interrupts. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-11	Merge back cpufreq material for 6.9-rc1.	Rafael J. Wysocki

2024-02-24	cpufreq: intel_pstate: Update default EPPs for Meteor Lake	Srinivas Pandruvada
	Update default balanced_performance EPP to 115 and performance EPP to 16. Changing the balanced_performance EPP has better performance/watt compared to default powerup EPP value of 128. Changing the performance EPP to 0x10 shows reduced power for similar performance as EPP 0. On small form factor devices this is beneficial as lower power results in lower CPU and skin temperature. This results in reduced thermal throttling and higher performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-24	cpufreq: intel_pstate: Allow model specific EPPs	Srinivas Pandruvada
	The current implementation allows model specific EPP override for balanced_performance. Add feature to allow model specific EPP for all predefined EPP strings. For example for some CPU models, even changing performance EPP has benefits Use a mask of EPPs as driver_data instead of just balanced_performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-24	cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back	Doug Smythies
	There is a loophole in pstate limit clamping for the intel_cpufreq CPU frequency scaling driver (intel_pstate in passive mode), schedutil CPU frequency scaling governor, HWP (HardWare Pstate) control enabled, when the adjust_perf call back path is used. Fix it. Fixes: a365ab6b9dfb cpufreq: intel_pstate: Implement the ->adjust_perf() callback Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13	cpufreq: intel_pstate: remove cpudata::prev_cummulative_iowait	Jiri Slaby (SUSE)
	Commit 09c448d3c61f ("cpufreq: intel_pstate: Use IOWAIT flag in Atom algorithm") removed the last user of cpudata::prev_cummulative_iowait. Remove the member too. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-22	cpufreq: intel_pstate: Refine computation of P-state for given frequency	Rafael J. Wysocki
	On systems using HWP, if a given frequency is equal to the maximum turbo frequency or the maximum non-turbo frequency, the HWP performance level corresponding to it is already known and can be used directly without any computation. Accordingly, adjust the code to use the known HWP performance levels in the cases mentioned above. This also helps to avoid limiting CPU capacity artificially in some cases when the BIOS produces the HWP_CAP numbers using a different E-core-to-P-core performance scaling factor than expected by the kernel. Fixes: f5c8cf2a4992 ("cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores") Cc: 6.1+ <stable@vger.kernel.org> # 6.1+ Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>