linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2014-08-07	cpufreq: speedstep-smi: fix decimal printf specifiers	Hans Wennborg
	The prefix suggests the number should be printed in hex, so use the %x specifier to do that. Also, these are 32-bit values, so drop the l characters. Signed-off-by: Hans Wennborg <hans@hanshq.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-07	Merge branch 'next' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc new goodies for 3.17. The short story: The biggest bit is Michael removing all of pre-POWER4 processor support from the 64-bit kernel. POWER3 and rs64. This gets rid of a ton of old cruft that has been bitrotting in a long while. It was broken for quite a few versions already and nobody noticed. Nobody uses those machines anymore. While at it, he cleaned up a bunch of old dusty cabinets, getting rid of a skeletton or two. Then, we have some base VFIO support for KVM, which allows assigning of PCI devices to KVM guests, support for large 64-bit BARs on "powernv" platforms, support for HMI (Hardware Management Interrupts) on those same platforms, some sparse-vmemmap improvements (for memory hotplug), There is the usual batch of Freescale embedded updates (summary in the merge commit) and fixes here or there, I think that's it for the highlights" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (102 commits) powerpc/eeh: Export eeh_iommu_group_to_pe() powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API powerpc: Reduce scariness of interrupt frames in stack traces powerpc: start loop at section start of start in vmemmap_populated() powerpc: implement vmemmap_free() powerpc: implement vmemmap_remove_mapping() for BOOK3S powerpc: implement vmemmap_list_free() powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE powerpc/book3s: Fix endianess issue for HMI handling on napping cpus. powerpc/book3s: handle HMIs for cpus in nap mode. powerpc/powernv: Invoke opal call to handle hmi. powerpc/book3s: Add basic infrastructure to handle HMI in Linux. powerpc/iommu: Fix comments with it_page_shift powerpc/powernv: Handle compound PE in config accessors powerpc/powernv: Handle compound PE for EEH powerpc/powernv: Handle compound PE powerpc/powernv: Split ioda_eeh_get_state() powerpc/powernv: Allow to freeze PE powerpc/powernv: Enable M64 aperatus for PHB3 powerpc/eeh: Aux PE data for error log ...
2014-08-07	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus	Linus Torvalds
	Pull MIPS updates from Ralf Baechle: "This is the main pull request for 3.17. It contains: - misc Cavium Octeon, BCM47xx, BCM63xx and Alchemy updates - MIPS ptrace updates and cleanups - various fixes that will also go to -stable - a number of cleanups and small non-critical fixes. - NUMA support for the Loongson 3. - more support for MSA - support for MAAR - various FP enhancements and fixes" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (139 commits) MIPS: jz4740: remove unnecessary null test before debugfs_remove MIPS: Octeon: remove unnecessary null test before debugfs_remove_recursive MIPS: ZBOOT: implement stack protector in compressed boot phase MIPS: mipsreg: remove duplicate MIPS_CONF4_FTLBSETS_SHIFT MIPS: Bonito64: remove a duplicate define MIPS: Malta: initialise MAARs MIPS: Initialise MAARs MIPS: detect presence of MAARs MIPS: define MAAR register accessors & bits MIPS: mark MSA experimental MIPS: Don't build MSA support unless it can be used MIPS: consistently clear MSA flags when starting & copying threads MIPS: 16 byte align MSA vector context MIPS: disable preemption whilst initialising MSA MIPS: ensure MSA gets disabled during boot MIPS: fix read_msa_* & write_msa_* functions on non-MSA toolchains MIPS: fix MSA context for tasks which don't use FP first MIPS: init upper 64b of vector registers when MSA is first used MIPS: save/disable MSA in lose_fpu MIPS: preserve scalar FP CSR when switching vector context ...
2014-08-07	cpufreq: OPP: Avoid sleeping while atomic	Stephen Boyd
	We allocate the cpufreq table after calling rcu_read_lock(), which disables preemption. This causes scheduling while atomic warnings. Use GFP_ATOMIC instead of GFP_KERNEL and update for kcalloc while we're here. BUG: sleeping function called from invalid context at mm/slub.c:1246 in_atomic(): 0, irqs_disabled(): 0, pid: 80, name: modprobe 5 locks held by modprobe/80: #0: (&dev->mutex){......}, at: [<c050d484>] __driver_attach+0x48/0x98 #1: (&dev->mutex){......}, at: [<c050d494>] __driver_attach+0x58/0x98 #2: (subsys mutex#5){+.+.+.}, at: [<c050c114>] subsys_interface_register+0x38/0xc8 #3: (cpufreq_rwsem){.+.+.+}, at: [<c05a9c8c>] __cpufreq_add_dev.isra.22+0x84/0x92c #4: (rcu_read_lock){......}, at: [<c05ab24c>] dev_pm_opp_init_cpufreq_table+0x18/0x10c Preemption disabled at:[< (null)>] (null) CPU: 2 PID: 80 Comm: modprobe Not tainted 3.16.0-rc3-next-20140701-00035-g286857f216aa-dirty #217 [<c0214da8>] (unwind_backtrace) from [<c02123f8>] (show_stack+0x10/0x14) [<c02123f8>] (show_stack) from [<c070141c>] (dump_stack+0x70/0xbc) [<c070141c>] (dump_stack) from [<c02f4cb0>] (__kmalloc+0x124/0x250) [<c02f4cb0>] (__kmalloc) from [<c05ab270>] (dev_pm_opp_init_cpufreq_table+0x3c/0x10c) [<c05ab270>] (dev_pm_opp_init_cpufreq_table) from [<bf000508>] (cpufreq_init+0x48/0x378 [cpufreq_generic]) [<bf000508>] (cpufreq_init [cpufreq_generic]) from [<c05a9e08>] (__cpufreq_add_dev.isra.22+0x200/0x92c) [<c05a9e08>] (__cpufreq_add_dev.isra.22) from [<c050c160>] (subsys_interface_register+0x84/0xc8) [<c050c160>] (subsys_interface_register) from [<c05a9494>] (cpufreq_register_driver+0x108/0x2d8) [<c05a9494>] (cpufreq_register_driver) from [<bf000888>] (generic_cpufreq_probe+0x50/0x74 [cpufreq_generic]) [<bf000888>] (generic_cpufreq_probe [cpufreq_generic]) from [<c050e994>] (platform_drv_probe+0x18/0x48) [<c050e994>] (platform_drv_probe) from [<c050d1f4>] (driver_probe_device+0x128/0x370) [<c050d1f4>] (driver_probe_device) from [<c050d4d0>] (__driver_attach+0x94/0x98) [<c050d4d0>] (__driver_attach) from [<c050b778>] (bus_for_each_dev+0x54/0x88) [<c050b778>] (bus_for_each_dev) from [<c050c894>] (bus_add_driver+0xe8/0x204) [<c050c894>] (bus_add_driver) from [<c050dd48>] (driver_register+0x78/0xf4) [<c050dd48>] (driver_register) from [<c0208870>] (do_one_initcall+0xac/0x1d8) [<c0208870>] (do_one_initcall) from [<c028b6b4>] (load_module+0x190c/0x21e8) [<c028b6b4>] (load_module) from [<c028c034>] (SyS_init_module+0xa4/0x110) [<c028c034>] (SyS_init_module) from [<c020f0c0>] (ret_fast_syscall+0x0/0x48) Fixes: a0dd7b79657b (PM / OPP: Move cpufreq specific OPP functions out of generic OPP library) Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 3.16+ <stable@vger.kernel.org> # 3.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-07	cpufreq: cpu0: Do not print error message when deferring	Markus Pargmann
	-EPROBE_DEFER is no real error. We are just waiting unti the necessary components are ready. The driver core infrastructure will also print an appropriate info message. This patch changes the error message to a debug message. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-07	cpufreq: integrator: Use set_cpus_allowed_ptr	Himangi Saraogi
	Several years ago there was an effort to convert all uses of set_cpus_allowed to use set_cpus_allowed_ptr with the goal of eventually removing the current definition of set_cpus_allowed and renaming set_cpus_allowed_ptr as set_cpus_allowed (https://lkml.org/lkml/2010/3/26/59). This is another step in this direction. The Coccinelle semantic patch that makes this change is as follows: // <smpl> @@ expression E1,E2; @@ - set_cpus_allowed(E1, cpumask_of_cpu(E2)) + set_cpus_allowed_ptr(E1, cpumask_of(E2)) @@ expression E; identifier I; @@ - set_cpus_allowed(E, I) + set_cpus_allowed_ptr(E, &I) // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-05	powerpc/cpufreq: Add pr_warn() on OPAL firmware failures	Vaidyanathan Srinivasan
	Cpufreq depends on platform firmware to implement PStates. In case of platform firmware failure, cpufreq should not panic host kernel with BUG_ON(). Less severe pr_warn() will suffice. Add firmware_has_feature(FW_FEATURE_OPALv3) check to skip probing for device-tree on non-powernv platforms. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-30	MIPS: Loongson: Modify ChipConfig register definition	Huacai Chen
	This patch is prepared for Multi-chip interconnection. Since each chip has a ChipConfig register, LOONGSON_CHIPCFG should be an array. Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: John Crispin <john@phrozen.org> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/7185/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-07-26	Merge tag 's5pv210-dt' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/soc Merge "Samsung S5PV210 DT support for v3.17" from Kukjin Kim: - support common clock framework for s5pv210 clock - add generic PHY driver on s5pv210 to support it via DT - add dt support for s5pv210-goni, smdkc110, smdkv210 and torbreck boards - remove board files from mach-s5pv210 and unused codes - enable multiplatform for s5pv210 * tag 's5pv210-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: clk: samsung: s5pv210: Remove legacy board support ARM: SAMSUNG: Remove remaining legacy code gpio: samsung: Remove legacy support of S5PV210 ARM: S5PV210: Enable multi-platform build support cpufreq: s5pv210: Make the driver multiplatform aware ARM: S5PV210: Register cpufreq platform device ARM: S5PV210: move debug-macro.S into the common space ARM: S5PV210: Untie PM support from legacy code ARM: S5PV210: Remove support for board files ARM: dts: Add Device tree for s5pc110/s5pv210 boards ARM: dts: Add Device tree for s5pv210 SoC ARM: S5PV210: Add board file for boot using Device Tree phy: Add support for S5PV210 to the Exynos USB 2.0 PHY driver clk: samsung: Add S5PV210 Audio Subsystem clock driver ARM: SAMSUNG: Remove legacy clock code serial: samsung: Remove support for legacy clock code cpufreq: s3c24xx: Remove some dead code ARM: S5PV210: Migrate clock handling to Common Clock Framework clk: samsung: Add clock driver for S5PV210 and compatible SoCs Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2014-07-21	cpufreq: move policy kobj to update_policy_cpu()	Viresh Kumar
	We are calling kobject_move() from two separate places currently and both these places share another routine update_policy_cpu() which is handling everything around updating policy->cpu. Moving ownership of policy->kobj also lies under the role of update_policy_cpu() routine and must be handled from there. So, Lets move kobject_move() to update_policy_cpu() and get rid of cpufreq_nominate_new_policy_cpu() as it doesn't have anything significant left. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: propagate error returned by kobject_move()	Viresh Kumar
	We are returning -EINVAL instead of the error returned from kobject_move() when it fails. Propagate the actual error number. Also add a meaningful print when sysfs_create_link() fails. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: don't restore policy->cpus on failure to move kobj	Viresh Kumar
	While hot-unplugging policy->cpu, we call cpufreq_nominate_new_policy_cpu() to nominate next owner of policy, i.e. policy->cpu. If we fail to move policy kobject under the new policy->cpu, we try to update policy->cpus with the old policy->cpu. This would have been required in case old-CPU is removed from policy->cpus in the first place. But its not done before calling cpufreq_nominate_new_policy_cpu(), but during the POST_DEAD notification which happens quite late in the hot-unplugging path. So, this is just some useless code hanging around, get rid of it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	powernow-k6: support 350MHz CPU	Mikulas Patocka
	There exists 350MHz K6-2E+ CPU, so add it to the usual frequency table. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: ondemand: Eliminate the deadband effect	Stratos Karafotis
	Currently, ondemand calculates the target frequency proportional to load using the formula: Target frequency = C * load where C = policy->cpuinfo.max_freq / 100 Though, in many cases, the minimum available frequency is pretty high and the above calculation introduces a dead band from load 0 to 100 * policy->cpuinfo.min_freq / policy->cpuinfo.max_freq where the target frequency is always calculated to less than policy->cpuinfo.min_freq and the minimum frequency is selected. For example: on Intel i7-3770 @ 3.4GHz the policy->cpuinfo.min_freq = 1600000 and the policy->cpuinfo.max_freq = 3400000 (without turbo). Thus, the CPU starts to scale up at a load above 47. On quad core 1500MHz Krait the policy->cpuinfo.min_freq = 384000 and the policy->cpuinfo.max_freq = 1512000. Thus, the CPU starts to scale at load above 25. Change the calculation of target frequency to eliminate the above effect using the formula: Target frequency = A + B * load where A = policy->cpuinfo.min_freq and B = (policy->cpuinfo.max_freq - policy->cpuinfo->min_freq) / 100 This will map load values 0 to 100 linearly to cpuinfo.min_freq to cpuinfo.max_freq. Also, use the CPUFREQ_RELATION_C in __cpufreq_driver_target to select the closest frequency in frequency_table. This is necessary to avoid selection of minimum frequency only when load equals to 0. It will also help for selection of frequencies using a more 'fair' criterion. Tables below show the difference in selected frequency for specific values of load without and with this patch. On Intel i7-3770 @ 3.40GHz: Without With Load Target Selected Target Selected 0 0 1600000 1600000 1600000 5 170050 1600000 1690050 1700000 10 340100 1600000 1780100 1700000 15 510150 1600000 1870150 1900000 20 680200 1600000 1960200 2000000 25 850250 1600000 2050250 2100000 30 1020300 1600000 2140300 2100000 35 1190350 1600000 2230350 2200000 40 1360400 1600000 2320400 2400000 45 1530450 1600000 2410450 2400000 50 1700500 1900000 2500500 2500000 55 1870550 1900000 2590550 2600000 60 2040600 2100000 2680600 2600000 65 2210650 2400000 2770650 2800000 70 2380700 2400000 2860700 2800000 75 2550750 2600000 2950750 3000000 80 2720800 2800000 3040800 3000000 85 2890850 2900000 3130850 3100000 90 3060900 3100000 3220900 3300000 95 3230950 3300000 3310950 3300000 100 3401000 3401000 3401000 3401000 On ARM quad core 1500MHz Krait: Without With Load Target Selected Target Selected 0 0 384000 384000 384000 5 75600 384000 440400 486000 10 151200 384000 496800 486000 15 226800 384000 553200 594000 20 302400 384000 609600 594000 25 378000 384000 666000 702000 30 453600 486000 722400 702000 35 529200 594000 778800 810000 40 604800 702000 835200 810000 45 680400 702000 891600 918000 50 756000 810000 948000 918000 55 831600 918000 1004400 1026000 60 907200 918000 1060800 1026000 65 982800 1026000 1117200 1134000 70 1058400 1134000 1173600 1134000 75 1134000 1134000 1230000 1242000 80 1209600 1242000 1286400 1242000 85 1285200 1350000 1342800 1350000 90 1360800 1458000 1399200 1350000 95 1436400 1458000 1455600 1458000 100 1512000 1512000 1512000 1512000 Tested on Intel i7-3770 CPU @ 3.40GHz and on ARM quad core 1500MHz Krait (Android smartphone). Benchmarks on Intel i7 shows a performance improvement on low and medium work loads with lower power consumption. Specifics: Phoronix Linux Kernel Compilation 3.1: Time: -0.40%, energy: -0.07% Phoronix Apache: Time: -4.98%, energy: -2.35% Phoronix FFMPEG: Time: -6.29%, energy: -4.02% Also, running mp3 decoding (very low load) shows no differences with and without this patch. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: Introduce new relation for freq selection	Stratos Karafotis
	Introduce CPUFREQ_RELATION_C for frequency selection. It selects the frequency with the minimum euclidean distance to target. In case of equal distance between 2 frequencies, it will select the greater frequency. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: imx6: remove pu regulator dependency for SOCs with no PU regulator	Anson Huang
	PU regulator is not a necessary regulator for cpufreq, not all i.MX6 SoCs have PU regulator, only if SOC has PU regulator, then its voltage must be equal to SOC regulator, so remove the dependency to support i.MX6SX which has no PU regulator. Signed-off-by: Anson Huang <b20788@freescale.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Shawn Guo <shawn.guo@freescale.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: intel_pstate: Remove core_pct rounding	Stratos Karafotis
	The specific rounding adds conditionally only 1/256 to fractional part of core_pct. We can safely remove it without any noticeable impact in calculations. Use div64_u64 instead of div_u64 to avoid possible overflow of sample->mperf as divisor Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: intel_pstate: Simplify P state adjustment logic.	Stratos Karafotis
	Simplify the code by removing the inline functions pstate_increase and pstate_decrease and use directly the intel_pstate_set_pstate. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: intel_pstate: Keep values in aperf/mperf in full precision	Stratos Karafotis
	Currently we shift right aperf and mperf variables by FRAC_BITS to prevent overflow when we convert them to fix point numbers (shift left by FRAC_BITS). But this is not necessary, because we actually use delta aperf and mperf which are much less than APERF and MPERF values. So, use the unmodified APERF and MPERF values in calculation. This also adds 8 bits in precision, although the gain is insignificant. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: intel_pstate: Disable interrupts during MSRs reading	Stratos Karafotis
	According to Intel 64 and IA-32 Architectures SDM, Volume 3, Chapter 14.2, "Software needs to exercise care to avoid delays between the two RDMSRs (for example interrupts)". So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF. This should increase the accuracy of the calculations. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: intel_pstate: Align multiple lines to open parenthesis	Stratos Karafotis
	Suppress checkpatch.pl --strict warnings: CHECK: Alignment should match open parenthesis Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: intel_pstate: Remove unnecessary intermediate variable sample_time	Stratos Karafotis
	Remove the unnecessary intermediate assignment and use directly the pid_params.sample_rate_ms variable. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: intel_pstate: Cleanup parentheses	Stratos Karafotis
	Remove unnecessary parentheses. Also, add parentheses in one case for better readability. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: intel_pstate: Fit code in a single line where possible	Stratos Karafotis
	We can fit these lines in a single one, under the 80 characters limit. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: intel_pstate: Add missing blank lines after declarations	Stratos Karafotis
	Also, remove unnecessary blank lines. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: intel_pstate: Remove unnecessary type casting in div_s64() call	Stratos Karafotis
	div_s64() accepts the divisor parameter as s32. Helper div_fp() also accepts divisor as int32_t. So, remove the unnecessary int64_t type casting. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21	cpufreq: intel_pstate: Make intel_pstate_kobject and debugfs_parent locals	Stratos Karafotis
	Since we never remove sysfs entry and debugfs files, we can make the intel_pstate_kobject and debugfs_parent locals. Also, annotate with __init intel_pstate_sysfs_expose_params() and intel_pstate_debug_expose_params() in order to be freed after bootstrap. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-19	cpufreq: s5pv210: Make the driver multiplatform aware	Tomasz Figa
	Signed-off-by: Tomasz Figa <t.figa@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2014-07-19	cpufreq: s3c24xx: Remove some dead code	Tomasz Figa
	There is no use for the .resume_clocks() callback now and in fact all the provided functions are empty, so this patch just removes it in preparation for further patches. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Tomasz Figa <t.figa@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2014-07-17	cpufreq: move policy kobj to policy->cpu at resume	Viresh Kumar
	This is only relevant to implementations with multiple clusters, where clusters have separate clock lines but all CPUs within a cluster share it. Consider a dual cluster platform with 2 cores per cluster. During suspend we start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its kobj as we want to retain permissions/values/etc. Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev(). We will recover the old policy and update policy->cpu from 3 to 2 from update_policy_cpu(). But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a link for CPU2, but would try that for CPU3 while bringing it online. Which will report errors as CPU3 already has kobj assigned to it. This bug got introduced with commit 42f921a, which overlooked this scenario. To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here only for the first CPU of a non-boot cluster. And we can't recover from this situation, if kobject_move() fails. Fixes: 42f921a6f10c (cpufreq: remove sysfs files for CPUs which failed to come back after resume) Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Reported-and-tested-by: Bu Yitian <ybu@qti.qualcomm.com> Reported-by: Saravana Kannan <skannan@codeaurora.org> Reviewed-by: Srivatsa S. Bhat <srivatsa@mit.edu> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16	cpufreq: cpu0: OPPs can be populated at runtime	Viresh Kumar
	OPPs can be populated statically, via DT, or added at run time with dev_pm_opp_add(). While this driver handles the first case correctly, it would fail to populate OPPs added at runtime. Because call to of_init_opp_table() would fail as there are no OPPs in DT and probe will return early. To fix this, remove error checking and call dev_pm_opp_init_cpufreq_table() unconditionally. Update bindings as well. Suggested-by: Stephen Boyd <sboyd@codeaurora.org> Tested-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16	cpufreq: kirkwood: Reinstate cpufreq driver for ARCH_KIRKWOOD	Quentin Armitage
	Commit ff1f0018cf66080d8e6f59791e552615648a033a ("drivers: Enable building of Kirkwood drivers for mach-mvebu") added Kirkwood into mach-mvebu, adding MACH_KIRKWOOD to ARCH_KIRKWOOD in the KConfig files. The change for ARM_KIRKWOOD_CPUFREQ replaced ARCH_KIRKWOOD with MACH_KIRKWOOD, whereas all the other changes were ARCH_KIRKWOOD \|\| MACH_KIRKWOOD. As a consequence of this change, the cpufreq driver is no longer enabled for ARCH_KIRKWOOD. This patch reinstates ARM_KIRKWOOD_CPUFREQ for ARCH_KIRKWOOD. Signed-off-by: Quentin Armitage <quentin@armitage.org.uk> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16	cpufreq: imx6q: Select PM_OPP	Nicolas Del Piano
	PM_OPP is a library used by several of the existing cpufreq drivers. ARM IMX6Q cpufreq driver uses this library for its functionality. Thus, it should be selected in Kconfig. Reported-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Signed-off-by: Nicolas Del Piano <ndel314@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16	cpufreq: sa1110: set memory type for h3600	Linus Walleij
	The Compaq iPAQ h3600 also has the K4S281632b-1H memory type. Verified by prying apart a broken board. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-09	cpufreq: Makefile: fix compilation for davinci platform	Prabhakar Lad
	Since commtit 8a7b1227e303 (cpufreq: davinci: move cpufreq driver to drivers/cpufreq) this added dependancy only for CONFIG_ARCH_DAVINCI_DA850 where as davinci_cpufreq_init() call is used by all davinci platform. This patch fixes following build error: arch/arm/mach-davinci/built-in.o: In function `davinci_init_late': :(.init.text+0x928): undefined reference to `davinci_cpufreq_init' make: *** [vmlinux] Error 1 Fixes: 8a7b1227e303 (cpufreq: davinci: move cpufreq driver to drivers/cpufreq) Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-07	intel_pstate: Set CPU number before accessing MSRs	Vincent Minet
	Ensure that cpu->cpu is set before writing MSR_IA32_PERF_CTL during CPU initialization. Otherwise only cpu0 has its P-state set and all other cores are left with their values unchanged. In most cases, this is not too serious because the P-states will be set correctly when the timer function is run. But when the default governor is set to performance, the per-CPU current_pstate stays the same forever and no attempts are made to write the MSRs again. Signed-off-by: Vincent Minet <vincent@vincent-minet.net> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-07	intel_pstate: don't touch turbo bit if turbo disabled or unavailable.	Dirk Brandewie
	If turbo is disabled in the BIOS bit 38 should be set in MSR_IA32_MISC_ENABLE register per section 14.3.2.1 of the SDM Vol 3 document 325384-050US Feb 2014. If this bit is set do not attempt to disable trubo via the MSR_IA32_PERF_CTL register. On some systems trying to disable turbo via MSR_IA32_PERF_CTL will cause subsequent writes to MSR_IA32_PERF_CTL not take affect, in fact reading MSR_IA32_PERF_CTL will not show the IDA/Turbo DISENGAGE bit(32) as set. A write of bit 32 to zero returns to normal operation. Also deal with the case where the processor does not support turbo and the BIOS does not report the fact in MSR_IA32_MISC_ENABLE but does report the max and turbo P states as the same value. Link: https://bugzilla.kernel.org/show_bug.cgi?id=64251 Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-07	intel_pstate: Fix setting VID	Dirk Brandewie
	Commit 21855ff5 (intel_pstate: Set turbo VID for BayTrail) introduced setting the turbo VID which is required to prevent a machine check on some Baytrail SKUs under heavy graphics based workloads. The docmumentation update that brought the requirement to light also changed the bit mask used for enumerating P state and VID values from 0x7f to 0x3f. This change returns the mask value to 0x7f. Tested with the Intel NUC DN2820FYK, BIOS version FYBYT10H.86A.0034.2014.0513.1413 with v3.16-rc1 and v3.14.8 kernel versions. Fixes: 21855ff5 (intel_pstate: Set turbo VID for BayTrail) Link: https://bugzilla.kernel.org/show_bug.cgi?id=77951 Reported-and-tested-by: Rune Reterson <rune@megahurts.dk> Reported-and-tested-by: Eric Eickmeyer <erich@ericheickmeyer.com> Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-26	of: Migrate of_find_node_by_name() users to for_each_node_by_name()	Grant Likely
	There are a bunch of users open coding the for_each_node_by_name() by calling of_find_node_by_name() directly instead of using the macro. This is getting in the way of some cleanups, and the possibility of removing of_find_node_by_name() entirely. Clean it up so that all the users are consistent. Signed-off-by: Grant Likely <grant.likely@linaro.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Takashi Iwai <tiwai@suse.de>
2014-06-18	cpufreq: unlock when failing cpufreq_update_policy()	Aaron Plattner
	Commit bd0fa9bb455d introduced a failure path to cpufreq_update_policy() if cpufreq_driver->get(cpu) returns NULL. However, it jumps to the 'no_policy' label, which exits without unlocking any of the locks the function acquired earlier. This causes later calls into cpufreq to hang. Fix this by creating a new 'unlock' label and jumping to that instead. Fixes: bd0fa9bb455d ("cpufreq: Return error if ->get() failed in cpufreq_update_policy()") Link: https://devtalk.nvidia.com/default/topic/751903/kernel-3-15-and-nv-drivers-337-340-failed-to-initialize-the-nvidia-kernel-module-gtx-550-ti-/ Signed-off-by: Aaron Plattner <aplattner@nvidia.com> Cc: 3.15+ <stable@vger.kernel.org> # 3.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-17	intel_pstate: Correct rounding in busy calculation	Doug Smythies
	There was a mistake in the actual rounding portion this previous patch: f0fe3cd7e12d (intel_pstate: Correct rounding in busy calculation) such that the rounding was asymetric and incorrect. Severity: Not very serious, but can increase target pstate by one extra value. For real world work flows the issue should self correct (but I have no proof). It is the equivalent of different PID gains for positive and negative numbers. Examples: -3.000000 used to round to -4, rounds to -3 with this patch. -3.503906 used to round to -5, rounds to -4 with this patch. Fixes: f0fe3cd7e12d (intel_pstate: Correct rounding in busy calculation) Signed-off-by: Doug Smythies <dsmythies@telus.net> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-16	cpufreq: cpufreq-cpu0: fix CPU_THERMAL dependency	Arnd Bergmann
	5fbfbcd3e842d ("cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR") was a little too quick in completely removing the dependency on the THERMAL driver. The problem is that while there are inline wrappers to turn the thermal API calls into empty functions, those do not help if the cpu-thermal driver is a loadable module and cpufreq-cpu0 is builtin. Since CONFIG_CPU_THERMAL is a bool option that decides whether the cpu code is built into the thermal module or not, we have to use a dependency on the thermal driver itself. However, if CPU_THERMAL is disabled, we don't need the dependency, hence the strange '!CPU_THERMAL \|\| THERMAL' construct. Fixes: 5fbfbcd3e842d ("cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-12	Merge branch 'pm-cpufreq'	Rafael J. Wysocki
	* pm-cpufreq: cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR cpufreq: tegra: update comment for clarity cpufreq: intel_pstate: Remove duplicate CPU ID check cpufreq: Mark CPU0 driver with CPUFREQ_NEED_INITIAL_FREQ_CHECK flag cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info' cpufreq: governor: Be friendly towards latency-sensitive bursty workloads cpufreq: ppc-corenet-cpu-freq: do_div use quotient Revert "cpufreq: Enable big.LITTLE cpufreq driver on arm64" cpufreq: Tegra: implement intermediate frequency callbacks cpufreq: add support for intermediate (stable) frequencies
2014-06-10	cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR	Viresh Kumar
	cpufreq-cpu0 uses thermal framework to register a cooling device, but doesn't depend on it as there are dummy calls provided by thermal layer when CONFIG_THERMAL=n. And when these calls fail, the driver is still usable. Similar explanation is valid for regulators as well. We do have dummy calls available for regulator APIs and the driver can work even when those calls fail. So, we don't really need to mention thermal and regulators as a dependency for cpufreq-cpu0 in Kconfig as platforms without support for thermal/regulator can also use this driver. Remove this dependency. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-10	cpufreq: tegra: update comment for clarity	Viresh Kumar
	Tegra's driver got updated a bit (00917dd cpufreq: Tegra: implement intermediate frequency callbacks) and implements new 'intermediate freq' infrastructure of core. Above commit updated comments about when to call clk_prepare_enable(pll_x_clk) and Doug wasn't satisfied with those comments and said this: > The "Though when target-freq is intermediate freq, we don't need to > take this reference." makes me think that this function is actually > called when target-freq is intermediate freq. I don't think it is, > right? For better clarity just make that comment more explicit about when we call tegra_target_intermediate(). Reviewed-by: Stephen Warren <swarren@nvidia.com> Reported-and-reviewed-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-10	cpufreq: intel_pstate: Remove duplicate CPU ID check	Stratos Karafotis
	We check the CPU ID during driver init. There is no need to do it again per logical CPU initialization. So, remove the duplicate check. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-09	cpufreq: Mark CPU0 driver with CPUFREQ_NEED_INITIAL_FREQ_CHECK flag	Viresh Kumar
	Sometimes boot loaders set CPU frequency to a value outside of frequency table present with cpufreq core. In such cases CPU might be unstable if it has to run on that frequency for long duration of time and so its better to set it to a frequency which is specified in frequency table. Sachin recently found this problem with cpufreq-cpu0 driver when he was testing it for Exynos. Set this flag for cpufreq-cpu0 driver. Reported-and-tested-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-09	cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info'	Viresh Kumar
	'copy_prev_load' was recently added by commit: 18b46ab (cpufreq: governor: Be friendly towards latency-sensitive bursty workloads). It actually is a bit redundant as we also have 'prev_load' which can store any integer value and can be used instead of 'copy_prev_load' by setting it zero. True load can also turn out to be zero during long idle intervals (and hence the actual value of 'prev_load' and the overloaded value can clash). However this is not a problem because, if the true load was really zero in the previous interval, it makes sense to evaluate the load afresh for the current interval rather than copying the previous load. So, drop 'copy_prev_load' and use 'prev_load' instead. Update comments as well to make it more clear. There is another change here which was probably missed by Srivatsa during the last version of updates he made. The unlikely in the 'if' statement was covering only half of the condition and the whole line should actually come under it. Also checkpatch is made more silent as it was reporting this (--strict option): CHECK: Alignment should match open parenthesis + if (unlikely(wall_time > (2 * sampling_rate) && + j_cdbs->prev_load)) { Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-07	cpufreq: governor: Be friendly towards latency-sensitive bursty workloads	Srivatsa S. Bhat
	Cpufreq governors like the ondemand governor calculate the load on the CPU periodically by employing deferrable timers. A deferrable timer won't fire if the CPU is completely idle (and there are no other timers to be run), in order to avoid unnecessary wakeups and thus save CPU power. However, the load calculation logic is agnostic to all this, and this can lead to the problem described below. Time (ms) CPU 1 100 Task-A running 110 Governor's timer fires, finds load as 100% in the last 10ms interval and increases the CPU frequency. 110.5 Task-A running 120 Governor's timer fires, finds load as 100% in the last 10ms interval and increases the CPU frequency. 125 Task-A went to sleep. With nothing else to do, CPU 1 went completely idle. 200 Task-A woke up and started running again. 200.5 Governor's deferred timer (which was originally programmed to fire at time 130) fires now. It calculates load for the time period 120 to 200.5, and finds the load is almost zero. Hence it decreases the CPU frequency to the minimum. 210 Governor's timer fires, finds load as 100% in the last 10ms interval and increases the CPU frequency. So, after the workload woke up and started running, the frequency was suddenly dropped to absolute minimum, and after that, there was an unnecessary delay of 10ms (sampling period) to increase the CPU frequency back to a reasonable value. And this pattern repeats for every wake-up-from-cpu-idle for that workload. This can be quite undesirable for latency- or response-time sensitive bursty workloads. So we need to fix the governor's logic to detect such wake-up-from- cpu-idle scenarios and start the workload at a reasonably high CPU frequency. One extreme solution would be to fake a load of 100% in such scenarios. But that might lead to undesirable side-effects such as frequency spikes (which might also need voltage changes) especially if the previous frequency happened to be very low. We just want to avoid the stupidity of dropping down the frequency to a minimum and then enduring a needless (and long) delay before ramping it up back again. So, let us simply carry forward the previous load - that is, let us just pretend that the 'load' for the current time-window is the same as the load for the previous window. That way, the frequency and voltage will continue to be set to whatever values they were set at previously. This means that bursty workloads will get a chance to influence the CPU frequency at which they wake up from cpu-idle, based on their past execution history. Thus, they might be able to avoid suffering from slow wakeups and long response-times. However, we should take care not to over-do this. For example, such a "copy previous load" logic will benefit cases like this: (where # represents busy and . represents idle) ##########.........#########.........###########...........##########........ but it will be detrimental in cases like the one shown below, because it will retain the high frequency (copied from the previous interval) even in a mostly idle system: ##########.........#.................#.....................#............... (i.e., the workload finished and the remaining tasks are such that their busy periods are smaller than the sampling interval, which causes the timer to always get deferred. So, this will make the copy-previous-load logic copy the initial high load to subsequent idle periods over and over again, thus keeping the frequency high unnecessarily). So, we modify this copy-previous-load logic such that it is used only once upon every wakeup-from-idle. Thus if we have 2 consecutive idle periods, the previous load won't get blindly copied over; cpufreq will freshly evaluate the load in the second idle interval, thus ensuring that the system comes back to its normal state. [ The right way to solve this whole problem is to teach the CPU frequency governors to also track load on a per-task basis, not just a per-CPU basis, and then use both the data sources intelligently to set the appropriate frequency on the CPUs. But that involves redesigning the cpufreq subsystem, so this patch should make the situation bearable until then. ] Experimental results: +-------------------+ I ran a modified version of ebizzy (called 'sleeping-ebizzy') that sleeps in between its execution such that its total utilization can be a user-defined value, say 10% or 20% (higher the utilization specified, lesser the amount of sleeps injected). This ebizzy was run with a single-thread, tied to CPU 8. Behavior observed with tracing (sample taken from 40% utilization runs): ------------------------------------------------------------------------ Without patch: ~~~~~~~~~~~~~~ kworker/8:2-12137 416.335742: cpu_frequency: state=2061000 cpu_id=8 kworker/8:2-12137 416.335744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40753 416.345741: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-12137 416.345744: cpu_frequency: state=4123000 cpu_id=8 kworker/8:2-12137 416.345746: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40753 416.355738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 <snip> --------------------------------------------------------------------- <snip> <...>-40753 416.402202: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8 <idle>-0 416.502130: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy <...>-40753 416.505738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-12137 416.505739: cpu_frequency: state=2061000 cpu_id=8 kworker/8:2-12137 416.505741: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40753 416.515739: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-12137 416.515742: cpu_frequency: state=4123000 cpu_id=8 kworker/8:2-12137 416.515744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy Observation: Ebizzy went idle at 416.402202, and started running again at 416.502130. But cpufreq noticed the long idle period, and dropped the frequency at 416.505739, only to increase it back again at 416.515742, realizing that the workload is in-fact CPU bound. Thus ebizzy needlessly ran at the lowest frequency for almost 13 milliseconds (almost 1 full sample period), and this pattern repeats on every sleep-wakeup. This could hurt latency-sensitive workloads quite a lot. With patch: ~~~~~~~~~~~ kworker/8:2-29802 464.832535: cpu_frequency: state=2061000 cpu_id=8 <snip> --------------------------------------------------------------------- <snip> kworker/8:2-29802 464.962538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40738 464.972533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-29802 464.972536: cpu_frequency: state=4123000 cpu_id=8 kworker/8:2-29802 464.972538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40738 464.982531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 <snip> --------------------------------------------------------------------- <snip> kworker/8:2-29802 465.022533: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40738 465.032531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-29802 465.032532: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40738 465.035797: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8 <idle>-0 465.240178: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy <...>-40738 465.242533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-29802 465.242535: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40738 465.252531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 Observation: Ebizzy went idle at 465.035797, and started running again at 465.240178. Since ebizzy was the only real workload running on this CPU, cpufreq retained the frequency at 4.1Ghz throughout the run of ebizzy, no matter how many times ebizzy slept and woke-up in-between. Thus, ebizzy got the 10ms worth of 4.1 Ghz benefit during every sleep-wakeup (as compared to the run without the patch) and this boost gave a modest improvement in total throughput, as shown below. Sleeping-ebizzy records-per-second: ----------------------------------- Utilization Without patch With patch Difference (Absolute and % values) 10% 274767 277046 + 2279 (+0.829%) 20% 543429 553484 + 10055 (+1.850%) 40% 1090744 1107959 + 17215 (+1.578%) 60% 1634908 1662018 + 27110 (+1.658%) A rudimentary and somewhat approximately latency-sensitive workload such as sleeping-ebizzy itself showed a consistent, noticeable performance improvement with this patch. Hence, workloads that are truly latency-sensitive will benefit quite a bit from this change. Moreover, this is an overall win-win since this patch does not hurt power-savings at all (because, this patch does not reduce the idle time or idle residency; and the high frequency of the CPU when it goes to cpu-idle does not affect/hurt the power-savings of deep idle states). Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-06	cpufreq: ppc-corenet-cpu-freq: do_div use quotient	Ed Swarthout
	Commit 6712d2931933 (cpufreq: ppc-corenet-cpufreq: Fix __udivdi3 modpost error) used the remainder from do_div instead of the quotient. Fix that and add one to ensure minimum is met. Fixes: 6712d2931933 (cpufreq: ppc-corenet-cpufreq: Fix __udivdi3 modpost error) Signed-off-by: Ed Swarthout <Ed.Swarthout@freescale.com> Cc: 3.15+ <stable@vger.kernel.org> # 3.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>