summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-05-05perf/arm: Special-case hetereogeneous CPUsMark Rutland
Commit: 26657848502b7847 ("perf/core: Verify we have a single perf_hw_context PMU") forcefully prevents multiple PMUs from sharing perf_hw_context, as this generally doesn't make sense. It is a common bug for uncore PMUs to use perf_hw_context rather than perf_invalid_context, which this detects. However, systems exist with heterogeneous CPUs (and hence heterogeneous HW PMUs), for which sharing perf_hw_context is necessary, and possible in some limited cases. To make this work we have to perform some gymnastics, as we did in these commits: 66eb579e66ecfea5 ("perf: allow for PMU-specific event filtering") c904e32a69b7c779 ("arm: perf: filter unschedulable events") To allow those systems to work, we must allow PMUs for heterogeneous CPUs to share perf_hw_context, though we must still disallow sharing otherwise to detect the common misuse of perf_hw_context. This patch adds a new PERF_PMU_CAP_HETEROGENEOUS_CPUS for this, updates the core logic to account for this, and makes use of it in the arm_pmu code that is used for systems with heterogeneous CPUs. Comments are added to make the rationale clear and hopefully avoid accidental abuse. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20160426103346.GA20836@leverpostej Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05perf/core: Let userspace know if the PMU supports address filtersAlexander Shishkin
Export an additional common attribute for PMUs that support address range filtering to let the perf userspace identify such PMUs in a uniform way. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1461771888-10409-8-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05perf/x86/intel/pt: Add support for address range filtering in PTAlexander Shishkin
Newer versions of Intel PT support address ranges, which can be used to define IP address range-based filters or TraceSTOP regions. Number of ranges in enumerated via cpuid. This patch implements PMU callbacks and related low-level code to allow filter validation, configuration and programming into the hardware. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1461771888-10409-7-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05perf/core: Introduce address range filteringAlexander Shishkin
Many instruction tracing PMUs out there support address range-based filtering, which would, for example, generate trace data only for a given range of instruction addresses, which is useful for tracing individual functions, modules or libraries. Other PMUs may also utilize this functionality to allow filtering to or filtering out code at certain address ranges. This patch introduces the interface for userspace to specify these filters and for the PMU drivers to apply these filters to hardware configuration. The user interface is an ASCII string that is passed via an ioctl() and specifies (in the form of an ASCII string) address ranges within certain object files or within kernel. There is no special treatment for kernel modules yet, but it might be a worthy pursuit. The PMU driver interface basically adds two extra callbacks to the PMU driver structure, one of which validates the filter configuration proposed by the user against what the hardware is actually capable of doing and the other one translates hardware-independent filter configuration into something that can be programmed into the hardware. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1461771888-10409-6-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05perf/core: Extend perf_event_aux_ctx() to optionally iterate through more eventsAlexander Shishkin
Trace filtering code needs an iterator that can go through all events in a context, including inactive and filtered, to be able to update their filters' address ranges based on mmap or exec events. This patch changes perf_event_aux_ctx() to optionally do this. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1461771888-10409-5-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05perf/x86/intel/pt: Add IP filtering register/CPUID bitsAlexander Shishkin
New versions of Intel PT support address range-based filtering. Add the new registers, bit definitions and relevant CPUID bits. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1461771888-10409-4-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05perf/x86/intel/pt: Move PT specific MSR bit definitions to a private headerAlexander Shishkin
Nothing outside of the Intel PT driver should ever care about its MSR bits, so there is no reason to keep them in msr-index.h. This patch moves them to a pt-local header. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1461771888-10409-3-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05perf/core: Move set_filter() out of CONFIG_EVENT_TRACINGAlexander Shishkin
For instruction trace filtering, namely, for communicating filter definitions from userspace, I'd like to re-use the SET_FILTER code that the tracepoints are using currently. To that end, move the relevant code out from behind the CONFIG_EVENT_TRACING dependency. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1461771888-10409-2-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05perf/x86/amd/iommu: Do not register a task ctx for uncore like PMUsPeter Zijlstra
The new sanity check introduced by: 26657848502b ("perf/core: Verify we have a single perf_hw_context PMU") ... triggered on the AMD IOMMU driver. IOMMUs are not per logical CPU, they cannot have per-task counters. Fix it. Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jroedel@suse.de Cc: suravee.suthikulpanit@amd.com Link: http://lkml.kernel.org/r/20160423224255.GB3430@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05locking/atomics: Flip atomic_fetch_or() argumentsPeter Zijlstra
All the atomic operations have their arguments the wrong way around; make atomic_fetch_or() consistent and flip them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05locking/pvqspinlock: Robustify init_qspinlock_stat()Davidlohr Bueso
Specifically around the debugfs file creation calls, I have no idea if they could ever possibly fail, but this is core code (debug aside) so lets at least check the return value and inform anything fishy. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <Waiman.Long@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160420041725.GC3472@linux-uzut.site Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05locking/pvqspinlock: Avoid double resetting of statsDavidlohr Bueso
... remove the redundant second iteration, this is most likely a copy/past buglet. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: waiman.long@hpe.com Link: http://lkml.kernel.org/r/1460961103-24953-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05Merge tag 'v4.6-rc6' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05Merge branch 'x86/urgent' into x86/platform, to resolve conflictIngo Molnar
Conflicts: arch/x86/kernel/apic/x2apic_uv_x.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_initAlex Thorlton
A while back the following commit: d394f2d9d8e1 ("x86/platform/UV: Remove EFI memmap quirk for UV2+") changed uv_system_init() to only call map_low_mmrs() on older UV1 hardware, which requires EFI_OLD_MEMMAP to be set in order to boot. The recent changes to the EFI memory mapping code in: d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping") exposed some issues with the fact that we were relying on the EFI memory mapping mechanisms to map in our MMRs for us, after commit d394f2d9d8e1. Rather than revert the entire commit and go back to forcing EFI_OLD_MEMMAP on all UVs, we're going to add the call to map_low_mmrs() back into uv_system_init(), and then fix up our EFI runtime calls to use the appropriate page table. For now, UV2+ will still need efi=old_map to boot, but there will be other changes soon that should eliminate the need for this. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Russ Anderson <rja@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/1462401592-120735-1-git-send-email-athorlton@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05sched/fair: Fix comment in calculate_imbalance()Dietmar Eggemann
The comment in calculate_imbalance() was introduced in commit: 2dd73a4f09be ("[PATCH] sched: implement smpnice") which described the logic as it was then, but a later commit: b18855500fc4 ("sched/balancing: Fix 'local->avg_load > sds->avg_load' case in calculate_imbalance()") .. complicated this logic some more so that the comment does not match anymore. Update the comment to match the code. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1461958364-675-3-git-send-email-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05sched/fair: Remove stale power aware scheduling commentsDietmar Eggemann
Commit 8e7fbcbc22c1 ("sched: Remove stale power aware scheduling remnants and dysfunctional knobs") deleted the power aware scheduling support. This patch gets rid of the remaining power aware scheduling related comments in the code as well. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1461958364-675-2-git-send-email-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05sched/fair: Update rq clock before updating nohz CPU loadMatt Fleming
If we're accessing rq_clock() (e.g. in sched_avg_update()) we should update the rq clock before calling cpu_load_update(), otherwise any time calculations will be stale. All other paths currently call update_rq_clock(). Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1462304814-11715-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05sched/debug: Print out idle balance values even on !CONFIG_SCHEDSTATS kernelsWanpeng Li
The max_idle_balance_cost and avg_idle values which are tracked and ar used to capture short idle incidents, are not associated with schedstats, however the information of these two values isn't printed out on !CONFIG_SCHEDSTATS kernels. Fix this by moving the value printout out of the CONFIG_SCHEDSTATS section. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1462250305-4523-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05sched/fair: Optimize sum computation with a lookup tableYuyang Du
__compute_runnable_contrib() uses a loop to compute sum, whereas a table lookup can do it faster in a constant amount of time. The program to generate the constants is located at: Documentation/scheduler/sched-avg.txt Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Morten Rasmussen <morten.rasmussen@arm.com> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@arm.com Cc: pjt@google.com Link: http://lkml.kernel.org/r/1462226078-31904-2-git-send-email-yuyang.du@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05sched/fair: Add detailed description to the sched load avg metricsYuyang Du
These sched metrics have become complex enough, so describe them in detail at their definition. Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Fixed the text to improve its spelling and typography. ] Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1459829551-21625-4-git-send-email-yuyang.du@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05sched/fair: Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT and remove ↵Yuyang Du
SCHED_LOAD_SCALE After cleaning up the sched metrics, there are two definitions that are ambiguous and confusing: SCHED_LOAD_SHIFT and SCHED_LOAD_SHIFT. Resolve this: - Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT, which better reflects what it is. - Replace SCHED_LOAD_SCALE use with SCHED_CAPACITY_SCALE and remove SCHED_LOAD_SCALE. Suggested-by: Ben Segall <bsegall@google.com> Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1459829551-21625-3-git-send-email-yuyang.du@intel.com [ Rewrote the changelog and fixed the build on 32-bit kernels. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05perf/x86: Add model numbers for Kabylake CPUsAndi Kleen
Everything the same as Skylake, just new model numbers. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461977748-17616-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05sched/fair: Generalize the load/util averages resolution definitionYuyang Du
Integer metric needs fixed point arithmetic. In sched/fair, a few metrics, e.g., weight, load, load_avg, util_avg, freq, and capacity, may have different fixed point ranges, which makes their update and usage error-prone. In order to avoid the errors relating to the fixed point range, we definie a basic fixed point range, and then formalize all metrics to base on the basic range. The basic range is 1024 or (1 << 10). Further, one can recursively apply the basic range to have larger range. Pointed out by Ben Segall, weight (visible to user, e.g., NICE-0 has 1024) and load (e.g., NICE_0_LOAD) have independent ranges, but they must be well calibrated. Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1459829551-21625-2-git-send-email-yuyang.du@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05sched/core: Enable increased load resolution on 64-bit kernelsPeter Zijlstra
Mike ran into the low load resolution limitation on his big machine. So reenable these bits; nobody could ever reproduce/analyze the reported power usage claim and Google has been running with this for years as well. Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05locking/lockdep, sched/core: Implement a better lock pinning schemePeter Zijlstra
The problem with the existing lock pinning is that each pin is of value 1; this mean you can simply unpin if you know its pinned, without having any extra information. This scheme generates a random (16 bit) cookie for each pin and requires this same cookie to unpin. This means you have to keep the cookie in context. No objsize difference for !LOCKDEP kernels. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05sched/core: Introduce 'struct rq_flags'Peter Zijlstra
In order to be able to pass around more than just the IRQ flags in the future, add a rq_flags structure. No difference in code generation for the x86_64-defconfig build I tested. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05sched/core: Move task_rq_lock() out of linePeter Zijlstra
Its a rather large function, inline doesn't seems to make much sense: $ size defconfig-build/kernel/sched/core.o{.orig,} text data bss dec hex filename 56533 21037 2320 79890 13812 defconfig-build/kernel/sched/core.o.orig 55733 21037 2320 79090 134f2 defconfig-build/kernel/sched/core.o The 'perf bench sched messaging' micro-benchmark shows a visible improvement of 4-5%: $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done $ perf stat --null --repeat 25 -- perf bench sched messaging -g 40 -l 5000 pre: 4.582798193 seconds time elapsed ( +- 1.41% ) 4.733374877 seconds time elapsed ( +- 2.10% ) 4.560955136 seconds time elapsed ( +- 1.43% ) 4.631062303 seconds time elapsed ( +- 1.40% ) post: 4.364765213 seconds time elapsed ( +- 0.91% ) 4.454442734 seconds time elapsed ( +- 1.18% ) 4.448893817 seconds time elapsed ( +- 1.41% ) 4.424346872 seconds time elapsed ( +- 0.97% ) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05Merge branch 'sched/urgent' into sched/core, to pick up fixes before ↵Ingo Molnar
applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05x86/entry/32: Remove asmlinkage_protect()Brian Gerst
Now that syscalls are called from C code, which copies the args to new stack slots instead of overlaying pt_regs, asmlinkage_protect() is no longer needed. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1462416278-11974-4-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05x86/entry/32: Remove GET_THREAD_INFO() from entry codeBrian Gerst
The entry code used to cache the thread_info pointer in the EBP register, but all the code that used it has been moved to C. Remove the unused code to get the pointer. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1462416278-11974-3-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05x86/entry, sched/x86: Don't save/restore EFLAGS on task switchBrian Gerst
Now that NT is filtered by the SYSENTER entry code, it is safe to skip saving and restoring flags on task switch. Also remove a leftover reset of flags on 64-bit fork. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1462416278-11974-2-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05crypto: testmgr - Add a flag allowing the self-tests to be disabled at runtime.Richard W.M. Jones
Running self-tests for a short-lived KVM VM takes 28ms on my laptop. This commit adds a flag 'cryptomgr.notests' which allows them to be disabled. However if fips=1 as well, we ignore this flag as FIPS mode mandates that the self-tests are run. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-05Merge tag 'v4.6-rc6' into x86/asm, to refresh the treeIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05crypto: rsa - select crypto mgr dependencyTadeusz Struk
The pkcs1pad template needs CRYPTO_MANAGER so it needs to be explicitly selected by CRYPTO_RSA. Reported-by: Jamie Heilman <jamie@audible.transient.net> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-05crypto: hash - Fix page length clamping in hash walkHerbert Xu
The crypto hash walk code is broken when supplied with an offset greater than or equal to PAGE_SIZE. This patch fixes it by adjusting walk->pg and walk->offset when this happens. Cc: <stable@vger.kernel.org> Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-04MAINTAINERS: Cleanup Intel Wired LAN maintainers listJeff Kirsher
With the recent "retirements" and other changes, make the maintainers list a lot less confusing and a bit more straight forward. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Shannon Nelson <sln@onemain.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04tcp: two more missing bh disableEric Dumazet
percpu_counter only have protection against preemption. TCP stack uses them possibly from BH, so we need BH protection in contexts that could be run in process context Fixes: c10d9310edf5 ("tcp: do not assume TCP code is non preemptible") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-05Merge tag 'drm-intel-fixes-2016-05-02' of ↵Dave Airlie
git://anongit.freedesktop.org/drm-intel into drm-fixes i915 fixes for 4.6. A bit more than I'd like at this stage, but OTOH they're all stable material. * tag 'drm-intel-fixes-2016-05-02' of git://anongit.freedesktop.org/drm-intel: drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW drm/i915: Fake HDMI live status drm/i915: Fix eDP low vswing for Broadwell drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume drm/i915: Fix system resume if PCI device remained enabled drm/i915: Avoid stalling on pending flips for legacy cursor updates
2016-05-05Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes two fixes for hw lockups and one for a double free * 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: make sure vertical front porch is at least 1 drm/radeon: make sure vertical front porch is at least 1 drm/amdgpu: set metadata pointer to NULL after freeing.
2016-05-05gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloadingPhilipp Zabel
If of_node is set before calling platform_device_add, the driver core will try to use of: modalias matching, which fails because the device tree nodes don't have a compatible property set. This patch fixes imx-ipuv3-crtc module autoloading by setting the of_node property only after the platform modalias is set. Fixes: 304e6be652e2 ("gpu: ipu-v3: Assign of_node of child platform devices to corresponding ports") Reported-by: Dennis Gilmore <dennis@ausil.us> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-By: Dennis Gilmore <dennis@ausil.us> Cc: stable@vger.kernel.org # 4.4+ Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-05cpufreq: intel_pstate: Ignore _PPC processing under HWPSrinivas Pandruvada
When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC is not used. So ignore processing for _PPC limits. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05PM / OPP: Remove useless checkViresh Kumar
Regulators are optional for devices using OPPs and the OPP core shouldn't be printing any errors for such missing regulators. It was fine before the commit 0c717d0f9cb4, but that failed to update this part of the code to remove an 'always true' check and an extra unwanted print message. Fix that now. Fixes: 0c717d0f9cb4 (PM / OPP: Initialize regulator pointer to an error value) Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_tableSudeep Holla
Currently when performing random CPU hot-plugs and suspend-to-ram(S2R) on systems using arm_big_little cpufreq driver, we get warnings similar to something like below: cpu cpu1: _opp_add: duplicate OPPs detected. Existing: freq: 600000000, volt: 800000, enabled: 1. New: freq: 600000000, volt: 800000, enabled: 1 This is mainly because the OPPs for the shared cpus are not set. We can just use dev_pm_opp_of_cpumask_add_table in case the OPPs are obtained from DT(arm_big_little_dt.c) or use dev_pm_opp_set_sharing_cpus if the OPPs are obtained by other means like firmware(e.g. scpi-cpufreq.c) Also now that the generic dev_pm_opp{,_of}_cpumask_remove_table can handle removal of opp table and entries for all associated CPUs, we can re-use dev_pm_opp{,_of}_cpumask_remove_table as free_opp_table in cpufreq_arm_bL_ops. This patch makes necessary changes to reuse the generic OPP functions for {init,free}_opp_table and thereby eliminating the warnings. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05Merge branch 'pm-opp' into pm-cpufreqRafael J. Wysocki
2016-05-05PM / OPP: add non-OF versions of dev_pm_opp_{cpumask_, }remove_tableSudeep Holla
Functions dev_pm_opp_of_{cpumask_,}remove_table removes/frees all the static OPP entries associated with the device and/or all cpus(in case of cpumask) that are created from DT. However the OPP entries are populated reading from the firmware or some different method using dev_pm_opp_add are marked dynamic and can't be removed using above functions. This patch adds non DT/OF versions of dev_pm_opp_{cpumask_,}remove_table to support the above mentioned usecase. This is in preparation to make use of the same in scpi-cpufreq.c Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05cpufreq: tango: Use generic platdev driverMarc Gonzalez
Add tango4 compatible string to the list. Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05PM / OPP: pass cpumask by referenceArnd Bergmann
The new use of dev_pm_opp_set_sharing_cpus resulted in a harmless compiler warning with CONFIG_CPUMASK_OFFSTACK=y: drivers/cpufreq/mvebu-cpufreq.c: In function 'armada_xp_pmsu_cpufreq_init': include/linux/cpumask.h:550:25: error: passing argument 2 of 'dev_pm_opp_set_sharing_cpus' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] The problem here is that cpumask_var_t gets passed by reference, but by declaring a 'const cpumask_var_t' argument, only the pointer is constant, not the actual mask. This is harmless because the function does not actually modify the mask. This patch changes the function prototypes for all of the related functions to pass a 'struct cpumask *' instead of 'cpumask_var_t', matching what most other such functions do in the kernel. This lets us mark all the other similar functions as taking a 'const' mask where possible, and it avoids the warning without any change in object code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 947bd567f7a5 (mvebu: Use dev_pm_opp_set_sharing_cpus() to mark OPP tables as shared) Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05cpufreq: Fix GOV_LIMITS handling for the userspace governorSai Gurrappadi
Currently, the userspace governor only updates frequency on GOV_LIMITS if policy->cur falls outside policy->{min/max}. However, it is also necessary to update current frequency on GOV_LIMITS to match the user requested value if it can be achieved within the new policy->{max/min}. This was previously the behaviour in the governor until commit d1922f0 ("cpufreq: Simplify userspace governor") which incorrectly assumed that policy->cur == user requested frequency via scaling_setspeed. This won't be true if the user requested frequency falls outside policy->{min/max}. Ex: a temporary thermal cap throttled the user requested frequency. Fix this by storing the user requested frequency in a seperate variable. The governor will then try to achieve this request on every GOV_LIMITS change. Fixes: d1922f02562f (cpufreq: Simplify userspace governor) Signed-off-by: Sai Gurrappadi <sgurrappadi@nvidia.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>