summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-03-21ALSA: hda - Fix missing ELD update at unpluggingTakashi Iwai
i915 get_eld ops may return an error when no encoder is connected, and currently we regard the error as fatal and skip the whole ELD handling. This ended up with the missing ELD update at unplugging. This patch fixes the issue by treating the error as the unplugged state, instead of skipping the rest. Reported-by: Libin Yang <libin.yang@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.5 Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-03-21drivers/xen: make platform-pci.c explicitly non-modularPaul Gortmaker
The Kconfig currently controlling compilation of this code is: arch/x86/xen/Kconfig:config XEN_PVHVM arch/x86/xen/Kconfig: def_bool y ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. Also note that MODULE_DEVICE_TABLE is a no-op for non-modular code. We also delete the MODULE_LICENSE tag etc. since all that information was (or is now) contained at the top of the file in the comments. In removing "module" from the init fcn name, we observe a namespace collision with the probe function, so we use "probe" in the name of the probe function, and "init" in the registration fcn, as per standard convention, as suggested by Stefano. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-03-21drivers/xen: make sys-hypervisor.c explicitly non-modularPaul Gortmaker
The Kconfig currently controlling compilation of this code is: config XEN_SYS_HYPERVISOR bool "Create xen entries under /sys/hypervisor" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. However one could argue that fs_initcall() might make more sense here. This change means that the one line function xen_properties_destroy() has only one user left, and since that is inside an #ifdef, we just manually inline it there vs. adding more ifdeffery around the function to avoid compile warnings about "defined but not used". In order to be consistent we also manually inline the other _destroy functions that are also just one line sysfs functions calls with only one call site remaing, even though they wouldn't need #ifdeffery. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-03-21drivers/xen: make xenbus_dev_[front/back]end explicitly non-modularPaul Gortmaker
The Makefile / Kconfig currently controlling compilation here is: obj-y += xenbus_dev_frontend.o [...] obj-$(CONFIG_XEN_BACKEND) += xenbus_dev_backend.o ...with: drivers/xen/Kconfig:config XEN_BACKEND drivers/xen/Kconfig: bool "Backend driver support" ...meaning that they currently are not being built as modules by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We also delete the MODULE_LICENSE tag since all that information is already contained at the top of the file in the comments. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-03-21drivers/xen: make [xen-]ballon explicitly non-modularPaul Gortmaker
The Makefile / Kconfig currently controlling compilation here is: obj-y += grant-table.o features.o balloon.o manage.o preempt.o time.o [...] obj-$(CONFIG_XEN_BALLOON) += xen-balloon.o ...with: drivers/xen/Kconfig:config XEN_BALLOON drivers/xen/Kconfig: bool "Xen memory balloon driver" ...meaning that they currently are not being built as modules by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. In doing so we uncover two implict includes that were obtained by module.h having such a wide include scope itself: In file included from drivers/xen/xen-balloon.c:41:0: include/xen/balloon.h:26:51: warning: ‘struct page’ declared inside parameter list [enabled by default] int alloc_xenballooned_pages(int nr_pages, struct page **pages); ^ include/xen/balloon.h: In function ‘register_xen_selfballooning’: include/xen/balloon.h:35:10: error: ‘ENOSYS’ undeclared (first use in this function) return -ENOSYS; ^ This is fixed by adding mm-types.h and errno.h to the list. We also delete the MODULE_LICENSE tags since all that information is already contained at the top of the file in the comments. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-03-21xen: audit usages of module.h ; remove unnecessary instancesPaul Gortmaker
Code that uses no modular facilities whatsoever should not be sourcing module.h at all, since that header drags in a bunch of other headers with it. Similarly, code that is not explicitly using modular facilities like module_init() but only is declaring module_param setup variables should be using moduleparam.h and not the larger module.h file for that. In making this change, we also uncover an implicit use of BUG() in inline fcns within arch/arm/include/asm/xen/hypercall.h so we explicitly source <linux/bug.h> for that file now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-03-21mailbox: Introduce TI message manager driverNishanth Menon
Support for TI Message Manager Module. This hardware block manages a bunch of hardware queues meant for communication between processor entities. Clients sitting on top of this would manage the required protocol for communicating with the counterpart entities. For more details on TI Message Manager hardware block, see documentation that will is available here: http://www.ti.com/lit/ug/spruhy8/spruhy8.pdf Chapter 8.1(Message Manager) Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-03-21Documentation: dt: mailbox: Add TI Message ManagerNishanth Menon
Message Manager is a hardware block used to communicate with various processor systems within certain Texas Instrument's Keystone generation SoCs. This hardware engine is used to transfer messages from various compute entities(or processors) within the SoC. It is designed to be self contained without needing software initialization for operation. Signed-off-by: Nishanth Menon <nm@ti.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-03-21btrfs: make sure we stay inside the bvec during __btrfs_lookup_bio_sumsChris Mason
Commit c40a3d38aff4e1c (Btrfs: Compute and look up csums based on sectorsized blocks) changes around how we walk the bios while looking up crcs. There's an inner loop that is jumping to the next bvec based on sectors and before it derefs the next bvec, it needs to make sure we're still in the bio. In this case, the outer loop would have decided to stop moving forward too, and the bvec deref is never actually used for anything. But CONFIG_DEBUG_PAGEALLOC catches it because we're outside our bio. Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: David Sterba <dsterba@suse.com>
2016-03-21Merge branches 'arm/rockchip', 'arm/exynos', 'arm/smmu', 'arm/mediatek', ↵Joerg Roedel
'arm/io-pgtable', 'arm/renesas' and 'core' into next
2016-03-21kvm: arm64: Disable compiler instrumentation for hypervisor codeCatalin Marinas
With the recent rewrite of the arm64 KVM hypervisor code in C, enabling certain options like KASAN would allow the compiler to generate memory accesses or function calls to addresses not mapped at EL2. This patch disables the compiler instrumentation on the arm64 hypervisor code for gcov-based profiling (GCOV_KERNEL), undefined behaviour sanity checker (UBSAN) and kernel address sanitizer (KASAN). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: <stable@vger.kernel.org> # 4.5+ Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-21PCI: Restore inclusion of pci/hotplug KconfigTero Roponen
Commit e7e127e3c767 ("PCI: Include pci/hotplug Kconfig directly from pci/Kconfig") added one line to pci/Kconfig. However, for some mysterious reason it isn't there now, even though there are no traces of removing it in the git log. I detected this issue when 'make oldconfig' removed all the options that depended on HOTPLUG_PCI. [bhelgaas: I botched the cfeb8139a1fb ("Merge branch 'pci/host-hv' into next") merge. "git diff cfeb8139a1fb^ cfeb8139a1fb" shows a conflict in drivers/pci/Kconfig, and I mistakenly dropped the hotplug/Kconfig piece.] Signed-off-by: Tero Roponen <tero.roponen@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-21drivers/perf: arm_pmu: avoid NULL dereference when not using devicetreeWill Deacon
Commit c6b90653f1f7 ("drivers/perf: arm_pmu: make info messages more verbose") breaks booting on systems where the PMU is probed without devicetree (e.g by inspecting the MIDR of the current CPU). In this case, pdev->dev.of_node is NULL and we shouldn't try to access its ->fullname field when printing probe error messages. This patch fixes the probing code to use of_node_full_name, which safely handles NULL nodes and removes the "Error %i" part of the string, since it's not terribly useful. Reported-by: Guenter Roeck <private@roeck-us.net> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-21irqchip/mbigen: Handle multiple device nodes in a mbigen moduleMaJun
Each mbigen device is represented as a independent platform device. If the devices belong to the same mbigen hardware module, then the register space for these devices is the same. That leads to a resource conflict. The solution for this is to represent the mbigen module as a platform device and make the mbigen devices subdevices of that. The register space is associated to the mbigen module and therefor the resource conflict is avoided. [ tglx: Massaged changelog, cleaned up the code and removed the silly printk ] Signed-off-by: Ma Jun <majun258@huawei.com> Cc: mark.rutland@arm.com Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: Catalin.Marinas@arm.com Cc: guohanjun@huawei.com Cc: Will.Deacon@arm.com Cc: huxinwei@huawei.com Cc: lizefan@huawei.com Cc: dingtianhong@huawei.com Cc: zhaojunhua@hisilicon.com Cc: liguozhu@hisilicon.com Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1458203641-17172-3-git-send-email-majun258@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-21irqchip/mbigen: Adjust DT bindings to handle multiple devices in a moduleMaJun
A mbigen hardware module can contain more than one device node. These device nodes contain the same register definition. mbigen_dev1:intc_dev1 { ... reg = <0x0 0xc0080000 0x0 0x10000>; ... }; mbigen_dev2:intc_dev2 { ... reg = <0x0 0xc0080000 0x0 0x10000>; ... }; In this case both devices try to request the same resource resulting in a resource conflict. To address this problem the devices need to be subnodes of the mbigen hardware module, which then contains the unique register space. [ tglx: Massaged changelog ] Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ma Jun <majun258@huawei.com> Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: Catalin.Marinas@arm.com Cc: guohanjun@huawei.com Cc: Will.Deacon@arm.com Cc: huxinwei@huawei.com Cc: lizefan@huawei.com Cc: dingtianhong@huawei.com Cc: zhaojunhua@hisilicon.com Cc: liguozhu@hisilicon.com Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20160203111602.GA1234@leverpostej Link: http://lkml.kernel.org/r/1458203641-17172-2-git-send-email-majun258@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-21perf/x86/intel/rapl: Add missing Broadwell modelsSrinivas Pandruvada
Added Broadwell-H and Broadwell-Server. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1458517938-25308-1-git-send-email-srinivas.pandruvada@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/intel/uncore: Remove ev_sel_ext bit support for PCUKan Liang
The ev_sel_ext in PCU_MSR_PMON_CTL is locked on some CPU models, so despite it being documented in the SDM, if we write 1 to that bit then we can get a #GP fault. Which #GP the perf fuzzer happily triggered in Peter Zijlstra's testing. Also, there are no public events which use that bit, so remove ev_sel_ext bit support for PCU. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1458500301-3594-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21sched/cpuacct: Simplify the cpuacct codeZhao Lei
- Use for() instead of while() loop in some functions to make the code simpler. - Use this_cpu_ptr() instead of per_cpu_ptr() to make the code cleaner and a bit faster. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tejun Heo <htejun@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/d8a7ef9592f55224630cb26dea239f05b6398a4e.1458187654.git.zhaolei@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21sched/cpuacct: Rename parameter in cpuusage_write() for readabilityDongsheng Yang
The name of the 'reset' parameter to cpuusage_write() is quite confusing, because the only valid value we allow is '0', so !reset is actually the case that resets ... Rename it to 'val' and explain it in a comment that we only allow 0. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: cgroups@vger.kernel.org Cc: tj@kernel.org Link: http://lkml.kernel.org/r/1450696483-2864-1-git-send-email-yangds.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21sched/fair: Add comments to explain select_idle_sibling()Matt Fleming
It's not entirely obvious how the main loop in select_idle_sibling() works on first glance. Sprinkle a few comments to explain the design and intention behind the loop based on some conversations with Mike and Peter. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.com> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1457535548-15329-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21bitops: Do not default to __clear_bit() for __clear_bit_unlock()Peter Zijlstra
__clear_bit_unlock() is a special little snowflake. While it carries the non-atomic '__' prefix, it is specifically documented to pair with test_and_set_bit() and therefore should be 'somewhat' atomic. Therefore the generic implementation of __clear_bit_unlock() cannot use the fully non-atomic __clear_bit() as a default. If an arch is able to do better; is must provide an implementation of __clear_bit_unlock() itself. Specifically, this came up as a result of hackbench livelock'ing in slab_lock() on ARC with SMP + SLUB + !LLSC. The issue was incorrect pairing of atomic ops. slab_lock() -> bit_spin_lock() -> test_and_set_bit() slab_unlock() -> __bit_spin_unlock() -> __clear_bit() The non serializing __clear_bit() was getting "lost" 80543b8e: ld_s r2,[r13,0] <--- (A) Finds PG_locked is set 80543b90: or r3,r2,1 <--- (B) other core unlocks right here 80543b94: st_s r3,[r13,0] <--- (C) sets PG_locked (overwrites unlock) Fixes ARC STAR 9000817404 (and probably more). Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Tested-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Noam Camus <noamc@ezchip.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160309114054.GJ6356@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21sched/fair: Fix fairness issue on migrationPeter Zijlstra
Pavan reported that in the presence of very light tasks (or cgroups) the placement of migrated tasks can cause severe fairness issues. The problem is that enqueue_entity() places the task before it updates time, thereby it can place the task far in the past (remember that light tasks will shoot virtual time forward at a high speed, so in relation to the pre-existing light task, we can land far in the past). This is done because update_curr() needs the current task, and we might be placing the current task. The obvious solution is to differentiate between the current and any other task; placing the current before we update time, and placing any other task after, such that !curr tasks end up at the current moment in time, and not in the past. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Tested-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: byungchul.park@lge.com Link: http://lkml.kernel.org/r/20160309120403.GK6344@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21sched/cgroup: Fix/cleanup cgroup teardown/initPeter Zijlstra
The CPU controller hasn't kept up with the various changes in the whole cgroup initialization / destruction sequence, and commit: 2e91fa7f6d45 ("cgroup: keep zombies associated with their original cgroups") caused it to explode. The reason for this is that zombies do not inhibit css_offline() from being called, but do stall css_released(). Now we tear down the cfs_rq structures on css_offline() but zombies can run after that, leading to use-after-free issues. The solution is to move the tear-down to css_released(), which guarantees nobody (including no zombies) is still using our cgroup. Furthermore, a few simple cleanups are possible too. There doesn't appear to be any point to us using css_online() (anymore?) so fold that in css_alloc(). And since cgroup code guarantees an RCU grace period between css_released() and css_free() we can forgo using call_rcu() and free the stuff immediately. Suggested-by: Tejun Heo <tj@kernel.org> Reported-by: Kazuki Yamaguchi <k@rhe.jp> Reported-by: Niklas Cassel <niklas.cassel@axis.com> Tested-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 2e91fa7f6d45 ("cgroup: keep zombies associated with their original cgroups") Link: http://lkml.kernel.org/r/20160316152245.GY6344@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21Merge branch 'linus' into sched/urgent, to pick up dependenciesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21arm64: KVM: Turn kvm_ksym_ref into a NOP on VHEMarc Zyngier
When running with VHE, there is no need to translate kernel pointers to the EL2 memory space, since we're already there (and we have a much saner memory map to start with). Unfortunately, kvm_ksym_ref is getting in the way, and the first call into the "hypervisor" section is going to end up in fireworks, since we're now branching into nowhereland. Meh. A potential solution is to test if VHE is engaged or not, and only perform the translation in the negative case. With this in place, VHE is able to run again. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-21KVM: arm/arm64: disable preemption when calling smp_call_function_manyEric Auger
Preemption must be disabled when calling smp_call_function_many Reported-by: bartosz.wawrzyniak@tieto.com Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-21perf/x86/amd/power: Add AMD accumulated power reporting mechanismHuang Rui
Introduce an AMD accumlated power reporting mechanism for the Family 15h, Model 60h processor that can be used to calculate the average power consumed by a processor during a measurement interval. The feature support is indicated by CPUID Fn8000_0007_EDX[12]. This feature will be implemented both in hwmon and perf. The current design provides one event to report per package/processor power consumption by counting each compute unit power value. Here the gory details of how the computation is done: * Tsample: compute unit power accumulator sample period * Tref: the PTSC counter period (PTSC: performance timestamp counter) * N: the ratio of compute unit power accumulator sample period to the PTSC period * Jmax: max compute unit accumulated power which is indicated by MSR_C001007b[MaxCpuSwPwrAcc] * Jx/Jy: compute unit accumulated power which is indicated by MSR_C001007a[CpuSwPwrAcc] * Tx/Ty: the value of performance timestamp counter which is indicated by CU_PTSC MSR_C0010280[PTSC] * PwrCPUave: CPU average power i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007. N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]]. ii. Read the full range of the cumulative energy value from the new MSR MaxCpuSwPwrAcc. Jmax = value returned. iii. At time x, software reads CpuSwPwrAcc and samples the PTSC. Jx = value read from CpuSwPwrAcc and Tx = value read from PTSC. iv. At time y, software reads CpuSwPwrAcc and samples the PTSC. Jy = value read from CpuSwPwrAcc and Ty = value read from PTSC. v. Calculate the average power consumption for a compute unit over time period (y-x). Unit of result is uWatt: if (Jy < Jx) // Rollover has occurred Jdelta = (Jy + Jmax) - Jx else Jdelta = Jy - Jx PwrCPUave = N * Jdelta * 1000 / (Ty - Tx) Simple example: root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' make -j4 CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CHK include/generated/timeconst.h CHK include/generated/bounds.h CHK include/generated/asm-offsets.h CALL scripts/checksyscalls.sh CHK include/generated/compile.h SKIPPED include/generated/compile.h Building modules, stage 2. Kernel: arch/x86/boot/bzImage is ready (#40) MODPOST 4225 modules Performance counter stats for 'system wide': 183.44 mWatts power/power-pkg/ 341.837270111 seconds time elapsed root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' sleep 10 Performance counter stats for 'system wide': 0.18 mWatts power/power-pkg/ 10.012551815 seconds time elapsed Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Ingo Molnar <mingo@kernel.org> Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jacob.w.shin@gmail.com Link: http://lkml.kernel.org/r/1457502306-2559-1-git-send-email-ray.huang@amd.com [ Fixed the modular build. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21x86/cpufeature, perf/x86: Add AMD Accumulated Power Mechanism feature flagHuang Rui
AMD CPU family 15h model 0x60 introduces a mechanism for measuring accumulated power. It is used to report the processor power consumption and support for it is indicated by CPUID Fn8000_0007_EDX[12]. Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wan Zongshun <Vincent.Wan@amd.com> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/1452739808-11871-4-git-send-email-ray.huang@amd.com [ Resolved conflict and moved the synthetic CPUID slot to 19. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/core: Document some hotplug bitsPeter Zijlstra
Document some of the hotplug notifier usage. Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/amd: Add support for new IOMMU performance eventsSuravee Suthikulpanit
This patch adds new IOMMU performance event based on the information in table 74 of the AMD I/O Virtualization Technology (IOMMU) Specification (Document Id: 4882, Rev 2.62, Feb 2015) Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Joerg Roedel <jroedel@suse.de> Acked-by: Joerg Roedel <jroedel@suse.de> Cc: <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://support.amd.com/TechDocs/48882_IOMMU.pdf Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/amd: Move nodes_per_socket into bsp_init_amd()Huang Rui
nodes_per_socket is static and it needn't be initialized many times during every CPU core init. So move its initialization into bsp_init_amd(). Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/1452739808-11871-2-git-send-email-ray.huang@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/cqm: Factor out some common codePeter Zijlstra
Having the same code twice (and once quite ugly) is fragile. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/mbm: Add support for MBM counter overflow handlingVikas Shivappa
This patch adds a per package timer which periodically updates the memory bandwidth counters for the events that are currently active. Current patch has a periodic timer every 1s since the SDM guarantees that the counter will not overflow in 1s but this time can be definitely improved by calibrating on the system. The overflow is really a function of the max memory b/w that the socket can support, max counter value and scaling factor. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/013b756c5006b1c4ca411f3ecf43ed52f19fbf87.1457723885.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/mbm: Implement RMID recyclingVikas Shivappa
RMID could be allocated or deallocated as part of RMID recycling. When an RMID is allocated for MBM event, the MBM counter needs to be initialized because next time we read the counter we need the previous value to account for total bytes that went to the memory controller. Similarly, when RMID is deallocated we need to update the ->count variable. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1457652732-4499-6-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/mbm: Add memory bandwidth monitoring event managementTony Luck
Includes all the core infrastructure to measure the total_bytes and bandwidth. We have per socket counters for both total system wide L3 external bytes and local socket memory-controller bytes. The OS does MSR writes to MSR_IA32_QM_EVTSEL and MSR_IA32_QM_CTR to read the counters and uses the IA32_PQR_ASSOC_MSR to associate the RMID with the task. The tasks have a common RMID for CQM (cache quality of service monitoring) and MBM. Hence most of the scheduling code is reused from CQM. Signed-off-by: Tony Luck <tony.luck@intel.com> [ Restructured rmid_read to not have an obvious hole, removed MBM_CNTR_MAX as its unused. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/abd7aac9a18d93b95b985b931cf258df0164746d.1457723885.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and initVikas Shivappa
The MBM init patch enumerates the Intel MBM (Memory b/w monitoring) and initializes the perf events and datastructures for monitoring the memory b/w. Its based on original patch series by Tony Luck and Kanaka Juvva. Memory bandwidth monitoring (MBM) provides OS/VMM a way to monitor bandwidth from one level of cache to another. The current patches support L3 external bandwidth monitoring. It supports both 'local bandwidth' and 'total bandwidth' monitoring for the socket. Local bandwidth measures the amount of data sent through the memory controller on the socket and total b/w measures the total system bandwidth. Extending the cache quality of service monitoring (CQM) we add two more events to the perf infrastructure: intel_cqm_llc/local_bytes - bytes sent through local socket memory controller intel_cqm_llc/total_bytes - total L3 external bytes sent The tasks are associated with a Resouce Monitoring ID (RMID) just like in CQM and OS uses a MSR write to indicate the RMID of the task during scheduling. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1457652732-4499-4-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/cqm: Fix CQM memory leak and notifier leakVikas Shivappa
Fixes the hotcpu notifier leak and other global variable memory leaks during CQM (cache quality of service monitoring) initialization. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1457652732-4499-3-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/cqm: Fix CQM handling of grouping events into a cache_groupVikas Shivappa
Currently CQM (cache quality of service monitoring) is grouping all events belonging to same PID to use one RMID. However its not counting all of these different events. Hence we end up with a count of zero for all events other than the group leader. The patch tries to address the issue by keeping a flag in the perf_event.hw which has other CQM related fields. The field is updated at event creation and during grouping. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> [peterz: Changed hw_perf_event::is_group_event to an int] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1457652732-4499-2-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/core: Fix Undefined behaviour in rb_alloc()Peter Zijlstra
Sasha reported: [ 3494.030114] UBSAN: Undefined behaviour in kernel/events/ring_buffer.c:685:22 [ 3494.030647] shift exponent -1 is negative Andrey spotted that this is because: It happens if nr_pages = 0: rb->page_order = ilog2(nr_pages); Fix it by making both assignments conditional on nr_pages; since otherwise they should both be 0 anyway, and will be because of the kzalloc() used to allocate the structure. Reported-by: Sasha Levin <sasha.levin@oracle.com> Reported-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20160129141751.GA407@worktop Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/BTS: Fix RCU usagePeter Zijlstra
This splat reminds us: [ 8166.045595] [ INFO: suspicious RCU usage. ] [ 8166.168972] [<ffffffff81127837>] lockdep_rcu_suspicious+0xe7/0x120 [ 8166.175966] [<ffffffff811e0bae>] perf_callchain+0x23e/0x250 [ 8166.182280] [<ffffffff811dda3d>] perf_prepare_sample+0x27d/0x350 [ 8166.189082] [<ffffffff8100f503>] intel_pmu_drain_bts_buffer+0x133/0x200 ... that as the core code does, one should hold rcu_read_lock() over that entire BTS event-output generation sequence as well. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/core: Fix dynamic interrupt throttlePeter Zijlstra
There were two problems with the dynamic interrupt throttle mechanism, both triggered by the same action. When you (or perf_fuzzer) write a huge value into /proc/sys/kernel/perf_event_max_sample_rate the computed perf_sample_allowed_ns becomes 0. This effectively disables the whole dynamic throttle. This is fixed by ensuring update_perf_cpu_limits() never sets the value to 0. However, we allow disabling of the dynamic throttle by writing 100 to /proc/sys/kernel/perf_cpu_time_max_percent. This will generate a warning in dmesg. The second problem is that by setting the max_sample_rate to a huge number, the adaptive process can take a few tries, since it halfs the limit each time. Change that to directly compute a new value based on the observed duration. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/ibs: Add IBS interrupt to the dynamic throttlePeter Zijlstra
Interrupt throttling is normally only done against sysctl_perf_event_sample_rate. This means that if that number is too high (for whatever reason) you can lock up your machine. We have, however, a dynamic throttling scheme too, but for that to work, we need to add a callback to the interrupt handler, IBS did not have this, so add it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/ibs: Fix race with IBS_STARTING statePeter Zijlstra
While tracing the IBS bits I saw the NMI hitting between clearing IBS_STARTING and the actual MSR writes to disable the counter. Since IBS_STARTING was cleared, the handler assumed these were spurious NMIs and because STOPPING wasn't set yet either, insta-triggered an "Unknown NMI". Cure this by clearing IBS_STARTING after disabling the hardware. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/ibs: Fix IBS throttlePeter Zijlstra
When the IBS IRQ handler get a !0 return from perf_event_overflow; meaning it should throttle the event, it only disables it, it doesn't call perf_ibs_stop(). This confuses the state machine, as we'll use pmu::start() -> perf_ibs_start() to unthrottle. Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: dvyukov@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Link: http://lkml.kernel.org/r/20160311142346.GE6344@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/core: Fix the unthrottle logicPeter Zijlstra
Its possible to IOC_PERIOD while the event is throttled, this would re-start the event and the next tick would then try to unthrottle it, and find the event wasn't actually stopped anymore. This would tickle a WARN in the x86-pmu code which isn't expecting to start a !stopped event. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: dvyukov@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160310143924.GR6356@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21drm/amdgpu: release_pages requires linux/pagemap.hStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Christian König <christian.koenig@amd.com. Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-20Merge branch 'mm-pkeys-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 protection key support from Ingo Molnar: "This tree adds support for a new memory protection hardware feature that is available in upcoming Intel CPUs: 'protection keys' (pkeys). There's a background article at LWN.net: https://lwn.net/Articles/643797/ The gist is that protection keys allow the encoding of user-controllable permission masks in the pte. So instead of having a fixed protection mask in the pte (which needs a system call to change and works on a per page basis), the user can map a (handful of) protection mask variants and can change the masks runtime relatively cheaply, without having to change every single page in the affected virtual memory range. This allows the dynamic switching of the protection bits of large amounts of virtual memory, via user-space instructions. It also allows more precise control of MMU permission bits: for example the executable bit is separate from the read bit (see more about that below). This tree adds the MM infrastructure and low level x86 glue needed for that, plus it adds a high level API to make use of protection keys - if a user-space application calls: mmap(..., PROT_EXEC); or mprotect(ptr, sz, PROT_EXEC); (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice this special case, and will set a special protection key on this memory range. It also sets the appropriate bits in the Protection Keys User Rights (PKRU) register so that the memory becomes unreadable and unwritable. So using protection keys the kernel is able to implement 'true' PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies PROT_READ as well. Unreadable executable mappings have security advantages: they cannot be read via information leaks to figure out ASLR details, nor can they be scanned for ROP gadgets - and they cannot be used by exploits for data purposes either. We know about no user-space code that relies on pure PROT_EXEC mappings today, but binary loaders could start making use of this new feature to map binaries and libraries in a more secure fashion. There is other pending pkeys work that offers more high level system call APIs to manage protection keys - but those are not part of this pull request. Right now there's a Kconfig that controls this feature (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled (like most x86 CPU feature enablement code that has no runtime overhead), but it's not user-configurable at the moment. If there's any serious problem with this then we can make it configurable and/or flip the default" * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) x86/mm/pkeys: Fix mismerge of protection keys CPUID bits mm/pkeys: Fix siginfo ABI breakage caused by new u64 field x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA mm/core, x86/mm/pkeys: Add execute-only protection keys support x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags x86/mm/pkeys: Allow kernel to modify user pkey rights register x86/fpu: Allow setting of XSAVE state x86/mm: Factor out LDT init from context init mm/core, x86/mm/pkeys: Add arch_validate_pkey() mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits() x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU x86/mm/pkeys: Add Kconfig prompt to existing config option x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps x86/mm/pkeys: Dump PKRU with other kernel registers mm/core, x86/mm/pkeys: Differentiate instruction fetches x86/mm/pkeys: Optimize fault handling in access_error() mm/core: Do not enforce PKEY permissions on remote mm access um, pkeys: Add UML arch_*_access_permitted() methods mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys x86/mm/gup: Simplify get_user_pages() PTE bit handling ...
2016-03-20Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes are: - Use separate EFI page tables when executing EFI firmware code. This isolates the EFI context from the rest of the kernel, which has security and general robustness advantages. (Matt Fleming) - Run regular UEFI firmware with interrupts enabled. This is already the status quo under other OSs. (Ard Biesheuvel) - Various x86 EFI enhancements, such as the use of non-executable attributes for EFI memory mappings. (Sai Praneeth Prakhya) - Various arm64 UEFI enhancements. (Ard Biesheuvel) - ... various fixes and cleanups. The separate EFI page tables feature got delayed twice already, because it's an intrusive change and we didn't feel confident about it - third time's the charm we hope!" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) x86/mm/pat: Fix boot crash when 1GB pages are not supported by the CPU x86/efi: Only map kernel text for EFI mixed mode x86/efi: Map EFI_MEMORY_{XP,RO} memory region bits to EFI page tables x86/mm/pat: Don't implicitly allow _PAGE_RW in kernel_map_pages_in_pgd() efi/arm*: Perform hardware compatibility check efi/arm64: Check for h/w support before booting a >4 KB granular kernel efi/arm: Check for LPAE support before booting a LPAE kernel efi/arm-init: Use read-only early mappings efi/efistub: Prevent __init annotations from being used arm64/vmlinux.lds.S: Handle .init.rodata.xxx and .init.bss sections efi/arm64: Drop __init annotation from handle_kernel_image() x86/mm/pat: Use _PAGE_GLOBAL bit for EFI page table mappings efi/runtime-wrappers: Run UEFI Runtime Services with interrupts enabled efi: Reformat GUID tables to follow the format in UEFI spec efi: Add Persistent Memory type name efi: Add NV memory attribute x86/efi: Show actual ending addresses in efi_print_memmap x86/efi/bgrt: Don't ignore the BGRT if the 'valid' bit is 0 efivars: Use to_efivar_entry efi: Runtime-wrapper: Get rid of the rtc_lock spinlock ...
2016-03-20Merge branch 'core-objtool-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull 'objtool' stack frame validation from Ingo Molnar: "This tree adds a new kernel build-time object file validation feature (ONFIG_STACK_VALIDATION=y): kernel stack frame correctness validation. It was written by and is maintained by Josh Poimboeuf. The motivation: there's a category of hard to find kernel bugs, most of them in assembly code (but also occasionally in C code), that degrades the quality of kernel stack dumps/backtraces. These bugs are hard to detect at the source code level. Such bugs result in incorrect/incomplete backtraces most of time - but can also in some rare cases result in crashes or other undefined behavior. The build time correctness checking is done via the new 'objtool' user-space utility that was written for this purpose and which is hosted in the kernel repository in tools/objtool/. The tool's (very simple) UI and source code design is shaped after Git and perf and shares quite a bit of infrastructure with tools/perf (which tooling infrastructure sharing effort got merged via perf and is already upstream). Objtool follows the well-known kernel coding style. Objtool does not try to check .c or .S files, it instead analyzes the resulting .o generated machine code from first principles: it decodes the instruction stream and interprets it. (Right now objtool supports the x86-64 architecture.) From tools/objtool/Documentation/stack-validation.txt: "The kernel CONFIG_STACK_VALIDATION option enables a host tool named objtool which runs at compile time. It has a "check" subcommand which analyzes every .o file and ensures the validity of its stack metadata. It enforces a set of rules on asm code and C inline assembly code so that stack traces can be reliable. Currently it only checks frame pointer usage, but there are plans to add CFI validation for C files and CFI generation for asm files. For each function, it recursively follows all possible code paths and validates the correct frame pointer state at each instruction. It also follows code paths involving special sections, like .altinstructions, __jump_table, and __ex_table, which can add alternative execution paths to a given instruction (or set of instructions). Similarly, it knows how to follow switch statements, for which gcc sometimes uses jump tables." When this new kernel option is enabled (it's disabled by default), the tool, if it finds any suspicious assembly code pattern, outputs warnings in compiler warning format: warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch warning: objtool: cik_tiling_mode_table_init()+0x6ce: call without frame pointer save/setup warning: objtool:__schedule()+0x3c0: duplicate frame pointer save warning: objtool:__schedule()+0x3fd: sibling call from callable instruction with changed frame pointer ... so that scripts that pick up compiler warnings will notice them. All known warnings triggered by the tool are fixed by the tree, most of the commits in fact prepare the kernel to be warning-free. Most of them are bugfixes or cleanups that stand on their own, but there are also some annotations of 'special' stack frames for justified cases such entries to JIT-ed code (BPF) or really special boot time code. There are two other long-term motivations behind this tool as well: - To improve the quality and reliability of kernel stack frames, so that they can be used for optimized live patching. - To create independent infrastructure to check the correctness of CFI stack frames at build time. CFI debuginfo is notoriously unreliable and we cannot use it in the kernel as-is without extra checking done both on the kernel side and on the build side. The quality of kernel stack frames matters to debuggability as well, so IMO we can merge this without having to consider the live patching or CFI debuginfo angle" * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) objtool: Only print one warning per function objtool: Add several performance improvements tools: Copy hashtable.h into tools directory objtool: Fix false positive warnings for functions with multiple switch statements objtool: Rename some variables and functions objtool: Remove superflous INIT_LIST_HEAD objtool: Add helper macros for traversing instructions objtool: Fix false positive warnings related to sibling calls objtool: Compile with debugging symbols objtool: Detect infinite recursion objtool: Prevent infinite recursion in noreturn detection objtool: Detect and warn if libelf is missing and don't break the build tools: Support relative directory path for 'O=' objtool: Support CROSS_COMPILE x86/asm/decoder: Use explicitly signed chars objtool: Enable stack metadata validation on 64-bit x86 objtool: Add CONFIG_STACK_VALIDATION option objtool: Add tool to perform compile-time stack metadata validation x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard sched: Always inline context_switch() ...
2016-03-20Merge tag 'armsoc-drivers' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver updates from Arnd Bergmann: "Driver updates for ARM SoCs, these contain various things that touch the drivers/ directory but got merged through arm-soc for practical reasons: - Rockchip rk3368 gains power domain support - Small updates for the ARM spmi driver - The Atmel PMC driver saw a larger rework, touching both arch/arm/mach-at91 and drivers/clk/at91 - All reset controller driver changes alway get merged through arm-soc, though this time the largest change is the addition of a MIPS pistachio reset driver - One bugfix for the NXP (formerly Freescale) i.MX weim bus driver" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (43 commits) bus: imx-weim: Take the 'status' property value into account clk: at91: remove useless includes clk: at91: pmc: remove useless capacities handling clk: at91: pmc: drop at91_pmc_base usb: gadget: atmel: access the PMC using regmap ARM: at91: remove useless includes and function prototypes ARM: at91: pm: move idle functions to pm.c ARM: at91: pm: find and remap the pmc ARM: at91: pm: simply call at91_pm_init clk: at91: pmc: move pmc structures to C file clk: at91: pmc: merge at91_pmc_init in atmel_pmc_probe clk: at91: remove IRQ handling and use polling clk: at91: make use of syscon/regmap internally clk: at91: make use of syscon to share PMC registers in several drivers hwmon: (scpi) add energy meter support firmware: arm_scpi: add support for 64-bit sensor values firmware: arm_scpi: decrease Tx timeout to 20ms firmware: arm_scpi: fix send_message and sensor_get_value for big-endian reset: sti: Make reset_control_ops const reset: zynq: Make reset_control_ops const ...