summaryrefslogtreecommitdiff
path: root/arch/x86/events/intel/core.c
AgeCommit message (Collapse)Author
2019-04-16perf/x86: Fix incorrect PEBS_REGSKan Liang
PEBS_REGS used as mask for the supported registers for large PEBS. However, the mask cannot filter the sample_regs_user/sample_regs_intr correctly. (1ULL << PERF_REG_X86_*) should be used to replace PERF_REG_X86_*, which is only the index. Rename PEBS_REGS to PEBS_GP_REGS, because the mask is only for general purpose registers. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Fixes: 2fe1bc1f501d ("perf/x86: Enable free running PEBS for REGS_USER/INTR") Link: https://lkml.kernel.org/r/20190402194509.2832-2-kan.liang@linux.intel.com [ Renamed it to PEBS_GP_REGS - as 'GPRS' is used elsewhere ;-) ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03perf/x86/intel: Initialize TFA MSRPeter Zijlstra
Stephane reported that the TFA MSR is not initialized by the kernel, but the TFA bit could set by firmware or as a leftover from a kexec, which makes the state inconsistent. Reported-by: Stephane Eranian <eranian@google.com> Tested-by: Nelson DSouza <nelson.dsouza@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: tonyj@suse.com Link: https://lkml.kernel.org/r/20190321123849.GN6521@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBSStephane Eranian
When an event is programmed with attr.wakeup_events=N (N>0), it means the caller is interested in getting a user level notification after N samples have been recorded in the kernel sampling buffer. With precise events on Intel processors, the kernel uses PEBS. The kernel tries minimize sampling overhead by verifying if the event configuration is compatible with multi-entry PEBS mode. If so, the kernel is notified only when the buffer has reached its threshold. Other PEBS operates in single-entry mode, the kenrel is notified for each PEBS sample. The problem is that the current implementation look at frequency mode and event sample_type but ignores the wakeup_events field. Thus, it may not be possible to receive a notification after each precise event. This patch fixes this problem by disabling multi-entry PEBS if wakeup_events is non-zero. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: https://lkml.kernel.org/r/20190306195048.189514-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-03-17perf/x86/intel: Make dev_attr_allow_tsx_force_abort statickbuild test robot
Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort") Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: kbuild-all@01.org Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190313184243.GA10820@lkp-sb-ep06
2019-03-15perf/x86/intel: Fix memory corruptionPeter Zijlstra
Through: validate_event() x86_pmu.get_event_constraints(.idx=-1) tfa_get_event_constraints() dyn_constraint() cpuc->constraint_list[-1] is used, which is an obvious out-of-bound access. In this case, simply skip the TFA constraint code, there is no event constraint with just PMC3, therefore the code will never result in the empty set. Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort") Reported-by: Tony Jones <tonyj@suse.com> Reported-by: "DSouza, Nelson" <nelson.dsouza@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tony Jones <tonyj@suse.com> Tested-by: "DSouza, Nelson" <nelson.dsouza@intel.com> Cc: eranian@google.com Cc: jolsa@redhat.com Cc: stable@kernel.org Link: https://lkml.kernel.org/r/20190314130705.441549378@infradead.org
2019-03-12Merge branch 'x86-tsx-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tsx fixes from Thomas Gleixner: "This update provides kernel side handling for the TSX erratum of Intel Skylake (and later) CPUs. On these CPUs Intel Transactional Synchronization Extensions (TSX) functions can result in unpredictable system behavior under certain circumstances. The issue is mitigated with an microcode update which utilizes Performance Monitoring Counter (PMC) 3 when TSX functions are in use. This mitigation is enabled unconditionally by the updated microcode. As a consequence the usage of TSX functions can cause corrupted performance monitoring results for events which utilize PMC3. The corruption is silent on kernels which have no update for this issue. This update makes the kernel aware of the PMC3 utilization by the microcode: The microcode offers a possibility to enforce TSX abort which prevents the malfunction and frees up PMC3. The enforced TSX abort requires the TSX using application to have a software fallback path implemented; abort handlers which solely retry the transaction will fail over and over. The enforced TSX abort request is issued by the kernel when: - enforced TSX abort is enabled (PMU attribute) - A performance monitoring request needs PMC3 When PMC3 is not longer used by the kernel the TSX force abort request is cleared. The enforced TSX abort mechanism is enabled by default and can be controlled by the administrator via the new PMU attribute 'allow_tsx_force_abort'. This attribute is only visible when updated microcode is detected on affected systems. Writing '0' disables the enforced TSX abort mechanism, '1' enables it. As a result of disabling the enforced TSX abort mechanism, PMC3 is permanentely unavailable for performance monitoring which can cause performance monitoring requests to fail or switch to multiplexing mode" * branch 'x86-tsx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Implement support for TSX Force Abort x86: Add TSX Force Abort CPUID/MSR perf/x86/intel: Generalize dynamic constraint creation perf/x86/intel: Make cpuc allocations consistent
2019-03-07Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Various cleanups and simplifications, none of them really stands out, they are all over the place" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uaccess: Remove unused __addr_ok() macro x86/smpboot: Remove unused phys_id variable x86/mm/dump_pagetables: Remove the unused prev_pud variable x86/fpu: Move init_xstate_size() to __init section x86/cpu_entry_area: Move percpu_setup_debug_store() to __init section x86/mtrr: Remove unused variable x86/boot/compressed/64: Explain paging_prepare()'s return value x86/resctrl: Remove duplicate MSR_MISC_FEATURE_CONTROL definition x86/asm/suspend: Drop ENTRY from local data x86/hw_breakpoints, kprobes: Remove kprobes ifdeffery x86/boot: Save several bytes in decompressor x86/trap: Remove useless declaration x86/mm/tlb: Remove unused cpu variable x86/events: Mark expected switch-case fall-throughs x86/asm-prototypes: Remove duplicate include <asm/page.h> x86/kernel: Mark expected switch-case fall-throughs x86/insn-eval: Mark expected switch-case fall-through x86/platform/UV: Replace kmalloc() and memset() with k[cz]alloc() calls x86/e820: Replace kmalloc() + memcpy() with kmemdup()
2019-03-06perf/x86/intel: Implement support for TSX Force AbortPeter Zijlstra (Intel)
Skylake (and later) will receive a microcode update to address a TSX errata. This microcode will, on execution of a TSX instruction (speculative or not) use (clobber) PMC3. This update will also provide a new MSR to change this behaviour along with a CPUID bit to enumerate the presence of this new MSR. When the MSR gets set; the microcode will no longer use PMC3 but will Force Abort every TSX transaction (upon executing COMMIT). When TSX Force Abort (TFA) is allowed (default); the MSR gets set when PMC3 gets scheduled and cleared when, after scheduling, PMC3 is unused. When TFA is not allowed; clear PMC3 from all constraints such that it will not get used. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-03-06perf/x86/intel: Generalize dynamic constraint creationPeter Zijlstra (Intel)
Such that we can re-use it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-03-06perf/x86/intel: Make cpuc allocations consistentPeter Zijlstra (Intel)
The cpuc data structure allocation is different between fake and real cpuc's; use the same code to init/free both. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-28Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11perf/x86: Add check_period PMU callbackJiri Olsa
Vince (and later on Ravi) reported crashes in the BTS code during fuzzing with the following backtrace: general protection fault: 0000 [#1] SMP PTI ... RIP: 0010:perf_prepare_sample+0x8f/0x510 ... Call Trace: <IRQ> ? intel_pmu_drain_bts_buffer+0x194/0x230 intel_pmu_drain_bts_buffer+0x160/0x230 ? tick_nohz_irq_exit+0x31/0x40 ? smp_call_function_single_interrupt+0x48/0xe0 ? call_function_single_interrupt+0xf/0x20 ? call_function_single_interrupt+0xa/0x20 ? x86_schedule_events+0x1a0/0x2f0 ? x86_pmu_commit_txn+0xb4/0x100 ? find_busiest_group+0x47/0x5d0 ? perf_event_set_state.part.42+0x12/0x50 ? perf_mux_hrtimer_restart+0x40/0xb0 intel_pmu_disable_event+0xae/0x100 ? intel_pmu_disable_event+0xae/0x100 x86_pmu_stop+0x7a/0xb0 x86_pmu_del+0x57/0x120 event_sched_out.isra.101+0x83/0x180 group_sched_out.part.103+0x57/0xe0 ctx_sched_out+0x188/0x240 ctx_resched+0xa8/0xd0 __perf_event_enable+0x193/0x1e0 event_function+0x8e/0xc0 remote_function+0x41/0x50 flush_smp_call_function_queue+0x68/0x100 generic_smp_call_function_single_interrupt+0x13/0x30 smp_call_function_single_interrupt+0x3e/0xe0 call_function_single_interrupt+0xf/0x20 </IRQ> The reason is that while event init code does several checks for BTS events and prevents several unwanted config bits for BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows to create BTS event without those checks being done. Following sequence will cause the crash: If we create an 'almost' BTS event with precise_ip and callchains, and it into a BTS event it will crash the perf_prepare_sample() function because precise_ip events are expected to come in with callchain data initialized, but that's not the case for intel_pmu_drain_bts_buffer() caller. Adding a check_period callback to be called before the period is changed via PERF_EVENT_IOC_PERIOD. It will deny the change if the event would become BTS. Plus adding also the limit_period check as well. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190204123532.GA4794@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11perf/x86/intel: Add counter freezing quirk for GoldmontKan Liang
A microcode patch is also needed for Goldmont while counter freezing feature is enabled. Otherwise, there will be some issues, e.g. PMI lost. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1549319013-4522-5-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11perf/x86/intel: Clean up counter freezing quirkKan Liang
Clean up counter freezing quirk to use the new facility to check for min microcode revisions. Rename the counter freezing quirk related functions. Because other platforms, e.g. Goldmont, also needs to call the quirk. Only check the boot CPU, assuming models and features are consistent over all CPUs. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1549319013-4522-4-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11perf/x86/intel: Clean up SNB PEBS quirkKan Liang
Clean up SNB PEBS quirk to use the new facility to check for min microcode revisions. Only check the boot CPU, assuming models and features are consistent over all CPUs. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1549319013-4522-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11perf/x86/kvm: Avoid unnecessary work in guest filteringAndi Kleen
KVM added a workaround for PEBS events leaking into guests with commit: 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.") This uses the VT entry/exit list to add an extra disable of the PEBS_ENABLE MSR. Intel also added a fix for this issue to microcode updates on Haswell/Broadwell/Skylake. It turns out using the MSR entry/exit list makes VM exits significantly slower. The list is only needed for disabling PEBS, because the GLOBAL_CTRL change gets optimized by KVM into changing the VMCS. Check for the microcode updates that have the microcode fix for leaking PEBS, and disable the extra entry/exit list entry for PEBS_ENABLE. In addition we always clear the GLOBAL_CTRL for the PEBS counter while running in the guest, which is enough to make them never fire at the wrong side of the host/guest transition. The overhead for VM exits with the filtering active with the patch is reduced from 8% to 4%. The microcode patch has already been merged into future platforms. This patch is one-off thing. The quirks is used here. For other old platforms which doesn't have microcode patch and quirks, extra disable of the PEBS_ENABLE MSR is still required. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1549319013-4522-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu()Peter Zijlstra
intel_pmu_cpu_prepare() allocated memory for ->shared_regs among other members of struct cpu_hw_events. This memory is released in intel_pmu_cpu_dying() which is wrong. The counterpart of the intel_pmu_cpu_prepare() callback is x86_pmu_dead_cpu(). Otherwise if the CPU fails on the UP path between CPUHP_PERF_X86_PREPARE and CPUHP_AP_PERF_X86_STARTING then it won't release the memory but allocate new memory on the next attempt to online the CPU (leaking the old memory). Also, if the CPU down path fails between CPUHP_AP_PERF_X86_STARTING and CPUHP_PERF_X86_PREPARE then the CPU will go back online but never allocate the memory that was released in x86_pmu_dying_cpu(). Make the memory allocation/free symmetrical in regard to the CPU hotplug notifier by moving the deallocation to intel_pmu_cpu_dead(). This started in commit: a7e3ed1e47011 ("perf: Add support for supplementary event registers"). In principle the bug was introduced in v2.6.39 (!), but it will almost certainly not backport cleanly across the big CPU hotplug rewrite between v4.7-v4.15... [ bigeasy: Added patch description. ] [ mingo: Added backporting guidance. ] Reported-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With developer hat on Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With maintainer hat on Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: bp@alien8.de Cc: hpa@zytor.com Cc: jolsa@kernel.org Cc: kan.liang@linux.intel.com Cc: namhyung@kernel.org Cc: <stable@vger.kernel.org> Fixes: a7e3ed1e47011 ("perf: Add support for supplementary event registers"). Link: https://lkml.kernel.org/r/20181219165350.6s3jvyxbibpvlhtq@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-29x86/events: Mark expected switch-case fall-throughsGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough by default, mark switch-case statements where fall-through is intentional, explicitly in order to fix a couple of -Wimplicit-fallthrough warnings. Warning level 3 was used: -Wimplicit-fallthrough=3. [ bp: Massasge and trim commit message. ] Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jacek Tomaka <jacek.tomaka@poczta.fm> Cc: Jia Zhang <qianyue.zj@alibaba-inc.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190125184917.GA7289@embeddedor
2018-12-03x86: Fix various typos in commentsIngo Molnar
Go over arch/x86/ and fix common typos in comments, and a typo in an actual function argument name. No change in functionality intended. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22perf/x86/intel: Disallow precise_ip on BTS eventsJiri Olsa
Vince reported a crash in the BTS flush code when touching the callchain data, which was supposed to be initialized as an 'early' callchain, but intel_pmu_drain_bts_buffer() does not do that: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 ... Call Trace: <IRQ> intel_pmu_drain_bts_buffer+0x151/0x220 ? intel_get_event_constraints+0x219/0x360 ? perf_assign_events+0xe2/0x2a0 ? select_idle_sibling+0x22/0x3a0 ? __update_load_avg_se+0x1ec/0x270 ? enqueue_task_fair+0x377/0xdd0 ? cpumask_next_and+0x19/0x20 ? load_balance+0x134/0x950 ? check_preempt_curr+0x7a/0x90 ? ttwu_do_wakeup+0x19/0x140 x86_pmu_stop+0x3b/0x90 x86_pmu_del+0x57/0x160 event_sched_out.isra.106+0x81/0x170 group_sched_out.part.108+0x51/0xc0 __perf_event_disable+0x7f/0x160 event_function+0x8c/0xd0 remote_function+0x3c/0x50 flush_smp_call_function_queue+0x35/0xe0 smp_call_function_single_interrupt+0x3a/0xd0 call_function_single_interrupt+0xf/0x20 </IRQ> It was triggered by fuzzer but can be easily reproduced by: # perf record -e cpu/branch-instructions/pu -g -c 1 Peter suggested not to allow branch tracing for precise events: > Now arguably, this is really stupid behaviour. Who in his right mind > wants callchain output on BTS entries. And even if they do, BTS + > precise_ip is nonsensical. > > So in my mind disallowing precise_ip on BTS would be the simplest fix. Suggested-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6cbc304f2f36 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)") Link: http://lkml.kernel.org/r/20181121101612.16272-3-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts()Jiri Olsa
Currently we check the branch tracing only by checking for the PERF_COUNT_HW_BRANCH_INSTRUCTIONS event of PERF_TYPE_HARDWARE type. But we can define the same event with the PERF_TYPE_RAW type. Changing the intel_pmu_has_bts() code to check on event's final hw config value, so both HW types are covered. Adding unlikely to intel_pmu_has_bts() condition calls, because it was used in the original code in intel_bts_constraints. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20181121101612.16272-2-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22perf/x86/intel: Move branch tracing setup to the Intel-specific source fileJiri Olsa
Moving branch tracing setup to Intel core object into separate intel_pmu_bts_config function, because it's Intel specific. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20181121101612.16272-1-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20perf/x86/intel: Fix regression by default disabling perfmon v4 interrupt ↵Peter Zijlstra
handling Kyle Huey reported that 'rr', a replay debugger, broke due to the following commit: af3bdb991a5c ("perf/x86/intel: Add a separate Arch Perfmon v4 PMI handler") Rework the 'disable_counter_freezing' __setup() parameter such that we can explicitly enable/disable it and switch to default disabled. To this purpose, rename the parameter to "perf_v4_pmi=" which is a much better description and allows requiring a bool argument. [ mingo: Improved the changelog some more. ] Reported-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/20181120170842.GZ2131@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-29x86: Clean up 'sizeof x' => 'sizeof(x)'Jordan Borgner
"sizeof(x)" is the canonical coding style used in arch/x86 most of the time. Fix the few places that didn't follow the convention. (Also do some whitespace cleanups in a few places while at it.) [ mingo: Rewrote the changelog. ] Signed-off-by: Jordan Borgner <mail@jordan-borgner.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181028125828.7rgammkgzep2wpam@JordanDesktop Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-16perf/x86/intel: Export mem events only if there's PEBS supportJiri Olsa
Memory events depends on PEBS support and access to LDLAT MSR, but we display them in /sys/devices/cpu/events even if the CPU does not provide those, like for KVM guests. That brings the false assumption that those events should be available, while they fail event to open. Separating the mem-* events attributes and merging them with cpu_events only if there's PEBS support detected. We could also check if LDLAT MSR is available, but the PEBS check seems to cover the need now. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20180906135748.GC9577@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02perf/x86/intel: Add quirk for Goldmont PlusKan Liang
A ucode patch is needed for Goldmont Plus while counter freezing feature is enabled. Otherwise, there will be some issues, e.g. PMI flood with some events. Add a quirk to check microcode version. If the system starts with the wrong ucode, leave the counter-freezing feature permanently disabled. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/1533712328-2834-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02x86/cpu: Sanitize FAM6_ATOM namingPeter Zijlstra
Going primarily by: https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors with additional information gleaned from other related pages; notably: - Bonnell shrink was called Saltwell - Moorefield is the Merriefield refresh which makes it Airmont The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE for i in `git grep -l FAM6_ATOM` ; do sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \ -e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \ -e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \ -e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \ -e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \ -e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \ -e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \ -e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \ -e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \ -e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \ -e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: dave.hansen@linux.intel.com Cc: len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02perf/x86/intel: Add a separate Arch Perfmon v4 PMI handlerAndi Kleen
Implements counter freezing for Arch Perfmon v4 (Skylake and newer). This allows to speed up the PMI handler by avoiding unnecessary MSR writes and make it more accurate. The Arch Perfmon v4 PMI handler is substantially different than the older PMI handler. Differences to the old handler: - It relies on counter freezing, which eliminates several MSR writes from the PMI handler and lowers the overhead significantly. It makes the PMI handler more accurate, as all counters get frozen atomically as soon as any counter overflows. So there is much less counting of the PMI handler itself. With the freezing we don't need to disable or enable counters or PEBS. Only BTS which does not support auto-freezing still needs to be explicitly managed. - The PMU acking is done at the end, not the beginning. This makes it possible to avoid manual enabling/disabling of the PMU, instead we just rely on the freezing/acking. - The APIC is acked before reenabling the PMU, which avoids problems with LBRs occasionally not getting unfreezed on Skylake. - Looping is only needed to workaround a corner case which several PMIs are very close to each other. For common cases, the counters are freezed during PMI handler. It doesn't need to do re-check. This patch: - Adds code to enable v4 counter freezing - Fork <=v3 and >=v4 PMI handlers into separate functions. - Add kernel parameter to disable counter freezing. It took some time to debug counter freezing, so in case there are new problems we added an option to turn it off. Would not expect this to be used until there are new bugs. - Only for big core. The patch for small core will be posted later separately. Performance: When profiling a kernel build on Kabylake with different perf options, measuring the length of all NMI handlers using the nmi handler trace point: V3 is without counter freezing. V4 is with counter freezing. The value is the average cost of the PMI handler. (lower is better) perf options ` V3(ns) V4(ns) delta -c 100000 1088 894 -18% -g -c 100000 1862 1646 -12% --call-graph lbr -c 100000 3649 3367 -8% --c.g. dwarf -c 100000 2248 1982 -12% Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/1533712328-2834-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02perf/x86/intel: Factor out common code of PMI handlerKan Liang
The Arch Perfmon v4 PMI handler is substantially different than the older PMI handler. Instead of adding more and more ifs cleanly fork the new handler into a new function, with the main common code factored out into a common function. Fix complaint from checkpatch.pl by removing "false" from "static bool warned". No functional change. Based-on-code-from: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/1533712328-2834-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25perf/x86/intel: Support Extended PEBS for Goldmont PlusKan Liang
Enable the extended PEBS for Goldmont Plus. There is no specific PEBS constrains for Goldmont Plus. Removing the pebs_constraints for Goldmont Plus. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/20180309021542.11374-4-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25perf/x86/intel/ds: Handle PEBS overflow for fixed countersKan Liang
The pebs_drain() need to support fixed counters. The DS Save Area now include "counter reset value" fields for each fixed counters. Extend the related variables (e.g. mask, counters, error) to support fixed counters. There is no extended PEBS in PEBS v2 and earlier PEBS format. Only need to change the code for PEBS v3 and later PEBS format. Extend the pebs_event_reset[] logic to support new "counter reset value" fields. Increase the reserve space for fixed counters. Based-on-code-from: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/20180309021542.11374-3-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25perf/x86/intel: Support PEBS on fixed countersKan Liang
The Extended PEBS feature supports PEBS on fixed-function performance counters as well as all four general purpose counters. It has to change the order of PEBS and fixed counter enabling to make sure PEBS is enabled for the fixed counters. The change of the order doesn't impact the behavior of current code on other platforms which don't support extended PEBS. Because there is no dependency among those enable/disable functions. Don't enable IRQ generation (0x8) for MSR_ARCH_PERFMON_FIXED_CTR_CTRL. The PEBS ucode will handle the interrupt generation. Based-on-code-from: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/20180309021542.11374-2-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)Peter Zijlstra
Vince reported the perf_fuzzer giving various unwinder warnings and Josh reported: > Deja vu. Most of these are related to perf PEBS, similar to the > following issue: > > b8000586c90b ("perf/x86/intel: Cure bogus unwind from PEBS entries") > > This is basically the ORC version of that. setup_pebs_sample_data() is > assembling a franken-pt_regs which ORC isn't happy about. RIP is > inconsistent with some of the other registers (like RSP and RBP). And where the previous unwinder only needed BP,SP ORC also requires IP. But we cannot spoof IP because then the sample will get displaced, entirely negating the point of PEBS. So cure the whole thing differently by doing the unwind early; this does however require a means to communicate we did the unwind early. We (ab)use an unused sample_type bit for this, which we set on events that fill out the data->callchain before the normal perf_prepare_sample(). Debugged-by: Josh Poimboeuf <jpoimboe@redhat.com> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-25perf/x86/intel: Don't enable freeze-on-smi for PerfMon V1Kan Liang
The SMM freeze feature was introduced since PerfMon V2. But the current code unconditionally enables the feature for all platforms. It can generate #GP exception, if the related FREEZE_WHILE_SMM bit is set for the machine with PerfMon V1. To disable the feature for PerfMon V1, perf needs to - Remove the freeze_on_smi sysfs entry by moving intel_pmu_attrs to intel_pmu, which is only applied to PerfMon V2 and later. - Check the PerfMon version before flipping the SMM bit when starting CPU Fixes: 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ak@linux.intel.com Cc: eranian@google.com Cc: acme@redhat.com Link: https://lkml.kernel.org/r/1524682637-63219-1-git-send-email-kan.liang@linux.intel.com
2018-03-24Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
With the cherry-picked perf/urgent commit merged separately we can now merge all the fixes without conflicts. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20perf/x86/intel: Rename confusing 'freerunning PEBS' API and implementation ↵Kan Liang
to 'large PEBS' The 'freerunning PEBS' and 'large PEBS' are the same thing. Both of these names appear in the code and in the API, which causes confusion. Rename 'freerunning PEBS' to 'large PEBS' to unify the code, which eliminates the confusion. No functional change. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1520865937-22910-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period()Dan Carpenter
We intended to clear the lowest 6 bits but because of a type bug we clear the high 32 bits as well. Andi says that periods are rarely more than U32_MAX so this bug probably doesn't have a huge runtime impact. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 294fe0f52a44 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds") Link: http://lkml.kernel.org/r/20180317115216.GB4035@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09perf/x86/intel: Fix PMU read for auto-reloadKan Liang
Auto-reload events needs to be specially handled in event count read. Auto-reload is only available for intel_pmu. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)") Link: http://lkml.kernel.org/r/1518474035-21006-5-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09perf/x86/intel: Properly save/restore the PMU state in the NMI handlerKan Liang
The PMU is disabled in intel_pmu_handle_irq(), but cpuc->enabled is not updated accordingly. This is fine in current usage because no-one checks it - but fix it for future code: for example, the drain_pebs() will be modified to fix an auto-reload bug. Properly save/restore the old PMU state. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: kernel test robot <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/6f44ee84-56f8-79f1-559b-08e371eaeb78@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09perf/x86/intel: Fix large period handling on Broadwell CPUsKan Liang
Large fixed period values could be truncated on Broadwell, for example: perf record -e cycles -c 10000000000 Here the fixed period is 0x2540BE400, but the period which finally applied is 0x540BE400 - which is wrong. The reason is that x86_pmu::limit_period() uses an u32 parameter, so the high 32 bits of 'period' get truncated. This bug was introduced in: commit 294fe0f52a44 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds") It's safe to use u64 instead of u32: - Although the 'left' is s64, the value of 'left' must be positive when calling limit_period(). - bdw_limit_period() only modifies the lowest 6 bits, it doesn't touch the higher 32 bits. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 294fe0f52a44 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds") Link: http://lkml.kernel.org/r/1519926894-3520-1-git-send-email-kan.liang@linux.intel.com [ Rewrote unacceptably bad changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-15x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_steppingJia Zhang
x86_mask is a confusing name which is hard to associate with the processor's stepping. Additionally, correct an indent issue in lib/cpu.c. Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com> [ Updated it to more recent kernels. ] Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-27perf/x86/intel: Plug memory leak in intel_pmu_init()Thomas Gleixner
A recent commit introduced an extra merge_attr() call in the skylake branch, which causes a memory leak. Store the pointer to the extra allocated memory and free it at the end of the function. Fixes: a5df70c354c2 ("perf/x86: Only show format attributes when supported") Reported-by: Tommi Rantala <tommi.t.rantala@nokia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@linux.intel.com>
2017-11-26Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: two PMU driver fixes and a memory leak fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix memory leak triggered by perf --namespace perf/x86/intel/uncore: Add event constraint for BDX PCU perf/x86/intel: Hide TSX events when RTM is not supported
2017-11-13perf/x86/intel: Hide TSX events when RTM is not supportedAndi Kleen
0day testing reported a perf test regression on Haswell systems without RTM. Commit a5df70c35 hides the in_tx/in_tx_cp attributes when RTM is not available, but the TSX events are still available in sysfs. Due to the missing attributes the event parser fails on those files. Don't show the TSX events in sysfs when RTM is not available on Haswell/Broadwell/Skylake. Fixes: a5df70c354c2 (perf/x86: Only show format attributes when supported) Reported-by: kernel test robot <xiaolong.ye@intel.com> Tested-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20171109000718.14137-1-andi@firstfloor.org
2017-10-20Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29perf/x86: Enable free running PEBS for REGS_USER/INTRAndi Kleen
Currently free running PEBS is disabled when user or interrupt registers are requested. Most of the registers are actually available in the PEBS record and can be supported. So we just need to check for the supported registers and then allow it: it is all except for the segment register. For user registers this only works when the counter is limited to ring 3 only, so this also needs to be checked. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170831214630.21892-1-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14perf/x86/intel, watchdog/core: Sanitize PMU HT bug workaroundPeter Zijlstra
The lockup_detector_suspend/resume() interface is broken in several ways especially as it results in recursive locking of the CPU hotplug lock. Use the new stop/restart interface in the perf NMI watchdog to temporarily disable and reenable the already active watchdog events. That's enough to handle it. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Don Zickus <dzickus@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Ulrich Obergfell <uobergfe@redhat.com> Link: http://lkml.kernel.org/r/20170912194146.247141871@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/x86: Fix caps/ for !IntelPeter Zijlstra
Move the 'max_precise' capability into generic x86 code where it belongs. This fixes a sysfs splat on !Intel systems where we fail to set x86_pmu_caps_group.atts. Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Fixes: 22688d1c20f5 ("x86/perf: Export some PMU attributes in caps/ directory") Link: http://lkml.kernel.org/r/20170828104650.2u3rsim4jafyjzv2@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25perf/x86: Export some PMU attributes in caps/ directoryAndi Kleen
It can be difficult to figure out for user programs what features the x86 CPU PMU driver actually supports. Currently it requires grepping in dmesg, but dmesg is not always available. This adds a caps directory to /sys/bus/event_source/devices/cpu/, similar to the caps already used on intel_pt, which can be used to discover the available capabilities cleanly. Three capabilities are defined: - pmu_name: Underlying CPU name known to the driver - max_precise: Max precise level supported - branches: Known depth of LBR. Example: % grep . /sys/bus/event_source/devices/cpu/caps/* /sys/bus/event_source/devices/cpu/caps/branches:32 /sys/bus/event_source/devices/cpu/caps/max_precise:3 /sys/bus/event_source/devices/cpu/caps/pmu_name:skylake Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170822185201.9261-3-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25perf/x86: Only show format attributes when supportedAndi Kleen
Only show the Intel format attributes in sysfs when the feature is actually supported with the current model numbers. This allows programs to probe what format attributes are available, and give a sensible error message to users if they are not. This handles near all cases for intel attributes since Nehalem, except the (obscure) case when the model number if known, but PEBS is disabled in PERF_CAPABILITIES. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170822185201.9261-2-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>