summaryrefslogtreecommitdiff
path: root/arch/x86/events
AgeCommit message (Collapse)Author
2017-08-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Another pile of small fixes and updates for x86: - Plug a hole in the SMAP implementation which misses to clear AC on NMI entry - Fix the norandmaps/ADDR_NO_RANDOMIZE logic so the command line parameter works correctly again - Use the proper accessor in the startup64 code for next_early_pgt to prevent accessing of invalid addresses and faulting in the early boot code. - Prevent CPU hotplug lock recursion in the MTRR code - Unbreak CPU0 hotplugging - Rename overly long CPUID bits which got introduced in this cycle - Two commits which mark data 'const' and restrict the scope of data and functions to file scope by making them 'static'" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Constify attribute_group structures x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt' x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks x86: Fix norandmaps/ADDR_NO_RANDOMIZE x86/mtrr: Prevent CPU hotplug lock recursion x86: Mark various structures and functions as 'static' x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag x86/smpboot: Unbreak CPU0 hotplug x86/asm/64: Clear AC on NMI entries
2017-08-18x86: Constify attribute_group structuresArvind Yadav
attribute_groups are not supposed to change at runtime and none of the groups is modified. Mark the non-const structs as const. [ tglx: Folded into one big patch ] Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: tony.luck@intel.com Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1500550238-15655-2-git-send-email-arvind.yadav.cs@gmail.com
2017-08-11x86: Mark various structures and functions as 'static'Colin Ian King
Mark a couple of structures and functions as 'static', pointed out by Sparse: warning: symbol 'bts_pmu' was not declared. Should it be static? warning: symbol 'p4_event_aliases' was not declared. Should it be static? warning: symbol 'rapl_attr_groups' was not declared. Should it be static? symbol 'process_uv2_message' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Andrew Banman <abanman@hpe.com> # for the UV change Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170810155709.7094-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10perf/x86: Fix RDPMC vs. mm_struct trackingPeter Zijlstra
Vince reported the following rdpmc() testcase failure: > Failing test case: > > fd=perf_event_open(); > addr=mmap(fd); > exec() // without closing or unmapping the event > fd=perf_event_open(); > addr=mmap(fd); > rdpmc() // GPFs due to rdpmc being disabled The problem is of course that exec() plays tricks with what is current->mm, only destroying the old mappings after having installed the new mm. Fix this confusion by passing along vma->vm_mm instead of relying on current->mm. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 1e0fb9ec679c ("perf: Add pmu callbacks to track event mapping and unmapping") Link: http://lkml.kernel.org/r/20170802173930.cstykcqefmqt7jau@hirez.programming.kicks-ass.net [ Minor cleanups. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-24perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regsStephane Eranian
This skx_uncore_cha_extra_regs array was missing an end-marker. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1499967350-10385-7-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-24perf/x86/intel/uncore: Fix SKX CHA event extra regsStephane Eranian
This patch adds two missing event extra regs for Skylake Server CHA PMU: - TOR_INSERTS - TOR_OCCUPANCY Were missing support for all the filters, including opcode matchers. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1499967350-10385-6-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-24perf/x86/intel/uncore: Remove invalid Skylake server CHA filter fieldKan Liang
There is no field c6 and link for CHA BOX FILTER. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1499967350-10385-5-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-24perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umaskKan Liang
Correct the umask for LLC_LOOKUP.LOCAL and LLC_LOOKUP.REMOTE events Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1499967350-10385-4-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-24perf/x86/intel/uncore: Fix Skylake server PCU PMU event formatKan Liang
PCU event format for SKX are different from snbep. Introduce a new format group for SKX PCU. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1499967350-10385-3-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-24perf/x86/intel/uncore: Fix Skylake UPI PMU event masksStephane Eranian
This patch fixes the event_mask and event_ext_mask for the Intel Skylake Server UPI PMU. Bit 21 is not used as a filter. The extended umask is from bit 32 to bit 55. Correct both umasks. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1499967350-10385-2-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-21Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Half of the fixes are for various build time warnings triggered by randconfig builds. Most (but not all...) were harmless. There's also: - ACPI boundary condition fixes - UV platform fixes - defconfig updates - an AMD K6 CPU init fix - a %pOF printk format related preparatory change - .. and a warning fix related to the tlb/PCID changes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/devicetree: Convert to using %pOF instead of ->full_name x86/platform/uv/BAU: Disable BAU on single hub configurations x86/platform/intel-mid: Fix a format string overflow warning x86/platform: Add PCI dependency for PUNIT_ATOM_DEBUG x86/build: Silence the build with "make -s" x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl x86/fpu/math-emu: Avoid bogus -Wint-in-bool-context warning x86/fpu/math-emu: Fix possible uninitialized variable use perf/x86: Shut up false-positive -Wmaybe-uninitialized warning x86/defconfig: Remove stale, old Kconfig options x86/ioapic: Pass the correct data to unmask_ioapic_irq() x86/acpi: Prevent out of bound access caused by broken ACPI tables x86/mm, KVM: Fix warning when !CONFIG_PREEMPT_COUNT x86/platform/uv/BAU: Fix congested_response_us not taking effect x86/cpu: Use indirect call to measure performance in init_amd_k6()
2017-07-21perf/x86/intel: Add proper condition to run sched_task callbacksJiri Olsa
We have 2 functions using the same sched_task callback: - PEBS drain for free running counters - LBR save/store Both of them are called from intel_pmu_sched_task() and either of them can be unwillingly triggered when the other one is configured to run. Let's say there's PEBS drain configured in sched_task callback for the event, but in the callback itself (intel_pmu_sched_task()) we will also run the code for LBR save/restore, which we did not ask for, but the code in intel_pmu_sched_task() does not check for that. This can lead to extra cycles in some perf monitoring, like when we monitor PEBS event without LBR data. # perf record --no-timestamp -c 10000 -e cycles:p ./perf bench sched pipe -l 1000000 (We need PEBS, non freq/non timestamp event to enable the sched_task callback) The perf stat of cycles and msr:write_msr for above command before the change: ... Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \ ./perf bench sched pipe -l 1000000' (5 runs): 18,519,557,441 cycles:k 91,195,527 msr:write_msr 29.334476406 seconds time elapsed And after the change: ... Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \ ./perf bench sched pipe -l 1000000' (5 runs): 18,704,973,540 cycles:k 27,184,720 msr:write_msr 16.977875900 seconds time elapsed There's no affect on cycles:k because the sched_task happens with events switched off, however the msr:write_msr tracepoint counter together with almost 50% of time speedup show the improvement. Monitoring LBR event and having extra PEBS drain processing in sched_task callback showed just a little speedup, because the drain function does not do much extra work in case there is no PEBS data. Adding conditions to recognize the configured work that needs to be done in the x86_pmu's sched_task callback. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170719075247.GA27506@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-20perf/x86: Shut up false-positive -Wmaybe-uninitialized warningArnd Bergmann
The intialization function checks for various failure scenarios, but unfortunately the compiler gets a little confused about the possible combinations, leading to a false-positive build warning when -Wmaybe-uninitialized is set: arch/x86/events/core.c: In function ‘init_hw_perf_events’: arch/x86/events/core.c:264:3: warning: ‘reg_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized] arch/x86/events/core.c:264:3: warning: ‘val_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized] pr_err(FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x is %Lx)\n", We can't actually run into this case, so this shuts up the warning by initializing the variables to a known-invalid state. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170719125310.2487451-2-arnd@arndb.de Link: https://patchwork.kernel.org/patch/9392595/ Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18perf/x86/intel: Fix debug_store reset field for freq eventsJiri Olsa
There's a bug in PEBs event enabling code, that prevents PEBS freq events to work properly after non freq PEBS event was run. freq events - perf_event_attr::freq set -F <freq> option of perf record PEBS events - perf_event_attr::precise_ip > 0 default for perf record Like in following example with CPU 0 busy, we expect ~10000 samples for following perf tool run: # perf record -F 10000 -C 0 sleep 1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.640 MB perf.data (10031 samples) ] Everything's fine, but once we run non freq PEBS event like: # perf record -c 10000 -C 0 sleep 1 [ perf record: Woken up 4 times to write data ] [ perf record: Captured and wrote 1.053 MB perf.data (20061 samples) ] the freq events start to fail like this: # perf record -F 10000 -C 0 sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.185 MB perf.data (40 samples) ] The issue is in non freq PEBs event initialization of debug_store reset field, which value is used to auto-reload the counter value after PEBS event drain. This value is not being used for PEBS freq events, but once we run non freq event it stays in debug_store data and screws the sample_freq counting for PEBS freq events. Setting the reset field to 0 for freq events. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170714163551.19459-1-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18perf/x86/intel: Add Goldmont Plus CPU PMU supportKan Liang
Add perf core PMU support for Intel Goldmont Plus CPU cores: - The init code is based on Goldmont. - There is a new cache event list, based on the Goldmont cache event list. - All four general-purpose performance counters support PEBS. - The first general-purpose performance counter is for reduced skid PEBS mechanism. Using :ppp to indicate the event which want to do reduced skid PEBS. - Goldmont Plus has 4-wide pipeline for Topdown Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/20170712134423.17766-1-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18perf/x86/intel: Enable C-state residency events for Apollo LakeHarry Pan
Goldmont microarchitecture supports C1/C3/C6, PC2/PC3/PC6/PC10 state residency counters, the patch enables them for Apollo Lake platform. The MSR information is based on Intel Software Developers' Manual, Vol. 4, Order No. 335592, Table 2-6 and 2-12. Signed-off-by: Harry Pan <harry.pan@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@suse.de Cc: davidcc@google.com Cc: gs0622@gmail.com Cc: lukasz.odzioba@intel.com Cc: piotr.luc@intel.com Cc: srinivas.pandruvada@linux.intel.com Link: http://lkml.kernel.org/r/20170717103749.24337-1-harry.pan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-03Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug updates from Thomas Gleixner: "This update is primarily a cleanup of the CPU hotplug locking code. The hotplug locking mechanism is an open coded RWSEM, which allows recursive locking. The main problem with that is the recursive nature as it evades the full lockdep coverage and hides potential deadlocks. The rework replaces the open coded RWSEM with a percpu RWSEM and establishes full lockdep coverage that way. The bulk of the changes fix up recursive locking issues and address the now fully reported potential deadlocks all over the place. Some of these deadlocks have been observed in the RT tree, but on mainline the probability was low enough to hide them away." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) cpu/hotplug: Constify attribute_group structures powerpc: Only obtain cpu_hotplug_lock if called by rtasd ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init cpu/hotplug: Remove unused check_for_tasks() function perf/core: Don't release cred_guard_mutex if not taken cpuhotplug: Link lock stacks for hotplug callbacks acpi/processor: Prevent cpu hotplug deadlock sched: Provide is_percpu_thread() helper cpu/hotplug: Convert hotplug locking to percpu rwsem s390: Prevent hotplug rwsem recursion arm: Prevent hotplug rwsem recursion arm64: Prevent cpu hotplug rwsem recursion kprobes: Cure hotplug lock ordering issues jump_label: Reorder hotplug lock and jump_label_lock perf/tracing/cpuhotplug: Fix locking order ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus() PCI: Replace the racy recursion prevention PCI: Use cpu_hotplug_disable() instead of get_online_cpus() perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode() x86/perf: Drop EXPORT of perf_check_microcode ...
2017-07-03Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main changes in this cycle were: - Continued work to add support for 5-level paging provided by future Intel CPUs. In particular we switch the x86 GUP code to the generic implementation. (Kirill A. Shutemov) - Continued work to add PCID CPU support to native kernels as well. In this round most of the focus is on reworking/refreshing the TLB flush infrastructure for the upcoming PCID changes. (Andy Lutomirski)" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/mm: Delete a big outdated comment about TLB flushing x86/mm: Don't reenter flush_tlb_func_common() x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging x86/ftrace: Exclude functions in head64.c from function-tracing x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap x86/mm: Remove reset_lazy_tlbstate() x86/ldt: Simplify the LDT switching logic x86/boot/64: Put __startup_64() into .head.text x86/mm: Add support for 5-level paging for KASLR x86/mm: Make kernel_physical_mapping_init() support 5-level paging x86/mm: Add sync_global_pgds() for configuration with 5-level paging x86/boot/64: Add support of additional page table level during early boot x86/boot/64: Rename init_level4_pgt and early_level4_pgt x86/boot/64: Rewrite startup_64() in C x86/boot/compressed: Enable 5-level paging during decompression stage x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations x86/boot/efi: Cleanup initialization of GDT entries x86/asm: Fix comment in return_from_SYSCALL_64() x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation ...
2017-07-03Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Add the SYSTEM_SCHEDULING bootup state to move various scheduler debug checks earlier into the bootup. This turns silent and sporadically deadly bugs into nice, deterministic splats. Fix some of the splats that triggered. (Thomas Gleixner) - A round of restructuring and refactoring of the load-balancing and topology code (Peter Zijlstra) - Another round of consolidating ~20 of incremental scheduler code history: this time in terms of wait-queue nomenclature. (I didn't get much feedback on these renaming patches, and we can still easily change any names I might have misplaced, so if anyone hates a new name, please holler and I'll fix it.) (Ingo Molnar) - sched/numa improvements, fixes and updates (Rik van Riel) - Another round of x86/tsc scheduler clock code improvements, in hope of making it more robust (Peter Zijlstra) - Improve NOHZ behavior (Frederic Weisbecker) - Deadline scheduler improvements and fixes (Luca Abeni, Daniel Bristot de Oliveira) - Simplify and optimize the topology setup code (Lauro Ramos Venancio) - Debloat and decouple scheduler code some more (Nicolas Pitre) - Simplify code by making better use of llist primitives (Byungchul Park) - ... plus other fixes and improvements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) sched/cputime: Refactor the cputime_adjust() code sched/debug: Expose the number of RT/DL tasks that can migrate sched/numa: Hide numa_wake_affine() from UP build sched/fair: Remove effective_load() sched/numa: Implement NUMA node level wake_affine() sched/fair: Simplify wake_affine() for the single socket case sched/numa: Override part of migrate_degrades_locality() when idle balancing sched/rt: Move RT related code from sched/core.c to sched/rt.c sched/deadline: Move DL related code from sched/core.c to sched/deadline.c sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled sched/fair: Spare idle load balancing on nohz_full CPUs nohz: Move idle balancer registration to the idle path sched/loadavg: Generalize "_idle" naming to "_nohz" sched/core: Drop the unused try_get_task_struct() helper function sched/fair: WARN() and refuse to set buddy when !se->on_rq sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h> ...
2017-07-03Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Most of the changes are for tooling, the main changes in this cycle were: - Improve Intel-PT hardware tracing support, both on the kernel and on the tooling side: PTWRITE instruction support, power events for C-state tracing, etc. (Adrian Hunter) - Add support to measure SMI cost to the x86 architecture, with tooling support in 'perf stat' (Kan Liang) - Support function filtering in 'perf ftrace', plus related improvements (Namhyung Kim) - Allow adding and removing fields to the default 'perf script' columns, using + or - as field prefixes to do so (Andi Kleen) - Allow resolving the DSO name with 'perf script -F brstack{sym,off},dso' (Mark Santaniello) - Add perf tooling unwind support for PowerPC (Paolo Bonzini) - ... and various other improvements as well" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits) perf auxtrace: Add CPU filter support perf intel-pt: Do not use TSC packets for calculating CPU cycles to TSC perf intel-pt: Update documentation to include new ptwrite and power events perf intel-pt: Add example script for power events and PTWRITE perf intel-pt: Synthesize new power and "ptwrite" events perf intel-pt: Move code in intel_pt_synth_events() to simplify attr setting perf intel-pt: Factor out intel_pt_set_event_name() perf intel-pt: Tidy messages into called function intel_pt_synth_event() perf intel-pt: Tidy Intel PT evsel lookup into separate function perf intel-pt: Join needlessly wrapped lines perf intel-pt: Remove unused instructions_sample_period perf intel-pt: Factor out common code synthesizing event samples perf script: Add synthesized Intel PT power and ptwrite events perf/x86/intel: Constify the 'lbr_desc[]' array and make a function static perf script: Add 'synth' field for synthesized event payloads perf auxtrace: Add itrace option to output power events perf auxtrace: Add itrace option to output ptwrite events tools include: Add byte-swapping macros to kernel.h perf script: Add 'synth' event type for synthesized events x86/insn: perf tools: Add new ptwrite instruction ...
2017-06-30perf/x86/intel: Constify the 'lbr_desc[]' array and make a function staticColin Ian King
A few minor clean-ups: constify the lbr_desc[] array and make local function lbr_from_signext_quirk_rd() static to fix a sparse warning: "symbol 'lbr_from_signext_quirk_rd' was not declared. Should it be static?" Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170629091406.9870-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-29perf/x86/intel/uncore: Fix wrong box pointer checkKan Liang
Should not init a NULL box. It will cause system crash. The issue looks like caused by a typo. This was not noticed because there is no NULL box. Also, for most boxes, they are enabled by default. The init code is not critical. Fixes: fff4b87e594a ("perf/x86/intel/uncore: Make package handling more robust") Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170629190926.2456-1-kan.liang@intel.com
2017-06-22perf/x86/intel: Add 1G DTLB load/store miss support for SKLKan Liang
Current DTLB load/store miss events (0x608/0x649) only counts 4K,2M and 4M page size. Need to extend the events to support any page size (4K/2M/4M/1G). The complete DTLB load/store miss events are: DTLB_LOAD_MISSES.WALK_COMPLETED 0xe08 DTLB_STORE_MISSES.WALK_COMPLETED 0xe49 Signed-off-by: Kan Liang <Kan.liang@intel.com> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/20170619142609.11058-1-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-08x86/ldt: Rename ldt_struct::size to ::nr_entriesBorislav Petkov
... because this is exactly what it is: the number of entries in the LDT. Calling it "size" is simply confusing and it is actually begging to be called "nr_entries" or somesuch, especially if you see constructs like: alloc_size = size * LDT_ENTRY_SIZE; since LDT_ENTRY_SIZE is the size of a single entry. There should be no functionality change resulting from this patch, as the before/after output from tools/testing/selftests/x86/ldt_gdt.c shows. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170606173116.13977-1-bp@alien8.de [ Renamed 'n_entries' to 'nr_entries' ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05x86/mm: Rework lazy TLB to track the actual loaded mmAndy Lutomirski
Lazy TLB state is currently managed in a rather baroque manner. AFAICT, there are three possible states: - Non-lazy. This means that we're running a user thread or a kernel thread that has called use_mm(). current->mm == current->active_mm == cpu_tlbstate.active_mm and cpu_tlbstate.state == TLBSTATE_OK. - Lazy with user mm. We're running a kernel thread without an mm and we're borrowing an mm_struct. We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0). The current cpu is set in mm_cpumask(current->active_mm). CR3 points to current->active_mm->pgd. The TLB is up to date. - Lazy with init_mm. This happens when we call leave_mm(). We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, but that mm is only relelvant insofar as the scheduler is tracking it for refcounting. cpu_tlbstate.state != TLBSTATE_OK. The current cpu is clear in mm_cpumask(current->active_mm). CR3 points to swapper_pg_dir, i.e. init_mm->pgd. This patch simplifies the situation. Other than perf, x86 stops caring about current->active_mm at all. We have cpu_tlbstate.loaded_mm pointing to the mm that CR3 references. The TLB is always up to date for that mm. leave_mm() just switches us to init_mm. There are no longer any special cases for mm_cpumask, and switch_mm() switches mms without worrying about laziness. After this patch, cpu_tlbstate.state serves only to tell the TLB flush code whether it may switch to init_mm instead of doing a normal flush. This makes fairly extensive changes to xen_exit_mmap(), which used to look a bit like black magic. Perf is unchanged. With or without this change, perf may behave a bit erratically if it tries to read user memory in kernel thread context. We should build on this patch to teach perf to never look at user memory when cpu_tlbstate.loaded_mm != current->mm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-26perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()Sebastian Andrzej Siewior
If intel_snb_check_microcode() is invoked via microcode_init -> perf_check_microcode -> intel_snb_check_microcode then get_online_cpus() is invoked nested. This works with the current implementation of get_online_cpus() but prevents converting it to a percpu rwsem. intel_snb_check_microcode() is also invoked from intel_sandybridge_quirk() unprotected. Drop get_online_cpus() from intel_snb_check_microcode() and add it to intel_sandybridge_quirk() so both call sites are protected. Convert *_online_cpus() to the new interfaces while at it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov <bp@suse.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20170524081548.594862191@linutronix.de
2017-05-26x86/perf: Drop EXPORT of perf_check_microcodeThomas Gleixner
The only caller is the microcode update, which cannot be modular. Drop the export. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov <bp@suse.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20170524081548.515204988@linutronix.de
2017-05-26perf/x86/intel/cqm: Use cpuhp_setup_state_cpuslocked()Sebastian Andrzej Siewior
intel_cqm_init() holds get_online_cpus() while registerring the hotplug callbacks. cpuhp_setup_state() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup_state_cpuslocked() to avoid the nested call. Convert *_online_cpus() to the new interfaces while at it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170524081548.075604046@linutronix.de
2017-05-23perf/x86: Add sysfs entry to freeze counters on SMIKan Liang
Currently, the SMIs are visible to all performance counters, because many users want to measure everything including SMIs. But in some cases, the SMI cycles should not be counted - for example, to calculate the cost of an SMI itself. So a knob is needed. When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all performance counters will be effected. There is no way to do per-counter freeze on SMI. So it should not use the per-event interface (e.g. ioctl or event attribute) to set FREEZE_WHILE_SMM bit. Adds sysfs entry /sys/device/cpu/freeze_on_smi to set FREEZE_WHILE_SMM bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages while in SMM. Value has to be 0 or 1. It will be applied to all processors. Also serialize the entire setting so we don't get multiple concurrent threads trying to update to different values. Signed-off-by: Kan Liang <Kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: bp@alien8.de Cc: jolsa@kernel.org Link: http://lkml.kernel.org/r/1494600673-244667-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-15x86/tsc: Remodel cyc2ns to use seqcount_latch()Peter Zijlstra
Replace the custom multi-value scheme with the more regular seqcount_latch() scheme. Along with scrapping a lot of lines, the latch scheme is better documented and used in more places. The immediate benefit however is not being limited on the update side. The current code has a limit where the writers block which is hit by future changes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-03Merge tag 'perf-core-for-mingo-4.12-20170503' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: Fixes: - Support setting probes in versioned user space symbols, such as pthread_create@@GLIBC_2.1, picking the default one, more work needed to make it possible to set it on the other versions, as the 'perf probe' syntax already uses @ for other purposes. (Paul Clarke) - Do not special case address zero as an error for routines that return addresses (symbol lookup), instead use the return as the success/error indication and pass a pointer to return the address, fixing 'perf test vmlinux' (the one that compares address between vmlinux and kallsyms) on s/390, where the '_text' address is equal to zero (Arnaldo Carvalho de Melo) Infrastructure changes: - More header sanitization, moving stuff out of util.h into more appropriate headers and objects and sometimes creating new ones (Arnaldo Carvalho de Melo) - Refactor a duplicated code for obtaining config file name (Taeung Song) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-03perf/x86: Fix Broadwell-EP DRAM RAPL eventsVince Weaver
It appears as though the Broadwell-EP DRAM units share the special units quirk with Haswell-EP/KNL. Without this patch, you get really high results (a single DRAM using 20W of power). The powercap driver in drivers/powercap/intel_rapl.c already has this change. Signed-off-by: Vince Weaver <vincent.weaver@maine.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14perf/x86: Fix spurious NMI with PEBS Load Latency eventKan Liang
Spurious NMIs will be observed with the following command: while :; do perf record -bae "cpu/umask=0x01,event=0xcd,ldlat=0x80/pp" -e "cpu/umask=0x03,event=0x0/" -e "cpu/umask=0x02,event=0x0/" -e cycles,branches,cache-misses -e cache-references -- sleep 10 done The bug was introduced by commit: 8077eca079a2 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+") That commit clears the status bits for the counters used for PEBS events, by masking the whole 64 bits pebs_enabled. However, only the low 32 bits of both status and pebs_enabled are reserved for PEBS-able counters. For status bits 32-34 are fixed counter overflow bits. For pebs_enabled bits 32-34 are for PEBS Load Latency. In the test case, the PEBS Load Latency event and fixed counter event could overflow at the same time. The fixed counter overflow bit will be cleared by mistake. Once it is cleared, the fixed counter overflow never be processed, which finally trigger spurious NMI. Correct the PEBS enabled mask by ignoring the non-PEBS bits. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 8077eca079a2 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+") Link: http://lkml.kernel.org/r/1491333246-3965-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()Peter Zijlstra
When the perf_branch_entry::{in_tx,abort,cycles} fields were added, intel_pmu_lbr_read_32() wasn't updated to initialize them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> Fixes: 135c5612c460 ("perf/x86/intel: Support Haswell/v4 LBR format") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11perf/amd/uncore: Fix pr_fmt() prefixBorislav Petkov
Make it "perf/amd/uncore: ", i.e., something more specific than "perf: ". Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20170410122047.3026-4-bp@alien8.de [ Changed it to perf/amd/uncore/ ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11perf/amd/uncore: Clean up per-family setupBorislav Petkov
Fam16h is the same as the default one, remove it. Turn the switch-case into a simple if-else. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20170410122047.3026-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11perf/amd/uncore: Do feature check first, before assignmentsBorislav Petkov
... and save some unnecessary work. Remove now unused label while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20170410122047.3026-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11Merge tag 'v4.11-rc6' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30x86/events/amd/iommu: Enable support for multiple IOMMUsSuravee Suthikulpanit
Add support for multiple IOMMUs to perf by exposing an AMD IOMMU PMU for each IOMMU found in the system via: /bus/event_source/devices/amd_iommu_x where x is the IOMMU index. This allows users to specify different events to be programmed into the performance counters of each IOMMU. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [ Improve readability, shorten names. ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1490166162-10002-11-git-send-email-Suravee.Suthikulpanit@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30x86/events/amd/iommu: Add IOMMU-specific hw_perf_event structSuravee Suthikulpanit
Current AMD IOMMU perf PMU inappropriately uses the hardware struct inside the union in struct hw_perf_event, extra_reg in particular. Instead, introduce an AMD IOMMU-specific struct with required parameters to be programmed into the IOMMU performance counter control register. Update the pasid field from 16 to 20 bits while at it. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [ Fixup macros, shorten get_next_avail_iommu_bnk_cntr() local vars, massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1487926102-13073-10-git-send-email-Suravee.Suthikulpanit@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30x86/events/amd/iommu: Fix sysfs perf attribute groupsSuravee Suthikulpanit
Introduce static amd_iommu_attr_groups to simplify the sysfs attributes initialization code. Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1487926102-13073-9-git-send-email-Suravee.Suthikulpanit@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30x86/events, drivers/amd/iommu: Prepare for multiple IOMMUs supportSuravee Suthikulpanit
Currently, amd_iommu_pc_get_set_reg_val() cannot support multiple IOMMUs. Modify it to allow callers to specify an IOMMU. This is in preparation for supporting multiple IOMMUs. Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1487926102-13073-8-git-send-email-Suravee.Suthikulpanit@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30x86/events/amd/iommu.c: Modify functions to query max banks and countersSuravee Suthikulpanit
Currently, amd_iommu_pc_get_max_[banks|counters]() use end-point device ID to locate an IOMMU and check the reported max banks/counters. The logic assumes that the IOMMU_BASE_DEVID belongs to the first IOMMU, and uses it to acquire a reference to the first IOMMU, which does not work on certain systems. Instead, modify the function to take an IOMMU index, and use it to query the corresponding AMD IOMMU instance. Currently, hardcode the IOMMU index to 0 since the current AMD IOMMU perf implementation supports only a single IOMMU. A subsequent patch will add support for multiple IOMMUs, and will use a proper IOMMU index. Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1487926102-13073-7-git-send-email-Suravee.Suthikulpanit@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30x86/events, drivers/iommu/amd: Introduce amd_iommu_get_num_iommus()Suravee Suthikulpanit
Introduce amd_iommu_get_num_iommus(), which returns the value of amd_iommus_present. The function is used to replace direct access to the variable, which is now declared as static. This function will also be used by AMD IOMMU perf driver. Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1487926102-13073-6-git-send-email-Suravee.Suthikulpanit@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30x86/events/amd/iommu: Clean up perf_iommu_read()Suravee Suthikulpanit
Fix coding style and use GENMASK_ULL(). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1487926102-13073-4-git-send-email-Suravee.Suthikulpanit@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30x86/events/amd/iommu: Clean up bitwise operationsSuravee Suthikulpanit
Clean up register initialization and make use of BIT_ULL(x) where appropriate. This should not affect logic and functionality. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1487926102-13073-3-git-send-email-Suravee.Suthikulpanit@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30x86/events/amd/iommu: Declare pr_fmt() formatSuravee Suthikulpanit
Declare pr_fmt() format for perf/amd_iommu and remove unnecessary pr_debug() calls. Also check return value when _init_events_attrs() fails and issue an error message. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1487926102-13073-2-git-send-email-Suravee.Suthikulpanit@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30perf/x86/intel/pt: Allow the disabling of branch tracingAlexander Shishkin
Now that Intel PT supports more types of trace content than just branch tracing, it may be useful to allow the user to disable branch tracing when it is not needed. The special case is BDW, where not setting BranchEn is not supported. This is slightly trickier than necessary, because up to this moment the driver has been setting BranchEn automatically and the userspace assumes as much. Instead of reversing the semantics of BranchEn, we introduce a 'passthrough' bit, which will forego the default and allow the user to set BranchEn to their heart's content. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170206144140.14402-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>