summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu/perf_event_amd.c
AgeCommit message (Collapse)Author
2012-07-05perf/x86/amd: Unify AMD's generic and family 15h pmusRobert Richter
There is no need for keeping separate pmu structs. We can enable amd_{get,put}_event_constraints() functions also for family 15h event. The advantage is that there is only a single pmu struct for all AMD cpus. This patch introduces functions to setup the pmu to enabe core performance counters or counter constraints. Also, cpuid checks are used instead of family checks where possible. Thus, it enables the code independently of cpu families if the feature flag is set. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-4-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-23Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: - Leftover AMD PMU driver fix fix from the end of the v3.4 stabilization cycle. - Late tools/perf/ changes that missed the first round: * endianness fixes * event parsing improvements * libtraceevent fixes factored out from trace-cmd * perl scripting engine fixes related to libtraceevent, * testcase improvements * perf inject / pipe mode fixes * plus a kernel side fix * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Update event scheduling constraints for AMD family 15h models * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "sched, perf: Use a single callback into the scheduler" perf evlist: Show event attribute details perf tools: Bump default sample freq to 4 kHz perf buildid-list: Work better with pipe mode perf tools: Fix piped mode read code perf inject: Fix broken perf inject -b perf tools: rename HEADER_TRACE_INFO to HEADER_TRACING_DATA perf tools: Add union u64_swap type for swapping u64 data perf tools: Carry perf_event_attr bitfield throught different endians perf record: Fix documentation for branch stack sampling perf target: Add cpu flag to sample_type if target has cpu perf tools: Always try to build libtraceevent perf tools: Rename libparsevent to libtraceevent in Makefile perf script: Rename struct event to struct event_format in perl engine perf script: Explicitly handle known default print arg type perf tools: Add hardcoded name term for pmu events perf tools: Separate 'mem:' event scanner bits perf tools: Use allocated list for each parsed event perf tools: Add support for displaying event parser debug info perf test: Move parse event automated tests to separated object
2012-05-18perf/x86: Update event scheduling constraints for AMD family 15h modelsRobert Richter
This update is for newer family 15h cpu models from 0x02 to 0x1f. Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: stable@vger.kernel.org # v2.6.39+ Link: http://lkml.kernel.org/r/1337337642-1621-1-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09perf/x86-ibs: Precise event sampling with IBS for AMD CPUsRobert Richter
This patch adds support for precise event sampling with IBS. There are two counting modes to count either cycles or micro-ops. If the corresponding performance counter events (hw events) are setup with the precise flag set, the request is redirected to the ibs pmu: perf record -a -e cpu-cycles:p ... # use ibs op counting cycle count perf record -a -e r076:p ... # same as -e cpu-cycles:p perf record -a -e r0C1:p ... # use ibs op counting micro-ops Each ibs sample contains a linear address that points to the instruction that was causing the sample to trigger. With ibs we have skid 0. Thus, ibs supports precise levels 1 and 2. Samples are marked with the PERF_EFLAGS_EXACT flag set. In rare cases the rip is invalid when IBS was not able to record the rip correctly. Then the PERF_EFLAGS_EXACT flag is cleared and the rip is taken from pt_regs. V2: * don't drop samples in precise level 2 if rip is invalid, instead support the PERF_EFLAGS_EXACT flag Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120502103309.GP18810@erda.amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-26perf/x86: Fix cmpxchg() usage in amd_put_event_constraints()Robert Richter
Now the return value of cmpxchg() is used to match an event. The change removes the duplicate event comparison and traverses the list until an event was removed. This also fixes the following warning: arch/x86/kernel/cpu/perf_event_amd.c:170: warning: value computed is not used Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1333643084-26776-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-16perf: Adding sysfs group format attribute for pmu deviceJiri Olsa
Adding sysfs group 'format' attribute for pmu device that contains a syntax description on how to construct raw events. The event configuration is described in following struct pefr_event_attr attributes: config config1 config2 Each sysfs attribute within the format attribute group, describes mapping of name and bitfield definition within one of above attributes. eg: "/sys/...<dev>/format/event" contains "config:0-7" "/sys/...<dev>/format/umask" contains "config:8-15" "/sys/...<dev>/format/usr" contains "config:16" the attribute value syntax is: line: config ':' bits config: 'config' | 'config1' | 'config2" bits: bits ',' bit_term | bit_term bit_term: VALUE '-' VALUE | VALUE Adding format attribute definitions for x86 cpu pmus. Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/n/tip-vhdk5y2hyype9j63prymty36@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-05perf: Disable PERF_SAMPLE_BRANCH_* when not supportedStephane Eranian
PERF_SAMPLE_BRANCH_* is disabled for: - SW events (sw counters, tracepoints) - HW breakpoints - ALL but Intel x86 architecture - AMD64 processors Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1328826068-11713-10-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-02perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM disabledJoerg Roedel
It turned out that a performance counter on AMD does not count at all when the GO or HO bit is set in the control register and SVM is disabled in EFER. This patch works around this issue by masking out the HO bit in the performance counter control register when SVM is not enabled. The GO bit is not touched because it is only set when the user wants to count in guest-mode only. So when SVM is disabled the counter should not run at all and the not-counting is the intended behaviour. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Avi Kivity <avi@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Robert Richter <robert.richter@amd.com> Cc: stable@vger.kernel.org # v3.2 Link: http://lkml.kernel.org/r/1330523852-19566-1-git-send-email-joerg.roedel@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Fix event scheduler for constraints with overlapping countersRobert Richter
The current x86 event scheduler fails to resolve scheduling problems of certain combinations of events and constraints. This happens if the counter mask of such an event is not a subset of any other counter mask of a constraint with an equal or higher weight, e.g. constraints of the AMD family 15h pmu: counter mask weight amd_f15_PMC30 0x09 2 <--- overlapping counters amd_f15_PMC20 0x07 3 amd_f15_PMC53 0x38 3 The scheduler does not find then an existing solution. Here is an example: event code counter failure possible solution 0x02E PMC[3,0] 0 3 0x043 PMC[2:0] 1 0 0x045 PMC[2:0] 2 1 0x046 PMC[2:0] FAIL 2 The event scheduler may not select the correct counter in the first cycle because it needs to know which subsequent events will be scheduled. It may fail to schedule the events then. To solve this, we now save the scheduler state of events with overlapping counter counstraints. If we fail to schedule the events we rollback to those states and try to use another free counter. Constraints with overlapping counters are marked with a new introduced overlap flag. We set the overlap flag for such constraints to give the scheduler a hint which events to select for counter rescheduling. The EVENT_CONSTRAINT_OVERLAP() macro can be used for this. Care must be taken as the rescheduling algorithm is O(n!) which will increase scheduling cycles for an over-commited system dramatically. The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter masks must be kept at a minimum. Thus, the current stack is limited to 2 states to limit the number of loops the algorithm takes in the worst case. On systems with no overlapping-counter constraints, this implementation does not increase the loop count compared to the previous algorithm. V2: * Renamed redo -> overlap. * Reimplementation using perf scheduling helper functions. V3: * Added WARN_ON_ONCE() if out of save states. * Changed function interface of perf_sched_restore_state() to use bool as return value. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10perf, x86: Share IBS macros between perf and oprofileRobert Richter
Moving IBS macros from oprofile to <asm/perf_event.h> to make it available to perf. No additional changes. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1316597423-25723-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-06perf, amd: Use GO/HO bits in perf-ctrJoerg Roedel
The AMD perf-counters support counting in guest or host-mode only. Make use of that feature when user-space specified guest/host-mode only counting. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317816084-18026-3-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-27x86: Perf_event_amd.c needs <asm/apicdef.h>Randy Dunlap
Fix (rare) build error by adding <asm/apicdef.h> header file: arch/x86/kernel/cpu/perf_event_amd.c:350:2: error: 'BAD_APICID' undeclared (first use in this function) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Robert Richter <robert.richter@amd.com> Cc: Andre Przywara <andre.przywara@amd.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/4E820138.90301@xenotime.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-26x86, perf: Clean up perf_event cpu codeKevin Winchester
The CPU support for perf events on x86 was implemented via included C files with #ifdefs. Clean this up by creating a new header file and compiling the vendor-specific files as needed. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314747665-2090-1-git-send-email-kjwinchester@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-14perf, x86: Avoid kfree() in CPU_STARTINGPeter Zijlstra
On -rt kfree() can schedule, but CPU_STARTING is before the CPU is fully up and running. These are contradictory, so avoid it. Instead push the kfree() to CPU_ONLINE where we're free to schedule. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-kwd4j6ayld5thrscvaxgjquv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01perf, arch: Add generic NODE cache eventsPeter Zijlstra
Add a NODE level to the generic cache events which is used to measure local vs remote memory accesses. Like all other cache events, an ACCESS is HIT+MISS, if there is no way to distinguish between reads and writes do reads only etc.. The below needs filling out for !x86 (which I filled out with unsupported events). I'm fairly sure ARM can leave it like that since it doesn't strike me as an architecture that even has NUMA support. SH might have something since it does appear to have some NUMA bits. Sparc64, PowerPC and MIPS certainly want a good look there since they clearly are NUMA capable. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Miller <davem@davemloft.net> Cc: Anton Blanchard <anton@samba.org> Cc: David Daney <ddaney@caviumnetworks.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-29perf, x86: Add new stalled cycles events for Intel and AMD CPUsIngo Molnar
Extend the Intel and AMD event definitions with generic front-end and back-end stall events. ( These are only approximations - suggestions are welcome for better events. ) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-7y40wib8n001io7hjpn1dsrm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-19perf, x86: Fix AMD family 15h FPU event constraintsRobert Richter
Depending on the unit mask settings some FPU events may be scheduled only on cpu counter #3. This patch fixes this. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@googlemail.com> Link: http://lkml.kernel.org/r/1302913676-14352-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-19perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpusAndre Przywara
With AMD cpu family 15h a unit mask was introduced for the Data Cache Miss event (0x041/L1-dcache-load-misses). We need to enable bit 0 (first data cache miss or streaming store to a 64 B cache line) of this mask to proper count data cache misses. Now we set this bit for all families and models. In case a PMU does not implement a unit mask for event 0x041 the bit is ignored. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1302913676-14352-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-16perf, x86: Add support for AMD family 15h core countersRobert Richter
This patch adds support for AMD family 15h core counters. There are major changes compared to family 10h. First, there is a new perfctr msr range for up to 6 counters. Northbridge counters are separate now. This patch only adds support for core counters. Second, certain events may only be scheduled on certain counters. For this we need to extend the event scheduling and constraints. We use cpu feature flags to calculate family 15h msr address offsets. This way we later can implement a faster ALTERNATIVE() version for this. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110215135210.GB5874@erda.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08perf, amd: Remove the nb lockPeter Zijlstra
Since all the hotplug stuff is serialized by the hotplug mutex, do away with the amd_nb_lock. Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10perf, amd: Use kmalloc_node(,__GFP_ZERO) for northbridge structure allocationPeter Zijlstra
Jasper suggested we use the zeroing capability of the allocators instead of calling memset ourselves. Add node affinity while we're at it. Reported-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18perf_events: Fix bogus AMD64 generic TLB eventsStephane Eranian
PERF_COUNT_HW_CACHE_DTLB:READ:MISS had a bogus umask value of 0 which counts nothing. Needed to be 0x7 (to count all possibilities). PERF_COUNT_HW_CACHE_ITLB:READ:MISS had a bogus umask value of 0 which counts nothing. Needed to be 0x3 (to count all possibilities). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: <stable@kernel.org> # as far back as it applies LKML-Reference: <4cb85478.41e9d80a.44e2.3f00@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-03perf, x86: Fix incorrect branches event on AMD CPUsVince Weaver
While doing some performance counter validation tests on some assembly language programs I noticed that the "branches:u" count was very wrong on AMD machines. It looks like the wrong event was selected. Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> LKML-Reference: <alpine.DEB.2.00.1007011526010.23160@cl320.eecs.utk.edu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02perf, x86: Fix __initconst vs constPeter Zijlstra
All variables that have __initconst should also be const. Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02perf, x86: Fix up the ANY flag stuffPeter Zijlstra
Stephane noticed that the ANY flag was in generic arch code, and Cyrill reported that it broke the P4 code. Solve this by merging x86_pmu::raw_event into x86_pmu::hw_config and provide intel_pmu and amd_pmu specific versions of this callback. The intel_pmu one deals with the ANY flag, the amd_pmu adds the few extra event bits AMD64 has. Reported-by: Stephane Eranian <eranian@google.com> Reported-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Robert Richter <robert.richter@amd.com> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1269968113.5258.442.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02perf, x86: implement ARCH_PERFMON_EVENTSEL bit masksRobert Richter
ARCH_PERFMON_EVENTSEL bit masks are often used in the kernel. This patch adds macros for the bit masks and removes local defines. The function intel_pmu_raw_event() becomes x86_pmu_raw_event() which is generic for x86 models and same also for p6. Duplicate code is removed. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100330092821.GH11907@erda.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02perf, x86: Undo some some *_counter* -> *_event* renamesRobert Richter
The big rename: cdd6c48 perf: Do the big rename: Performance Counters -> Performance Events accidentally renamed some members of stucts that were named after registers in the spec. To avoid confusion this patch reverts some changes. The related specs are MSR descriptions in AMD's BKDGs and the ARCHITECTURAL PERFORMANCE MONITORING section in the Intel 64 and IA-32 Architectures Software Developer's Manuals. This patch does: $ sed -i -e 's:num_events:num_counters:g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event_amd.c \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_p6.c \ arch/x86/kernel/cpu/perf_event_p4.c \ arch/x86/oprofile/op_model_ppro.c $ sed -i -e 's:event_bits:cntval_bits:g' -e 's:event_mask:cntval_mask:g' \ arch/x86/kernel/cpu/perf_event_amd.c \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_p6.c \ arch/x86/kernel/cpu/perf_event_p4.c Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1269880612-25800-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02Merge branch 'perf/urgent' into perf/coreIngo Molnar
Conflicts: arch/x86/kernel/cpu/perf_event.c Merge reason: Resolve the conflict, pick up fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02perf, x86: Fix AMD hotplug & constraint initializationPeter Zijlstra
Commit 3f6da39 ("perf: Rework and fix the arch CPU-hotplug hooks") moved the amd northbridge allocation from CPUS_ONLINE to CPUS_PREPARE_UP however amd_nb_id() doesn't work yet on prepare so it would simply bail basically reverting to a state where we do not properly track node wide constraints - causing weird perf results. Fix up the AMD NorthBridge initialization code by allocating from CPU_UP_PREPARE and installing it from CPU_STARTING once we have the proper nb_id. It also properly deals with the allocation failing. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [ robustify using amd_has_nb() ] Signed-off-by: Stephane Eranian <eranian@google.com> LKML-Reference: <1269353485.5109.48.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-22x86 / perf: Fix suspend to RAM on HP nx6325Rafael J. Wysocki
Commit 3f6da3905398826d85731247e7fbcf53400c18bd (perf: Rework and fix the arch CPU-hotplug hooks) broke suspend to RAM on my HP nx6325 (and most likely on other AMD-based boxes too) by allowing amd_pmu_cpu_offline() to be executed for CPUs that are going offline as part of the suspend process. The problem is that cpuhw->amd_nb may be NULL already, so the function should make sure it's not NULL before accessing the object pointed to by it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-11perf, x86: Implement initial P4 PMU driverCyrill Gorcunov
The netburst PMU is way different from the "architectural perfomance monitoring" specification that current CPUs use. P4 uses a tuple of ESCR+CCCR+COUNTER MSR registers to handle perfomance monitoring events. A few implementational details: 1) We need a separate x86_pmu::hw_config helper in struct x86_pmu since register bit-fields are quite different from P6, Core and later cpu series. 2) For the same reason is a x86_pmu::schedule_events helper introduced. 3) hw_perf_event::config consists of packed ESCR+CCCR values. It's allowed since in reality both registers only use a half of their size. Of course before making a real write into a particular MSR we need to unpack the value and extend it to a proper size. 4) The tuple of packed ESCR+CCCR in hw_perf_event::config doesn't describe the memory address of ESCR MSR register so that we need to keep a mapping between these tuples used and available ESCR (various P4 events may use same ESCRs but not simultaneously), for this sake every active event has a per-cpu map of hw_perf_event::idx <--> ESCR addresses. 5) Since hw_perf_event::idx is an offset to counter/control register we need to lift X86_PMC_MAX_GENERIC up, otherwise kernel strips it down to 8 registers and event armed may never be turned off (ie the bit in active_mask is set but the loop never reaches this index to check), thanks to Peter Zijlstra Restrictions: - No cascaded counters support (do we ever need them?) - No dependent events support (so PERF_COUNT_HW_INSTRUCTIONS doesn't work for now) - There are events with same counters which can't work simultaneously (need to use intersected ones due to broken counter 1) - No PERF_COUNT_HW_CACHE_ events yet Todo: - Implement dependent events - Need proper hashing for event opcodes (no linear search, good for debugging stage but not in real loads) - Some events counted during a clock cycle -- need to set threshold for them and count every clock cycle just to get summary statistics (ie to behave the same way as other PMUs do) - Need to swicth to use event_constraints - To support RAW events we need to encode a global list of P4 events into p4_templates - Cache events need to be added Event support status matrix: Event status ----------------------------- cycles works cache-references works cache-misses works branch-misses works bus-cycles partially (does not work on 64bit cpu with HT enabled) instruction doesnt work (needs dependent event [mop tagging]) branches doesnt work Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100311165439.GB5129@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-10perf, x86: Use unlocked bitopsPeter Zijlstra
There is no concurrency on these variables, so don't use LOCK'ed ops. As to the intel_pmu_handle_irq() status bit clean, nobody uses that so remove it all together. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com Cc: Arnaldo Carvalho de Melo <acme@infradead.org> LKML-Reference: <20100304140100.240023029@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-10perf: Rework and fix the arch CPU-hotplug hooksPeter Zijlstra
Remove the hw_perf_event_*() hotplug hooks in favour of per PMU hotplug notifiers. This has the advantage of reducing the static weak interface as well as exposing all hotplug actions to the PMU. Use this to fix x86 hotplug usage where we did things in ONLINE which should have been done in UP_PREPARE or STARTING. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mundt <lethal@linux-sh.org> Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com Cc: Arnaldo Carvalho de Melo <acme@infradead.org> LKML-Reference: <20100305154128.736225361@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-26perf_event, amd: Fix spinlock initializationPeter Zijlstra
Avoid kernels from exploding on AMD machines when they have any lock debugging bits enabled. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-26perf_events, x86: Split PMU definitions into separate filesPeter Zijlstra
Split amd,p6,intel into separate files so that we can easily deal with CONFIG_CPU_SUP_* things, needed to make things build now that perf_event.c relies on symbols from amd.c Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>