summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-03-09perf report: Enable TUI in branch view modeStephane Eranian
This patch updates perf report to support TUI mode when the perf.data file contains samples with branch stacks. For each row in the report, it is possible to annotate either the source or target of each branch. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: asharma@fb.com Cc: ravitillo@lbl.gov Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1331246868-19905-5-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09perf report: Auto-detect branch stack sampling modeStephane Eranian
This patch enhances perf report to auto-detect when the perf.data file contains samples with branch stacks. That way it is not necessary to use the -b option. To force branch view mode to off, simply use --no-branch-stack. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: asharma@fb.com Cc: ravitillo@lbl.gov Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1331246868-19905-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09perf record: Add HEADER_BRANCH_STACK tagStephane Eranian
This patch adds a new feature bit, namely, HEADER_BRANCH_STACK. When present, it indicates that sample records may contain branch stack. This could be useful to a viewer to switch to branch mode without having to parse all the samples or without a specific cmdline option. This will be used in a subsequent patch to enhance perf report with branch stacks. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: asharma@fb.com Cc: ravitillo@lbl.gov Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1331246868-19905-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09perf record: Provide default branch stack sampling mode optionStephane Eranian
This patch chanegs the logic of the -b, --branch-stack options of perf record. Based on users' request, the patch provides a default filter mode with the -b (or --branch-any) option. With the option, any type of taken branches is sampled. With -j (or --branch-filter), the user can specify any valid combination of branch types and privilege levels if supported by the underlying hardware. The -b (--branch any) is a shortcut for: --branch-filter any. $ perf record -b foo or: $ perf record --branch-filter any foo For more specific filtering: $ perf record --branch-filter ind_call,u foo Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: asharma@fb.com Cc: ravitillo@lbl.gov Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1331246868-19905-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09perf tools: Make perf able to read files from older ABIsStephane Eranian
This patches provides a way to handle legacy perf.data files. Legacy files are those using the older PERFFILE signature. For those, it is still necessary to detect endianness but without comparing their header->attr_size with the tool's own version as it may be different. Instead, we use a reference table for all known sizes from the legacy era. We try all the combinations for sizes and endianness. If we find a match, we proceed, otherwise we return: "incompatible file format". This is also done for the pipe-mode file format. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: robert.richter@amd.com Cc: ming.m.lin@intel.com Cc: andi@firstfloor.org Cc: asharma@fb.com Cc: ravitillo@lbl.gov Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1328826068-11713-19-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09perf tools: Fix ABI compatibility bug in print_event_desc()Stephane Eranian
This patches cleans up local variable types for msz and ret. They need to be size_t and ssize_t respectively. It also fixes a bug whereby perf would not read attr struct with a different size than what it knows about. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: robert.richter@amd.com Cc: ming.m.lin@intel.com Cc: andi@firstfloor.org Cc: asharma@fb.com Cc: ravitillo@lbl.gov Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1328826068-11713-18-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09perf tools: Enable reading of perf.data files from different ABI revStephane Eranian
This patch allows perf to process perf.data files generated using an ABI that has a different perf_event_attr struct size, i.e., a different ABI version. The perf_event_attr can be extended, yet perf needs to cope with older perf.data files. Similarly, perf must be able to cope with a perf.data file which is using a newer version of the ABI than what it knows about. This patch adds read_attr(), a routine that reads a perf_event_attr struct from a file incrementally based on its advertised size. If the on-file struct is smaller than what perf knows, then the extra fields are zeroed. If the on-file struct is bigger, then perf only uses what it knows about, the rest is skipped. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: robert.richter@amd.com Cc: ming.m.lin@intel.com Cc: andi@firstfloor.org Cc: asharma@fb.com Cc: ravitillo@lbl.gov Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1328826068-11713-17-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09perf: Add ABI reference sizesStephane Eranian
This patch adds reference sizes for revision 1 and 2 of the perf_event ABI, i.e., the size of the perf_event_attr struct. With Rev1: config2 was added = +8 bytes With Rev2: branch_sample_type was added = +8 bytes Adds the definition for Rev1, Rev2. This is useful for tools trying to decode the revision numbers based on the size of the struct. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: robert.richter@amd.com Cc: ming.m.lin@intel.com Cc: andi@firstfloor.org Cc: asharma@fb.com Cc: ravitillo@lbl.gov Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1328826068-11713-16-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09perf report: Add support for taken branch samplingRoberto Agostino Vitillo
This patch adds support for taken branch sampling, i.e, the PERF_SAMPLE_BRANCH_STACK feature to perf report. In other words, to display histograms based on taken branches rather than executed instructions addresses. The new option is called -b and it takes no argument. To generate meaningful output, the perf.data must have been obtained using perf record -b xxx ... where xxx is a branch filter option. The output shows symbols, modules, sorted by 'who branches where' the most often. The percentages reported in the first column refer to the total number of branches captured and not the usual number of samples. Here is a quick example. Here branchy is simple test program which looks as follows: void f2(void) {} void f3(void) {} void f1(unsigned long n) { if (n & 1UL) f2(); else f3(); } int main(void) { unsigned long i; for (i=0; i < N; i++) f1(i); return 0; } Here is the output captured on Nehalem, if we are only interested in user level function calls. $ perf record -b any_call,u -e cycles:u branchy $ perf report -b --sort=symbol 52.34% [.] main [.] f1 24.04% [.] f1 [.] f3 23.60% [.] f1 [.] f2 0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow 0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn 0.01% [k] _IO_vfprintf_internal [k] strchrnul 0.01% [k] __printf [k] _IO_vfprintf_internal 0.01% [k] main [k] __printf About half (52%) of the call branches captured are from main() -> f1(). The second half (24%+23%) is split in two equal shares between f1() -> f2(), f1() ->f3(). The output is as expected given the code. It should be noted, that using -b in perf record does not eliminate information in the perf.data file. Consequently, a typical profile can also be obtained by perf report by simply not using its -b option. It is possible to sort on branch related columns: - dso_from, symbol_from - dso_to, symbol_to - mispredict Signed-off-by: Roberto Agostino Vitillo <ravitillo@lbl.gov> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: robert.richter@amd.com Cc: ming.m.lin@intel.com Cc: andi@firstfloor.org Cc: asharma@fb.com Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1328826068-11713-14-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09perf record: Add support for sampling taken branchRoberto Agostino Vitillo
This patch adds a new option to enable taken branch stack sampling, i.e., leverage the PERF_SAMPLE_BRANCH_STACK feature of perf_events. There is a new option to active this mode: -b. It is possible to pass a set of filters to select the type of branches to sample. The following filters are available: - any : any type of branches - any_call : any function call or system call - any_ret : any function return or system call return - any_ind : any indirect branch - u: only when the branch target is at the user level - k: only when the branch target is in the kernel - hv: only when the branch target is in the hypervisor Filters can be combined by passing a comma separated list to the option: $ perf record -b any_call,u -e cycles:u branchy Signed-off-by: Roberto Agostino Vitillo <ravitillo@lbl.gov> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: robert.richter@amd.com Cc: ming.m.lin@intel.com Cc: andi@firstfloor.org Cc: asharma@fb.com Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1328826068-11713-13-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09perf tools: Add code to support PERF_SAMPLE_BRANCH_STACKRoberto Agostino Vitillo
This patch adds: - ability to parse samples with PERF_SAMPLE_BRANCH_STACK - sort on branches (dso_from, symbol_from, dso_to, symbol_to, mispredict) - build histograms on branches Signed-off-by: Roberto Agostino Vitillo <ravitillo@lbl.gov> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: robert.richter@amd.com Cc: ming.m.lin@intel.com Cc: andi@firstfloor.org Cc: asharma@fb.com Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1328826068-11713-12-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09sh-sci / PM: Avoid deadlocking runtime PMRafael J. Wysocki
The runtime PM of sh-sci devices is enabled when sci_probe() returns, so the pm_runtime_put_sync() executed by driver_probe_device() attempts to suspend the device. Then, in some situations, a diagnostic message is printed to the console by one of the runtime suspend routines handling the sh-sci device, which causes synchronous runtime resume to be started from the device's own runtime suspend callback. This causes rpm_resume() to be run eventually, which sees the RPM_SUSPENDING status set by rpm_suspend() and waits for it to change. However, the device's runtime PM status cannot change at that point, because the routine that has set it waits for the rpm_suspend() to return. A deadlock occurs as a result. To avoid that make sci_init_single() increment the device's runtime PM usage counter, so that it cannot be suspended by driver_probe_device(). That counter has to be decremented eventually, so make sci_startup() do that before starting to actually use the device and make sci_shutdown() increment it again before returning to balance the incrementation carried out by sci_startup(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-03-09powerpc: Rework lazy-interrupt handlingBenjamin Herrenschmidt
The current implementation of lazy interrupts handling has some issues that this tries to address. We don't do the various workarounds we need to do when re-enabling interrupts in some cases such as when returning from an interrupt and thus we may still lose or get delayed decrementer or doorbell interrupts. The current scheme also makes it much harder to handle the external "edge" interrupts provided by some BookE processors when using the EPR facility (External Proxy) and the Freescale Hypervisor. Additionally, we tend to keep interrupts hard disabled in a number of cases, such as decrementer interrupts, external interrupts, or when a masked decrementer interrupt is pending. This is sub-optimal. This is an attempt at fixing it all in one go by reworking the way we do the lazy interrupt disabling from the ground up. The base idea is to replace the "hard_enabled" field with a "irq_happened" field in which we store a bit mask of what interrupt occurred while soft-disabled. When re-enabling, either via arch_local_irq_restore() or when returning from an interrupt, we can now decide what to do by testing bits in that field. We then implement replaying of the missed interrupts either by re-using the existing exception frame (in exception exit case) or via the creation of a new one from an assembly trampoline (in the arch_local_irq_enable case). This removes the need to play with the decrementer to try to create fake interrupts, among others. In addition, this adds a few refinements: - We no longer hard disable decrementer interrupts that occur while soft-disabled. We now simply bump the decrementer back to max (on BookS) or leave it stopped (on BookE) and continue with hard interrupts enabled, which means that we'll potentially get better sample quality from performance monitor interrupts. - Timer, decrementer and doorbell interrupts now hard-enable shortly after removing the source of the interrupt, which means they no longer run entirely hard disabled. Again, this will improve perf sample quality. - On Book3E 64-bit, we now make the performance monitor interrupt act as an NMI like Book3S (the necessary C code for that to work appear to already be present in the FSL perf code, notably calling nmi_enter instead of irq_enter). (This also fixes a bug where BookE perfmon interrupts could clobber r14 ... oops) - We could make "masked" decrementer interrupts act as NMIs when doing timer-based perf sampling to improve the sample quality. Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: - Add hard-enable to decrementer, timer and doorbells - Fix CR clobber in masked irq handling on BookE - Make embedded perf interrupt act as an NMI - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want to retrigger an interrupt without preventing hard-enable v3: - Fix or vs. ori bug on Book3E - Fix enabling of interrupts for some exceptions on Book3E v4: - Fix resend of doorbells on return from interrupt on Book3E v5: - Rebased on top of my latest series, which involves some significant rework of some aspects of the patch. v6: - 32-bit compile fix - more compile fixes with various .config combos - factor out the asm code to soft-disable interrupts - remove the C wrapper around preempt_schedule_irq v7: - Fix a bug with hard irq state tracking on native power7
2012-03-08vfs: use 'unsigned long' accesses for dcache name comparison and hashingLinus Torvalds
Ok, this is hacky, and only works on little-endian machines with goo unaligned handling. And even then only with CONFIG_DEBUG_PAGEALLOC disabled, since it can access up to 7 bytes after the pathname. But it runs like a bat out of hell. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-08Merge tag 'fixes-urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull last minute fixes from Olof Johansson: "One samsung build fix due to a mis-applied patch, and a small set of OMAP fixes. This should be the last from arm-soc for 3.3, hopefully." * tag 'fixes-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: S3C2440: Fixed build error for s3c244x ARM: OMAP2+: Fix module build errors with CONFIG_OMAP4_ERRATA_I688 ARM: OMAP: id: Add missing break statement in omap3xxx_check_revision ARM: OMAP2+: Remove apply_uV constraints for fixed regulator ARM: OMAP: irqs: Fix NR_IRQS value to handle PRCM interrupts
2012-03-08Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "Another small, clear fix in a specific driver." * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: tps65910: Configure correct value for VDDCTRL vout reg
2012-03-08Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds
Pull minor devicetree bug fixes and documentation updates from Grant Likely: "Fixes up a duplicate #include, adds an empty implementation of of_find_compatible_node() and make git ignore .dtb files. And fix up bus name on OF described PHYs. Nothing exciting here." * tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6: doc: dt: Fix broken reference in gpio-leds documentation of/mdio: fix fixed link bus name of/fdt.c: asm/setup.h included twice of: add picochip vendor prefix dt: add empty of_find_compatible_node function ARM: devicetree: Add .dtb files to arch/arm/boot/.gitignore
2012-03-08Merge tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds
Pull SPI section mismatch bug fix for v3.3-rc3 from Grant Likely: "Minor fix for pl022_dma_probe() function which was put in the wrong section." * tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6: Fix section mismatch in spi-pl022.c
2012-03-08Merge tag 'hwmon-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull four hwmon patches from Guenter Roeck * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (jc42) Add support for AT30TS00, TS3000GB2, TSE2002GB2, and MCP9804 hwmon: (zl6100) Maintain delay parameter in driver instance data hwmon: (pmbus_core) Fix maximum number of POUT alarm attributes hwmon: (jc42) Add support for ST Microelectronics STTS2002 and STTS3000
2012-03-08Merge tag 'dm-3.3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull device-mapper fixes for 3.3 from Alasdair Kergon Eight small device-mapper bug fixes. * tag 'dm-3.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: dm raid: fix flush support dm raid: set MD_CHANGE_DEVS when rebuilding dm thin metadata: decrement counter after removing mapped block dm thin metadata: unlock superblock in init_pmd error path dm thin metadata: remove incorrect close_device on creation error paths dm flakey: fix crash on read when corrupt_bio_byte not set dm io: fix discard support dm ioctl: do not leak argv if target message only contains whitespace
2012-03-09powerpc/eeh: pseries platform config space access in EEHGavin Shan
With the original EEH implementation, the access to config space of the corresponding PCI device is done by RTAS sensitive function. That depends on pci_dn heavily. That would limit EEH extension to other platforms like powernv because other platforms might have different ways to access PCI config space. The patch splits those functions used to access PCI config space and implement them in platform related EEH component. It would be helpful to support EEH on multiple platforms simutaneously in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: Introduce struct eeh_stats for EEHGavin Shan
With the original EEH implementation, the EEH global statistics are maintained by individual global variables. That makes the code a little hard to maintain. The patch introduces extra struct eeh_stats for the EEH global statistics so that it can be maintained in collective fashion. It's the rework on the corresponding v5 patch. According to the comments from David Laight, the EEH global statistics have been changed for a litte bit so that they have fixed-type of "u64". Also, the format used to print them has been changed to "%llu" based on David's suggestion. Also, the output format of EEH global statistics should be kept as intacted according to Michael's suggestion that there might be tools parsing them. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: Replace pci_dn with eeh_dev for EEH on pSeriesGavin Shan
The pci_dn has been replaced with eeh_dev. In order to comply with the rule, the EEH platform implementation on pSeries should also be adjusted for a little bit so that it will depend on eeh_dev instead of pci_dn. The patch replaces pci_dn with eeh_dev. The corresponding information will be retrieved from eeh_dev instead of pci_dn. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: Replace pci_dn with eeh_dev for EEH aux componentsGavin Shan
The original EEH implementation is heavily depending on struct pci_dn. We have to put EEH related information to pci_dn. Actually, we could split struct pci_dn so that the EEH sensitive information to form an individual struct, then EEH looks more independent. The patch replaces pci_dn with eeh_dev for EEH aux components like event and driver. Also, the eeh_event struct has been adjusted for a little bit since eeh_dev has linked the associated FDT (Flat Device Tree) node and PCI device. It's not necessary for eeh_event struct to trace FDT node and PCI device. We can just simply to trace eeh_dev in eeh_event. The patch also renames function pcid_name() to eeh_pcid_name(), which should be missed in the previous patch where the EEH aux components have been cleaned up. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: Replace pci_dn with eeh_dev for EEH coreGavin Shan
The original EEH implementation is heavily depending on struct pci_dn. We have to put EEH related information to pci_dn. Actually, we could split struct pci_dn so that the EEH sensitive information to form an individual struct, then EEH looks more independent. The patch replaces pci_dn with eeh_dev for EEH core. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: Replace pci_dn with eeh_dev for EEH address cacheGavin Shan
With original EEH implementation, struct pci_dn is used while building PCI I/O address cache, which helps on searching the corresponding PCI device according to the given physical I/O address. Besides, pci_dn is associated with the corresponding PCI device while building its I/O cache. The patch replaces struct pci_dn with struct eeh_dev so that EEH address cache won't depend on struct pci_dn. That will help EEH to become an independent module in future. Besides, the binding of eeh_dev and PCI device is done while building PCI device I/O cache. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: Replace pci_dn with eeh_dev for EEH sysfsGavin Shan
With original EEH implementation, all EEH related statistics have been put into struct pci_dn. We've introduced struct eeh_dev to replace struct pci_dn in EEH core components, including EEH sysfs component. The patch shows EEH statistics from struct eeh_dev instead of struct pci_dn in EEH sysfs component. Besides, it also fixed the EEH device retrieval from PCI device, which was introduced by the previous patch in the series of patch. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: Introduce EEH deviceGavin Shan
Original EEH implementation depends on struct pci_dn heavily. However, EEH shouldn't depend on that actually because EEH needn't share much information with other PCI components. That's to say, EEH should have worked independently. The patch introduces struct eeh_dev so that EEH core components needn't be working based on struct pci_dn in future. Also, struct pci_dn, struct eeh_dev instances are created in dynamic fasion and the binding with EEH device, OF node, PCI device is implemented as well. The EEH devices are created after PHBs are detected and initialized, but PCI emunation hasn't started yet. Apart from that, PHB might be created dynamically through DLPAR component and the EEH devices should be creatd as well. Another case might be OF node is created dynamically by DR (Dynamic Reconfiguration), which has been defined by PAPR. For those OF nodes created by DR, EEH devices should be also created accordingly. The binding between EEH device and OF node is done while the EEH device is initially created. The binding between EEH device and PCI device should be done after PCI emunation is done. Besides, PCI hotplug also needs the binding so that the EEH devices could be traced from the newly coming PCI buses or PCI devices. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: Cleanup function names in EEH aux componentsGavin Shan
The patch does some cleanup on the function names of EEH aux components. Currently, only couple of function names from eeh_cache have been adjusted so that: * The function name has prefix "eeh_addr_cache". * Move around pci_addr_cache_build() in the header file to reflect function call sequence. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/pseries: Cleanup comments in EEH aux componentsGavin Shan
There're several EEH aux components and the patch does some cleanup for them so that they look more clean. * Duplicated comments have been removed from the header file. * Comments have been reorganized so that it looks more clean. * The leading comments of functions are adjusted for a little bit so that the result of "make pdfdocs" would be more unified. * Function calls "xxx ()" has been replaced by "xxx()". Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: pseries platform EEH configure bridgeGavin Shan
In order to enable particular PCI device, which has been included in the parent PE. The involved PCI bridges should be enabled explicitly if there has. On pSeries platform, there're dedicated RTAS calls to fulfil the purpose. The patch implements the function of configuring PCI bridges through the dedicated RTAS calls. Besides, the function has been abstracted by struct eeh_ops::configure_bridge so that the EEH core components could support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: pseries platform EEH error log retrievalGavin Shan
On RTAS compliant pSeries platform, one dedicated RTAS call has been introduced to retrieve EEH temporary or permanent error log. The patch implements the function of retriving EEH error log through RTAS call. Besides, it has been abstracted by struct eeh_ops::get_log so that EEH core components could support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: pseries platform EEH reset PEGavin Shan
On RTAS compliant pSeries platform, there is a dedicated RTAS call (ibm,set-slot-reset) to reset the specified PE. Furthermore, two types of resets are supported: hot and fundamental. the type of reset is to be used actually depends on the included PCI device's requirements. The patch implements resetting PE on pSeries platform through RTAS call. Besides, it has been abstracted through struct eeh_ops::reset so that EEH core components could support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: pseries platform EEH wait PE stateGavin Shan
On pSeries platform, the PE state might be temporarily unavailable. In that case, the firmware will return the corresponding wait time. That means the kernel has to wait for appropriate time in order to get the PE state. The patch does the implementation for that. Besides, the function has been abstracted through struct eeh_ops::wait_state so that EEH core components could support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: pseries platform PE state retrievalGavin Shan
On pSeries platform, there're 2 dedicated RTAS calls introduced to retrieve the corresponding PE's state: ibm,read-slot-reset-state and ibm,read-slot-reset-state2. The patch implements the retrieval of PE's state according to the given PE address. Besides, the implementation has been abstracted by struct eeh_ops::get_state so that EEH core components could support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: pseries platform EEH PE address retrievalGavin Shan
There're 2 types of addresses used for EEH operations. The first one would be BDF (Bus/Device/Function) address which is retrieved from the reg property of the corresponding FDT node. Another one is PE address that should be enquired from firmware through RTAS call on pSeries platform. When issuing EEH operation, the PE address has precedence over BDF address. The patch implements retrieving PE address according to the given BDF address on pSeries platform. Also, the struct eeh_early_enable_info has been removed since the information can be figured out from dn->pdn->phb->buid directly and that simplifies the code. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: pseries platform EEH operationsGavin Shan
There're 4 EEH operations that are covered by the dedicated RTAS call <ibm,set-eeh-option>: enable or disable EEH, enable MMIO and enable DMA. At early stage of system boot, the EEH would be tried to enable on PCI device related device node. MMIO and DMA for particular PE should be enabled when doing recovery on EEH errors so that the PE could function properly again. The patch implements it and abstract that through struct eeh_ops::set_eeh. It would be help for EEH to support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: pseries platform EEH initializationGavin Shan
The platform specific EEH operations have been abstracted by struct eeh_ops. The individual platroms, including pSeries, needs doing necessary initialization before the platform dependent EEH operations work properly. The patch is addressing that and do necessary platform initialization for pSeries platform. More specificly, it will figure out the tokens of EEH related RTAS calls. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: Platform dependent EEH operationsGavin Shan
EEH has been implemented on RTAS-compliant pSeries platform. That's to say, the EEH operations will be implemented through RTAS calls eventually. The situation limited feasible extension on EEH. In order to support EEH on multiple platforms like pseries and powernv simutaneously. We have to split the platform dependent EEH options up out of current implementation. The patch addresses supporting EEH on multiple platforms. The pseries platform dependent EEH operations will be abstracted by struct eeh_ops. EEH core components will be built based on the registered EEH operations. With the mechanism, what the individual platform needs to do is implement platform dependent EEH operations. For now, the pseries platform is covered under the mechanism. That means we have to think about other platforms to support EEH, like powernv. Besides, we only have framework for the mechanism and we have to implement it for pseries platform later. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: Cleanup function names in the EEH coreGavin Shan
The EEH has been implemented on pSeries platform. The original code looks a little bit nasty. The patch does cleanup on the current EEH implementation so that it looks more clean. * Try adding prefix "eeh" for functions. * Some function names have been adjusted so that they looks shorter and meaningful. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/eeh: Cleanup comments in the EEH coreGavin Shan
The EEH has been implemented on pSeries platform. The original code looks a little bit nasty. The patch does cleanup on the current EEH implementation so that it looks more clean. * Duplicated comments have been removed from the corresponding header files. * Comments have been reorganized so that it looks more clean. * The leading comments of functions are adjusted for a little bit so that the result of "make pdfdocs" would be more unified. * Function definitions and calls have unified format as "xxx()". That means the format "xxx ()" has been replaced by "xxx()". * There're multiple functions implemented for resetting PE. The position of those functions have been move around so that they are adjacent to each other to reflect their relationship. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc: Replace mfmsr instructions with load from PACA kernel_msr fieldBenjamin Herrenschmidt
On 64-bit, the mfmsr instruction can be quite slow, slower than loading a field from the cache-hot PACA, which happens to already contain the value we want in most cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc: Fix 64-bit BookE FP unavailable exceptionsBenjamin Herrenschmidt
We were using CR0.EQ after EXCEPTION_COMMON, hoping it still contained whether we came from userspace or kernel space. However, under some circumstances, EXCEPTION_COMMON will call C code and clobber non-volatile registers, so we really need to re-load the previous MSR from the stackframe and re-test. While there, invert the condition to make the fast path more obvious and remove the BUG_OPCODE which was a debugging leftover and call .ret_from_except as we should. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc: Fix register clobbering when accumulating stolen timeBenjamin Herrenschmidt
When running under a hypervisor that supports stolen time accounting, we may call C code from the macro EXCEPTION_PROLOG_COMMON in the exception entry path, which clobbers CR0. However, the FPU and vector traps rely on CR0 indicating whether we are coming from userspace or kernel to decide what to do. So we need to restore that value after the C call Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc/xmon: Add display of soft & hard irq statesBenjamin Herrenschmidt
Also use local_paca instead of get_paca() to avoid getting into the smp_processor_id() debugging code from the debugger Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc: Add support for page fault retry and fatal signalsBenjamin Herrenschmidt
Other architectures such as x86 and ARM have been growing new support for features like retrying page faults after dropping the mm semaphore to break contention, or being able to return from a stuck page fault when a SIGKILL is pending. This refactors our implementation of do_page_fault() to move the error handling out of line in a way similar to x86 and adds support for those two features. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc: Disable interrupts in 64-bit kernel FP and vector faultsBenjamin Herrenschmidt
If we get a floating point, altivec or vsx unavaible interrupt in kernel, we trigger a kernel error. There is no point preserving the interrupt state, in fact, that can even make debugging harder as the processor state might change (we may even preempt) between taking the exception and landing in a debugger. So just make those 3 disable interrupts unconditionally. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: On BookE only disable when hitting the kernel unavailable path, otherwise it will fail to restore softe as fast_exception_return doesn't do it.
2012-03-09powerpc: Call do_page_fault() with interrupts offBenjamin Herrenschmidt
We currently turn interrupts back to their previous state before calling do_page_fault(). This can be annoying when debugging as a bad fault will potentially have lost some processor state before getting into the debugger. We also end up calling some generic code with interrupts enabled such as notify_page_fault() with interrupts enabled, which could be unexpected. This changes our code to behave more like other architectures, and make the assembly entry code call into do_page_faults() with interrupts disabled. They are conditionally re-enabled from within do_page_fault() in the same spot x86 does it. While there, add the might_sleep() test in the case of a successful trylock of the mmap semaphore, again like x86. Also fix a bug in the existing assembly where r12 (_MSR) could get clobbered by C calls (the DTL accounting in the exception common macro and DISABLE_INTS) in some cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2. Add the r12 clobber fix
2012-03-09powerpc: Improve behaviour of irq tracing on 64-bit exception entryBenjamin Herrenschmidt
Some exceptions would unconditionally disable interrupts on entry, which is fine, but calling lockdep every time not only adds more overhead than strictly needed, but also means we get quite a few "redudant" disable logged, which makes it hard to spot the really bad ones. So instead, split the macro used by the exception code into a normal one and a separate one used when CONFIG_TRACE_IRQFLAGS is enabled, and make the later skip th tracing if interrupts were already disabled. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc: Improve 64-bit syscall entry/exitBenjamin Herrenschmidt
We unconditionally hard enable interrupts. This is unnecessary as syscalls are expected to always be called with interrupts enabled. While at it, we add a WARN_ON if that is not the case and CONFIG_TRACE_IRQFLAGS is enabled (we don't want to add overhead to the fast path when this is not set though). Thus let's remove the enabling (and associated irq tracing) from the syscall entry path. Also on Book3S, replace a few mfmsr instructions with loads of PACAMSR from the PACA, which should be faster & schedule better. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>