summaryrefslogtreecommitdiff
path: root/Documentation/trace/coresight
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/trace/coresight')
-rw-r--r--Documentation/trace/coresight/coresight-cpu-debug.rst192
-rw-r--r--Documentation/trace/coresight/coresight-etm4x-reference.rst798
-rw-r--r--Documentation/trace/coresight/coresight.rst498
-rw-r--r--Documentation/trace/coresight/index.rst9
4 files changed, 1497 insertions, 0 deletions
diff --git a/Documentation/trace/coresight/coresight-cpu-debug.rst b/Documentation/trace/coresight/coresight-cpu-debug.rst
new file mode 100644
index 000000000000..993dd294b81b
--- /dev/null
+++ b/Documentation/trace/coresight/coresight-cpu-debug.rst
@@ -0,0 +1,192 @@
+==========================
+Coresight CPU Debug Module
+==========================
+
+ :Author: Leo Yan <leo.yan@linaro.org>
+ :Date: April 5th, 2017
+
+Introduction
+------------
+
+Coresight CPU debug module is defined in ARMv8-a architecture reference manual
+(ARM DDI 0487A.k) Chapter 'Part H: External debug', the CPU can integrate
+debug module and it is mainly used for two modes: self-hosted debug and
+external debug. Usually the external debug mode is well known as the external
+debugger connects with SoC from JTAG port; on the other hand the program can
+explore debugging method which rely on self-hosted debug mode, this document
+is to focus on this part.
+
+The debug module provides sample-based profiling extension, which can be used
+to sample CPU program counter, secure state and exception level, etc; usually
+every CPU has one dedicated debug module to be connected. Based on self-hosted
+debug mechanism, Linux kernel can access these related registers from mmio
+region when the kernel panic happens. The callback notifier for kernel panic
+will dump related registers for every CPU; finally this is good for assistant
+analysis for panic.
+
+
+Implementation
+--------------
+
+- During driver registration, it uses EDDEVID and EDDEVID1 - two device ID
+ registers to decide if sample-based profiling is implemented or not. On some
+ platforms this hardware feature is fully or partially implemented; and if
+ this feature is not supported then registration will fail.
+
+- At the time this documentation was written, the debug driver mainly relies on
+ information gathered by the kernel panic callback notifier from three
+ sampling registers: EDPCSR, EDVIDSR and EDCIDSR: from EDPCSR we can get
+ program counter; EDVIDSR has information for secure state, exception level,
+ bit width, etc; EDCIDSR is context ID value which contains the sampled value
+ of CONTEXTIDR_EL1.
+
+- The driver supports a CPU running in either AArch64 or AArch32 mode. The
+ registers naming convention is a bit different between them, AArch64 uses
+ 'ED' for register prefix (ARM DDI 0487A.k, chapter H9.1) and AArch32 uses
+ 'DBG' as prefix (ARM DDI 0487A.k, chapter G5.1). The driver is unified to
+ use AArch64 naming convention.
+
+- ARMv8-a (ARM DDI 0487A.k) and ARMv7-a (ARM DDI 0406C.b) have different
+ register bits definition. So the driver consolidates two difference:
+
+ If PCSROffset=0b0000, on ARMv8-a the feature of EDPCSR is not implemented;
+ but ARMv7-a defines "PCSR samples are offset by a value that depends on the
+ instruction set state". For ARMv7-a, the driver checks furthermore if CPU
+ runs with ARM or thumb instruction set and calibrate PCSR value, the
+ detailed description for offset is in ARMv7-a ARM (ARM DDI 0406C.b) chapter
+ C11.11.34 "DBGPCSR, Program Counter Sampling Register".
+
+ If PCSROffset=0b0010, ARMv8-a defines "EDPCSR implemented, and samples have
+ no offset applied and do not sample the instruction set state in AArch32
+ state". So on ARMv8 if EDDEVID1.PCSROffset is 0b0010 and the CPU operates
+ in AArch32 state, EDPCSR is not sampled; when the CPU operates in AArch64
+ state EDPCSR is sampled and no offset are applied.
+
+
+Clock and power domain
+----------------------
+
+Before accessing debug registers, we should ensure the clock and power domain
+have been enabled properly. In ARMv8-a ARM (ARM DDI 0487A.k) chapter 'H9.1
+Debug registers', the debug registers are spread into two domains: the debug
+domain and the CPU domain.
+::
+
+ +---------------+
+ | |
+ | |
+ +----------+--+ |
+ dbg_clock -->| |**| |<-- cpu_clock
+ | Debug |**| CPU |
+ dbg_power_domain -->| |**| |<-- cpu_power_domain
+ +----------+--+ |
+ | |
+ | |
+ +---------------+
+
+For debug domain, the user uses DT binding "clocks" and "power-domains" to
+specify the corresponding clock source and power supply for the debug logic.
+The driver calls the pm_runtime_{put|get} operations as needed to handle the
+debug power domain.
+
+For CPU domain, the different SoC designs have different power management
+schemes and finally this heavily impacts external debug module. So we can
+divide into below cases:
+
+- On systems with a sane power controller which can behave correctly with
+ respect to CPU power domain, the CPU power domain can be controlled by
+ register EDPRCR in driver. The driver firstly writes bit EDPRCR.COREPURQ
+ to power up the CPU, and then writes bit EDPRCR.CORENPDRQ for emulation
+ of CPU power down. As result, this can ensure the CPU power domain is
+ powered on properly during the period when access debug related registers;
+
+- Some designs will power down an entire cluster if all CPUs on the cluster
+ are powered down - including the parts of the debug registers that should
+ remain powered in the debug power domain. The bits in EDPRCR are not
+ respected in these cases, so these designs do not support debug over
+ power down in the way that the CoreSight / Debug designers anticipated.
+ This means that even checking EDPRSR has the potential to cause a bus hang
+ if the target register is unpowered.
+
+ In this case, accessing to the debug registers while they are not powered
+ is a recipe for disaster; so we need preventing CPU low power states at boot
+ time or when user enable module at the run time. Please see chapter
+ "How to use the module" for detailed usage info for this.
+
+
+Device Tree Bindings
+--------------------
+
+See Documentation/devicetree/bindings/arm/coresight-cpu-debug.txt for details.
+
+
+How to use the module
+---------------------
+
+If you want to enable debugging functionality at boot time, you can add
+"coresight_cpu_debug.enable=1" to the kernel command line parameter.
+
+The driver also can work as module, so can enable the debugging when insmod
+module::
+
+ # insmod coresight_cpu_debug.ko debug=1
+
+When boot time or insmod module you have not enabled the debugging, the driver
+uses the debugfs file system to provide a knob to dynamically enable or disable
+debugging:
+
+To enable it, write a '1' into /sys/kernel/debug/coresight_cpu_debug/enable::
+
+ # echo 1 > /sys/kernel/debug/coresight_cpu_debug/enable
+
+To disable it, write a '0' into /sys/kernel/debug/coresight_cpu_debug/enable::
+
+ # echo 0 > /sys/kernel/debug/coresight_cpu_debug/enable
+
+As explained in chapter "Clock and power domain", if you are working on one
+platform which has idle states to power off debug logic and the power
+controller cannot work well for the request from EDPRCR, then you should
+firstly constraint CPU idle states before enable CPU debugging feature; so can
+ensure the accessing to debug logic.
+
+If you want to limit idle states at boot time, you can use "nohlt" or
+"cpuidle.off=1" in the kernel command line.
+
+At the runtime you can disable idle states with below methods:
+
+It is possible to disable CPU idle states by way of the PM QoS
+subsystem, more specifically by using the "/dev/cpu_dma_latency"
+interface (see Documentation/power/pm_qos_interface.rst for more
+details). As specified in the PM QoS documentation the requested
+parameter will stay in effect until the file descriptor is released.
+For example::
+
+ # exec 3<> /dev/cpu_dma_latency; echo 0 >&3
+ ...
+ Do some work...
+ ...
+ # exec 3<>-
+
+The same can also be done from an application program.
+
+Disable specific CPU's specific idle state from cpuidle sysfs (see
+Documentation/admin-guide/pm/cpuidle.rst)::
+
+ # echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable
+
+Output format
+-------------
+
+Here is an example of the debugging output format::
+
+ ARM external debug module:
+ coresight-cpu-debug 850000.debug: CPU[0]:
+ coresight-cpu-debug 850000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock)
+ coresight-cpu-debug 850000.debug: EDPCSR: handle_IPI+0x174/0x1d8
+ coresight-cpu-debug 850000.debug: EDCIDSR: 00000000
+ coresight-cpu-debug 850000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0)
+ coresight-cpu-debug 852000.debug: CPU[1]:
+ coresight-cpu-debug 852000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock)
+ coresight-cpu-debug 852000.debug: EDPCSR: debug_notifier_call+0x23c/0x358
+ coresight-cpu-debug 852000.debug: EDCIDSR: 00000000
+ coresight-cpu-debug 852000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0)
diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst
new file mode 100644
index 000000000000..b64d9a9c79df
--- /dev/null
+++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst
@@ -0,0 +1,798 @@
+===============================================
+ETMv4 sysfs linux driver programming reference.
+===============================================
+
+ :Author: Mike Leach <mike.leach@linaro.org>
+ :Date: October 11th, 2019
+
+Supplement to existing ETMv4 driver documentation.
+
+Sysfs files and directories
+---------------------------
+
+Root: ``/sys/bus/coresight/devices/etm<N>``
+
+
+The following paragraphs explain the association between sysfs files and the
+ETMv4 registers that they effect. Note the register names are given without
+the ‘TRC’ prefix.
+
+----
+
+:File: ``mode`` (rw)
+:Trace Registers: {CONFIGR + others}
+:Notes:
+ Bit select trace features. See ‘mode’ section below. Bits
+ in this will cause equivalent programming of trace config and
+ other registers to enable the features requested.
+
+:Syntax & eg:
+ ``echo bitfield > mode``
+
+ bitfield up to 32 bits setting trace features.
+
+:Example:
+ ``$> echo 0x012 > mode``
+
+----
+
+:File: ``reset`` (wo)
+:Trace Registers: All
+:Notes:
+ Reset all programming to trace nothing / no logic programmed.
+
+:Syntax:
+ ``echo 1 > reset``
+
+----
+
+:File: ``enable_source`` (wo)
+:Trace Registers: PRGCTLR, All hardware regs.
+:Notes:
+ - > 0 : Programs up the hardware with the current values held in the driver
+ and enables trace.
+
+ - = 0 : disable trace hardware.
+
+:Syntax:
+ ``echo 1 > enable_source``
+
+----
+
+:File: ``cpu`` (ro)
+:Trace Registers: None.
+:Notes:
+ CPU ID that this ETM is attached to.
+
+:Example:
+ ``$> cat cpu``
+
+ ``$> 0``
+
+----
+
+:File: ``addr_idx`` (rw)
+:Trace Registers: None.
+:Notes:
+ Virtual register to index address comparator and range
+ features. Set index for first of the pair in a range.
+
+:Syntax:
+ ``echo idx > addr_idx``
+
+ Where idx < nr_addr_cmp x 2
+
+----
+
+:File: ``addr_range`` (rw)
+:Trace Registers: ACVR[idx, idx+1], VIIECTLR
+:Notes:
+ Pair of addresses for a range selected by addr_idx. Include
+ / exclude according to the optional parameter, or if omitted
+ uses the current ‘mode’ setting. Select comparator range in
+ control register. Error if index is odd value.
+
+:Depends: ``mode, addr_idx``
+:Syntax:
+ ``echo addr1 addr2 [exclude] > addr_range``
+
+ Where addr1 and addr2 define the range and addr1 < addr2.
+
+ Optional exclude value:-
+
+ - 0 for include
+ - 1 for exclude.
+:Example:
+ ``$> echo 0x0000 0x2000 0 > addr_range``
+
+----
+
+:File: ``addr_single`` (rw)
+:Trace Registers: ACVR[idx]
+:Notes:
+ Set a single address comparator according to addr_idx. This
+ is used if the address comparator is used as part of event
+ generation logic etc.
+
+:Depends: ``addr_idx``
+:Syntax:
+ ``echo addr1 > addr_single``
+
+----
+
+:File: ``addr_start`` (rw)
+:Trace Registers: ACVR[idx], VISSCTLR
+:Notes:
+ Set a trace start address comparator according to addr_idx.
+ Select comparator in control register.
+
+:Depends: ``addr_idx``
+:Syntax:
+ ``echo addr1 > addr_start``
+
+----
+
+:File: ``addr_stop`` (rw)
+:Trace Registers: ACVR[idx], VISSCTLR
+:Notes:
+ Set a trace stop address comparator according to addr_idx.
+ Select comparator in control register.
+
+:Depends: ``addr_idx``
+:Syntax:
+ ``echo addr1 > addr_stop``
+
+----
+
+:File: ``addr_context`` (rw)
+:Trace Registers: ACATR[idx,{6:4}]
+:Notes:
+ Link context ID comparator to address comparator addr_idx
+
+:Depends: ``addr_idx``
+:Syntax:
+ ``echo ctxt_idx > addr_context``
+
+ Where ctxt_idx is the index of the linked context id / vmid
+ comparator.
+
+----
+
+:File: ``addr_ctxtype`` (rw)
+:Trace Registers: ACATR[idx,{3:2}]
+:Notes:
+ Input value string. Set type for linked context ID comparator
+
+:Depends: ``addr_idx``
+:Syntax:
+ ``echo type > addr_ctxtype``
+
+ Type one of {all, vmid, ctxid, none}
+:Example:
+ ``$> echo ctxid > addr_ctxtype``
+
+----
+
+:File: ``addr_exlevel_s_ns`` (rw)
+:Trace Registers: ACATR[idx,{14:8}]
+:Notes:
+ Set the ELx secure and non-secure matching bits for the
+ selected address comparator
+
+:Depends: ``addr_idx``
+:Syntax:
+ ``echo val > addr_exlevel_s_ns``
+
+ val is a 7 bit value for exception levels to exclude. Input
+ value shifted to correct bits in register.
+:Example:
+ ``$> echo 0x4F > addr_exlevel_s_ns``
+
+----
+
+:File: ``addr_instdatatype`` (rw)
+:Trace Registers: ACATR[idx,{1:0}]
+:Notes:
+ Set the comparator address type for matching. Driver only
+ supports setting instruction address type.
+
+:Depends: ``addr_idx``
+
+----
+
+:File: ``addr_cmp_view`` (ro)
+:Trace Registers: ACVR[idx, idx+1], ACATR[idx], VIIECTLR
+:Notes:
+ Read the currently selected address comparator. If part of
+ address range then display both addresses.
+
+:Depends: ``addr_idx``
+:Syntax:
+ ``cat addr_cmp_view``
+:Example:
+ ``$> cat addr_cmp_view``
+
+ ``addr_cmp[0] range 0x0 0xffffffffffffffff include ctrl(0x4b00)``
+
+----
+
+:File: ``nr_addr_cmp`` (ro)
+:Trace Registers: From IDR4
+:Notes:
+ Number of address comparator pairs
+
+----
+
+:File: ``sshot_idx`` (rw)
+:Trace Registers: None
+:Notes:
+ Select single shot register set.
+
+----
+
+:File: ``sshot_ctrl`` (rw)
+:Trace Registers: SSCCR[idx]
+:Notes:
+ Access a single shot comparator control register.
+
+:Depends: ``sshot_idx``
+:Syntax:
+ ``echo val > sshot_ctrl``
+
+ Writes val into the selected control register.
+
+----
+
+:File: ``sshot_status`` (ro)
+:Trace Registers: SSCSR[idx]
+:Notes:
+ Read a single shot comparator status register
+
+:Depends: ``sshot_idx``
+:Syntax:
+ ``cat sshot_status``
+
+ Read status.
+:Example:
+ ``$> cat sshot_status``
+
+ ``0x1``
+
+----
+
+:File: ``sshot_pe_ctrl`` (rw)
+:Trace Registers: SSPCICR[idx]
+:Notes:
+ Access a single shot PE comparator input control register.
+
+:Depends: ``sshot_idx``
+:Syntax:
+ ``echo val > sshot_pe_ctrl``
+
+ Writes val into the selected control register.
+
+----
+
+:File: ``ns_exlevel_vinst`` (rw)
+:Trace Registers: VICTLR{23:20}
+:Notes:
+ Program non-secure exception level filters. Set / clear NS
+ exception filter bits. Setting ‘1’ excludes trace from the
+ exception level.
+
+:Syntax:
+ ``echo bitfield > ns_exlevel_viinst``
+
+ Where bitfield contains bits to set clear for EL0 to EL2
+:Example:
+ ``%> echo 0x4 > ns_exlevel_viinst``
+
+ Excludes EL2 NS trace.
+
+----
+
+:File: ``vinst_pe_cmp_start_stop`` (rw)
+:Trace Registers: VIPCSSCTLR
+:Notes:
+ Access PE start stop comparator input control registers
+
+----
+
+:File: ``bb_ctrl`` (rw)
+:Trace Registers: BBCTLR
+:Notes:
+ Define ranges that Branch Broadcast will operate in.
+ Default (0x0) is all addresses.
+
+:Depends: BB enabled.
+
+----
+
+:File: ``cyc_threshold`` (rw)
+:Trace Registers: CCCTLR
+:Notes:
+ Set the threshold for which cycle counts will be emitted.
+ Error if attempt to set below minimum defined in IDR3, masked
+ to width of valid bits.
+
+:Depends: CC enabled.
+
+----
+
+:File: ``syncfreq`` (rw)
+:Trace Registers: SYNCPR
+:Notes:
+ Set trace synchronisation period. Power of 2 value, 0 (off)
+ or 8-20. Driver defaults to 12 (every 4096 bytes).
+
+----
+
+:File: ``cntr_idx`` (rw)
+:Trace Registers: none
+:Notes:
+ Select the counter to access
+
+:Syntax:
+ ``echo idx > cntr_idx``
+
+ Where idx < nr_cntr
+
+----
+
+:File: ``cntr_ctrl`` (rw)
+:Trace Registers: CNTCTLR[idx]
+:Notes:
+ Set counter control value.
+
+:Depends: ``cntr_idx``
+:Syntax:
+ ``echo val > cntr_ctrl``
+
+ Where val is per ETMv4 spec.
+
+----
+
+:File: ``cntrldvr`` (rw)
+:Trace Registers: CNTRLDVR[idx]
+:Notes:
+ Set counter reload value.
+
+:Depends: ``cntr_idx``
+:Syntax:
+ ``echo val > cntrldvr``
+
+ Where val is per ETMv4 spec.
+
+----
+
+:File: ``nr_cntr`` (ro)
+:Trace Registers: From IDR5
+
+:Notes:
+ Number of counters implemented.
+
+----
+
+:File: ``ctxid_idx`` (rw)
+:Trace Registers: None
+:Notes:
+ Select the context ID comparator to access
+
+:Syntax:
+ ``echo idx > ctxid_idx``
+
+ Where idx < numcidc
+
+----
+
+:File: ``ctxid_pid`` (rw)
+:Trace Registers: CIDCVR[idx]
+:Notes:
+ Set the context ID comparator value
+
+:Depends: ``ctxid_idx``
+
+----
+
+:File: ``ctxid_masks`` (rw)
+:Trace Registers: CIDCCTLR0, CIDCCTLR1, CIDCVR<0-7>
+:Notes:
+ Pair of values to set the byte masks for 1-8 context ID
+ comparators. Automatically clears masked bytes to 0 in CID
+ value registers.
+
+:Syntax:
+ ``echo m3m2m1m0 [m7m6m5m4] > ctxid_masks``
+
+ 32 bit values made up of mask bytes, where mN represents a
+ byte mask value for Context ID comparator N.
+
+ Second value not required on systems that have fewer than 4
+ context ID comparators
+
+----
+
+:File: ``numcidc`` (ro)
+:Trace Registers: From IDR4
+:Notes:
+ Number of Context ID comparators
+
+----
+
+:File: ``vmid_idx`` (rw)
+:Trace Registers: None
+:Notes:
+ Select the VM ID comparator to access.
+
+:Syntax:
+ ``echo idx > vmid_idx``
+
+ Where idx <  numvmidc
+
+----
+
+:File: ``vmid_val`` (rw)
+:Trace Registers: VMIDCVR[idx]
+:Notes:
+ Set the VM ID comparator value
+
+:Depends: ``vmid_idx``
+
+----
+
+:File: ``vmid_masks`` (rw)
+:Trace Registers: VMIDCCTLR0, VMIDCCTLR1, VMIDCVR<0-7>
+:Notes:
+ Pair of values to set the byte masks for 1-8 VM ID comparators.
+ Automatically clears masked bytes to 0 in VMID value registers.
+
+:Syntax:
+ ``echo m3m2m1m0 [m7m6m5m4] > vmid_masks``
+
+ Where mN represents a byte mask value for VMID comparator N.
+ Second value not required on systems that have fewer than 4
+ VMID comparators.
+
+----
+
+:File: ``numvmidc`` (ro)
+:Trace Registers: From IDR4
+:Notes:
+ Number of VMID comparators
+
+----
+
+:File: ``res_idx`` (rw)
+:Trace Registers: None.
+:Notes:
+ Select the resource selector control to access. Must be 2 or
+ higher as selectors 0 and 1 are hardwired.
+
+:Syntax:
+ ``echo idx > res_idx``
+
+ Where 2 <= idx < nr_resource x 2
+
+----
+
+:File: ``res_ctrl`` (rw)
+:Trace Registers: RSCTLR[idx]
+:Notes:
+ Set resource selector control value. Value per ETMv4 spec.
+
+:Depends: ``res_idx``
+:Syntax:
+ ``echo val > res_cntr``
+
+ Where val is per ETMv4 spec.
+
+----
+
+:File: ``nr_resource`` (ro)
+:Trace Registers: From IDR4
+:Notes:
+ Number of resource selector pairs
+
+----
+
+:File: ``event`` (rw)
+:Trace Registers: EVENTCTRL0R
+:Notes:
+ Set up to 4 implemented event fields.
+
+:Syntax:
+ ``echo ev3ev2ev1ev0 > event``
+
+ Where evN is an 8 bit event field. Up to 4 event fields make up the
+ 32-bit input value. Number of valid fields is implementation dependent,
+ defined in IDR0.
+
+----
+
+:File: ``event_instren`` (rw)
+:Trace Registers: EVENTCTRL1R
+:Notes:
+ Choose events which insert event packets into trace stream.
+
+:Depends: EVENTCTRL0R
+:Syntax:
+ ``echo bitfield > event_instren``
+
+ Where bitfield is up to 4 bits according to number of event fields.
+
+----
+
+:File: ``event_ts`` (rw)
+:Trace Registers: TSCTLR
+:Notes:
+ Set the event that will generate timestamp requests.
+
+:Depends: ``TS activated``
+:Syntax:
+ ``echo evfield > event_ts``
+
+ Where evfield is an 8 bit event selector.
+
+----
+
+:File: ``seq_idx`` (rw)
+:Trace Registers: None
+:Notes:
+ Sequencer event register select - 0 to 2
+
+----
+
+:File: ``seq_state`` (rw)
+:Trace Registers: SEQSTR
+:Notes:
+ Sequencer current state - 0 to 3.
+
+----
+
+:File: ``seq_event`` (rw)
+:Trace Registers: SEQEVR[idx]
+:Notes:
+ State transition event registers
+
+:Depends: ``seq_idx``
+:Syntax:
+ ``echo evBevF > seq_event``
+
+ Where evBevF is a 16 bit value made up of two event selectors,
+
+ - evB : back
+ - evF : forwards.
+
+----
+
+:File: ``seq_reset_event`` (rw)
+:Trace Registers: SEQRSTEVR
+:Notes:
+ Sequencer reset event
+
+:Syntax:
+ ``echo evfield > seq_reset_event``
+
+ Where evfield is an 8 bit event selector.
+
+----
+
+:File: ``nrseqstate`` (ro)
+:Trace Registers: From IDR5
+:Notes:
+ Number of sequencer states (0 or 4)
+
+----
+
+:File: ``nr_pe_cmp`` (ro)
+:Trace Registers: From IDR4
+:Notes:
+ Number of PE comparator inputs
+
+----
+
+:File: ``nr_ext_inp`` (ro)
+:Trace Registers: From IDR5
+:Notes:
+ Number of external inputs
+
+----
+
+:File: ``nr_ss_cmp`` (ro)
+:Trace Registers: From IDR4
+:Notes:
+ Number of Single Shot control registers
+
+----
+
+*Note:* When programming any address comparator the driver will tag the
+comparator with a type used - i.e. RANGE, SINGLE, START, STOP. Once this tag
+is set, then only the values can be changed using the same sysfs file / type
+used to program it.
+
+Thus::
+
+ % echo 0 > addr_idx ; select address comparator 0
+ % echo 0x1000 0x5000 0 > addr_range ; set address range on comparators 0, 1.
+ % echo 0x2000 > addr_start ; error as comparator 0 is a range comparator
+ % echo 2 > addr_idx ; select address comparator 2
+ % echo 0x2000 > addr_start ; this is OK as comparator 2 is unused.
+ % echo 0x3000 > addr_stop ; error as comparator 2 set as start address.
+ % echo 2 > addr_idx ; select address comparator 3
+ % echo 0x3000 > addr_stop ; this is OK
+
+To remove programming on all the comparators (and all the other hardware) use
+the reset parameter::
+
+ % echo 1 > reset
+
+
+
+The ‘mode’ sysfs parameter.
+---------------------------
+
+This is a bitfield selection parameter that sets the overall trace mode for the
+ETM. The table below describes the bits, using the defines from the driver
+source file, along with a description of the feature these represent. Many
+features are optional and therefore dependent on implementation in the
+hardware.
+
+Bit assignments shown below:-
+
+----
+
+**bit (0):**
+ ETM_MODE_EXCLUDE
+
+**description:**
+ This is the default value for the include / exclude function when
+ setting address ranges. Set 1 for exclude range. When the mode
+ parameter is set this value is applied to the currently indexed
+ address range.
+
+
+**bit (4):**
+ ETM_MODE_BB
+
+**description:**
+ Set to enable branch broadcast if supported in hardware [IDR0].
+
+
+**bit (5):**
+ ETMv4_MODE_CYCACC
+
+**description:**
+ Set to enable cycle accurate trace if supported [IDR0].
+
+
+**bit (6):**
+ ETMv4_MODE_CTXID
+
+**description:**
+ Set to enable context ID tracing if supported in hardware [IDR2].
+
+
+**bit (7):**
+ ETM_MODE_VMID
+
+**description:**
+ Set to enable virtual machine ID tracing if supported [IDR2].
+
+
+**bit (11):**
+ ETMv4_MODE_TIMESTAMP
+
+**description:**
+ Set to enable timestamp generation if supported [IDR0].
+
+
+**bit (12):**
+ ETM_MODE_RETURNSTACK
+**description:**
+ Set to enable trace return stack use if supported [IDR0].
+
+
+**bit (13-14):**
+ ETM_MODE_QELEM(val)
+
+**description:**
+ ‘val’ determines level of Q element support enabled if
+ implemented by the ETM [IDR0]
+
+
+**bit (19):**
+ ETM_MODE_ATB_TRIGGER
+
+**description:**
+ Set to enable the ATBTRIGGER bit in the event control register
+ [EVENTCTLR1] if supported [IDR5].
+
+
+**bit (20):**
+ ETM_MODE_LPOVERRIDE
+
+**description:**
+ Set to enable the LPOVERRIDE bit in the event control register
+ [EVENTCTLR1], if supported [IDR5].
+
+
+**bit (21):**
+ ETM_MODE_ISTALL_EN
+
+**description:**
+ Set to enable the ISTALL bit in the stall control register
+ [STALLCTLR]
+
+
+**bit (23):**
+ ETM_MODE_INSTPRIO
+
+**description:**
+ Set to enable the INSTPRIORITY bit in the stall control register
+ [STALLCTLR] , if supported [IDR0].
+
+
+**bit (24):**
+ ETM_MODE_NOOVERFLOW
+
+**description:**
+ Set to enable the NOOVERFLOW bit in the stall control register
+ [STALLCTLR], if supported [IDR3].
+
+
+**bit (25):**
+ ETM_MODE_TRACE_RESET
+
+**description:**
+ Set to enable the TRCRESET bit in the viewinst control register
+ [VICTLR] , if supported [IDR3].
+
+
+**bit (26):**
+ ETM_MODE_TRACE_ERR
+
+**description:**
+ Set to enable the TRCCTRL bit in the viewinst control register
+ [VICTLR].
+
+
+**bit (27):**
+ ETM_MODE_VIEWINST_STARTSTOP
+
+**description:**
+ Set the initial state value of the ViewInst start / stop logic
+ in the viewinst control register [VICTLR]
+
+
+**bit (30):**
+ ETM_MODE_EXCL_KERN
+
+**description:**
+ Set default trace setup to exclude kernel mode trace (see note a)
+
+
+**bit (31):**
+ ETM_MODE_EXCL_USER
+
+**description:**
+ Set default trace setup to exclude user space trace (see note a)
+
+----
+
+*Note a)* On startup the ETM is programmed to trace the complete address space
+using address range comparator 0. ‘mode’ bits 30 / 31 modify this setting to
+set EL exclude bits for NS state in either user space (EL0) or kernel space
+(EL1) in the address range comparator. (the default setting excludes all
+secure EL, and NS EL2)
+
+Once the reset parameter has been used, and/or custom programming has been
+implemented - using these bits will result in the EL bits for address
+comparator 0 being set in the same way.
+
+*Note b)* Bits 2-3, 8-10, 15-16, 18, 22, control features that only work with
+data trace. As A-profile data trace is architecturally prohibited in ETMv4,
+these have been omitted here. Possible uses could be where a kernel has
+support for control of R or M profile infrastructure as part of a heterogeneous
+system.
+
+Bits 17, 28-29 are unused.
diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst
new file mode 100644
index 000000000000..a566719f8e7e
--- /dev/null
+++ b/Documentation/trace/coresight/coresight.rst
@@ -0,0 +1,498 @@
+======================================
+Coresight - HW Assisted Tracing on ARM
+======================================
+
+ :Author: Mathieu Poirier <mathieu.poirier@linaro.org>
+ :Date: September 11th, 2014
+
+Introduction
+------------
+
+Coresight is an umbrella of technologies allowing for the debugging of ARM
+based SoC. It includes solutions for JTAG and HW assisted tracing. This
+document is concerned with the latter.
+
+HW assisted tracing is becoming increasingly useful when dealing with systems
+that have many SoCs and other components like GPU and DMA engines. ARM has
+developed a HW assisted tracing solution by means of different components, each
+being added to a design at synthesis time to cater to specific tracing needs.
+Components are generally categorised as source, link and sinks and are
+(usually) discovered using the AMBA bus.
+
+"Sources" generate a compressed stream representing the processor instruction
+path based on tracing scenarios as configured by users. From there the stream
+flows through the coresight system (via ATB bus) using links that are connecting
+the emanating source to a sink(s). Sinks serve as endpoints to the coresight
+implementation, either storing the compressed stream in a memory buffer or
+creating an interface to the outside world where data can be transferred to a
+host without fear of filling up the onboard coresight memory buffer.
+
+At typical coresight system would look like this::
+
+ *****************************************************************
+ **************************** AMBA AXI ****************************===||
+ ***************************************************************** ||
+ ^ ^ | ||
+ | | * **
+ 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ ||||||||||||
+ 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System ||
+ |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory ||
+ | #######<-->: I : | #######<-->: I : : I : @@@<-| ||||||||||||
+ | # ETM # ::::: | # PTM # ::::: ::::: @ |
+ | ##### ^ ^ | ##### ^ ! ^ ! . | |||||||||
+ | |->### | ! | |->### | ! | ! . | || DAP ||
+ | | # | ! | | # | ! | ! . | |||||||||
+ | | . | ! | | . | ! | ! . | | |
+ | | . | ! | | . | ! | ! . | | *
+ | | . | ! | | . | ! | ! . | | SWD/
+ | | . | ! | | . | ! | ! . | | JTAG
+ *****************************************************************<-|
+ *************************** AMBA Debug APB ************************
+ *****************************************************************
+ | . ! . ! ! . |
+ | . * . * * . |
+ *****************************************************************
+ ******************** Cross Trigger Matrix (CTM) *******************
+ *****************************************************************
+ | . ^ . . |
+ | * ! * * |
+ *****************************************************************
+ ****************** AMBA Advanced Trace Bus (ATB) ******************
+ *****************************************************************
+ | ! =============== |
+ | * ===== F =====<---------|
+ | ::::::::: ==== U ====
+ |-->:: CTI ::<!! === N ===
+ | ::::::::: ! == N ==
+ | ^ * == E ==
+ | ! &&&&&&&&& IIIIIII == L ==
+ |------>&& ETB &&<......II I =======
+ | ! &&&&&&&&& II I .
+ | ! I I .
+ | ! I REP I<..........
+ | ! I I
+ | !!>&&&&&&&&& II I *Source: ARM ltd.
+ |------>& TPIU &<......II I DAP = Debug Access Port
+ &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell
+ ; PTM = Program Trace Macrocell
+ ; CTI = Cross Trigger Interface
+ * ETB = Embedded Trace Buffer
+ To trace port TPIU= Trace Port Interface Unit
+ SWD = Serial Wire Debug
+
+While on target configuration of the components is done via the APB bus,
+all trace data are carried out-of-band on the ATB bus. The CTM provides
+a way to aggregate and distribute signals between CoreSight components.
+
+The coresight framework provides a central point to represent, configure and
+manage coresight devices on a platform. This first implementation centers on
+the basic tracing functionality, enabling components such ETM/PTM, funnel,
+replicator, TMC, TPIU and ETB. Future work will enable more
+intricate IP blocks such as STM and CTI.
+
+
+Acronyms and Classification
+---------------------------
+
+Acronyms:
+
+PTM:
+ Program Trace Macrocell
+ETM:
+ Embedded Trace Macrocell
+STM:
+ System trace Macrocell
+ETB:
+ Embedded Trace Buffer
+ITM:
+ Instrumentation Trace Macrocell
+TPIU:
+ Trace Port Interface Unit
+TMC-ETR:
+ Trace Memory Controller, configured as Embedded Trace Router
+TMC-ETF:
+ Trace Memory Controller, configured as Embedded Trace FIFO
+CTI:
+ Cross Trigger Interface
+
+Classification:
+
+Source:
+ ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM
+Link:
+ Funnel, replicator (intelligent or not), TMC-ETR
+Sinks:
+ ETBv1.0, ETB1.1, TPIU, TMC-ETF
+Misc:
+ CTI
+
+
+Device Tree Bindings
+--------------------
+
+See Documentation/devicetree/bindings/arm/coresight.txt for details.
+
+As of this writing drivers for ITM, STMs and CTIs are not provided but are
+expected to be added as the solution matures.
+
+
+Framework and implementation
+----------------------------
+
+The coresight framework provides a central point to represent, configure and
+manage coresight devices on a platform. Any coresight compliant device can
+register with the framework for as long as they use the right APIs:
+
+.. c:function:: struct coresight_device *coresight_register(struct coresight_desc *desc);
+.. c:function:: void coresight_unregister(struct coresight_device *csdev);
+
+The registering function is taking a ``struct coresight_desc *desc`` and
+register the device with the core framework. The unregister function takes
+a reference to a ``struct coresight_device *csdev`` obtained at registration time.
+
+If everything goes well during the registration process the new devices will
+show up under /sys/bus/coresight/devices, as showns here for a TC2 platform::
+
+ root:~# ls /sys/bus/coresight/devices/
+ replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
+ 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
+ root:~#
+
+The functions take a ``struct coresight_device``, which looks like this::
+
+ struct coresight_desc {
+ enum coresight_dev_type type;
+ struct coresight_dev_subtype subtype;
+ const struct coresight_ops *ops;
+ struct coresight_platform_data *pdata;
+ struct device *dev;
+ const struct attribute_group **groups;
+ };
+
+
+The "coresight_dev_type" identifies what the device is, i.e, source link or
+sink while the "coresight_dev_subtype" will characterise that type further.
+
+The ``struct coresight_ops`` is mandatory and will tell the framework how to
+perform base operations related to the components, each component having
+a different set of requirement. For that ``struct coresight_ops_sink``,
+``struct coresight_ops_link`` and ``struct coresight_ops_source`` have been
+provided.
+
+The next field ``struct coresight_platform_data *pdata`` is acquired by calling
+``of_get_coresight_platform_data()``, as part of the driver's _probe routine and
+``struct device *dev`` gets the device reference embedded in the ``amba_device``::
+
+ static int etm_probe(struct amba_device *adev, const struct amba_id *id)
+ {
+ ...
+ ...
+ drvdata->dev = &adev->dev;
+ ...
+ }
+
+Specific class of device (source, link, or sink) have generic operations
+that can be performed on them (see ``struct coresight_ops``). The ``**groups``
+is a list of sysfs entries pertaining to operations
+specific to that component only. "Implementation defined" customisations are
+expected to be accessed and controlled using those entries.
+
+Device Naming scheme
+--------------------
+
+The devices that appear on the "coresight" bus were named the same as their
+parent devices, i.e, the real devices that appears on AMBA bus or the platform bus.
+Thus the names were based on the Linux Open Firmware layer naming convention,
+which follows the base physical address of the device followed by the device
+type. e.g::
+
+ root:~# ls /sys/bus/coresight/devices/
+ 20010000.etf 20040000.funnel 20100000.stm 22040000.etm
+ 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu
+ 20070000.etr 20120000.replicator 220c0000.funnel
+ 23040000.etm 23140000.etm 23340000.etm
+
+However, with the introduction of ACPI support, the names of the real
+devices are a bit cryptic and non-obvious. Thus, a new naming scheme was
+introduced to use more generic names based on the type of the device. The
+following rules apply::
+
+ 1) Devices that are bound to CPUs, are named based on the CPU logical
+ number.
+
+ e.g, ETM bound to CPU0 is named "etm0"
+
+ 2) All other devices follow a pattern, "<device_type_prefix>N", where :
+
+ <device_type_prefix> - A prefix specific to the type of the device
+ N - a sequential number assigned based on the order
+ of probing.
+
+ e.g, tmc_etf0, tmc_etr0, funnel0, funnel1
+
+Thus, with the new scheme the devices could appear as ::
+
+ root:~# ls /sys/bus/coresight/devices/
+ etm0 etm1 etm2 etm3 etm4 etm5 funnel0
+ funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0
+
+Some of the examples below might refer to old naming scheme and some
+to the newer scheme, to give a confirmation that what you see on your
+system is not unexpected. One must use the "names" as they appear on
+the system under specified locations.
+
+How to use the tracer modules
+-----------------------------
+
+There are two ways to use the Coresight framework:
+
+1. using the perf cmd line tools.
+2. interacting directly with the Coresight devices using the sysFS interface.
+
+Preference is given to the former as using the sysFS interface
+requires a deep understanding of the Coresight HW. The following sections
+provide details on using both methods.
+
+1) Using the sysFS interface:
+
+Before trace collection can start, a coresight sink needs to be identified.
+There is no limit on the amount of sinks (nor sources) that can be enabled at
+any given moment. As a generic operation, all device pertaining to the sink
+class will have an "active" entry in sysfs::
+
+ root:/sys/bus/coresight/devices# ls
+ replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
+ 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
+ root:/sys/bus/coresight/devices# ls 20010000.etb
+ enable_sink status trigger_cntr
+ root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink
+ root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink
+ 1
+ root:/sys/bus/coresight/devices#
+
+At boot time the current etm3x driver will configure the first address
+comparator with "_stext" and "_etext", essentially tracing any instruction
+that falls within that range. As such "enabling" a source will immediately
+trigger a trace capture::
+
+ root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source
+ root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source
+ 1
+ root:/sys/bus/coresight/devices# cat 20010000.etb/status
+ Depth: 0x2000
+ Status: 0x1
+ RAM read ptr: 0x0
+ RAM wrt ptr: 0x19d3 <----- The write pointer is moving
+ Trigger cnt: 0x0
+ Control: 0x1
+ Flush status: 0x0
+ Flush ctrl: 0x2001
+ root:/sys/bus/coresight/devices#
+
+Trace collection is stopped the same way::
+
+ root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source
+ root:/sys/bus/coresight/devices#
+
+The content of the ETB buffer can be harvested directly from /dev::
+
+ root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \
+ of=~/cstrace.bin
+ 64+0 records in
+ 64+0 records out
+ 32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s
+ root:/sys/bus/coresight/devices#
+
+The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32.
+
+Following is a DS-5 output of an experimental loop that increments a variable up
+to a certain value. The example is simple and yet provides a glimpse of the
+wealth of possibilities that coresight provides.
+::
+
+ Info Tracing enabled
+ Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr}
+ Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc
+ Instruction 0 0x8026B544 E3A03000 false MOV r3,#0
+ Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Timestamp Timestamp: 17106715833
+ Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1
+ Instruction 0 0x8026B564 E1A0100D false MOV r1,sp
+ Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0
+ Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f
+ Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4]
+ Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368
+ Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc]
+ Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0]
+ Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4
+ Info Tracing enabled
+ Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc
+ Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc}
+ Timestamp Timestamp: 17107041535
+
+2) Using perf framework:
+
+Coresight tracers are represented using the Perf framework's Performance
+Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of
+controlling when tracing gets enabled based on when the process of interest is
+scheduled. When configured in a system, Coresight PMUs will be listed when
+queried by the perf command line tool:
+
+ linaro@linaro-nano:~$ ./perf list pmu
+
+ List of pre-defined events (to be used in -e):
+
+ cs_etm// [Kernel PMU event]
+
+ linaro@linaro-nano:~$
+
+Regardless of the number of tracers available in a system (usually equal to the
+amount of processor cores), the "cs_etm" PMU will be listed only once.
+
+A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
+listed along with configuration options within forward slashes '/'. Since a
+Coresight system will typically have more than one sink, the name of the sink to
+work with needs to be specified as an event option.
+On newer kernels the available sinks are listed in sysFS under
+($SYSFS)/bus/event_source/devices/cs_etm/sinks/::
+
+ root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls
+ tmc_etf0 tmc_etr0 tpiu0
+
+On older kernels, this may need to be found from the list of coresight devices,
+available under ($SYSFS)/bus/coresight/devices/::
+
+ root:~# ls /sys/bus/coresight/devices/
+ etm0 etm1 etm2 etm3 etm4 etm5 funnel0
+ funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0
+ root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program
+
+As mentioned above in section "Device Naming scheme", the names of the devices could
+look different from what is used in the example above. One must use the device names
+as it appears under the sysFS.
+
+The syntax within the forward slashes '/' is important. The '@' character
+tells the parser that a sink is about to be specified and that this is the sink
+to use for the trace session.
+
+More information on the above and other example on how to use Coresight with
+the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
+repository [#third]_.
+
+2.1) AutoFDO analysis using the perf tools:
+
+perf can be used to record and analyze trace of programs.
+
+Execution can be recorded using 'perf record' with the cs_etm event,
+specifying the name of the sink to record to, e.g::
+
+ perf record -e cs_etm/@tmc_etr0/u --per-thread
+
+The 'perf report' and 'perf script' commands can be used to analyze execution,
+synthesizing instruction and branch events from the instruction trace.
+'perf inject' can be used to replace the trace data with the synthesized events.
+The --itrace option controls the type and frequency of synthesized events
+(see perf documentation).
+
+Note that only 64-bit programs are currently supported - further work is
+required to support instruction decode of 32-bit Arm programs.
+
+
+Generating coverage files for Feedback Directed Optimization: AutoFDO
+---------------------------------------------------------------------
+
+'perf inject' accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+::
+
+ perf inject --itrace --strip -i perf.data -o perf.data.new
+
+Below is an example of using ARM ETM for autoFDO. It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5. The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
+::
+
+ $ gcc-5 -O3 sort.c -o sort
+ $ taskset -c 2 ./sort
+ Bubble sorting array of 30000 elements
+ 5910 ms
+
+ $ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort
+ Bubble sorting array of 30000 elements
+ 12543 ms
+ [ perf record: Woken up 35 times to write data ]
+ [ perf record: Captured and wrote 69.640 MB perf.data ]
+
+ $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
+ $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
+ $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+ $ taskset -c 2 ./sort_autofdo
+ Bubble sorting array of 30000 elements
+ 5806 ms
+
+
+How to use the STM module
+-------------------------
+
+Using the System Trace Macrocell module is the same as the tracers - the only
+difference is that clients are driving the trace capture rather
+than the program flow through the code.
+
+As with any other CoreSight component, specifics about the STM tracer can be
+found in sysfs with more information on each entry being found in [#first]_::
+
+ root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0
+ enable_source hwevent_select port_enable subsystem uevent
+ hwevent_enable mgmt port_select traceid
+ root@genericarmv8:~#
+
+Like any other source a sink needs to be identified and the STM enabled before
+being used::
+
+ root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink
+ root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source
+
+From there user space applications can request and use channels using the devfs
+interface provided for that purpose by the generic STM API::
+
+ root@genericarmv8:~# ls -l /dev/stm0
+ crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0
+ root@genericarmv8:~#
+
+Details on how to use the generic STM API can be found here:- :doc:`../stm` [#second]_.
+
+.. [#first] Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
+
+.. [#second] Documentation/trace/stm.rst
+
+.. [#third] https://github.com/Linaro/perf-opencsd
diff --git a/Documentation/trace/coresight/index.rst b/Documentation/trace/coresight/index.rst
new file mode 100644
index 000000000000..8d31b155a87c
--- /dev/null
+++ b/Documentation/trace/coresight/index.rst
@@ -0,0 +1,9 @@
+==============================
+CoreSight - ARM Hardware Trace
+==============================
+
+.. toctree::
+ :maxdepth: 2
+ :glob:
+
+ *