19 files changed, 1041 insertions, 87 deletions
diff --git a/Documentation/trace/coresight/coresight-perf.rst b/Documentation/trace/coresight/coresight-perf.rst
index d087aae7d492..30be89320621 100644
--- a/Documentation/trace/coresight/coresight-perf.rst
+++ b/Documentation/trace/coresight/coresight-perf.rst
@@ -78,6 +78,37 @@ enabled like::
 
 Please refer to the kernel configuration help for more information.
 
+Fine-grained tracing with AUX pause and resume
+----------------------------------------------
+
+Arm CoreSight may generate a large amount of hardware trace data, which
+will lead to overhead in recording and distract users when reviewing
+profiling result. To mitigate the issue of excessive trace data, Perf
+provides AUX pause and resume functionality for fine-grained tracing.
+
+The AUX pause and resume can be triggered by associated events. These
+events can be ftrace tracepoints (including static and dynamic
+tracepoints) or PMU events (e.g. CPU PMU cycle event). To create a perf
+session with AUX pause / resume, three configuration terms are
+introduced:
+
+- "aux-action=start-paused": it is specified for the cs_etm PMU event to
+  launch in a paused state.
+- "aux-action=pause": an associated event is specified with this term
+  to pause AUX trace.
+- "aux-action=resume": an associated event is specified with this term
+  to resume AUX trace.
+
+Example for triggering AUX pause and resume with ftrace tracepoints::
+
+  perf record -e cs_etm/aux-action=start-paused/k,syscalls:sys_enter_openat/aux-action=resume/,syscalls:sys_exit_openat/aux-action=pause/ ls
+
+Example for triggering AUX pause and resume with PMU event::
+
+  perf record -a -e cs_etm/aux-action=start-paused/k \
+        -e cycles/aux-action=pause,period=10000000/ \
+        -e cycles/aux-action=resume,period=1050000/ -- sleep 1
+
 Perf test - Verify kernel and userspace perf CoreSight work
 -----------------------------------------------------------
 
diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst
index d4f93d6a2d63..806699871b80 100644
--- a/Documentation/trace/coresight/coresight.rst
+++ b/Documentation/trace/coresight/coresight.rst
@@ -462,44 +462,35 @@ queried by the perf command line tool:
 
 		cs_etm//                                    [Kernel PMU event]
 
-	linaro@linaro-nano:~$
-
 Regardless of the number of tracers available in a system (usually equal to the
 amount of processor cores), the "cs_etm" PMU will be listed only once.
 
 A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
-listed along with configuration options within forward slashes '/'.  Since a
-Coresight system will typically have more than one sink, the name of the sink to
-work with needs to be specified as an event option.
-On newer kernels the available sinks are listed in sysFS under
+provided along with configuration options within forward slashes '/' (see
+`Config option formats`_).
+
+Advanced Perf framework usage
+-----------------------------
+
+Sink selection
+~~~~~~~~~~~~~~
+
+An appropriate sink will be selected automatically for use with Perf, but since
+there will typically be more than one sink, the name of the sink to use may be
+specified as a special config option prefixed with '@'.
+
+The available sinks are listed in sysFS under
 ($SYSFS)/bus/event_source/devices/cs_etm/sinks/::
 
 	root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls
 	tmc_etf0  tmc_etr0  tpiu0
 
-On older kernels, this may need to be found from the list of coresight devices,
-available under ($SYSFS)/bus/coresight/devices/::
-
-	root:~# ls /sys/bus/coresight/devices/
-	 etm0     etm1     etm2         etm3  etm4      etm5      funnel0
-	 funnel1  funnel2  replicator0  stm0  tmc_etf0  tmc_etr0  tpiu0
 	root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program
 
-As mentioned above in section "Device Naming scheme", the names of the devices could
-look different from what is used in the example above. One must use the device names
-as it appears under the sysFS.
-
-The syntax within the forward slashes '/' is important.  The '@' character
-tells the parser that a sink is about to be specified and that this is the sink
-to use for the trace session.
-
 More information on the above and other example on how to use Coresight with
 the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
 repository [#third]_.
 
-Advanced perf framework usage
------------------------------
-
 AutoFDO analysis using the perf tools
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -508,7 +499,7 @@ perf can be used to record and analyze trace of programs.
 Execution can be recorded using 'perf record' with the cs_etm event,
 specifying the name of the sink to record to, e.g::
 
-    perf record -e cs_etm/@tmc_etr0/u --per-thread
+    perf record -e cs_etm//u --per-thread
 
 The 'perf report' and 'perf script' commands can be used to analyze execution,
 synthesizing instruction and branch events from the instruction trace.
@@ -572,7 +563,7 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto
 	Bubble sorting array of 30000 elements
 	5910 ms
 
-	$ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort
+	$ perf record -e cs_etm//u --per-thread taskset -c 2 ./sort
 	Bubble sorting array of 30000 elements
 	12543 ms
 	[ perf record: Woken up 35 times to write data ]
diff --git a/Documentation/trace/coresight/panic.rst b/Documentation/trace/coresight/panic.rst
new file mode 100644
index 000000000000..6e4bde953cae
--- /dev/null
+++ b/Documentation/trace/coresight/panic.rst
@@ -0,0 +1,362 @@
+===================================================
+Using Coresight for Kernel panic and Watchdog reset
+===================================================
+
+Introduction
+------------
+This documentation is about using Linux coresight trace support to
+debug kernel panic and watchdog reset scenarios.
+
+Coresight trace during Kernel panic
+-----------------------------------
+From the coresight driver point of view, addressing the kernel panic
+situation has four main requirements.
+
+a. Support for allocation of trace buffer pages from reserved memory area.
+   Platform can advertise this using a new device tree property added to
+   relevant coresight nodes.
+
+b. Support for stopping coresight blocks at the time of panic
+
+c. Saving required metadata in the specified format
+
+d. Support for reading trace data captured at the time of panic
+
+Allocation of trace buffer pages from reserved RAM
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A new optional device tree property "memory-region" is added to the
+Coresight TMC device nodes, that would give the base address and size of trace
+buffer.
+
+Static allocation of trace buffers would ensure that both IOMMU enabled
+and disabled cases are handled. Also, platforms that support persistent
+RAM will allow users to read trace data in the subsequent boot without
+booting the crashdump kernel.
+
+Note:
+For ETR sink devices, this reserved region will be used for both trace
+capture and trace data retrieval.
+For ETF sink devices, internal SRAM would be used for trace capture,
+and they would be synced to reserved region for retrieval.
+
+
+Disabling coresight blocks at the time of panic
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In order to avoid the situation of losing relevant trace data after a
+kernel panic, it would be desirable to stop the coresight blocks at the
+time of panic.
+
+This can be achieved by configuring the comparator, CTI and sink
+devices as below::
+
+           Trigger on panic
+    Comparator --->External out --->CTI -->External In---->ETR/ETF stop
+
+Saving metadata at the time of kernel panic
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Coresight metadata involves all additional data that are required for a
+successful trace decode in addition to the trace data. This involves
+ETR/ETF/ETB register snapshot etc.
+
+A new optional device property "memory-region" is added to
+the ETR/ETF/ETB device nodes for this.
+
+Reading trace data captured at the time of panic
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Trace data captured at the time of panic, can be read from rebooted kernel
+or from crashdump kernel using a special device file /dev/crash_tmc_xxx.
+This device file is created only when there is a valid crashdata available.
+
+General flow of trace capture and decode in case of kernel panic
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+1. Enable source and sink on all the cores using the sysfs interface.
+   ETR sinks should have trace buffers allocated from reserved memory,
+   by selecting "resrv" buffer mode from sysfs.
+
+2. Run relevant tests.
+
+3. On a kernel panic, all coresight blocks are disabled, necessary
+   metadata is synced by kernel panic handler.
+
+   System would eventually reboot or boot a crashdump kernel.
+
+4. For  platforms that supports crashdump kernel, raw trace data can be
+   dumped using the coresight sysfs interface from the crashdump kernel
+   itself. Persistent RAM is not a requirement in this case.
+
+5. For platforms that supports persistent RAM, trace data can be dumped
+   using the coresight sysfs interface in the subsequent Linux boot.
+   Crashdump kernel is not a requirement in this case. Persistent RAM
+   ensures that trace data is intact across reboot.
+
+Coresight trace during Watchdog reset
+-------------------------------------
+The main difference between addressing the watchdog reset and kernel panic
+case are below,
+
+a. Saving coresight metadata need to be taken care by the
+   SCP(system control processor) firmware in the specified format,
+   instead of kernel.
+
+b. Reserved memory region given by firmware for trace buffer and metadata
+   has to be in persistent RAM.
+   Note: This is a requirement for watchdog reset case but optional
+   in kernel panic case.
+
+Watchdog reset can be supported only on platforms that meet the above
+two requirements.
+
+Sample commands for testing a Kernel panic case with ETR sink
+-------------------------------------------------------------
+
+1. Boot Linux kernel with "crash_kexec_post_notifiers" added to the kernel
+   bootargs. This is mandatory if the user would like to read the tracedata
+   from the crashdump kernel.
+
+2. Enable the preloaded ETM configuration::
+
+    #echo 1 > /sys/kernel/config/cs-syscfg/configurations/panicstop/enable
+
+3. Configure CTI using sysfs interface::
+
+    #./cti_setup.sh
+
+    #cat cti_setup.sh
+
+
+    cd /sys/bus/coresight/devices/
+
+    ap_cti_config () {
+      #ETM trig out[0] trigger to Channel 0
+      echo 0 4 > channels/trigin_attach
+    }
+
+    etf_cti_config () {
+      #ETF Flush in trigger from Channel 0
+      echo 0 1 > channels/trigout_attach
+      echo 1 > channels/trig_filter_enable
+    }
+
+    etr_cti_config () {
+      #ETR Flush in from Channel 0
+      echo 0 1 > channels/trigout_attach
+      echo 1 > channels/trig_filter_enable
+    }
+
+    ctidevs=`find . -name "cti*"`
+
+    for i in $ctidevs
+    do
+            cd $i
+
+            connection=`find . -name "ete*"`
+            if [ ! -z "$connection" ]
+            then
+                    echo "AP CTI config for $i"
+                    ap_cti_config
+            fi
+
+            connection=`find . -name "tmc_etf*"`
+            if [ ! -z "$connection" ]
+            then
+                    echo "ETF CTI config for $i"
+                    etf_cti_config
+            fi
+
+            connection=`find . -name "tmc_etr*"`
+            if [ ! -z "$connection" ]
+            then
+                    echo "ETR CTI config for $i"
+                    etr_cti_config
+            fi
+
+            cd ..
+    done
+
+Note: CTI connections are SOC specific and hence the above script is
+added just for reference.
+
+4. Choose reserved buffer mode for ETR buffer::
+
+    #echo "resrv" > /sys/bus/coresight/devices/tmc_etr0/buf_mode_preferred
+
+5. Enable stop on flush trigger configuration::
+
+    #echo 1 > /sys/bus/coresight/devices/tmc_etr0/stop_on_flush
+
+6. Start Coresight tracing on cores 1 and 2 using sysfs interface
+
+7. Run some application on core 1::
+
+    #taskset -c 1 dd if=/dev/urandom of=/dev/null &
+
+8. Invoke kernel panic on core 2::
+
+    #echo 1 > /proc/sys/kernel/panic
+    #taskset -c 2 echo c > /proc/sysrq-trigger
+
+9. From rebooted kernel or crashdump kernel, read crashdata::
+
+    #dd if=/dev/crash_tmc_etr0 of=/trace/cstrace.bin
+
+10. Run opencsd decoder tools/scripts to generate the instruction trace.
+
+Sample instruction trace dump
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Core1 dump::
+
+    A                                  etm4_enable_hw: ffff800008ae1dd4
+    CONTEXT EL2                        etm4_enable_hw: ffff800008ae1dd4
+    I                                  etm4_enable_hw: ffff800008ae1dd4:
+    d503201f   nop
+    I                                  etm4_enable_hw: ffff800008ae1dd8:
+    d503201f   nop
+    I                                  etm4_enable_hw: ffff800008ae1ddc:
+    d503201f   nop
+    I                                  etm4_enable_hw: ffff800008ae1de0:
+    d503201f   nop
+    I                                  etm4_enable_hw: ffff800008ae1de4:
+    d503201f   nop
+    I                                  etm4_enable_hw: ffff800008ae1de8:
+    d503233f   paciasp
+    I                                  etm4_enable_hw: ffff800008ae1dec:
+    a9be7bfd   stp     x29, x30, [sp, #-32]!
+    I                                  etm4_enable_hw: ffff800008ae1df0:
+    910003fd   mov     x29, sp
+    I                                  etm4_enable_hw: ffff800008ae1df4:
+    a90153f3   stp     x19, x20, [sp, #16]
+    I                                  etm4_enable_hw: ffff800008ae1df8:
+    2a0003f4   mov     w20, w0
+    I                                  etm4_enable_hw: ffff800008ae1dfc:
+    900085b3   adrp    x19, ffff800009b95000 <reserved_mem+0xc48>
+    I                                  etm4_enable_hw: ffff800008ae1e00:
+    910f4273   add     x19, x19, #0x3d0
+    I                                  etm4_enable_hw: ffff800008ae1e04:
+    f8747a60   ldr     x0, [x19, x20, lsl #3]
+    E                                  etm4_enable_hw: ffff800008ae1e08:
+    b4000140   cbz     x0, ffff800008ae1e30 <etm4_starting_cpu+0x50>
+    I    149.039572921                 etm4_enable_hw: ffff800008ae1e30:
+    a94153f3   ldp     x19, x20, [sp, #16]
+    I    149.039572921                 etm4_enable_hw: ffff800008ae1e34:
+    52800000   mov     w0, #0x0                        // #0
+    I    149.039572921                 etm4_enable_hw: ffff800008ae1e38:
+    a8c27bfd   ldp     x29, x30, [sp], #32
+
+    ..snip
+
+        149.052324811           chacha_block_generic: ffff800008642d80:
+    9100a3e0   add     x0,
+    I    149.052324811           chacha_block_generic: ffff800008642d84:
+    b86178a2   ldr     w2, [x5, x1, lsl #2]
+    I    149.052324811           chacha_block_generic: ffff800008642d88:
+    8b010803   add     x3, x0, x1, lsl #2
+    I    149.052324811           chacha_block_generic: ffff800008642d8c:
+    b85fc063   ldur    w3, [x3, #-4]
+    I    149.052324811           chacha_block_generic: ffff800008642d90:
+    0b030042   add     w2, w2, w3
+    I    149.052324811           chacha_block_generic: ffff800008642d94:
+    b8217882   str     w2, [x4, x1, lsl #2]
+    I    149.052324811           chacha_block_generic: ffff800008642d98:
+    91000421   add     x1, x1, #0x1
+    I    149.052324811           chacha_block_generic: ffff800008642d9c:
+    f100443f   cmp     x1, #0x11
+
+
+Core 2 dump::
+
+    A                                  etm4_enable_hw: ffff800008ae1dd4
+    CONTEXT EL2                        etm4_enable_hw: ffff800008ae1dd4
+    I                                  etm4_enable_hw: ffff800008ae1dd4:
+    d503201f   nop
+    I                                  etm4_enable_hw: ffff800008ae1dd8:
+    d503201f   nop
+    I                                  etm4_enable_hw: ffff800008ae1ddc:
+    d503201f   nop
+    I                                  etm4_enable_hw: ffff800008ae1de0:
+    d503201f   nop
+    I                                  etm4_enable_hw: ffff800008ae1de4:
+    d503201f   nop
+    I                                  etm4_enable_hw: ffff800008ae1de8:
+    d503233f   paciasp
+    I                                  etm4_enable_hw: ffff800008ae1dec:
+    a9be7bfd   stp     x29, x30, [sp, #-32]!
+    I                                  etm4_enable_hw: ffff800008ae1df0:
+    910003fd   mov     x29, sp
+    I                                  etm4_enable_hw: ffff800008ae1df4:
+    a90153f3   stp     x19, x20, [sp, #16]
+    I                                  etm4_enable_hw: ffff800008ae1df8:
+    2a0003f4   mov     w20, w0
+    I                                  etm4_enable_hw: ffff800008ae1dfc:
+    900085b3   adrp    x19, ffff800009b95000 <reserved_mem+0xc48>
+    I                                  etm4_enable_hw: ffff800008ae1e00:
+    910f4273   add     x19, x19, #0x3d0
+    I                                  etm4_enable_hw: ffff800008ae1e04:
+    f8747a60   ldr     x0, [x19, x20, lsl #3]
+    E                                  etm4_enable_hw: ffff800008ae1e08:
+    b4000140   cbz     x0, ffff800008ae1e30 <etm4_starting_cpu+0x50>
+    I    149.046243445                 etm4_enable_hw: ffff800008ae1e30:
+    a94153f3   ldp     x19, x20, [sp, #16]
+    I    149.046243445                 etm4_enable_hw: ffff800008ae1e34:
+    52800000   mov     w0, #0x0                        // #0
+    I    149.046243445                 etm4_enable_hw: ffff800008ae1e38:
+    a8c27bfd   ldp     x29, x30, [sp], #32
+    I    149.046243445                 etm4_enable_hw: ffff800008ae1e3c:
+    d50323bf   autiasp
+    E    149.046243445                 etm4_enable_hw: ffff800008ae1e40:
+    d65f03c0   ret
+    A                                ete_sysreg_write: ffff800008adfa18
+
+    ..snip
+
+    I     149.05422547                          panic: ffff800008096300:
+    a90363f7   stp     x23, x24, [sp, #48]
+    I     149.05422547                          panic: ffff800008096304:
+    6b00003f   cmp     w1, w0
+    I     149.05422547                          panic: ffff800008096308:
+    3a411804   ccmn    w0, #0x1, #0x4, ne  // ne = any
+    N     149.05422547                          panic: ffff80000809630c:
+    540001e0   b.eq    ffff800008096348 <panic+0xe0>  // b.none
+    I     149.05422547                          panic: ffff800008096310:
+    f90023f9   str     x25, [sp, #64]
+    E     149.05422547                          panic: ffff800008096314:
+    97fe44ef   bl      ffff8000080276d0 <panic_smp_self_stop>
+    A                                           panic: ffff80000809634c
+    I     149.05422547                          panic: ffff80000809634c:
+    910102d5   add     x21, x22, #0x40
+    I     149.05422547                          panic: ffff800008096350:
+    52800020   mov     w0, #0x1                        // #1
+    E     149.05422547                          panic: ffff800008096354:
+    94166b8b   bl      ffff800008631180 <bust_spinlocks>
+    N    149.054225518                 bust_spinlocks: ffff800008631180:
+    340000c0   cbz     w0, ffff800008631198 <bust_spinlocks+0x18>
+    I    149.054225518                 bust_spinlocks: ffff800008631184:
+    f000a321   adrp    x1, ffff800009a98000 <pbufs.0+0xbb8>
+    I    149.054225518                 bust_spinlocks: ffff800008631188:
+    b9405c20   ldr     w0, [x1, #92]
+    I    149.054225518                 bust_spinlocks: ffff80000863118c:
+    11000400   add     w0, w0, #0x1
+    I    149.054225518                 bust_spinlocks: ffff800008631190:
+    b9005c20   str     w0, [x1, #92]
+    E    149.054225518                 bust_spinlocks: ffff800008631194:
+    d65f03c0   ret
+    A                                           panic: ffff800008096358
+
+Perf based testing
+------------------
+
+Starting perf session
+~~~~~~~~~~~~~~~~~~~~~
+ETF::
+
+    perf record -e cs_etm/panicstop,@tmc_etf1/ -C 1
+    perf record -e cs_etm/panicstop,@tmc_etf2/ -C 2
+
+ETR::
+
+    perf record -e cs_etm/panicstop,@tmc_etr0/ -C 1,2
+
+Reading trace data after panic
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Same sysfs based method explained above can be used to retrieve and
+decode the trace data after the reboot on kernel panic.
diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst
new file mode 100644
index 000000000000..d54bc500af80
--- /dev/null
+++ b/Documentation/trace/debugging.rst
@@ -0,0 +1,161 @@
+==============================
+Using the tracer for debugging
+==============================
+
+Copyright 2024 Google LLC.
+
+:Author:   Steven Rostedt <rostedt@goodmis.org>
+:License:  The GNU Free Documentation License, Version 1.2
+          (dual licensed under the GPL v2)
+
+- Written for: 6.12
+
+Introduction
+------------
+The tracing infrastructure can be very useful for debugging the Linux
+kernel. This document is a place to add various methods of using the tracer
+for debugging.
+
+First, make sure that the tracefs file system is mounted::
+
+ $ sudo mount -t tracefs tracefs /sys/kernel/tracing
+
+
+Using trace_printk()
+--------------------
+
+trace_printk() is a very lightweight utility that can be used in any context
+inside the kernel, with the exception of "noinstr" sections. It can be used
+in normal, softirq, interrupt and even NMI context. The trace data is
+written to the tracing ring buffer in a lockless way. To make it even
+lighter weight, when possible, it will only record the pointer to the format
+string, and save the raw arguments into the buffer. The format and the
+arguments will be post processed when the ring buffer is read. This way the
+trace_printk() format conversions are not done during the hot path, where
+the trace is being recorded.
+
+trace_printk() is meant only for debugging, and should never be added into
+a subsystem of the kernel. If you need debugging traces, add trace events
+instead. If a trace_printk() is found in the kernel, the following will
+appear in the dmesg::
+
+  **********************************************************
+  **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
+  **                                                      **
+  ** trace_printk() being used. Allocating extra memory.  **
+  **                                                      **
+  ** This means that this is a DEBUG kernel and it is     **
+  ** unsafe for production use.                           **
+  **                                                      **
+  ** If you see this message and you are not debugging    **
+  ** the kernel, report this immediately to your vendor!  **
+  **                                                      **
+  **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
+  **********************************************************
+
+Debugging kernel crashes
+------------------------
+There is various methods of acquiring the state of the system when a kernel
+crash occurs. This could be from the oops message in printk, or one could
+use kexec/kdump. But these just show what happened at the time of the crash.
+It can be very useful in knowing what happened up to the point of the crash.
+The tracing ring buffer, by default, is a circular buffer than will
+overwrite older events with newer ones. When a crash happens, the content of
+the ring buffer will be all the events that lead up to the crash.
+
+There are several kernel command line parameters that can be used to help in
+this. The first is "ftrace_dump_on_oops". This will dump the tracing ring
+buffer when a oops occurs to the console. This can be useful if the console
+is being logged somewhere. If a serial console is used, it may be prudent to
+make sure the ring buffer is relatively small, otherwise the dumping of the
+ring buffer may take several minutes to hours to finish. Here's an example
+of the kernel command line::
+
+  ftrace_dump_on_oops trace_buf_size=50K
+
+Note, the tracing buffer is made up of per CPU buffers where each of these
+buffers is broken up into sub-buffers that are by default PAGE_SIZE. The
+above trace_buf_size option above sets each of the per CPU buffers to 50K,
+so, on a machine with 8 CPUs, that's actually 400K total.
+
+Persistent buffers across boots
+-------------------------------
+If the system memory allows it, the tracing ring buffer can be specified at
+a specific location in memory. If the location is the same across boots and
+the memory is not modified, the tracing buffer can be retrieved from the
+following boot. There's two ways to reserve memory for the use of the ring
+buffer.
+
+The more reliable way (on x86) is to reserve memory with the "memmap" kernel
+command line option and then use that memory for the trace_instance. This
+requires a bit of knowledge of the physical memory layout of the system. The
+advantage of using this method, is that the memory for the ring buffer will
+always be the same::
+
+  memmap==12M$0x284500000 trace_instance=boot_map@0x284500000:12M
+
+The memmap above reserves 12 megabytes of memory at the physical memory
+location 0x284500000. Then the trace_instance option will create a trace
+instance "boot_map" at that same location with the same amount of memory
+reserved. As the ring buffer is broke up into per CPU buffers, the 12
+megabytes will be broken up evenly between those CPUs. If you have 8 CPUs,
+each per CPU ring buffer will be 1.5 megabytes in size. Note, that also
+includes meta data, so the amount of memory actually used by the ring buffer
+will be slightly smaller.
+
+Another more generic but less robust way to allocate a ring buffer mapping
+at boot is with the "reserve_mem" option::
+
+  reserve_mem=12M:4096:trace trace_instance=boot_map@trace
+
+The reserve_mem option above will find 12 megabytes that are available at
+boot up, and align it by 4096 bytes. It will label this memory as "trace"
+that can be used by later command line options.
+
+The trace_instance option creates a "boot_map" instance and will use the
+memory reserved by reserve_mem that was labeled as "trace". This method is
+more generic but may not be as reliable. Due to KASLR, the memory reserved
+by reserve_mem may not be located at the same location. If this happens,
+then the ring buffer will not be from the previous boot and will be reset.
+
+Sometimes, by using a larger alignment, it can keep KASLR from moving things
+around in such a way that it will move the location of the reserve_mem. By
+using a larger alignment, you may find better that the buffer is more
+consistent to where it is placed::
+
+  reserve_mem=12M:0x2000000:trace trace_instance=boot_map@trace
+
+On boot up, the memory reserved for the ring buffer is validated. It will go
+through a series of tests to make sure that the ring buffer contains valid
+data. If it is, it will then set it up to be available to read from the
+instance. If it fails any of the tests, it will clear the entire ring buffer
+and initialize it as new.
+
+The layout of this mapped memory may not be consistent from kernel to
+kernel, so only the same kernel is guaranteed to work if the mapping is
+preserved. Switching to a different kernel version may find a different
+layout and mark the buffer as invalid.
+
+NB: Both the mapped address and size must be page aligned for the architecture.
+
+Using trace_printk() in the boot instance
+-----------------------------------------
+By default, the content of trace_printk() goes into the top level tracing
+instance. But this instance is never preserved across boots. To have the
+trace_printk() content, and some other internal tracing go to the preserved
+buffer (like dump stacks), either set the instance to be the trace_printk()
+destination from the kernel command line, or set it after boot up via the
+trace_printk_dest option.
+
+After boot up::
+
+  echo 1 > /sys/kernel/tracing/instances/boot_map/options/trace_printk_dest
+
+From the kernel command line::
+
+  reserve_mem=12M:4096:trace trace_instance=boot_map^traceprintk^traceoff@trace
+
+If setting it from the kernel command line, it is recommended to also
+disable tracing with the "traceoff" flag, and enable tracing after boot up.
+Otherwise the trace from the most recent boot will be mixed with the trace
+from the previous boot, and may make it confusing to read.
diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index 759907c20e75..2d88a2acacc0 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -55,6 +55,30 @@ command::
 
 	# echo 'irq:*' > /sys/kernel/tracing/set_event
 
+The set_event file may also be used to enable events associated to only
+a specific module::
+
+	# echo ':mod:<module>' > /sys/kernel/tracing/set_event
+
+Will enable all events in the module ``<module>``.  If the module is not yet
+loaded, the string will be saved and when a module is that matches ``<module>``
+is loaded, then it will apply the enabling of events then.
+
+The text before ``:mod:`` will be parsed to specify specific events that the
+module creates::
+
+	# echo '<match>:mod:<module>' > /sys/kernel/tracing/set_event
+
+The above will enable any system or event that ``<match>`` matches. If
+``<match>`` is ``"*"`` then it will match all events.
+
+To enable only a specific event within a system::
+
+	# echo '<system>:<event>:mod:<module>' > /sys/kernel/tracing/set_event
+
+If ``<event>`` is ``"*"`` then it will match all events within the system
+for a given module.
+
 2.2 Via the 'enable' toggle
 ---------------------------
 
diff --git a/Documentation/trace/fprobe.rst b/Documentation/trace/fprobe.rst
index 196f52386aaa..71cd40472d36 100644
--- a/Documentation/trace/fprobe.rst
+++ b/Documentation/trace/fprobe.rst
@@ -9,9 +9,10 @@ Fprobe - Function entry/exit probe
 Introduction
 ============
 
-Fprobe is a function entry/exit probe mechanism based on ftrace.
-Instead of using ftrace full feature, if you only want to attach callbacks
-on function entry and exit, similar to the kprobes and kretprobes, you can
+Fprobe is a function entry/exit probe based on the function-graph tracing
+feature in ftrace.
+Instead of tracing all functions, if you want to attach callbacks on specific
+function entry and exit, similar to the kprobes and kretprobes, you can
 use fprobe. Compared with kprobes and kretprobes, fprobe gives faster
 instrumentation for multiple functions with single handler. This document
 describes how to use fprobe.
@@ -91,12 +92,14 @@ The prototype of the entry/exit callback function are as follows:
 
 .. code-block:: c
 
- int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct pt_regs *regs, void *entry_data);
+ int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data);
 
- void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct pt_regs *regs, void *entry_data);
+ void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data);
 
-Note that the @entry_ip is saved at function entry and passed to exit handler.
-If the entry callback function returns !0, the corresponding exit callback will be cancelled.
+Note that the @entry_ip is saved at function entry and passed to exit
+handler.
+If the entry callback function returns !0, the corresponding exit callback
+will be cancelled.
 
 @fp
         This is the address of `fprobe` data structure related to this handler.
@@ -112,12 +115,10 @@ If the entry callback function returns !0, the corresponding exit callback will
         This is the return address that the traced function will return to,
         somewhere in the caller. This can be used at both entry and exit.
 
-@regs
-        This is the `pt_regs` data structure at the entry and exit. Note that
-        the instruction pointer of @regs may be different from the @entry_ip
-        in the entry_handler. If you need traced instruction pointer, you need
-        to use @entry_ip. On the other hand, in the exit_handler, the instruction
-        pointer of @regs is set to the current return address.
+@fregs
+        This is the `ftrace_regs` data structure at the entry and exit. This
+        includes the function parameters, or the return values. So user can
+        access thos values via appropriate `ftrace_regs_*` APIs.
 
 @entry_data
         This is a local storage to share the data between entry and exit handlers.
@@ -125,6 +126,17 @@ If the entry callback function returns !0, the corresponding exit callback will
         and `entry_data_size` field when registering the fprobe, the storage is
         allocated and passed to both `entry_handler` and `exit_handler`.
 
+Entry data size and exit handlers on the same function
+======================================================
+
+Since the entry data is passed via per-task stack and it has limited size,
+the entry data size per probe is limited to `15 * sizeof(long)`. You also need
+to take care that the different fprobes are probing on the same function, this
+limit becomes smaller. The entry data size is aligned to `sizeof(long)` and
+each fprobe which has exit handler uses a `sizeof(long)` space on the stack,
+you should keep the number of fprobes on the same function as small as
+possible.
+
 Share the callbacks with kprobes
 ================================
 
@@ -165,8 +177,8 @@ This counter counts up when;
  - fprobe fails to take ftrace_recursion lock. This usually means that a function
    which is traced by other ftrace users is called from the entry_handler.
 
- - fprobe fails to setup the function exit because of the shortage of rethook
-   (the shadow stack for hooking the function return.)
+ - fprobe fails to setup the function exit because of failing to allocate the
+   data buffer from the per-task shadow stack.
 
 The `fprobe::nmissed` field counts up in both cases. Therefore, the former
 skips both of entry and exit callback and the latter skips the exit
diff --git a/Documentation/trace/ftrace-design.rst b/Documentation/trace/ftrace-design.rst
index 6893399157f0..dc82d64b3a44 100644
--- a/Documentation/trace/ftrace-design.rst
+++ b/Documentation/trace/ftrace-design.rst
@@ -217,18 +217,6 @@ along to ftrace_push_return_trace() instead of a stub value of 0.
 
 Similarly, when you call ftrace_return_to_handler(), pass it the frame pointer.
 
-HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
---------------------------------
-
-An arch may pass in a pointer to the return address on the stack.  This
-prevents potential stack unwinding issues where the unwinder gets out of
-sync with ret_stack and the wrong addresses are reported by
-ftrace_graph_ret_addr().
-
-Adding support for it is easy: just define the macro in asm/ftrace.h and
-pass the return address pointer as the 'retp' argument to
-ftrace_push_return_trace().
-
 HAVE_SYSCALL_TRACEPOINTS
 ------------------------
 
diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index 5aba74872ba7..af66a05e18cc 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -810,6 +810,12 @@ Here is the list of current tracers that may be configured.
 	to draw a graph of function calls similar to C code
 	source.
 
+	Note that the function graph calculates the timings of when the
+	function starts and returns internally and for each instance. If
+	there are two instances that run function graph tracer and traces
+	the same functions, the length of the timings may be slightly off as
+	each read the timestamp separately and not at the same time.
+
   "blk"
 
 	The block tracer. The tracer used by the blktrace user
@@ -1031,14 +1037,15 @@ explains which is which.
   CPU#: The CPU which the process was running on.
 
   irqs-off: 'd' interrupts are disabled. '.' otherwise.
-	.. caution:: If the architecture does not support a way to
-		read the irq flags variable, an 'X' will always
-		be printed here.
 
   need-resched:
+	- 'B' all, TIF_NEED_RESCHED, PREEMPT_NEED_RESCHED and TIF_RESCHED_LAZY is set,
 	- 'N' both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set,
 	- 'n' only TIF_NEED_RESCHED is set,
 	- 'p' only PREEMPT_NEED_RESCHED is set,
+	- 'L' both PREEMPT_NEED_RESCHED and TIF_RESCHED_LAZY is set,
+	- 'b' both TIF_NEED_RESCHED and TIF_RESCHED_LAZY is set,
+	- 'l' only TIF_RESCHED_LAZY is set
 	- '.' otherwise.
 
   hardirq/softirq:
@@ -1186,6 +1193,31 @@ Here are the available options:
   trace_printk
 	Can disable trace_printk() from writing into the buffer.
 
+  trace_printk_dest
+	Set to have trace_printk() and similar internal tracing functions
+	write into this instance. Note, only one trace instance can have
+	this set. By setting this flag, it clears the trace_printk_dest flag
+	of the instance that had it set previously. By default, the top
+	level trace has this set, and will get it set again if another
+	instance has it set then clears it.
+
+	This flag cannot be cleared by the top level instance, as it is the
+	default instance. The only way the top level instance has this flag
+	cleared, is by it being set in another instance.
+
+  copy_trace_marker
+	If there are applications that hard code writing into the top level
+	trace_marker file (/sys/kernel/tracing/trace_marker or trace_marker_raw),
+	and the tooling would like it to go into an instance, this option can
+	be used. Create an instance and set this option, and then all writes
+	into the top level trace_marker file will also be redirected into this
+	instance.
+
+	Note, by default this option is set for the top level instance. If it
+	is disabled, then writes to the trace_marker or trace_marker_raw files
+	will not be written into the top level file. If no instance has this
+	option set, then a write will error with the errno of ENODEV.
+
   annotate
 	It is sometimes confusing when the CPU buffers are full
 	and one CPU buffer had a lot of events recently, thus
@@ -3058,7 +3090,7 @@ Notice that we lost the sys_nanosleep.
   # cat set_ftrace_filter
   hrtimer_run_queues
   hrtimer_run_pending
-  hrtimer_init
+  hrtimer_setup
   hrtimer_cancel
   hrtimer_try_to_cancel
   hrtimer_forward
@@ -3096,7 +3128,7 @@ Again, now we want to append.
   # cat set_ftrace_filter
   hrtimer_run_queues
   hrtimer_run_pending
-  hrtimer_init
+  hrtimer_setup
   hrtimer_cancel
   hrtimer_try_to_cancel
   hrtimer_forward
diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst
index 989255eb5622..6eef28ebb0c7 100644
--- a/Documentation/trace/hisi-ptt.rst
+++ b/Documentation/trace/hisi-ptt.rst
@@ -40,7 +40,7 @@ IO dies (SICL, Super I/O Cluster), where there's one PCIe Root
 Complex for each SICL.
 ::
 
-    /sys/devices/hisi_ptt<sicl_id>_<core_id>
+    /sys/bus/event_source/devices/hisi_ptt<sicl_id>_<core_id>
 
 Tune
 ====
@@ -53,7 +53,7 @@ Each event is presented as a file under $(PTT PMU dir)/tune, and
 a simple open/read/write/close cycle will be used to tune the event.
 ::
 
-    $ cd /sys/devices/hisi_ptt<sicl_id>_<core_id>/tune
+    $ cd /sys/bus/event_source/devices/hisi_ptt<sicl_id>_<core_id>/tune
     $ ls
     qos_tx_cpl    qos_tx_np    qos_tx_p
     tx_path_rx_req_alloc_buf_level
diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
index 3c9b263de9c2..0aada18c38c6 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -81,7 +81,7 @@ Documentation written by Tom Zanussi
 	.usecs         display a common_timestamp in microseconds
         .percent       display a number of percentage value
         .graph         display a bar-graph of a value
-	.stacktrace    display as a stacktrace (must by a long[] type)
+	.stacktrace    display as a stacktrace (must be a long[] type)
 	=============  =================================================
 
   Note that in general the semantics of a given field aren't
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 5092d6c13af5..cc1dc5a087e8 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -1,37 +1,103 @@
-==========================
-Linux Tracing Technologies
-==========================
+================================
+Linux Tracing Technologies Guide
+================================
+
+Tracing in the Linux kernel is a powerful mechanism that allows
+developers and system administrators to analyze and debug system
+behavior. This guide provides documentation on various tracing
+frameworks and tools available in the Linux kernel.
+
+Introduction to Tracing
+-----------------------
+
+This section provides an overview of Linux tracing mechanisms
+and debugging approaches.
 
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 1
 
-   ftrace-design
+   debugging
+   tracepoints
    tracepoint-analysis
+   ring-buffer-map
+
+Core Tracing Frameworks
+-----------------------
+
+The following are the primary tracing frameworks integrated into
+the Linux kernel.
+
+.. toctree::
+   :maxdepth: 1
+
    ftrace
+   ftrace-design
    ftrace-uses
-   fprobe
    kprobes
    kprobetrace
-   uprobetracer
    fprobetrace
-   tracepoints
+   fprobe
+   ring-buffer-design
+
+Event Tracing and Analysis
+--------------------------
+
+A detailed explanation of event tracing mechanisms and their
+applications.
+
+.. toctree::
+   :maxdepth: 1
+
    events
    events-kmem
    events-power
    events-nmi
    events-msr
-   mmiotrace
+   boottime-trace
    histogram
    histogram-design
-   boottime-trace
-   hwlat_detector
-   osnoise-tracer
-   timerlat-tracer
+
+Hardware and Performance Tracing
+--------------------------------
+
+This section covers tracing features that monitor hardware
+interactions and system performance.
+
+.. toctree::
+   :maxdepth: 1
+
    intel_th
-   ring-buffer-design
    stm
    sys-t
    coresight/index
-   user_events
    rv/index
    hisi-ptt
+   mmiotrace
+   hwlat_detector
+   osnoise-tracer
+   timerlat-tracer
+
+User-Space Tracing
+------------------
+
+These tools allow tracing user-space applications and
+interactions.
+
+.. toctree::
+   :maxdepth: 1
+
+   user_events
+   uprobetracer
+
+Additional Resources
+--------------------
+
+For more details, refer to the respective documentation of each
+tracing tool and framework.
+
+.. only:: subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 69cb7776ae99..3b6791c17e9b 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -58,8 +58,9 @@ Synopsis of kprobe_events
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
 		  (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
-		  (x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr"
-                  and bitfield are supported.
+		  (x8/x16/x32/x64), VFS layer common type(%pd/%pD), "char",
+                  "string", "ustring", "symbol", "symstr" and bitfield are
+                  supported.
 
   (\*1) only for the probe on function entry (offs == 0). Note, this argument access
         is best effort, because depending on the argument type, it may be passed on
@@ -122,6 +123,9 @@ With 'symstr' type, you can filter the event with wildcard pattern of the
 symbols, and you don't need to solve symbol name by yourself.
 For $comm, the default type is "string"; any other type is invalid.
 
+VFS layer common type(%pd/%pD) is a special type, which fetches dentry's or
+file's name from struct dentry's address or struct file's address.
+
 .. _user_mem_access:
 
 User Memory Access
diff --git a/Documentation/trace/osnoise-tracer.rst b/Documentation/trace/osnoise-tracer.rst
index 140ef2533d26..a520adbd3476 100644
--- a/Documentation/trace/osnoise-tracer.rst
+++ b/Documentation/trace/osnoise-tracer.rst
@@ -108,7 +108,7 @@ The tracer has a set of options inside the osnoise directory, they are:
    option.
  - tracing_threshold: the minimum delta between two time() reads to be
    considered as noise, in us. When set to 0, the default value will
-   be used, which is currently 5 us.
+   be used, which is currently 1 us.
  - osnoise/options: a set of on/off options that can be enabled by
    writing the option name to the file or disabled by writing the option
    name preceded with the 'NO\_' prefix. For example, writing
diff --git a/Documentation/trace/postprocess/decode_msr.py b/Documentation/trace/postprocess/decode_msr.py
index aa9cc7abd5c2..f5609b16f589 100644
--- a/Documentation/trace/postprocess/decode_msr.py
+++ b/Documentation/trace/postprocess/decode_msr.py
@@ -32,6 +32,6 @@ for j in sys.stdin:
 					break
 		if r:
 			j = j.replace(" " + m.group(2), " " + r + "(" + m.group(2) + ")")
-	print j,
+	print(j)
 
 
diff --git a/Documentation/trace/ring-buffer-map.rst b/Documentation/trace/ring-buffer-map.rst
new file mode 100644
index 000000000000..8e296bcc0d7f
--- /dev/null
+++ b/Documentation/trace/ring-buffer-map.rst
@@ -0,0 +1,106 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+Tracefs ring-buffer memory mapping
+==================================
+
+:Author: Vincent Donnefort <vdonnefort@google.com>
+
+Overview
+========
+Tracefs ring-buffer memory map provides an efficient method to stream data
+as no memory copy is necessary. The application mapping the ring-buffer becomes
+then a consumer for that ring-buffer, in a similar fashion to trace_pipe.
+
+Memory mapping setup
+====================
+The mapping works with a mmap() of the trace_pipe_raw interface.
+
+The first system page of the mapping contains ring-buffer statistics and
+description. It is referred to as the meta-page. One of the most important
+fields of the meta-page is the reader. It contains the sub-buffer ID which can
+be safely read by the mapper (see ring-buffer-design.rst).
+
+The meta-page is followed by all the sub-buffers, ordered by ascending ID. It is
+therefore effortless to know where the reader starts in the mapping:
+
+.. code-block:: c
+
+        reader_id = meta->reader->id;
+        reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;
+
+When the application is done with the current reader, it can get a new one using
+the trace_pipe_raw ioctl() TRACE_MMAP_IOCTL_GET_READER. This ioctl also updates
+the meta-page fields.
+
+Limitations
+===========
+When a mapping is in place on a Tracefs ring-buffer, it is not possible to
+either resize it (either by increasing the entire size of the ring-buffer or
+each subbuf). It is also not possible to use snapshot and causes splice to copy
+the ring buffer data instead of using the copyless swap from the ring buffer.
+
+Concurrent readers (either another application mapping that ring-buffer or the
+kernel with trace_pipe) are allowed but not recommended. They will compete for
+the ring-buffer and the output is unpredictable, just like concurrent readers on
+trace_pipe would be.
+
+Example
+=======
+
+.. code-block:: c
+
+        #include <fcntl.h>
+        #include <stdio.h>
+        #include <stdlib.h>
+        #include <unistd.h>
+
+        #include <linux/trace_mmap.h>
+
+        #include <sys/mman.h>
+        #include <sys/ioctl.h>
+
+        #define TRACE_PIPE_RAW "/sys/kernel/tracing/per_cpu/cpu0/trace_pipe_raw"
+
+        int main(void)
+        {
+                int page_size = getpagesize(), fd, reader_id;
+                unsigned long meta_len, data_len;
+                struct trace_buffer_meta *meta;
+                void *map, *reader, *data;
+
+                fd = open(TRACE_PIPE_RAW, O_RDONLY | O_NONBLOCK);
+                if (fd < 0)
+                        exit(EXIT_FAILURE);
+
+                map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
+                if (map == MAP_FAILED)
+                        exit(EXIT_FAILURE);
+
+                meta = (struct trace_buffer_meta *)map;
+                meta_len = meta->meta_page_size;
+
+                printf("entries:        %llu\n", meta->entries);
+                printf("overrun:        %llu\n", meta->overrun);
+                printf("read:           %llu\n", meta->read);
+                printf("nr_subbufs:     %u\n", meta->nr_subbufs);
+
+                data_len = meta->subbuf_size * meta->nr_subbufs;
+                data = mmap(NULL, data_len, PROT_READ, MAP_SHARED, fd, meta_len);
+                if (data == MAP_FAILED)
+                        exit(EXIT_FAILURE);
+
+                if (ioctl(fd, TRACE_MMAP_IOCTL_GET_READER) < 0)
+                        exit(EXIT_FAILURE);
+
+                reader_id = meta->reader.id;
+                reader = data + meta->subbuf_size * reader_id;
+
+                printf("Current reader address: %p\n", reader);
+
+                munmap(data, data_len);
+                munmap(meta, meta_len);
+                close (fd);
+
+                return 0;
+        }
diff --git a/Documentation/trace/rv/index.rst b/Documentation/trace/rv/index.rst
index 15fa966102c0..e80e0057feb4 100644
--- a/Documentation/trace/rv/index.rst
+++ b/Documentation/trace/rv/index.rst
@@ -12,3 +12,4 @@ Runtime Verification
    da_monitor_instrumentation.rst
    monitor_wip.rst
    monitor_wwnr.rst
+   monitor_sched.rst
diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
new file mode 100644
index 000000000000..24b2c62a3bc2
--- /dev/null
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -0,0 +1,171 @@
+Scheduler monitors
+==================
+
+- Name: sched
+- Type: container for multiple monitors
+- Author: Gabriele Monaco <gmonaco@redhat.com>, Daniel Bristot de Oliveira <bristot@kernel.org>
+
+Description
+-----------
+
+Monitors describing complex systems, such as the scheduler, can easily grow to
+the point where they are just hard to understand because of the many possible
+state transitions.
+Often it is possible to break such descriptions into smaller monitors,
+sharing some or all events. Enabling those smaller monitors concurrently is,
+in fact, testing the system as if we had one single larger monitor.
+Splitting models into multiple specification is not only easier to
+understand, but gives some more clues when we see errors.
+
+The sched monitor is a set of specifications to describe the scheduler behaviour.
+It includes several per-cpu and per-task monitors that work independently to verify
+different specifications the scheduler should follow.
+
+To make this system as straightforward as possible, sched specifications are *nested*
+monitors, whereas sched itself is a *container*.
+From the interface perspective, sched includes other monitors as sub-directories,
+enabling/disabling or setting reactors to sched, propagates the change to all monitors,
+however single monitors can be used independently as well.
+
+It is important that future modules are built after their container (sched, in
+this case), otherwise the linker would not respect the order and the nesting
+wouldn't work as expected.
+To do so, simply add them after sched in the Makefile.
+
+Specifications
+--------------
+
+The specifications included in sched are currently a work in progress, adapting the ones
+defined in by Daniel Bristot in [1].
+
+Currently we included the following:
+
+Monitor tss
+~~~~~~~~~~~
+
+The task switch while scheduling (tss) monitor ensures a task switch happens
+only in scheduling context, that is inside a call to `__schedule`::
+
+                     |
+                     |
+                     v
+                   +-----------------+
+                   |     thread      | <+
+                   +-----------------+  |
+                     |                  |
+                     | schedule_entry   | schedule_exit
+                     v                  |
+    sched_switch                        |
+  +---------------                      |
+  |                       sched         |
+  +-------------->                     -+
+
+Monitor sco
+~~~~~~~~~~~
+
+The scheduling context operations (sco) monitor ensures changes in a task state
+happen only in thread context::
+
+
+                        |
+                        |
+                        v
+    sched_set_state   +------------------+
+  +------------------ |                  |
+  |                   |  thread_context  |
+  +-----------------> |                  | <+
+                      +------------------+  |
+                        |                   |
+                        | schedule_entry    | schedule_exit
+                        v                   |
+                                            |
+                       scheduling_context  -+
+
+Monitor snroc
+~~~~~~~~~~~~~
+
+The set non runnable on its own context (snroc) monitor ensures changes in a
+task state happens only in the respective task's context. This is a per-task
+monitor::
+
+                        |
+                        |
+                        v
+                      +------------------+
+                      |  other_context   | <+
+                      +------------------+  |
+                        |                   |
+                        | sched_switch_in   | sched_switch_out
+                        v                   |
+    sched_set_state                         |
+  +------------------                       |
+  |                       own_context       |
+  +----------------->                      -+
+
+Monitor scpd
+~~~~~~~~~~~~
+
+The schedule called with preemption disabled (scpd) monitor ensures schedule is
+called with preemption disabled::
+
+                       |
+                       |
+                       v
+                     +------------------+
+                     |    cant_sched    | <+
+                     +------------------+  |
+                       |                   |
+                       | preempt_disable   | preempt_enable
+                       v                   |
+    schedule_entry                         |
+    schedule_exit                          |
+  +-----------------      can_sched        |
+  |                                        |
+  +---------------->                      -+
+
+Monitor snep
+~~~~~~~~~~~~
+
+The schedule does not enable preempt (snep) monitor ensures a schedule call
+does not enable preemption::
+
+                        |
+                        |
+                        v
+    preempt_disable   +------------------------+
+    preempt_enable    |                        |
+  +------------------ | non_scheduling_context |
+  |                   |                        |
+  +-----------------> |                        | <+
+                      +------------------------+  |
+                        |                         |
+                        | schedule_entry          | schedule_exit
+                        v                         |
+                                                  |
+                          scheduling_contex      -+
+
+Monitor sncid
+~~~~~~~~~~~~~
+
+The schedule not called with interrupt disabled (sncid) monitor ensures
+schedule is not called with interrupt disabled::
+
+                       |
+                       |
+                       v
+    schedule_entry   +--------------+
+    schedule_exit    |              |
+  +----------------- |  can_sched   |
+  |                  |              |
+  +----------------> |              | <+
+                     +--------------+  |
+                       |               |
+                       | irq_disable   | irq_enable
+                       v               |
+                                       |
+                        cant_sched    -+
+
+References
+----------
+
+[1] - https://bristot.me/linux-task-model
diff --git a/Documentation/trace/rv/runtime-verification.rst b/Documentation/trace/rv/runtime-verification.rst
index dae78dfa7cdc..c700dde9259c 100644
--- a/Documentation/trace/rv/runtime-verification.rst
+++ b/Documentation/trace/rv/runtime-verification.rst
@@ -8,14 +8,14 @@ checking* and *theorem proving*) with a more practical approach for complex
 systems.
 
 Instead of relying on a fine-grained model of a system (e.g., a
-re-implementation a instruction level), RV works by analyzing the trace of the
+re-implementation at instruction level), RV works by analyzing the trace of the
 system's actual execution, comparing it against a formal specification of
 the system behavior.
 
 The main advantage is that RV can give precise information on the runtime
 behavior of the monitored system, without the pitfalls of developing models
 that require a re-implementation of the entire system in a modeling language.
-Moreover, given an efficient monitoring method, it is possible execute an
+Moreover, given an efficient monitoring method, it is possible to execute an
 *online* verification of a system, enabling the *reaction* for unexpected
 events, avoiding, for example, the propagation of a failure on safety-critical
 systems.
diff --git a/Documentation/trace/tracepoints.rst b/Documentation/trace/tracepoints.rst
index decabcc77b56..b35c40e3abbe 100644
--- a/Documentation/trace/tracepoints.rst
+++ b/Documentation/trace/tracepoints.rst
@@ -71,7 +71,7 @@ In subsys/file.c (where the tracing statement must be added)::
 	void somefct(void)
 	{
 		...
-		trace_subsys_eventname(arg, task);
+		trace_subsys_eventname_tp(arg, task);
 		...
 	}
 
@@ -129,12 +129,12 @@ within an if statement with the following::
 		for (i = 0; i < count; i++)
 			tot += calculate_nuggets();
 
-		trace_foo_bar(tot);
+		trace_foo_bar_tp(tot);
 	}
 
-All trace_<tracepoint>() calls have a matching trace_<tracepoint>_enabled()
+All trace_<tracepoint>_tp() calls have a matching trace_<tracepoint>_enabled()
 function defined that returns true if the tracepoint is enabled and
-false otherwise. The trace_<tracepoint>() should always be within the
+false otherwise. The trace_<tracepoint>_tp() should always be within the
 block of the if (trace_<tracepoint>_enabled()) to prevent races between
 the tracepoint being enabled and the check being seen.
 
@@ -143,7 +143,10 @@ the static_key of the tracepoint to allow the if statement to be implemented
 with jump labels and avoid conditional branches.
 
 .. note:: The convenience macro TRACE_EVENT provides an alternative way to
-      define tracepoints. Check http://lwn.net/Articles/379903,
+      define tracepoints. Note, DECLARE_TRACE(foo) creates a function
+      "trace_foo_tp()" whereas TRACE_EVENT(foo) creates a function
+      "trace_foo()", and also exposes the tracepoint as a trace event in
+      /sys/kernel/tracing/events directory.  Check http://lwn.net/Articles/379903,
       http://lwn.net/Articles/381064 and http://lwn.net/Articles/383362
       for a series of articles with more details.
 
@@ -159,7 +162,9 @@ In a C file::
 
 	void do_trace_foo_bar_wrapper(args)
 	{
-		trace_foo_bar(args);
+		trace_foo_bar_tp(args); // for tracepoints created via DECLARE_TRACE
+					//   or
+		trace_foo_bar(args);    // for tracepoints created via TRACE_EVENT
 	}
 
 In the header file::