diff options
Diffstat (limited to 'Documentation/trace')
-rw-r--r-- | Documentation/trace/coresight/coresight-perf.rst | 31 | ||||
-rw-r--r-- | Documentation/trace/coresight/coresight.rst | 41 | ||||
-rw-r--r-- | Documentation/trace/coresight/panic.rst | 362 | ||||
-rw-r--r-- | Documentation/trace/debugging.rst | 161 | ||||
-rw-r--r-- | Documentation/trace/events.rst | 24 | ||||
-rw-r--r-- | Documentation/trace/fprobe.rst | 42 | ||||
-rw-r--r-- | Documentation/trace/ftrace-design.rst | 12 | ||||
-rw-r--r-- | Documentation/trace/ftrace.rst | 42 | ||||
-rw-r--r-- | Documentation/trace/hisi-ptt.rst | 4 | ||||
-rw-r--r-- | Documentation/trace/histogram.rst | 2 | ||||
-rw-r--r-- | Documentation/trace/index.rst | 96 | ||||
-rw-r--r-- | Documentation/trace/kprobetrace.rst | 8 | ||||
-rw-r--r-- | Documentation/trace/osnoise-tracer.rst | 2 | ||||
-rw-r--r-- | Documentation/trace/postprocess/decode_msr.py | 2 | ||||
-rw-r--r-- | Documentation/trace/ring-buffer-map.rst | 106 | ||||
-rw-r--r-- | Documentation/trace/rv/index.rst | 1 | ||||
-rw-r--r-- | Documentation/trace/rv/monitor_sched.rst | 171 | ||||
-rw-r--r-- | Documentation/trace/rv/runtime-verification.rst | 4 | ||||
-rw-r--r-- | Documentation/trace/tracepoints.rst | 17 |
19 files changed, 1041 insertions, 87 deletions
diff --git a/Documentation/trace/coresight/coresight-perf.rst b/Documentation/trace/coresight/coresight-perf.rst index d087aae7d492..30be89320621 100644 --- a/Documentation/trace/coresight/coresight-perf.rst +++ b/Documentation/trace/coresight/coresight-perf.rst @@ -78,6 +78,37 @@ enabled like:: Please refer to the kernel configuration help for more information. +Fine-grained tracing with AUX pause and resume +---------------------------------------------- + +Arm CoreSight may generate a large amount of hardware trace data, which +will lead to overhead in recording and distract users when reviewing +profiling result. To mitigate the issue of excessive trace data, Perf +provides AUX pause and resume functionality for fine-grained tracing. + +The AUX pause and resume can be triggered by associated events. These +events can be ftrace tracepoints (including static and dynamic +tracepoints) or PMU events (e.g. CPU PMU cycle event). To create a perf +session with AUX pause / resume, three configuration terms are +introduced: + +- "aux-action=start-paused": it is specified for the cs_etm PMU event to + launch in a paused state. +- "aux-action=pause": an associated event is specified with this term + to pause AUX trace. +- "aux-action=resume": an associated event is specified with this term + to resume AUX trace. + +Example for triggering AUX pause and resume with ftrace tracepoints:: + + perf record -e cs_etm/aux-action=start-paused/k,syscalls:sys_enter_openat/aux-action=resume/,syscalls:sys_exit_openat/aux-action=pause/ ls + +Example for triggering AUX pause and resume with PMU event:: + + perf record -a -e cs_etm/aux-action=start-paused/k \ + -e cycles/aux-action=pause,period=10000000/ \ + -e cycles/aux-action=resume,period=1050000/ -- sleep 1 + Perf test - Verify kernel and userspace perf CoreSight work ----------------------------------------------------------- diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst index d4f93d6a2d63..806699871b80 100644 --- a/Documentation/trace/coresight/coresight.rst +++ b/Documentation/trace/coresight/coresight.rst @@ -462,44 +462,35 @@ queried by the perf command line tool: cs_etm// [Kernel PMU event] - linaro@linaro-nano:~$ - Regardless of the number of tracers available in a system (usually equal to the amount of processor cores), the "cs_etm" PMU will be listed only once. A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is -listed along with configuration options within forward slashes '/'. Since a -Coresight system will typically have more than one sink, the name of the sink to -work with needs to be specified as an event option. -On newer kernels the available sinks are listed in sysFS under +provided along with configuration options within forward slashes '/' (see +`Config option formats`_). + +Advanced Perf framework usage +----------------------------- + +Sink selection +~~~~~~~~~~~~~~ + +An appropriate sink will be selected automatically for use with Perf, but since +there will typically be more than one sink, the name of the sink to use may be +specified as a special config option prefixed with '@'. + +The available sinks are listed in sysFS under ($SYSFS)/bus/event_source/devices/cs_etm/sinks/:: root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls tmc_etf0 tmc_etr0 tpiu0 -On older kernels, this may need to be found from the list of coresight devices, -available under ($SYSFS)/bus/coresight/devices/:: - - root:~# ls /sys/bus/coresight/devices/ - etm0 etm1 etm2 etm3 etm4 etm5 funnel0 - funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0 root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program -As mentioned above in section "Device Naming scheme", the names of the devices could -look different from what is used in the example above. One must use the device names -as it appears under the sysFS. - -The syntax within the forward slashes '/' is important. The '@' character -tells the parser that a sink is about to be specified and that this is the sink -to use for the trace session. - More information on the above and other example on how to use Coresight with the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub repository [#third]_. -Advanced perf framework usage ------------------------------ - AutoFDO analysis using the perf tools ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -508,7 +499,7 @@ perf can be used to record and analyze trace of programs. Execution can be recorded using 'perf record' with the cs_etm event, specifying the name of the sink to record to, e.g:: - perf record -e cs_etm/@tmc_etr0/u --per-thread + perf record -e cs_etm//u --per-thread The 'perf report' and 'perf script' commands can be used to analyze execution, synthesizing instruction and branch events from the instruction trace. @@ -572,7 +563,7 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto Bubble sorting array of 30000 elements 5910 ms - $ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort + $ perf record -e cs_etm//u --per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 12543 ms [ perf record: Woken up 35 times to write data ] diff --git a/Documentation/trace/coresight/panic.rst b/Documentation/trace/coresight/panic.rst new file mode 100644 index 000000000000..6e4bde953cae --- /dev/null +++ b/Documentation/trace/coresight/panic.rst @@ -0,0 +1,362 @@ +=================================================== +Using Coresight for Kernel panic and Watchdog reset +=================================================== + +Introduction +------------ +This documentation is about using Linux coresight trace support to +debug kernel panic and watchdog reset scenarios. + +Coresight trace during Kernel panic +----------------------------------- +From the coresight driver point of view, addressing the kernel panic +situation has four main requirements. + +a. Support for allocation of trace buffer pages from reserved memory area. + Platform can advertise this using a new device tree property added to + relevant coresight nodes. + +b. Support for stopping coresight blocks at the time of panic + +c. Saving required metadata in the specified format + +d. Support for reading trace data captured at the time of panic + +Allocation of trace buffer pages from reserved RAM +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A new optional device tree property "memory-region" is added to the +Coresight TMC device nodes, that would give the base address and size of trace +buffer. + +Static allocation of trace buffers would ensure that both IOMMU enabled +and disabled cases are handled. Also, platforms that support persistent +RAM will allow users to read trace data in the subsequent boot without +booting the crashdump kernel. + +Note: +For ETR sink devices, this reserved region will be used for both trace +capture and trace data retrieval. +For ETF sink devices, internal SRAM would be used for trace capture, +and they would be synced to reserved region for retrieval. + + +Disabling coresight blocks at the time of panic +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In order to avoid the situation of losing relevant trace data after a +kernel panic, it would be desirable to stop the coresight blocks at the +time of panic. + +This can be achieved by configuring the comparator, CTI and sink +devices as below:: + + Trigger on panic + Comparator --->External out --->CTI -->External In---->ETR/ETF stop + +Saving metadata at the time of kernel panic +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Coresight metadata involves all additional data that are required for a +successful trace decode in addition to the trace data. This involves +ETR/ETF/ETB register snapshot etc. + +A new optional device property "memory-region" is added to +the ETR/ETF/ETB device nodes for this. + +Reading trace data captured at the time of panic +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Trace data captured at the time of panic, can be read from rebooted kernel +or from crashdump kernel using a special device file /dev/crash_tmc_xxx. +This device file is created only when there is a valid crashdata available. + +General flow of trace capture and decode in case of kernel panic +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +1. Enable source and sink on all the cores using the sysfs interface. + ETR sinks should have trace buffers allocated from reserved memory, + by selecting "resrv" buffer mode from sysfs. + +2. Run relevant tests. + +3. On a kernel panic, all coresight blocks are disabled, necessary + metadata is synced by kernel panic handler. + + System would eventually reboot or boot a crashdump kernel. + +4. For platforms that supports crashdump kernel, raw trace data can be + dumped using the coresight sysfs interface from the crashdump kernel + itself. Persistent RAM is not a requirement in this case. + +5. For platforms that supports persistent RAM, trace data can be dumped + using the coresight sysfs interface in the subsequent Linux boot. + Crashdump kernel is not a requirement in this case. Persistent RAM + ensures that trace data is intact across reboot. + +Coresight trace during Watchdog reset +------------------------------------- +The main difference between addressing the watchdog reset and kernel panic +case are below, + +a. Saving coresight metadata need to be taken care by the + SCP(system control processor) firmware in the specified format, + instead of kernel. + +b. Reserved memory region given by firmware for trace buffer and metadata + has to be in persistent RAM. + Note: This is a requirement for watchdog reset case but optional + in kernel panic case. + +Watchdog reset can be supported only on platforms that meet the above +two requirements. + +Sample commands for testing a Kernel panic case with ETR sink +------------------------------------------------------------- + +1. Boot Linux kernel with "crash_kexec_post_notifiers" added to the kernel + bootargs. This is mandatory if the user would like to read the tracedata + from the crashdump kernel. + +2. Enable the preloaded ETM configuration:: + + #echo 1 > /sys/kernel/config/cs-syscfg/configurations/panicstop/enable + +3. Configure CTI using sysfs interface:: + + #./cti_setup.sh + + #cat cti_setup.sh + + + cd /sys/bus/coresight/devices/ + + ap_cti_config () { + #ETM trig out[0] trigger to Channel 0 + echo 0 4 > channels/trigin_attach + } + + etf_cti_config () { + #ETF Flush in trigger from Channel 0 + echo 0 1 > channels/trigout_attach + echo 1 > channels/trig_filter_enable + } + + etr_cti_config () { + #ETR Flush in from Channel 0 + echo 0 1 > channels/trigout_attach + echo 1 > channels/trig_filter_enable + } + + ctidevs=`find . -name "cti*"` + + for i in $ctidevs + do + cd $i + + connection=`find . -name "ete*"` + if [ ! -z "$connection" ] + then + echo "AP CTI config for $i" + ap_cti_config + fi + + connection=`find . -name "tmc_etf*"` + if [ ! -z "$connection" ] + then + echo "ETF CTI config for $i" + etf_cti_config + fi + + connection=`find . -name "tmc_etr*"` + if [ ! -z "$connection" ] + then + echo "ETR CTI config for $i" + etr_cti_config + fi + + cd .. + done + +Note: CTI connections are SOC specific and hence the above script is +added just for reference. + +4. Choose reserved buffer mode for ETR buffer:: + + #echo "resrv" > /sys/bus/coresight/devices/tmc_etr0/buf_mode_preferred + +5. Enable stop on flush trigger configuration:: + + #echo 1 > /sys/bus/coresight/devices/tmc_etr0/stop_on_flush + +6. Start Coresight tracing on cores 1 and 2 using sysfs interface + +7. Run some application on core 1:: + + #taskset -c 1 dd if=/dev/urandom of=/dev/null & + +8. Invoke kernel panic on core 2:: + + #echo 1 > /proc/sys/kernel/panic + #taskset -c 2 echo c > /proc/sysrq-trigger + +9. From rebooted kernel or crashdump kernel, read crashdata:: + + #dd if=/dev/crash_tmc_etr0 of=/trace/cstrace.bin + +10. Run opencsd decoder tools/scripts to generate the instruction trace. + +Sample instruction trace dump +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Core1 dump:: + + A etm4_enable_hw: ffff800008ae1dd4 + CONTEXT EL2 etm4_enable_hw: ffff800008ae1dd4 + I etm4_enable_hw: ffff800008ae1dd4: + d503201f nop + I etm4_enable_hw: ffff800008ae1dd8: + d503201f nop + I etm4_enable_hw: ffff800008ae1ddc: + d503201f nop + I etm4_enable_hw: ffff800008ae1de0: + d503201f nop + I etm4_enable_hw: ffff800008ae1de4: + d503201f nop + I etm4_enable_hw: ffff800008ae1de8: + d503233f paciasp + I etm4_enable_hw: ffff800008ae1dec: + a9be7bfd stp x29, x30, [sp, #-32]! + I etm4_enable_hw: ffff800008ae1df0: + 910003fd mov x29, sp + I etm4_enable_hw: ffff800008ae1df4: + a90153f3 stp x19, x20, [sp, #16] + I etm4_enable_hw: ffff800008ae1df8: + 2a0003f4 mov w20, w0 + I etm4_enable_hw: ffff800008ae1dfc: + 900085b3 adrp x19, ffff800009b95000 <reserved_mem+0xc48> + I etm4_enable_hw: ffff800008ae1e00: + 910f4273 add x19, x19, #0x3d0 + I etm4_enable_hw: ffff800008ae1e04: + f8747a60 ldr x0, [x19, x20, lsl #3] + E etm4_enable_hw: ffff800008ae1e08: + b4000140 cbz x0, ffff800008ae1e30 <etm4_starting_cpu+0x50> + I 149.039572921 etm4_enable_hw: ffff800008ae1e30: + a94153f3 ldp x19, x20, [sp, #16] + I 149.039572921 etm4_enable_hw: ffff800008ae1e34: + 52800000 mov w0, #0x0 // #0 + I 149.039572921 etm4_enable_hw: ffff800008ae1e38: + a8c27bfd ldp x29, x30, [sp], #32 + + ..snip + + 149.052324811 chacha_block_generic: ffff800008642d80: + 9100a3e0 add x0, + I 149.052324811 chacha_block_generic: ffff800008642d84: + b86178a2 ldr w2, [x5, x1, lsl #2] + I 149.052324811 chacha_block_generic: ffff800008642d88: + 8b010803 add x3, x0, x1, lsl #2 + I 149.052324811 chacha_block_generic: ffff800008642d8c: + b85fc063 ldur w3, [x3, #-4] + I 149.052324811 chacha_block_generic: ffff800008642d90: + 0b030042 add w2, w2, w3 + I 149.052324811 chacha_block_generic: ffff800008642d94: + b8217882 str w2, [x4, x1, lsl #2] + I 149.052324811 chacha_block_generic: ffff800008642d98: + 91000421 add x1, x1, #0x1 + I 149.052324811 chacha_block_generic: ffff800008642d9c: + f100443f cmp x1, #0x11 + + +Core 2 dump:: + + A etm4_enable_hw: ffff800008ae1dd4 + CONTEXT EL2 etm4_enable_hw: ffff800008ae1dd4 + I etm4_enable_hw: ffff800008ae1dd4: + d503201f nop + I etm4_enable_hw: ffff800008ae1dd8: + d503201f nop + I etm4_enable_hw: ffff800008ae1ddc: + d503201f nop + I etm4_enable_hw: ffff800008ae1de0: + d503201f nop + I etm4_enable_hw: ffff800008ae1de4: + d503201f nop + I etm4_enable_hw: ffff800008ae1de8: + d503233f paciasp + I etm4_enable_hw: ffff800008ae1dec: + a9be7bfd stp x29, x30, [sp, #-32]! + I etm4_enable_hw: ffff800008ae1df0: + 910003fd mov x29, sp + I etm4_enable_hw: ffff800008ae1df4: + a90153f3 stp x19, x20, [sp, #16] + I etm4_enable_hw: ffff800008ae1df8: + 2a0003f4 mov w20, w0 + I etm4_enable_hw: ffff800008ae1dfc: + 900085b3 adrp x19, ffff800009b95000 <reserved_mem+0xc48> + I etm4_enable_hw: ffff800008ae1e00: + 910f4273 add x19, x19, #0x3d0 + I etm4_enable_hw: ffff800008ae1e04: + f8747a60 ldr x0, [x19, x20, lsl #3] + E etm4_enable_hw: ffff800008ae1e08: + b4000140 cbz x0, ffff800008ae1e30 <etm4_starting_cpu+0x50> + I 149.046243445 etm4_enable_hw: ffff800008ae1e30: + a94153f3 ldp x19, x20, [sp, #16] + I 149.046243445 etm4_enable_hw: ffff800008ae1e34: + 52800000 mov w0, #0x0 // #0 + I 149.046243445 etm4_enable_hw: ffff800008ae1e38: + a8c27bfd ldp x29, x30, [sp], #32 + I 149.046243445 etm4_enable_hw: ffff800008ae1e3c: + d50323bf autiasp + E 149.046243445 etm4_enable_hw: ffff800008ae1e40: + d65f03c0 ret + A ete_sysreg_write: ffff800008adfa18 + + ..snip + + I 149.05422547 panic: ffff800008096300: + a90363f7 stp x23, x24, [sp, #48] + I 149.05422547 panic: ffff800008096304: + 6b00003f cmp w1, w0 + I 149.05422547 panic: ffff800008096308: + 3a411804 ccmn w0, #0x1, #0x4, ne // ne = any + N 149.05422547 panic: ffff80000809630c: + 540001e0 b.eq ffff800008096348 <panic+0xe0> // b.none + I 149.05422547 panic: ffff800008096310: + f90023f9 str x25, [sp, #64] + E 149.05422547 panic: ffff800008096314: + 97fe44ef bl ffff8000080276d0 <panic_smp_self_stop> + A panic: ffff80000809634c + I 149.05422547 panic: ffff80000809634c: + 910102d5 add x21, x22, #0x40 + I 149.05422547 panic: ffff800008096350: + 52800020 mov w0, #0x1 // #1 + E 149.05422547 panic: ffff800008096354: + 94166b8b bl ffff800008631180 <bust_spinlocks> + N 149.054225518 bust_spinlocks: ffff800008631180: + 340000c0 cbz w0, ffff800008631198 <bust_spinlocks+0x18> + I 149.054225518 bust_spinlocks: ffff800008631184: + f000a321 adrp x1, ffff800009a98000 <pbufs.0+0xbb8> + I 149.054225518 bust_spinlocks: ffff800008631188: + b9405c20 ldr w0, [x1, #92] + I 149.054225518 bust_spinlocks: ffff80000863118c: + 11000400 add w0, w0, #0x1 + I 149.054225518 bust_spinlocks: ffff800008631190: + b9005c20 str w0, [x1, #92] + E 149.054225518 bust_spinlocks: ffff800008631194: + d65f03c0 ret + A panic: ffff800008096358 + +Perf based testing +------------------ + +Starting perf session +~~~~~~~~~~~~~~~~~~~~~ +ETF:: + + perf record -e cs_etm/panicstop,@tmc_etf1/ -C 1 + perf record -e cs_etm/panicstop,@tmc_etf2/ -C 2 + +ETR:: + + perf record -e cs_etm/panicstop,@tmc_etr0/ -C 1,2 + +Reading trace data after panic +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Same sysfs based method explained above can be used to retrieve and +decode the trace data after the reboot on kernel panic. diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst new file mode 100644 index 000000000000..d54bc500af80 --- /dev/null +++ b/Documentation/trace/debugging.rst @@ -0,0 +1,161 @@ +============================== +Using the tracer for debugging +============================== + +Copyright 2024 Google LLC. + +:Author: Steven Rostedt <rostedt@goodmis.org> +:License: The GNU Free Documentation License, Version 1.2 + (dual licensed under the GPL v2) + +- Written for: 6.12 + +Introduction +------------ +The tracing infrastructure can be very useful for debugging the Linux +kernel. This document is a place to add various methods of using the tracer +for debugging. + +First, make sure that the tracefs file system is mounted:: + + $ sudo mount -t tracefs tracefs /sys/kernel/tracing + + +Using trace_printk() +-------------------- + +trace_printk() is a very lightweight utility that can be used in any context +inside the kernel, with the exception of "noinstr" sections. It can be used +in normal, softirq, interrupt and even NMI context. The trace data is +written to the tracing ring buffer in a lockless way. To make it even +lighter weight, when possible, it will only record the pointer to the format +string, and save the raw arguments into the buffer. The format and the +arguments will be post processed when the ring buffer is read. This way the +trace_printk() format conversions are not done during the hot path, where +the trace is being recorded. + +trace_printk() is meant only for debugging, and should never be added into +a subsystem of the kernel. If you need debugging traces, add trace events +instead. If a trace_printk() is found in the kernel, the following will +appear in the dmesg:: + + ********************************************************** + ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** + ** ** + ** trace_printk() being used. Allocating extra memory. ** + ** ** + ** This means that this is a DEBUG kernel and it is ** + ** unsafe for production use. ** + ** ** + ** If you see this message and you are not debugging ** + ** the kernel, report this immediately to your vendor! ** + ** ** + ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** + ********************************************************** + +Debugging kernel crashes +------------------------ +There is various methods of acquiring the state of the system when a kernel +crash occurs. This could be from the oops message in printk, or one could +use kexec/kdump. But these just show what happened at the time of the crash. +It can be very useful in knowing what happened up to the point of the crash. +The tracing ring buffer, by default, is a circular buffer than will +overwrite older events with newer ones. When a crash happens, the content of +the ring buffer will be all the events that lead up to the crash. + +There are several kernel command line parameters that can be used to help in +this. The first is "ftrace_dump_on_oops". This will dump the tracing ring +buffer when a oops occurs to the console. This can be useful if the console +is being logged somewhere. If a serial console is used, it may be prudent to +make sure the ring buffer is relatively small, otherwise the dumping of the +ring buffer may take several minutes to hours to finish. Here's an example +of the kernel command line:: + + ftrace_dump_on_oops trace_buf_size=50K + +Note, the tracing buffer is made up of per CPU buffers where each of these +buffers is broken up into sub-buffers that are by default PAGE_SIZE. The +above trace_buf_size option above sets each of the per CPU buffers to 50K, +so, on a machine with 8 CPUs, that's actually 400K total. + +Persistent buffers across boots +------------------------------- +If the system memory allows it, the tracing ring buffer can be specified at +a specific location in memory. If the location is the same across boots and +the memory is not modified, the tracing buffer can be retrieved from the +following boot. There's two ways to reserve memory for the use of the ring +buffer. + +The more reliable way (on x86) is to reserve memory with the "memmap" kernel +command line option and then use that memory for the trace_instance. This +requires a bit of knowledge of the physical memory layout of the system. The +advantage of using this method, is that the memory for the ring buffer will +always be the same:: + + memmap==12M$0x284500000 trace_instance=boot_map@0x284500000:12M + +The memmap above reserves 12 megabytes of memory at the physical memory +location 0x284500000. Then the trace_instance option will create a trace +instance "boot_map" at that same location with the same amount of memory +reserved. As the ring buffer is broke up into per CPU buffers, the 12 +megabytes will be broken up evenly between those CPUs. If you have 8 CPUs, +each per CPU ring buffer will be 1.5 megabytes in size. Note, that also +includes meta data, so the amount of memory actually used by the ring buffer +will be slightly smaller. + +Another more generic but less robust way to allocate a ring buffer mapping +at boot is with the "reserve_mem" option:: + + reserve_mem=12M:4096:trace trace_instance=boot_map@trace + +The reserve_mem option above will find 12 megabytes that are available at +boot up, and align it by 4096 bytes. It will label this memory as "trace" +that can be used by later command line options. + +The trace_instance option creates a "boot_map" instance and will use the +memory reserved by reserve_mem that was labeled as "trace". This method is +more generic but may not be as reliable. Due to KASLR, the memory reserved +by reserve_mem may not be located at the same location. If this happens, +then the ring buffer will not be from the previous boot and will be reset. + +Sometimes, by using a larger alignment, it can keep KASLR from moving things +around in such a way that it will move the location of the reserve_mem. By +using a larger alignment, you may find better that the buffer is more +consistent to where it is placed:: + + reserve_mem=12M:0x2000000:trace trace_instance=boot_map@trace + +On boot up, the memory reserved for the ring buffer is validated. It will go +through a series of tests to make sure that the ring buffer contains valid +data. If it is, it will then set it up to be available to read from the +instance. If it fails any of the tests, it will clear the entire ring buffer +and initialize it as new. + +The layout of this mapped memory may not be consistent from kernel to +kernel, so only the same kernel is guaranteed to work if the mapping is +preserved. Switching to a different kernel version may find a different +layout and mark the buffer as invalid. + +NB: Both the mapped address and size must be page aligned for the architecture. + +Using trace_printk() in the boot instance +----------------------------------------- +By default, the content of trace_printk() goes into the top level tracing +instance. But this instance is never preserved across boots. To have the +trace_printk() content, and some other internal tracing go to the preserved +buffer (like dump stacks), either set the instance to be the trace_printk() +destination from the kernel command line, or set it after boot up via the +trace_printk_dest option. + +After boot up:: + + echo 1 > /sys/kernel/tracing/instances/boot_map/options/trace_printk_dest + +From the kernel command line:: + + reserve_mem=12M:4096:trace trace_instance=boot_map^traceprintk^traceoff@trace + +If setting it from the kernel command line, it is recommended to also +disable tracing with the "traceoff" flag, and enable tracing after boot up. +Otherwise the trace from the most recent boot will be mixed with the trace +from the previous boot, and may make it confusing to read. diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst index 759907c20e75..2d88a2acacc0 100644 --- a/Documentation/trace/events.rst +++ b/Documentation/trace/events.rst @@ -55,6 +55,30 @@ command:: # echo 'irq:*' > /sys/kernel/tracing/set_event +The set_event file may also be used to enable events associated to only +a specific module:: + + # echo ':mod:<module>' > /sys/kernel/tracing/set_event + +Will enable all events in the module ``<module>``. If the module is not yet +loaded, the string will be saved and when a module is that matches ``<module>`` +is loaded, then it will apply the enabling of events then. + +The text before ``:mod:`` will be parsed to specify specific events that the +module creates:: + + # echo '<match>:mod:<module>' > /sys/kernel/tracing/set_event + +The above will enable any system or event that ``<match>`` matches. If +``<match>`` is ``"*"`` then it will match all events. + +To enable only a specific event within a system:: + + # echo '<system>:<event>:mod:<module>' > /sys/kernel/tracing/set_event + +If ``<event>`` is ``"*"`` then it will match all events within the system +for a given module. + 2.2 Via the 'enable' toggle --------------------------- diff --git a/Documentation/trace/fprobe.rst b/Documentation/trace/fprobe.rst index 196f52386aaa..71cd40472d36 100644 --- a/Documentation/trace/fprobe.rst +++ b/Documentation/trace/fprobe.rst @@ -9,9 +9,10 @@ Fprobe - Function entry/exit probe Introduction ============ -Fprobe is a function entry/exit probe mechanism based on ftrace. -Instead of using ftrace full feature, if you only want to attach callbacks -on function entry and exit, similar to the kprobes and kretprobes, you can +Fprobe is a function entry/exit probe based on the function-graph tracing +feature in ftrace. +Instead of tracing all functions, if you want to attach callbacks on specific +function entry and exit, similar to the kprobes and kretprobes, you can use fprobe. Compared with kprobes and kretprobes, fprobe gives faster instrumentation for multiple functions with single handler. This document describes how to use fprobe. @@ -91,12 +92,14 @@ The prototype of the entry/exit callback function are as follows: .. code-block:: c - int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct pt_regs *regs, void *entry_data); + int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data); - void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct pt_regs *regs, void *entry_data); + void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data); -Note that the @entry_ip is saved at function entry and passed to exit handler. -If the entry callback function returns !0, the corresponding exit callback will be cancelled. +Note that the @entry_ip is saved at function entry and passed to exit +handler. +If the entry callback function returns !0, the corresponding exit callback +will be cancelled. @fp This is the address of `fprobe` data structure related to this handler. @@ -112,12 +115,10 @@ If the entry callback function returns !0, the corresponding exit callback will This is the return address that the traced function will return to, somewhere in the caller. This can be used at both entry and exit. -@regs - This is the `pt_regs` data structure at the entry and exit. Note that - the instruction pointer of @regs may be different from the @entry_ip - in the entry_handler. If you need traced instruction pointer, you need - to use @entry_ip. On the other hand, in the exit_handler, the instruction - pointer of @regs is set to the current return address. +@fregs + This is the `ftrace_regs` data structure at the entry and exit. This + includes the function parameters, or the return values. So user can + access thos values via appropriate `ftrace_regs_*` APIs. @entry_data This is a local storage to share the data between entry and exit handlers. @@ -125,6 +126,17 @@ If the entry callback function returns !0, the corresponding exit callback will and `entry_data_size` field when registering the fprobe, the storage is allocated and passed to both `entry_handler` and `exit_handler`. +Entry data size and exit handlers on the same function +====================================================== + +Since the entry data is passed via per-task stack and it has limited size, +the entry data size per probe is limited to `15 * sizeof(long)`. You also need +to take care that the different fprobes are probing on the same function, this +limit becomes smaller. The entry data size is aligned to `sizeof(long)` and +each fprobe which has exit handler uses a `sizeof(long)` space on the stack, +you should keep the number of fprobes on the same function as small as +possible. + Share the callbacks with kprobes ================================ @@ -165,8 +177,8 @@ This counter counts up when; - fprobe fails to take ftrace_recursion lock. This usually means that a function which is traced by other ftrace users is called from the entry_handler. - - fprobe fails to setup the function exit because of the shortage of rethook - (the shadow stack for hooking the function return.) + - fprobe fails to setup the function exit because of failing to allocate the + data buffer from the per-task shadow stack. The `fprobe::nmissed` field counts up in both cases. Therefore, the former skips both of entry and exit callback and the latter skips the exit diff --git a/Documentation/trace/ftrace-design.rst b/Documentation/trace/ftrace-design.rst index 6893399157f0..dc82d64b3a44 100644 --- a/Documentation/trace/ftrace-design.rst +++ b/Documentation/trace/ftrace-design.rst @@ -217,18 +217,6 @@ along to ftrace_push_return_trace() instead of a stub value of 0. Similarly, when you call ftrace_return_to_handler(), pass it the frame pointer. -HAVE_FUNCTION_GRAPH_RET_ADDR_PTR --------------------------------- - -An arch may pass in a pointer to the return address on the stack. This -prevents potential stack unwinding issues where the unwinder gets out of -sync with ret_stack and the wrong addresses are reported by -ftrace_graph_ret_addr(). - -Adding support for it is easy: just define the macro in asm/ftrace.h and -pass the return address pointer as the 'retp' argument to -ftrace_push_return_trace(). - HAVE_SYSCALL_TRACEPOINTS ------------------------ diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index 5aba74872ba7..af66a05e18cc 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -810,6 +810,12 @@ Here is the list of current tracers that may be configured. to draw a graph of function calls similar to C code source. + Note that the function graph calculates the timings of when the + function starts and returns internally and for each instance. If + there are two instances that run function graph tracer and traces + the same functions, the length of the timings may be slightly off as + each read the timestamp separately and not at the same time. + "blk" The block tracer. The tracer used by the blktrace user @@ -1031,14 +1037,15 @@ explains which is which. CPU#: The CPU which the process was running on. irqs-off: 'd' interrupts are disabled. '.' otherwise. - .. caution:: If the architecture does not support a way to - read the irq flags variable, an 'X' will always - be printed here. need-resched: + - 'B' all, TIF_NEED_RESCHED, PREEMPT_NEED_RESCHED and TIF_RESCHED_LAZY is set, - 'N' both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set, - 'n' only TIF_NEED_RESCHED is set, - 'p' only PREEMPT_NEED_RESCHED is set, + - 'L' both PREEMPT_NEED_RESCHED and TIF_RESCHED_LAZY is set, + - 'b' both TIF_NEED_RESCHED and TIF_RESCHED_LAZY is set, + - 'l' only TIF_RESCHED_LAZY is set - '.' otherwise. hardirq/softirq: @@ -1186,6 +1193,31 @@ Here are the available options: trace_printk Can disable trace_printk() from writing into the buffer. + trace_printk_dest + Set to have trace_printk() and similar internal tracing functions + write into this instance. Note, only one trace instance can have + this set. By setting this flag, it clears the trace_printk_dest flag + of the instance that had it set previously. By default, the top + level trace has this set, and will get it set again if another + instance has it set then clears it. + + This flag cannot be cleared by the top level instance, as it is the + default instance. The only way the top level instance has this flag + cleared, is by it being set in another instance. + + copy_trace_marker + If there are applications that hard code writing into the top level + trace_marker file (/sys/kernel/tracing/trace_marker or trace_marker_raw), + and the tooling would like it to go into an instance, this option can + be used. Create an instance and set this option, and then all writes + into the top level trace_marker file will also be redirected into this + instance. + + Note, by default this option is set for the top level instance. If it + is disabled, then writes to the trace_marker or trace_marker_raw files + will not be written into the top level file. If no instance has this + option set, then a write will error with the errno of ENODEV. + annotate It is sometimes confusing when the CPU buffers are full and one CPU buffer had a lot of events recently, thus @@ -3058,7 +3090,7 @@ Notice that we lost the sys_nanosleep. # cat set_ftrace_filter hrtimer_run_queues hrtimer_run_pending - hrtimer_init + hrtimer_setup hrtimer_cancel hrtimer_try_to_cancel hrtimer_forward @@ -3096,7 +3128,7 @@ Again, now we want to append. # cat set_ftrace_filter hrtimer_run_queues hrtimer_run_pending - hrtimer_init + hrtimer_setup hrtimer_cancel hrtimer_try_to_cancel hrtimer_forward diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst index 989255eb5622..6eef28ebb0c7 100644 --- a/Documentation/trace/hisi-ptt.rst +++ b/Documentation/trace/hisi-ptt.rst @@ -40,7 +40,7 @@ IO dies (SICL, Super I/O Cluster), where there's one PCIe Root Complex for each SICL. :: - /sys/devices/hisi_ptt<sicl_id>_<core_id> + /sys/bus/event_source/devices/hisi_ptt<sicl_id>_<core_id> Tune ==== @@ -53,7 +53,7 @@ Each event is presented as a file under $(PTT PMU dir)/tune, and a simple open/read/write/close cycle will be used to tune the event. :: - $ cd /sys/devices/hisi_ptt<sicl_id>_<core_id>/tune + $ cd /sys/bus/event_source/devices/hisi_ptt<sicl_id>_<core_id>/tune $ ls qos_tx_cpl qos_tx_np qos_tx_p tx_path_rx_req_alloc_buf_level diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst index 3c9b263de9c2..0aada18c38c6 100644 --- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -81,7 +81,7 @@ Documentation written by Tom Zanussi .usecs display a common_timestamp in microseconds .percent display a number of percentage value .graph display a bar-graph of a value - .stacktrace display as a stacktrace (must by a long[] type) + .stacktrace display as a stacktrace (must be a long[] type) ============= ================================================= Note that in general the semantics of a given field aren't diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst index 5092d6c13af5..cc1dc5a087e8 100644 --- a/Documentation/trace/index.rst +++ b/Documentation/trace/index.rst @@ -1,37 +1,103 @@ -========================== -Linux Tracing Technologies -========================== +================================ +Linux Tracing Technologies Guide +================================ + +Tracing in the Linux kernel is a powerful mechanism that allows +developers and system administrators to analyze and debug system +behavior. This guide provides documentation on various tracing +frameworks and tools available in the Linux kernel. + +Introduction to Tracing +----------------------- + +This section provides an overview of Linux tracing mechanisms +and debugging approaches. .. toctree:: - :maxdepth: 2 + :maxdepth: 1 - ftrace-design + debugging + tracepoints tracepoint-analysis + ring-buffer-map + +Core Tracing Frameworks +----------------------- + +The following are the primary tracing frameworks integrated into +the Linux kernel. + +.. toctree:: + :maxdepth: 1 + ftrace + ftrace-design ftrace-uses - fprobe kprobes kprobetrace - uprobetracer fprobetrace - tracepoints + fprobe + ring-buffer-design + +Event Tracing and Analysis +-------------------------- + +A detailed explanation of event tracing mechanisms and their +applications. + +.. toctree:: + :maxdepth: 1 + events events-kmem events-power events-nmi events-msr - mmiotrace + boottime-trace histogram histogram-design - boottime-trace - hwlat_detector - osnoise-tracer - timerlat-tracer + +Hardware and Performance Tracing +-------------------------------- + +This section covers tracing features that monitor hardware +interactions and system performance. + +.. toctree:: + :maxdepth: 1 + intel_th - ring-buffer-design stm sys-t coresight/index - user_events rv/index hisi-ptt + mmiotrace + hwlat_detector + osnoise-tracer + timerlat-tracer + +User-Space Tracing +------------------ + +These tools allow tracing user-space applications and +interactions. + +.. toctree:: + :maxdepth: 1 + + user_events + uprobetracer + +Additional Resources +-------------------- + +For more details, refer to the respective documentation of each +tracing tool and framework. + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst index 69cb7776ae99..3b6791c17e9b 100644 --- a/Documentation/trace/kprobetrace.rst +++ b/Documentation/trace/kprobetrace.rst @@ -58,8 +58,9 @@ Synopsis of kprobe_events NAME=FETCHARG : Set NAME as the argument name of FETCHARG. FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types - (x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr" - and bitfield are supported. + (x8/x16/x32/x64), VFS layer common type(%pd/%pD), "char", + "string", "ustring", "symbol", "symstr" and bitfield are + supported. (\*1) only for the probe on function entry (offs == 0). Note, this argument access is best effort, because depending on the argument type, it may be passed on @@ -122,6 +123,9 @@ With 'symstr' type, you can filter the event with wildcard pattern of the symbols, and you don't need to solve symbol name by yourself. For $comm, the default type is "string"; any other type is invalid. +VFS layer common type(%pd/%pD) is a special type, which fetches dentry's or +file's name from struct dentry's address or struct file's address. + .. _user_mem_access: User Memory Access diff --git a/Documentation/trace/osnoise-tracer.rst b/Documentation/trace/osnoise-tracer.rst index 140ef2533d26..a520adbd3476 100644 --- a/Documentation/trace/osnoise-tracer.rst +++ b/Documentation/trace/osnoise-tracer.rst @@ -108,7 +108,7 @@ The tracer has a set of options inside the osnoise directory, they are: option. - tracing_threshold: the minimum delta between two time() reads to be considered as noise, in us. When set to 0, the default value will - be used, which is currently 5 us. + be used, which is currently 1 us. - osnoise/options: a set of on/off options that can be enabled by writing the option name to the file or disabled by writing the option name preceded with the 'NO\_' prefix. For example, writing diff --git a/Documentation/trace/postprocess/decode_msr.py b/Documentation/trace/postprocess/decode_msr.py index aa9cc7abd5c2..f5609b16f589 100644 --- a/Documentation/trace/postprocess/decode_msr.py +++ b/Documentation/trace/postprocess/decode_msr.py @@ -32,6 +32,6 @@ for j in sys.stdin: break if r: j = j.replace(" " + m.group(2), " " + r + "(" + m.group(2) + ")") - print j, + print(j) diff --git a/Documentation/trace/ring-buffer-map.rst b/Documentation/trace/ring-buffer-map.rst new file mode 100644 index 000000000000..8e296bcc0d7f --- /dev/null +++ b/Documentation/trace/ring-buffer-map.rst @@ -0,0 +1,106 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================== +Tracefs ring-buffer memory mapping +================================== + +:Author: Vincent Donnefort <vdonnefort@google.com> + +Overview +======== +Tracefs ring-buffer memory map provides an efficient method to stream data +as no memory copy is necessary. The application mapping the ring-buffer becomes +then a consumer for that ring-buffer, in a similar fashion to trace_pipe. + +Memory mapping setup +==================== +The mapping works with a mmap() of the trace_pipe_raw interface. + +The first system page of the mapping contains ring-buffer statistics and +description. It is referred to as the meta-page. One of the most important +fields of the meta-page is the reader. It contains the sub-buffer ID which can +be safely read by the mapper (see ring-buffer-design.rst). + +The meta-page is followed by all the sub-buffers, ordered by ascending ID. It is +therefore effortless to know where the reader starts in the mapping: + +.. code-block:: c + + reader_id = meta->reader->id; + reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size; + +When the application is done with the current reader, it can get a new one using +the trace_pipe_raw ioctl() TRACE_MMAP_IOCTL_GET_READER. This ioctl also updates +the meta-page fields. + +Limitations +=========== +When a mapping is in place on a Tracefs ring-buffer, it is not possible to +either resize it (either by increasing the entire size of the ring-buffer or +each subbuf). It is also not possible to use snapshot and causes splice to copy +the ring buffer data instead of using the copyless swap from the ring buffer. + +Concurrent readers (either another application mapping that ring-buffer or the +kernel with trace_pipe) are allowed but not recommended. They will compete for +the ring-buffer and the output is unpredictable, just like concurrent readers on +trace_pipe would be. + +Example +======= + +.. code-block:: c + + #include <fcntl.h> + #include <stdio.h> + #include <stdlib.h> + #include <unistd.h> + + #include <linux/trace_mmap.h> + + #include <sys/mman.h> + #include <sys/ioctl.h> + + #define TRACE_PIPE_RAW "/sys/kernel/tracing/per_cpu/cpu0/trace_pipe_raw" + + int main(void) + { + int page_size = getpagesize(), fd, reader_id; + unsigned long meta_len, data_len; + struct trace_buffer_meta *meta; + void *map, *reader, *data; + + fd = open(TRACE_PIPE_RAW, O_RDONLY | O_NONBLOCK); + if (fd < 0) + exit(EXIT_FAILURE); + + map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); + if (map == MAP_FAILED) + exit(EXIT_FAILURE); + + meta = (struct trace_buffer_meta *)map; + meta_len = meta->meta_page_size; + + printf("entries: %llu\n", meta->entries); + printf("overrun: %llu\n", meta->overrun); + printf("read: %llu\n", meta->read); + printf("nr_subbufs: %u\n", meta->nr_subbufs); + + data_len = meta->subbuf_size * meta->nr_subbufs; + data = mmap(NULL, data_len, PROT_READ, MAP_SHARED, fd, meta_len); + if (data == MAP_FAILED) + exit(EXIT_FAILURE); + + if (ioctl(fd, TRACE_MMAP_IOCTL_GET_READER) < 0) + exit(EXIT_FAILURE); + + reader_id = meta->reader.id; + reader = data + meta->subbuf_size * reader_id; + + printf("Current reader address: %p\n", reader); + + munmap(data, data_len); + munmap(meta, meta_len); + close (fd); + + return 0; + } diff --git a/Documentation/trace/rv/index.rst b/Documentation/trace/rv/index.rst index 15fa966102c0..e80e0057feb4 100644 --- a/Documentation/trace/rv/index.rst +++ b/Documentation/trace/rv/index.rst @@ -12,3 +12,4 @@ Runtime Verification da_monitor_instrumentation.rst monitor_wip.rst monitor_wwnr.rst + monitor_sched.rst diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst new file mode 100644 index 000000000000..24b2c62a3bc2 --- /dev/null +++ b/Documentation/trace/rv/monitor_sched.rst @@ -0,0 +1,171 @@ +Scheduler monitors +================== + +- Name: sched +- Type: container for multiple monitors +- Author: Gabriele Monaco <gmonaco@redhat.com>, Daniel Bristot de Oliveira <bristot@kernel.org> + +Description +----------- + +Monitors describing complex systems, such as the scheduler, can easily grow to +the point where they are just hard to understand because of the many possible +state transitions. +Often it is possible to break such descriptions into smaller monitors, +sharing some or all events. Enabling those smaller monitors concurrently is, +in fact, testing the system as if we had one single larger monitor. +Splitting models into multiple specification is not only easier to +understand, but gives some more clues when we see errors. + +The sched monitor is a set of specifications to describe the scheduler behaviour. +It includes several per-cpu and per-task monitors that work independently to verify +different specifications the scheduler should follow. + +To make this system as straightforward as possible, sched specifications are *nested* +monitors, whereas sched itself is a *container*. +From the interface perspective, sched includes other monitors as sub-directories, +enabling/disabling or setting reactors to sched, propagates the change to all monitors, +however single monitors can be used independently as well. + +It is important that future modules are built after their container (sched, in +this case), otherwise the linker would not respect the order and the nesting +wouldn't work as expected. +To do so, simply add them after sched in the Makefile. + +Specifications +-------------- + +The specifications included in sched are currently a work in progress, adapting the ones +defined in by Daniel Bristot in [1]. + +Currently we included the following: + +Monitor tss +~~~~~~~~~~~ + +The task switch while scheduling (tss) monitor ensures a task switch happens +only in scheduling context, that is inside a call to `__schedule`:: + + | + | + v + +-----------------+ + | thread | <+ + +-----------------+ | + | | + | schedule_entry | schedule_exit + v | + sched_switch | + +--------------- | + | sched | + +--------------> -+ + +Monitor sco +~~~~~~~~~~~ + +The scheduling context operations (sco) monitor ensures changes in a task state +happen only in thread context:: + + + | + | + v + sched_set_state +------------------+ + +------------------ | | + | | thread_context | + +-----------------> | | <+ + +------------------+ | + | | + | schedule_entry | schedule_exit + v | + | + scheduling_context -+ + +Monitor snroc +~~~~~~~~~~~~~ + +The set non runnable on its own context (snroc) monitor ensures changes in a +task state happens only in the respective task's context. This is a per-task +monitor:: + + | + | + v + +------------------+ + | other_context | <+ + +------------------+ | + | | + | sched_switch_in | sched_switch_out + v | + sched_set_state | + +------------------ | + | own_context | + +-----------------> -+ + +Monitor scpd +~~~~~~~~~~~~ + +The schedule called with preemption disabled (scpd) monitor ensures schedule is +called with preemption disabled:: + + | + | + v + +------------------+ + | cant_sched | <+ + +------------------+ | + | | + | preempt_disable | preempt_enable + v | + schedule_entry | + schedule_exit | + +----------------- can_sched | + | | + +----------------> -+ + +Monitor snep +~~~~~~~~~~~~ + +The schedule does not enable preempt (snep) monitor ensures a schedule call +does not enable preemption:: + + | + | + v + preempt_disable +------------------------+ + preempt_enable | | + +------------------ | non_scheduling_context | + | | | + +-----------------> | | <+ + +------------------------+ | + | | + | schedule_entry | schedule_exit + v | + | + scheduling_contex -+ + +Monitor sncid +~~~~~~~~~~~~~ + +The schedule not called with interrupt disabled (sncid) monitor ensures +schedule is not called with interrupt disabled:: + + | + | + v + schedule_entry +--------------+ + schedule_exit | | + +----------------- | can_sched | + | | | + +----------------> | | <+ + +--------------+ | + | | + | irq_disable | irq_enable + v | + | + cant_sched -+ + +References +---------- + +[1] - https://bristot.me/linux-task-model diff --git a/Documentation/trace/rv/runtime-verification.rst b/Documentation/trace/rv/runtime-verification.rst index dae78dfa7cdc..c700dde9259c 100644 --- a/Documentation/trace/rv/runtime-verification.rst +++ b/Documentation/trace/rv/runtime-verification.rst @@ -8,14 +8,14 @@ checking* and *theorem proving*) with a more practical approach for complex systems. Instead of relying on a fine-grained model of a system (e.g., a -re-implementation a instruction level), RV works by analyzing the trace of the +re-implementation at instruction level), RV works by analyzing the trace of the system's actual execution, comparing it against a formal specification of the system behavior. The main advantage is that RV can give precise information on the runtime behavior of the monitored system, without the pitfalls of developing models that require a re-implementation of the entire system in a modeling language. -Moreover, given an efficient monitoring method, it is possible execute an +Moreover, given an efficient monitoring method, it is possible to execute an *online* verification of a system, enabling the *reaction* for unexpected events, avoiding, for example, the propagation of a failure on safety-critical systems. diff --git a/Documentation/trace/tracepoints.rst b/Documentation/trace/tracepoints.rst index decabcc77b56..b35c40e3abbe 100644 --- a/Documentation/trace/tracepoints.rst +++ b/Documentation/trace/tracepoints.rst @@ -71,7 +71,7 @@ In subsys/file.c (where the tracing statement must be added):: void somefct(void) { ... - trace_subsys_eventname(arg, task); + trace_subsys_eventname_tp(arg, task); ... } @@ -129,12 +129,12 @@ within an if statement with the following:: for (i = 0; i < count; i++) tot += calculate_nuggets(); - trace_foo_bar(tot); + trace_foo_bar_tp(tot); } -All trace_<tracepoint>() calls have a matching trace_<tracepoint>_enabled() +All trace_<tracepoint>_tp() calls have a matching trace_<tracepoint>_enabled() function defined that returns true if the tracepoint is enabled and -false otherwise. The trace_<tracepoint>() should always be within the +false otherwise. The trace_<tracepoint>_tp() should always be within the block of the if (trace_<tracepoint>_enabled()) to prevent races between the tracepoint being enabled and the check being seen. @@ -143,7 +143,10 @@ the static_key of the tracepoint to allow the if statement to be implemented with jump labels and avoid conditional branches. .. note:: The convenience macro TRACE_EVENT provides an alternative way to - define tracepoints. Check http://lwn.net/Articles/379903, + define tracepoints. Note, DECLARE_TRACE(foo) creates a function + "trace_foo_tp()" whereas TRACE_EVENT(foo) creates a function + "trace_foo()", and also exposes the tracepoint as a trace event in + /sys/kernel/tracing/events directory. Check http://lwn.net/Articles/379903, http://lwn.net/Articles/381064 and http://lwn.net/Articles/383362 for a series of articles with more details. @@ -159,7 +162,9 @@ In a C file:: void do_trace_foo_bar_wrapper(args) { - trace_foo_bar(args); + trace_foo_bar_tp(args); // for tracepoints created via DECLARE_TRACE + // or + trace_foo_bar(args); // for tracepoints created via TRACE_EVENT } In the header file:: |