diff options
Diffstat (limited to 'Documentation/trace/histogram.rst')
| -rw-r--r-- | Documentation/trace/histogram.rst | 930 |
1 files changed, 728 insertions, 202 deletions
diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst index 7dda76503127..340bcb5099e7 100644 --- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -25,7 +25,7 @@ Documentation written by Tom Zanussi hist:keys=<field1[,field2,...]>[:values=<field1[,field2,...]>] [:sort=<field1[,field2,...]>][:size=#entries][:pause][:continue] - [:clear][:name=histname1] [if <filter>] + [:clear][:name=histname1][:nohitcount][:<handler>.<action>] [if <filter>] When a matching event is hit, an entry is added to a hash table using the key(s) and value(s) named. Keys and values correspond to @@ -35,11 +35,11 @@ Documentation written by Tom Zanussi in place of an explicit value field - this is simply a count of event hits. If 'values' isn't specified, an implicit 'hitcount' value will be automatically created and used as the only value. - Keys can be any field, or the special string 'stacktrace', which + Keys can be any field, or the special string 'common_stacktrace', which will use the event's kernel stacktrace as the key. The keywords 'keys' or 'key' can be used to specify keys, and the keywords 'values', 'vals', or 'val' can be used to specify values. Compound - keys consisting of up to two fields can be specified by the 'keys' + keys consisting of up to three fields can be specified by the 'keys' keyword. Hashing a compound key produces a unique entry in the table for each unique combination of component keys, and can be useful for providing more fine-grained summaries of event data. @@ -54,7 +54,7 @@ Documentation written by Tom Zanussi 'compatible' if the fields named in the trigger share the same number and type of fields and those fields also have the same names. Note that any two events always share the compatible 'hitcount' and - 'stacktrace' fields and can therefore be combined using those + 'common_stacktrace' fields and can therefore be combined using those fields, however pointless that may be. 'hist' triggers add a 'hist' file to each event's subdirectory. @@ -70,15 +70,19 @@ Documentation written by Tom Zanussi modified by appending any of the following modifiers to the field name: - =========== ========================================== - .hex display a number as a hex value - .sym display an address as a symbol - .sym-offset display an address as a symbol and offset - .syscall display a syscall id as a system call name - .execname display a common_pid as a program name - .log2 display log2 value rather than raw number - .usecs display a common_timestamp in microseconds - =========== ========================================== + ============= ================================================= + .hex display a number as a hex value + .sym display an address as a symbol + .sym-offset display an address as a symbol and offset + .syscall display a syscall id as a system call name + .execname display a common_pid as a program name + .log2 display log2 value rather than raw number + .buckets=size display grouping of values rather than raw number + .usecs display a common_timestamp in microseconds + .percent display a number of percentage value + .graph display a bar-graph of a value + .stacktrace display as a stacktrace (must be a long[] type) + ============= ================================================= Note that in general the semantics of a given field aren't interpreted when applying a modifier to it, but there are some @@ -99,12 +103,12 @@ Documentation written by Tom Zanussi trigger, read its current contents, and then turn it off:: # echo 'hist:keys=skbaddr.hex:vals=len' > \ - /sys/kernel/debug/tracing/events/net/netif_rx/trigger + /sys/kernel/tracing/events/net/netif_rx/trigger - # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist + # cat /sys/kernel/tracing/events/net/netif_rx/hist # echo '!hist:keys=skbaddr.hex:vals=len' > \ - /sys/kernel/debug/tracing/events/net/netif_rx/trigger + /sys/kernel/tracing/events/net/netif_rx/trigger The trigger file itself can be read to show the details of the currently attached hist trigger. This information is also displayed @@ -136,6 +140,12 @@ Documentation written by Tom Zanussi existing trigger, rather than via the '>' operator, which will cause the trigger to be removed through truncation. + The 'nohitcount' (or NOHC) parameter will suppress display of + raw hitcount in the histogram. This option requires at least one + value field which is not a 'raw hitcount'. For example, + 'hist:...:vals=hitcount:nohitcount' is rejected, but + 'hist:...:vals=hitcount.percent:nohitcount' is OK. + - enable_hist/disable_hist The enable_hist and disable_hist triggers can be used to have one @@ -160,13 +170,13 @@ Documentation written by Tom Zanussi aggregation on and off when conditions of interest are hit:: # echo 'hist:keys=skbaddr.hex:vals=len:pause' > \ - /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + /sys/kernel/tracing/events/net/netif_receive_skb/trigger # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \ - /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + /sys/kernel/tracing/events/sched/sched_process_exec/trigger # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \ - /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + /sys/kernel/tracing/events/sched/sched_process_exit/trigger The above sets up an initially paused hist trigger which is unpaused and starts aggregating events when a given program is executed, and @@ -176,8 +186,8 @@ Documentation written by Tom Zanussi The examples below provide a more concrete illustration of the concepts and typical usage patterns discussed above. -'special' event fields ------------------------- +2.1. 'special' event fields +--------------------------- There are a number of 'special event fields' available for use as keys or values in a hist trigger. These look like and behave as if @@ -191,37 +201,25 @@ Documentation written by Tom Zanussi with the event, in nanoseconds. May be modified by .usecs to have timestamps interpreted as microseconds. - cpu int the cpu on which the event occurred. + common_cpu int the cpu on which the event occurred. ====================== ==== ======================================= -Extended error information --------------------------- +2.2. Extended error information +------------------------------- For some error conditions encountered when invoking a hist trigger command, extended error information is available via the - corresponding event's 'hist' file. Reading the hist file after an - error will display more detailed information about what went wrong, - if information is available. This extended error information will - be available until the next hist trigger command for that event. - - If available for a given error condition, the extended error - information and usage takes the following form:: - - # echo xxx > /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger - echo: write error: Invalid argument + tracing/error_log file. See "Error conditions" section in + Documentation/trace/ftrace.rst for details. - # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist - ERROR: Couldn't yyy: zzz - Last command: xxx - -6.2 'hist' trigger examples ---------------------------- +2.3. 'hist' trigger examples +---------------------------- The first set of examples creates aggregations using the kmalloc event. The fields that can be used for the hist trigger are listed in the kmalloc event's format file:: - # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/format + # cat /sys/kernel/tracing/events/kmem/kmalloc/format name: kmalloc ID: 374 format: @@ -240,8 +238,8 @@ Extended error information that lists the total number of bytes requested for each function in the kernel that made one or more calls to kmalloc:: - # echo 'hist:key=call_site:val=bytes_req' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + # echo 'hist:key=call_site:val=bytes_req.buckets=32' > \ + /sys/kernel/tracing/events/kmem/kmalloc/trigger This tells the tracing system to create a 'hist' trigger using the call_site field of the kmalloc event as the key for the table, which @@ -251,11 +249,11 @@ Extended error information table, it should keep a running total of the number of bytes requested by that call_site. - We'll let it run for awhile and then dump the contents of the 'hist' + We'll let it run for a while and then dump the contents of the 'hist' file in the kmalloc event's subdirectory (for readability, a number of entries have been omitted):: - # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # cat /sys/kernel/tracing/events/kmem/kmalloc/hist # trigger info: hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active] { call_site: 18446744072106379007 } hitcount: 1 bytes_req: 176 @@ -295,7 +293,7 @@ Extended error information the trigger info, which can also be displayed by reading the 'trigger' file:: - # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + # cat /sys/kernel/tracing/events/kmem/kmalloc/trigger hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active] At the end of the output are a few lines that display the overall @@ -326,7 +324,7 @@ Extended error information command history and re-execute it with a '!' prepended:: # echo '!hist:key=call_site:val=bytes_req' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + /sys/kernel/tracing/events/kmem/kmalloc/trigger Finally, notice that the call_site as displayed in the output above isn't really very useful. It's an address, but normally addresses @@ -334,9 +332,9 @@ Extended error information value, simply append '.hex' to the field name in the trigger:: # echo 'hist:key=call_site.hex:val=bytes_req' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + /sys/kernel/tracing/events/kmem/kmalloc/trigger - # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # cat /sys/kernel/tracing/events/kmem/kmalloc/hist # trigger info: hist:keys=call_site.hex:vals=bytes_req:sort=hitcount:size=2048 [active] { call_site: ffffffffa026b291 } hitcount: 1 bytes_req: 433 @@ -379,9 +377,9 @@ Extended error information trigger:: # echo 'hist:key=call_site.sym:val=bytes_req' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + /sys/kernel/tracing/events/kmem/kmalloc/trigger - # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # cat /sys/kernel/tracing/events/kmem/kmalloc/hist # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=hitcount:size=2048 [active] { call_site: [ffffffff810adcb9] syslog_print_all } hitcount: 1 bytes_req: 1024 @@ -423,15 +421,15 @@ Extended error information Because the default sort key above is 'hitcount', the above shows a the list of call_sites by increasing hitcount, so that at the bottom we see the functions that made the most kmalloc calls during the - run. If instead we we wanted to see the top kmalloc callers in + run. If instead we wanted to see the top kmalloc callers in terms of the number of bytes requested rather than the number of calls, and we wanted the top caller to appear at the top, we can use the 'sort' parameter, along with the 'descending' modifier:: # echo 'hist:key=call_site.sym:val=bytes_req:sort=bytes_req.descending' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + /sys/kernel/tracing/events/kmem/kmalloc/trigger - # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # cat /sys/kernel/tracing/events/kmem/kmalloc/hist # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=bytes_req.descending:size=2048 [active] { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2186 bytes_req: 3397464 @@ -470,9 +468,9 @@ Extended error information name, just use 'sym-offset' instead:: # echo 'hist:key=call_site.sym-offset:val=bytes_req:sort=bytes_req.descending' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + /sys/kernel/tracing/events/kmem/kmalloc/trigger - # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # cat /sys/kernel/tracing/events/kmem/kmalloc/hist # trigger info: hist:keys=call_site.sym-offset:vals=bytes_req:sort=bytes_req.descending:size=2048 [active] { call_site: [ffffffffa046041c] i915_gem_execbuffer2+0x6c/0x2c0 [i915] } hitcount: 4569 bytes_req: 3163720 @@ -509,9 +507,9 @@ Extended error information allocated in a descending order:: # echo 'hist:keys=call_site.sym:values=bytes_req,bytes_alloc:sort=bytes_alloc.descending' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + /sys/kernel/tracing/events/kmem/kmalloc/trigger - # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # cat /sys/kernel/tracing/events/kmem/kmalloc/hist # trigger info: hist:keys=call_site.sym:vals=bytes_req,bytes_alloc:sort=bytes_alloc.descending:size=2048 [active] { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 7403 bytes_req: 4084360 bytes_alloc: 5958016 @@ -549,10 +547,10 @@ Extended error information the hist trigger display symbolic call_sites, we can have the hist trigger additionally display the complete set of kernel stack traces that led to each call_site. To do that, we simply use the special - value 'stacktrace' for the key parameter:: + value 'common_stacktrace' for the key parameter:: - # echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + # echo 'hist:keys=common_stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \ + /sys/kernel/tracing/events/kmem/kmalloc/trigger The above trigger will use the kernel stack trace in effect when an event is triggered as the key for the hash table. This allows the @@ -562,10 +560,10 @@ Extended error information every callpath in the system that led up to a kmalloc (in this case every callpath to a kmalloc for a kernel compile):: - # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist - # trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active] + # cat /sys/kernel/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=common_stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active] - { stacktrace: + { common_stacktrace: __kmalloc_track_caller+0x10b/0x1a0 kmemdup+0x20/0x50 hidraw_report_event+0x8a/0x120 [hid] @@ -583,7 +581,7 @@ Extended error information cpu_startup_entry+0x315/0x3e0 rest_init+0x7c/0x80 } hitcount: 3 bytes_req: 21 bytes_alloc: 24 - { stacktrace: + { common_stacktrace: __kmalloc_track_caller+0x10b/0x1a0 kmemdup+0x20/0x50 hidraw_report_event+0x8a/0x120 [hid] @@ -598,7 +596,7 @@ Extended error information do_IRQ+0x5a/0xf0 ret_from_intr+0x0/0x30 } hitcount: 3 bytes_req: 21 bytes_alloc: 24 - { stacktrace: + { common_stacktrace: kmem_cache_alloc_trace+0xeb/0x150 aa_alloc_task_context+0x27/0x40 apparmor_cred_prepare+0x1f/0x50 @@ -610,7 +608,7 @@ Extended error information . . . - { stacktrace: + { common_stacktrace: __kmalloc+0x11b/0x1b0 i915_gem_execbuffer2+0x6c/0x2c0 [i915] drm_ioctl+0x349/0x670 [drm] @@ -618,7 +616,7 @@ Extended error information SyS_ioctl+0x81/0xa0 system_call_fastpath+0x12/0x6a } hitcount: 17726 bytes_req: 13944120 bytes_alloc: 19593808 - { stacktrace: + { common_stacktrace: __kmalloc+0x11b/0x1b0 load_elf_phdrs+0x76/0xa0 load_elf_binary+0x102/0x1650 @@ -627,7 +625,7 @@ Extended error information SyS_execve+0x3a/0x50 return_from_execve+0x0/0x23 } hitcount: 33348 bytes_req: 17152128 bytes_alloc: 20226048 - { stacktrace: + { common_stacktrace: kmem_cache_alloc_trace+0xeb/0x150 apparmor_file_alloc_security+0x27/0x40 security_file_alloc+0x16/0x20 @@ -638,7 +636,7 @@ Extended error information SyS_open+0x1e/0x20 system_call_fastpath+0x12/0x6a } hitcount: 4766422 bytes_req: 9532844 bytes_alloc: 38131376 - { stacktrace: + { common_stacktrace: __kmalloc+0x11b/0x1b0 seq_buf_alloc+0x1b/0x50 seq_read+0x2cc/0x370 @@ -661,9 +659,9 @@ Extended error information keeps a per-process sum of total bytes read:: # echo 'hist:key=common_pid.execname:val=count:sort=count.descending' > \ - /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger + /sys/kernel/tracing/events/syscalls/sys_enter_read/trigger - # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/hist + # cat /sys/kernel/tracing/events/syscalls/sys_enter_read/hist # trigger info: hist:keys=common_pid.execname:vals=count:sort=count.descending:size=2048 [active] { common_pid: gnome-terminal [ 3196] } hitcount: 280 count: 1093512 @@ -702,9 +700,9 @@ Extended error information counts for the system during the run:: # echo 'hist:key=id.syscall:val=hitcount' > \ - /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger + /sys/kernel/tracing/events/raw_syscalls/sys_enter/trigger - # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist + # cat /sys/kernel/tracing/events/raw_syscalls/sys_enter/hist # trigger info: hist:keys=id.syscall:vals=hitcount:sort=hitcount:size=2048 [active] { id: sys_fsync [ 74] } hitcount: 1 @@ -756,9 +754,9 @@ Extended error information hitcount sum as the secondary key:: # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount' > \ - /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger + /sys/kernel/tracing/events/raw_syscalls/sys_enter/trigger - # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist + # cat /sys/kernel/tracing/events/raw_syscalls/sys_enter/hist # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 [active] { id: sys_read [ 0], common_pid: rtkit-daemon [ 1877] } hitcount: 1 @@ -806,9 +804,9 @@ Extended error information can use that to filter out all the other syscalls:: # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount if id == 16' > \ - /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger + /sys/kernel/tracing/events/raw_syscalls/sys_enter/trigger - # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist + # cat /sys/kernel/tracing/events/raw_syscalls/sys_enter/hist # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 if id == 16 [active] { id: sys_ioctl [ 16], common_pid: gmain [ 2769] } hitcount: 1 @@ -842,16 +840,16 @@ Extended error information The compound key examples used a key and a sum value (hitcount) to sort the output, but we can just as easily use two keys instead. - Here's an example where we use a compound key composed of the the + Here's an example where we use a compound key composed of the common_pid and size event fields. Sorting with pid as the primary key and 'size' as the secondary key allows us to display an ordered summary of the recvfrom sizes, with counts, received by each process:: # echo 'hist:key=common_pid.execname,size:val=hitcount:sort=common_pid,size' > \ - /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/trigger + /sys/kernel/tracing/events/syscalls/sys_enter_recvfrom/trigger - # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/hist + # cat /sys/kernel/tracing/events/syscalls/sys_enter_recvfrom/hist # trigger info: hist:keys=common_pid.execname,size:vals=hitcount:sort=common_pid.execname,size:size=2048 [active] { common_pid: smbd [ 784], size: 4 } hitcount: 1 @@ -902,9 +900,9 @@ Extended error information much smaller number, say 256:: # echo 'hist:key=child_comm:val=hitcount:size=256' > \ - /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger + /sys/kernel/tracing/events/sched/sched_process_fork/trigger - # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist + # cat /sys/kernel/tracing/events/sched/sched_process_fork/hist # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active] { child_comm: dconf worker } hitcount: 1 @@ -938,9 +936,9 @@ Extended error information displays as [paused]:: # echo 'hist:key=child_comm:val=hitcount:size=256:pause' >> \ - /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger + /sys/kernel/tracing/events/sched/sched_process_fork/trigger - # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist + # cat /sys/kernel/tracing/events/sched/sched_process_fork/hist # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [paused] { child_comm: dconf worker } hitcount: 1 @@ -975,9 +973,9 @@ Extended error information again, and the data has changed:: # echo 'hist:key=child_comm:val=hitcount:size=256:cont' >> \ - /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger + /sys/kernel/tracing/events/sched/sched_process_fork/trigger - # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist + # cat /sys/kernel/tracing/events/sched/sched_process_fork/hist # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active] { child_comm: dconf worker } hitcount: 1 @@ -1022,14 +1020,14 @@ Extended error information For example, suppose we wanted to take a look at the relative weights in terms of skb length for each callpath that leads to a - netif_receieve_skb event when downloading a decent-sized file using + netif_receive_skb event when downloading a decent-sized file using wget. First we set up an initially paused stacktrace trigger on the netif_receive_skb event:: - # echo 'hist:key=stacktrace:vals=len:pause' > \ - /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + # echo 'hist:key=common_stacktrace:vals=len:pause' > \ + /sys/kernel/tracing/events/net/netif_receive_skb/trigger Next, we set up an 'enable_hist' trigger on the sched_process_exec event, with an 'if filename==/usr/bin/wget' filter. The effect of @@ -1040,7 +1038,7 @@ Extended error information hash table keyed on stacktrace:: # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \ - /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + /sys/kernel/tracing/events/sched/sched_process_exec/trigger The aggregation continues until the netif_receive_skb is paused again, which is what the following disable_hist event does by @@ -1048,7 +1046,7 @@ Extended error information filter 'comm==wget':: # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \ - /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + /sys/kernel/tracing/events/sched/sched_process_exit/trigger Whenever a process exits and the comm field of the disable_hist trigger filter matches 'comm==wget', the netif_receive_skb hist @@ -1061,10 +1059,10 @@ Extended error information $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz - # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist - # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused] + # cat /sys/kernel/tracing/events/net/netif_receive_skb/hist + # trigger info: hist:keys=common_stacktrace:vals=len:sort=hitcount:size=2048 [paused] - { stacktrace: + { common_stacktrace: __netif_receive_skb_core+0x46d/0x990 __netif_receive_skb+0x18/0x60 netif_receive_skb_internal+0x23/0x90 @@ -1081,7 +1079,7 @@ Extended error information kthread+0xd2/0xf0 ret_from_fork+0x42/0x70 } hitcount: 85 len: 28884 - { stacktrace: + { common_stacktrace: __netif_receive_skb_core+0x46d/0x990 __netif_receive_skb+0x18/0x60 netif_receive_skb_internal+0x23/0x90 @@ -1099,7 +1097,7 @@ Extended error information irq_thread+0x11f/0x150 kthread+0xd2/0xf0 } hitcount: 98 len: 664329 - { stacktrace: + { common_stacktrace: __netif_receive_skb_core+0x46d/0x990 __netif_receive_skb+0x18/0x60 process_backlog+0xa8/0x150 @@ -1117,7 +1115,7 @@ Extended error information inet_sendmsg+0x64/0xa0 sock_sendmsg+0x3d/0x50 } hitcount: 115 len: 13030 - { stacktrace: + { common_stacktrace: __netif_receive_skb_core+0x46d/0x990 __netif_receive_skb+0x18/0x60 netif_receive_skb_internal+0x23/0x90 @@ -1144,14 +1142,14 @@ Extended error information into the histogram. In order to avoid having to set everything up again, we can just clear the histogram first:: - # echo 'hist:key=stacktrace:vals=len:clear' >> \ - /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + # echo 'hist:key=common_stacktrace:vals=len:clear' >> \ + /sys/kernel/tracing/events/net/netif_receive_skb/trigger Just to verify that it is in fact cleared, here's what we now see in the hist file:: - # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist - # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused] + # cat /sys/kernel/tracing/events/net/netif_receive_skb/hist + # trigger info: hist:keys=common_stacktrace:vals=len:sort=hitcount:size=2048 [paused] Totals: Hits: 0 @@ -1165,21 +1163,21 @@ Extended error information sched_process_exit events as such:: # echo 'enable_event:net:netif_receive_skb if filename==/usr/bin/wget' > \ - /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + /sys/kernel/tracing/events/sched/sched_process_exec/trigger # echo 'disable_event:net:netif_receive_skb if comm==wget' > \ - /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + /sys/kernel/tracing/events/sched/sched_process_exit/trigger If you read the trigger files for the sched_process_exec and sched_process_exit triggers, you should see two triggers for each: one enabling/disabling the hist aggregation and the other enabling/disabling the logging of events:: - # cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + # cat /sys/kernel/tracing/events/sched/sched_process_exec/trigger enable_event:net:netif_receive_skb:unlimited if filename==/usr/bin/wget enable_hist:net:netif_receive_skb:unlimited if filename==/usr/bin/wget - # cat /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + # cat /sys/kernel/tracing/events/sched/sched_process_exit/trigger enable_event:net:netif_receive_skb:unlimited if comm==wget disable_hist:net:netif_receive_skb:unlimited if comm==wget @@ -1195,7 +1193,7 @@ Extended error information saw in the last run, but this time you should also see the individual events in the trace file:: - # cat /sys/kernel/debug/tracing/trace + # cat /sys/kernel/tracing/trace # tracer: nop # @@ -1229,15 +1227,15 @@ Extended error information other things:: # echo 'hist:keys=skbaddr.hex:vals=len if len < 0' >> \ - /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + /sys/kernel/tracing/events/net/netif_receive_skb/trigger # echo 'hist:keys=skbaddr.hex:vals=len if len > 4096' >> \ - /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + /sys/kernel/tracing/events/net/netif_receive_skb/trigger # echo 'hist:keys=skbaddr.hex:vals=len if len == 256' >> \ - /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + /sys/kernel/tracing/events/net/netif_receive_skb/trigger # echo 'hist:keys=skbaddr.hex:vals=len' >> \ - /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + /sys/kernel/tracing/events/net/netif_receive_skb/trigger # echo 'hist:keys=len:vals=common_preempt_count' >> \ - /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + /sys/kernel/tracing/events/net/netif_receive_skb/trigger The above set of commands create four triggers differing only in their filters, along with a completely different though fairly @@ -1249,7 +1247,7 @@ Extended error information Displaying the contents of the 'hist' file for the event shows the contents of all five histograms:: - # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist + # cat /sys/kernel/tracing/events/net/netif_receive_skb/hist # event histogram # @@ -1370,15 +1368,15 @@ Extended error information field in the shared 'foo' histogram data:: # echo 'hist:name=foo:keys=skbaddr.hex:vals=len' > \ - /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + /sys/kernel/tracing/events/net/netif_receive_skb/trigger # echo 'hist:name=foo:keys=skbaddr.hex:vals=len' > \ - /sys/kernel/debug/tracing/events/net/netif_rx/trigger + /sys/kernel/tracing/events/net/netif_rx/trigger You can see that they're updating common histogram data by reading each event's hist files at the same time:: - # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist; - cat /sys/kernel/debug/tracing/events/net/netif_rx/hist + # cat /sys/kernel/tracing/events/net/netif_receive_skb/hist; + cat /sys/kernel/tracing/events/net/netif_rx/hist # event histogram # @@ -1487,32 +1485,32 @@ Extended error information And here's an example that shows how to combine histogram data from any two events even if they don't share any 'compatible' fields - other than 'hitcount' and 'stacktrace'. These commands create a + other than 'hitcount' and 'common_stacktrace'. These commands create a couple of triggers named 'bar' using those fields:: - # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \ - /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger - # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \ - /sys/kernel/debug/tracing/events/net/netif_rx/trigger + # echo 'hist:name=bar:key=common_stacktrace:val=hitcount' > \ + /sys/kernel/tracing/events/sched/sched_process_fork/trigger + # echo 'hist:name=bar:key=common_stacktrace:val=hitcount' > \ + /sys/kernel/tracing/events/net/netif_rx/trigger And displaying the output of either shows some interesting if somewhat confusing output:: - # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist - # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist + # cat /sys/kernel/tracing/events/sched/sched_process_fork/hist + # cat /sys/kernel/tracing/events/net/netif_rx/hist # event histogram # - # trigger info: hist:name=bar:keys=stacktrace:vals=hitcount:sort=hitcount:size=2048 [active] + # trigger info: hist:name=bar:keys=common_stacktrace:vals=hitcount:sort=hitcount:size=2048 [active] # - { stacktrace: - _do_fork+0x18e/0x330 + { common_stacktrace: + kernel_clone+0x18e/0x330 kernel_thread+0x29/0x30 kthreadd+0x154/0x1b0 ret_from_fork+0x3f/0x70 } hitcount: 1 - { stacktrace: + { common_stacktrace: netif_rx_internal+0xb2/0xd0 netif_rx_ni+0x20/0x70 dev_loopback_xmit+0xaa/0xd0 @@ -1530,7 +1528,7 @@ Extended error information call_cpuidle+0x3b/0x60 cpu_startup_entry+0x22d/0x310 } hitcount: 1 - { stacktrace: + { common_stacktrace: netif_rx_internal+0xb2/0xd0 netif_rx_ni+0x20/0x70 dev_loopback_xmit+0xaa/0xd0 @@ -1545,7 +1543,7 @@ Extended error information SyS_sendto+0xe/0x10 entry_SYSCALL_64_fastpath+0x12/0x6a } hitcount: 2 - { stacktrace: + { common_stacktrace: netif_rx_internal+0xb2/0xd0 netif_rx+0x1c/0x60 loopback_xmit+0x6c/0xb0 @@ -1563,7 +1561,7 @@ Extended error information sock_sendmsg+0x38/0x50 ___sys_sendmsg+0x14e/0x270 } hitcount: 76 - { stacktrace: + { common_stacktrace: netif_rx_internal+0xb2/0xd0 netif_rx+0x1c/0x60 loopback_xmit+0x6c/0xb0 @@ -1581,7 +1579,7 @@ Extended error information sock_sendmsg+0x38/0x50 ___sys_sendmsg+0x269/0x270 } hitcount: 77 - { stacktrace: + { common_stacktrace: netif_rx_internal+0xb2/0xd0 netif_rx+0x1c/0x60 loopback_xmit+0x6c/0xb0 @@ -1599,8 +1597,8 @@ Extended error information sock_sendmsg+0x38/0x50 SYSC_sendto+0xef/0x170 } hitcount: 88 - { stacktrace: - _do_fork+0x18e/0x330 + { common_stacktrace: + kernel_clone+0x18e/0x330 SyS_clone+0x19/0x20 entry_SYSCALL_64_fastpath+0x12/0x6a } hitcount: 244 @@ -1610,8 +1608,8 @@ Extended error information Entries: 7 Dropped: 0 -2.2 Inter-event hist triggers ------------------------------ +2.4. Inter-event hist triggers +------------------------------ Inter-event hist triggers are hist triggers that combine values from one or more other events and create a histogram using that data. Data @@ -1687,8 +1685,8 @@ pseudo-file. These features are described in more detail in the following sections. -2.2.1 Histogram Variables -------------------------- +2.5. Histogram Variables +------------------------ Variables are simply named locations used for saving and retrieving values between matching events. A 'matching' event is defined as an @@ -1774,8 +1772,25 @@ using the same key and variable from yet another event:: # echo 'hist:key=pid:wakeupswitch_lat=$wakeup_lat+$switchtime_lat ...' >> event3/trigger -2.2.2 Synthetic Events ----------------------- +Expressions support the use of addition, subtraction, multiplication and +division operators (+-\*/). + +Note if division by zero cannot be detected at parse time (i.e. the +divisor is not a constant), the result will be -1. + +Numeric constants can also be used directly in an expression:: + + # echo 'hist:keys=next_pid:timestamp_secs=common_timestamp/1000000 ...' >> event/trigger + +or assigned to a variable and referenced in a subsequent expression:: + + # echo 'hist:keys=next_pid:us_per_sec=1000000 ...' >> event/trigger + # echo 'hist:keys=next_pid:timestamp_secs=common_timestamp/$us_per_sec ...' >> event/trigger + +Variables can even hold stacktraces, which are useful with synthetic events. + +2.6. Synthetic Events +--------------------- Synthetic events are user-defined events generated from hist trigger variables or fields associated with one or more other events. Their @@ -1788,6 +1803,24 @@ consisting of the name of the new event along with one or more variables and their types, which can be any valid field type, separated by semicolons, to the tracing/synthetic_events file. +See synth_field_size() for available types. + +If field_name contains [n], the field is considered to be a static array. + +If field_names contains[] (no subscript), the field is considered to +be a dynamic array, which will only take as much space in the event as +is required to hold the array. + +A string field can be specified using either the static notation: + + char name[32]; + +Or the dynamic: + + char name[]; + +The size limit for either is 256. + For instance, the following creates a new event named 'wakeup_latency' with 3 fields: lat, pid, and prio. Each of those fields is simply a variable reference to a variable on another event:: @@ -1796,80 +1829,359 @@ variable reference to a variable on another event:: u64 lat; \ pid_t pid; \ int prio' >> \ - /sys/kernel/debug/tracing/synthetic_events + /sys/kernel/tracing/synthetic_events Reading the tracing/synthetic_events file lists all the currently defined synthetic events, in this case the event defined above:: - # cat /sys/kernel/debug/tracing/synthetic_events + # cat /sys/kernel/tracing/synthetic_events wakeup_latency u64 lat; pid_t pid; int prio An existing synthetic event definition can be removed by prepending the command that defined it with a '!':: # echo '!wakeup_latency u64 lat pid_t pid int prio' >> \ - /sys/kernel/debug/tracing/synthetic_events + /sys/kernel/tracing/synthetic_events At this point, there isn't yet an actual 'wakeup_latency' event instantiated in the event subsystem - for this to happen, a 'hist trigger action' needs to be instantiated and bound to actual fields -and variables defined on other events (see Section 2.2.3 below on +and variables defined on other events (see Section 2.7. below on how that is done using hist trigger 'onmatch' action). Once that is done, the 'wakeup_latency' synthetic event instance is created. -A histogram can now be defined for the new synthetic event:: - - # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \ - /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger - The new event is created under the tracing/events/synthetic/ directory and looks and behaves just like any other event:: - # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency + # ls /sys/kernel/tracing/events/synthetic/wakeup_latency enable filter format hist id trigger -Like any other event, once a histogram is enabled for the event, the -output can be displayed by reading the event's 'hist' file. - -2.2.3 Hist trigger 'actions' ----------------------------- - -A hist trigger 'action' is a function that's executed whenever a -histogram entry is added or updated. - -The default 'action' if no special function is explicitly specified is -as it always has been, to simply update the set of values associated -with an entry. Some applications, however, may want to perform -additional actions at that point, such as generate another event, or -compare and save a maximum. +A histogram can now be defined for the new synthetic event:: -The following additional actions are available. To specify an action -for a given event, simply specify the action between colons in the -hist trigger specification. + # echo 'hist:keys=pid,prio,lat.log2:sort=lat' >> \ + /sys/kernel/tracing/events/synthetic/wakeup_latency/trigger - - onmatch(matching.event).<synthetic_event_name>(param list) +The above shows the latency "lat" in a power of 2 grouping. - The 'onmatch(matching.event).<synthetic_event_name>(params)' hist - trigger action is invoked whenever an event matches and the - histogram entry would be added or updated. It causes the named - synthetic event to be generated with the values given in the +Like any other event, once a histogram is enabled for the event, the +output can be displayed by reading the event's 'hist' file:: + + # cat /sys/kernel/tracing/events/synthetic/wakeup_latency/hist + + # event histogram + # + # trigger info: hist:keys=pid,prio,lat.log2:vals=hitcount:sort=lat.log2:size=2048 [active] + # + + { pid: 2035, prio: 9, lat: ~ 2^2 } hitcount: 43 + { pid: 2034, prio: 9, lat: ~ 2^2 } hitcount: 60 + { pid: 2029, prio: 9, lat: ~ 2^2 } hitcount: 965 + { pid: 2034, prio: 120, lat: ~ 2^2 } hitcount: 9 + { pid: 2033, prio: 120, lat: ~ 2^2 } hitcount: 5 + { pid: 2030, prio: 9, lat: ~ 2^2 } hitcount: 335 + { pid: 2030, prio: 120, lat: ~ 2^2 } hitcount: 10 + { pid: 2032, prio: 120, lat: ~ 2^2 } hitcount: 1 + { pid: 2035, prio: 120, lat: ~ 2^2 } hitcount: 2 + { pid: 2031, prio: 9, lat: ~ 2^2 } hitcount: 176 + { pid: 2028, prio: 120, lat: ~ 2^2 } hitcount: 15 + { pid: 2033, prio: 9, lat: ~ 2^2 } hitcount: 91 + { pid: 2032, prio: 9, lat: ~ 2^2 } hitcount: 125 + { pid: 2029, prio: 120, lat: ~ 2^2 } hitcount: 4 + { pid: 2031, prio: 120, lat: ~ 2^2 } hitcount: 3 + { pid: 2029, prio: 120, lat: ~ 2^3 } hitcount: 2 + { pid: 2035, prio: 9, lat: ~ 2^3 } hitcount: 41 + { pid: 2030, prio: 120, lat: ~ 2^3 } hitcount: 1 + { pid: 2032, prio: 9, lat: ~ 2^3 } hitcount: 32 + { pid: 2031, prio: 9, lat: ~ 2^3 } hitcount: 44 + { pid: 2034, prio: 9, lat: ~ 2^3 } hitcount: 40 + { pid: 2030, prio: 9, lat: ~ 2^3 } hitcount: 29 + { pid: 2033, prio: 9, lat: ~ 2^3 } hitcount: 31 + { pid: 2029, prio: 9, lat: ~ 2^3 } hitcount: 31 + { pid: 2028, prio: 120, lat: ~ 2^3 } hitcount: 18 + { pid: 2031, prio: 120, lat: ~ 2^3 } hitcount: 2 + { pid: 2028, prio: 120, lat: ~ 2^4 } hitcount: 1 + { pid: 2029, prio: 9, lat: ~ 2^4 } hitcount: 4 + { pid: 2031, prio: 120, lat: ~ 2^7 } hitcount: 1 + { pid: 2032, prio: 120, lat: ~ 2^7 } hitcount: 1 + + Totals: + Hits: 2122 + Entries: 30 + Dropped: 0 + + +The latency values can also be grouped linearly by a given size with +the ".buckets" modifier and specify a size (in this case groups of 10):: + + # echo 'hist:keys=pid,prio,lat.buckets=10:sort=lat' >> \ + /sys/kernel/tracing/events/synthetic/wakeup_latency/trigger + + # event histogram + # + # trigger info: hist:keys=pid,prio,lat.buckets=10:vals=hitcount:sort=lat.buckets=10:size=2048 [active] + # + + { pid: 2067, prio: 9, lat: ~ 0-9 } hitcount: 220 + { pid: 2068, prio: 9, lat: ~ 0-9 } hitcount: 157 + { pid: 2070, prio: 9, lat: ~ 0-9 } hitcount: 100 + { pid: 2067, prio: 120, lat: ~ 0-9 } hitcount: 6 + { pid: 2065, prio: 120, lat: ~ 0-9 } hitcount: 2 + { pid: 2066, prio: 120, lat: ~ 0-9 } hitcount: 2 + { pid: 2069, prio: 9, lat: ~ 0-9 } hitcount: 122 + { pid: 2069, prio: 120, lat: ~ 0-9 } hitcount: 8 + { pid: 2070, prio: 120, lat: ~ 0-9 } hitcount: 1 + { pid: 2068, prio: 120, lat: ~ 0-9 } hitcount: 7 + { pid: 2066, prio: 9, lat: ~ 0-9 } hitcount: 365 + { pid: 2064, prio: 120, lat: ~ 0-9 } hitcount: 35 + { pid: 2065, prio: 9, lat: ~ 0-9 } hitcount: 998 + { pid: 2071, prio: 9, lat: ~ 0-9 } hitcount: 85 + { pid: 2065, prio: 9, lat: ~ 10-19 } hitcount: 2 + { pid: 2064, prio: 120, lat: ~ 10-19 } hitcount: 2 + + Totals: + Hits: 2112 + Entries: 16 + Dropped: 0 + +To save stacktraces, create a synthetic event with a field of type "unsigned long[]" +or even just "long[]". For example, to see how long a task is blocked in an +uninterruptible state:: + + # cd /sys/kernel/tracing + # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events + # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=common_stacktrace if prev_state == 2' >> events/sched/sched_switch/trigger + # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger + # echo 1 > events/synthetic/block_lat/enable + # cat trace + + # tracer: nop + # + # entries-in-buffer/entries-written: 2/2 #P:8 + # + # _-----=> irqs-off/BH-disabled + # / _----=> need-resched + # | / _---=> hardirq/softirq + # || / _--=> preempt-depth + # ||| / _-=> migrate-disable + # |||| / delay + # TASK-PID CPU# ||||| TIMESTAMP FUNCTION + # | | | ||||| | | + <idle>-0 [005] d..4. 521.164922: block_lat: pid=0 delta=8322 stack=STACK: + => __schedule+0x448/0x7b0 + => schedule+0x5a/0xb0 + => io_schedule+0x42/0x70 + => bit_wait_io+0xd/0x60 + => __wait_on_bit+0x4b/0x140 + => out_of_line_wait_on_bit+0x91/0xb0 + => jbd2_journal_commit_transaction+0x1679/0x1a70 + => kjournald2+0xa9/0x280 + => kthread+0xe9/0x110 + => ret_from_fork+0x2c/0x50 + + <...>-2 [004] d..4. 525.184257: block_lat: pid=2 delta=76 stack=STACK: + => __schedule+0x448/0x7b0 + => schedule+0x5a/0xb0 + => schedule_timeout+0x11a/0x150 + => wait_for_completion_killable+0x144/0x1f0 + => __kthread_create_on_node+0xe7/0x1e0 + => kthread_create_on_node+0x51/0x70 + => create_worker+0xcc/0x1a0 + => worker_thread+0x2ad/0x380 + => kthread+0xe9/0x110 + => ret_from_fork+0x2c/0x50 + +A synthetic event that has a stacktrace field may use it as a key in +histogram:: + + # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > events/synthetic/block_lat/trigger + # cat events/synthetic/block_lat/hist + + # event histogram + # + # trigger info: hist:keys=delta.buckets=100,stack.stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active] + # + { delta: ~ 0-99, stack.stacktrace __schedule+0xa19/0x1520 + schedule+0x6b/0x110 + io_schedule+0x46/0x80 + bit_wait_io+0x11/0x80 + __wait_on_bit+0x4e/0x120 + out_of_line_wait_on_bit+0x8d/0xb0 + __wait_on_buffer+0x33/0x40 + jbd2_journal_commit_transaction+0x155a/0x19b0 + kjournald2+0xab/0x270 + kthread+0xfa/0x130 + ret_from_fork+0x29/0x50 + } hitcount: 1 + { delta: ~ 0-99, stack.stacktrace __schedule+0xa19/0x1520 + schedule+0x6b/0x110 + io_schedule+0x46/0x80 + rq_qos_wait+0xd0/0x170 + wbt_wait+0x9e/0xf0 + __rq_qos_throttle+0x25/0x40 + blk_mq_submit_bio+0x2c3/0x5b0 + __submit_bio+0xff/0x190 + submit_bio_noacct_nocheck+0x25b/0x2b0 + submit_bio_noacct+0x20b/0x600 + submit_bio+0x28/0x90 + ext4_bio_write_page+0x1e0/0x8c0 + mpage_submit_page+0x60/0x80 + mpage_process_page_bufs+0x16c/0x180 + mpage_prepare_extent_to_map+0x23f/0x530 + } hitcount: 1 + { delta: ~ 0-99, stack.stacktrace __schedule+0xa19/0x1520 + schedule+0x6b/0x110 + schedule_hrtimeout_range_clock+0x97/0x110 + schedule_hrtimeout_range+0x13/0x20 + usleep_range_state+0x65/0x90 + __intel_wait_for_register+0x1c1/0x230 [i915] + intel_psr_wait_for_idle_locked+0x171/0x2a0 [i915] + intel_pipe_update_start+0x169/0x360 [i915] + intel_update_crtc+0x112/0x490 [i915] + skl_commit_modeset_enables+0x199/0x600 [i915] + intel_atomic_commit_tail+0x7c4/0x1080 [i915] + intel_atomic_commit_work+0x12/0x20 [i915] + process_one_work+0x21c/0x3f0 + worker_thread+0x50/0x3e0 + kthread+0xfa/0x130 + } hitcount: 3 + { delta: ~ 0-99, stack.stacktrace __schedule+0xa19/0x1520 + schedule+0x6b/0x110 + schedule_timeout+0x11e/0x160 + __wait_for_common+0x8f/0x190 + wait_for_completion+0x24/0x30 + __flush_work.isra.0+0x1cc/0x360 + flush_work+0xe/0x20 + drm_mode_rmfb+0x18b/0x1d0 [drm] + drm_mode_rmfb_ioctl+0x10/0x20 [drm] + drm_ioctl_kernel+0xb8/0x150 [drm] + drm_ioctl+0x243/0x560 [drm] + __x64_sys_ioctl+0x92/0xd0 + do_syscall_64+0x59/0x90 + entry_SYSCALL_64_after_hwframe+0x72/0xdc + } hitcount: 1 + { delta: ~ 0-99, stack.stacktrace __schedule+0xa19/0x1520 + schedule+0x6b/0x110 + schedule_timeout+0x87/0x160 + __wait_for_common+0x8f/0x190 + wait_for_completion_timeout+0x1d/0x30 + drm_atomic_helper_wait_for_flip_done+0x57/0x90 [drm_kms_helper] + intel_atomic_commit_tail+0x8ce/0x1080 [i915] + intel_atomic_commit_work+0x12/0x20 [i915] + process_one_work+0x21c/0x3f0 + worker_thread+0x50/0x3e0 + kthread+0xfa/0x130 + ret_from_fork+0x29/0x50 + } hitcount: 1 + { delta: ~ 100-199, stack.stacktrace __schedule+0xa19/0x1520 + schedule+0x6b/0x110 + schedule_hrtimeout_range_clock+0x97/0x110 + schedule_hrtimeout_range+0x13/0x20 + usleep_range_state+0x65/0x90 + pci_set_low_power_state+0x17f/0x1f0 + pci_set_power_state+0x49/0x250 + pci_finish_runtime_suspend+0x4a/0x90 + pci_pm_runtime_suspend+0xcb/0x1b0 + __rpm_callback+0x48/0x120 + rpm_callback+0x67/0x70 + rpm_suspend+0x167/0x780 + rpm_idle+0x25a/0x380 + pm_runtime_work+0x93/0xc0 + process_one_work+0x21c/0x3f0 + } hitcount: 1 + + Totals: + Hits: 10 + Entries: 7 + Dropped: 0 + +2.7. Hist trigger 'handlers' and 'actions' +------------------------------------------ + +A hist trigger 'action' is a function that's executed (in most cases +conditionally) whenever a histogram entry is added or updated. + +When a histogram entry is added or updated, a hist trigger 'handler' +is what decides whether the corresponding action is actually invoked +or not. + +Hist trigger handlers and actions are paired together in the general +form: + + <handler>.<action> + +To specify a handler.action pair for a given event, simply specify +that handler.action pair between colons in the hist trigger +specification. + +In theory, any handler can be combined with any action, but in +practice, not every handler.action combination is currently supported; +if a given handler.action combination isn't supported, the hist +trigger will fail with -EINVAL; + +The default 'handler.action' if none is explicitly specified is as it +always has been, to simply update the set of values associated with an +entry. Some applications, however, may want to perform additional +actions at that point, such as generate another event, or compare and +save a maximum. + +The supported handlers and actions are listed below, and each is +described in more detail in the following paragraphs, in the context +of descriptions of some common and useful handler.action combinations. + +The available handlers are: + + - onmatch(matching.event) - invoke action on any addition or update + - onmax(var) - invoke action if var exceeds current max + - onchange(var) - invoke action if var changes + +The available actions are: + + - trace(<synthetic_event_name>,param list) - generate synthetic event + - save(field,...) - save current event fields + - snapshot() - snapshot the trace buffer + +The following commonly-used handler.action pairs are available: + + - onmatch(matching.event).trace(<synthetic_event_name>,param list) + + The 'onmatch(matching.event).trace(<synthetic_event_name>,param + list)' hist trigger action is invoked whenever an event matches + and the histogram entry would be added or updated. It causes the + named synthetic event to be generated with the values given in the 'param list'. The result is the generation of a synthetic event that consists of the values contained in those variables at the - time the invoking event was hit. - - The 'param list' consists of one or more parameters which may be - either variables or fields defined on either the 'matching.event' - or the target event. The variables or fields specified in the - param list may be either fully-qualified or unqualified. If a - variable is specified as unqualified, it must be unique between - the two events. A field name used as a param can be unqualified - if it refers to the target event, but must be fully qualified if - it refers to the matching event. A fully-qualified name is of the - form 'system.event_name.$var_name' or 'system.event_name.field'. + time the invoking event was hit. For example, if the synthetic + event name is 'wakeup_latency', a wakeup_latency event is + generated using onmatch(event).trace(wakeup_latency,arg1,arg2). + + There is also an equivalent alternative form available for + generating synthetic events. In this form, the synthetic event + name is used as if it were a function name. For example, using + the 'wakeup_latency' synthetic event name again, the + wakeup_latency event would be generated by invoking it as if it + were a function call, with the event field values passed in as + arguments: onmatch(event).wakeup_latency(arg1,arg2). The syntax + for this form is: + + onmatch(matching.event).<synthetic_event_name>(param list) + + In either case, the 'param list' consists of one or more + parameters which may be either variables or fields defined on + either the 'matching.event' or the target event. The variables or + fields specified in the param list may be either fully-qualified + or unqualified. If a variable is specified as unqualified, it + must be unique between the two events. A field name used as a + param can be unqualified if it refers to the target event, but + must be fully qualified if it refers to the matching event. A + fully-qualified name is of the form 'system.event_name.$var_name' + or 'system.event_name.field'. The 'matching.event' specification is simply the fully qualified event name of the event that matches the target event for the - onmatch() functionality, in the form 'system.event_name'. + onmatch() functionality, in the form 'system.event_name'. Histogram + keys of both events are compared to find if events match. In case + multiple histogram keys are used, they all must match in the specified + order. Finally, the number and type of variables/fields in the 'param list' must match the number and types of the fields in the @@ -1881,9 +2193,9 @@ hist trigger specification. event:: # echo 'wakeup_new_test pid_t pid' >> \ - /sys/kernel/debug/tracing/synthetic_events + /sys/kernel/tracing/synthetic_events - # cat /sys/kernel/debug/tracing/synthetic_events + # cat /sys/kernel/tracing/synthetic_events wakeup_new_test pid_t pid The following hist trigger both defines the missing testpid @@ -1894,20 +2206,26 @@ hist trigger specification. # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\ wakeup_new_test($testpid) if comm=="cyclictest"' >> \ - /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger + /sys/kernel/tracing/events/sched/sched_wakeup_new/trigger + + Or, equivalently, using the 'trace' keyword syntax:: + + # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\ + trace(wakeup_new_test,$testpid) if comm=="cyclictest"' >> \ + /sys/kernel/tracing/events/sched/sched_wakeup_new/trigger Creating and displaying a histogram based on those events is now just a matter of using the fields and new synthetic event in the tracing/events/synthetic directory, as usual:: # echo 'hist:keys=pid:sort=pid' >> \ - /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger + /sys/kernel/tracing/events/synthetic/wakeup_new_test/trigger Running 'cyclictest' should cause wakeup_new events to generate wakeup_new_test synthetic events which should result in histogram output in the wakeup_new_test event's hist file:: - # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/hist + # cat /sys/kernel/tracing/events/synthetic/wakeup_new_test/hist A more typical usage would be to use two events to calculate a latency. The following example uses a set of hist triggers to @@ -1916,36 +2234,36 @@ hist trigger specification. First, we define a 'wakeup_latency' synthetic event:: # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \ - /sys/kernel/debug/tracing/synthetic_events + /sys/kernel/tracing/synthetic_events Next, we specify that whenever we see a sched_waking event for a cyclictest thread, save the timestamp in a 'ts0' variable:: # echo 'hist:keys=$saved_pid:saved_pid=pid:ts0=common_timestamp.usecs \ if comm=="cyclictest"' >> \ - /sys/kernel/debug/tracing/events/sched/sched_waking/trigger + /sys/kernel/tracing/events/sched/sched_waking/trigger Then, when the corresponding thread is actually scheduled onto the - CPU by a sched_switch event, calculate the latency and use that - along with another variable and an event field to generate a - wakeup_latency synthetic event:: + CPU by a sched_switch event (saved_pid matches next_pid), calculate + the latency and use that along with another variable and an event field + to generate a wakeup_latency synthetic event:: # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:\ onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,\ $saved_pid,next_prio) if next_comm=="cyclictest"' >> \ - /sys/kernel/debug/tracing/events/sched/sched_switch/trigger + /sys/kernel/tracing/events/sched/sched_switch/trigger We also need to create a histogram on the wakeup_latency synthetic event in order to aggregate the generated synthetic event data:: # echo 'hist:keys=pid,prio,lat:sort=pid,lat' >> \ - /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger + /sys/kernel/tracing/events/synthetic/wakeup_latency/trigger Finally, once we've run cyclictest to actually generate some events, we can see the output by looking at the wakeup_latency synthetic event's hist file:: - # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist + # cat /sys/kernel/tracing/events/synthetic/wakeup_latency/hist - onmax(var).save(field,.. .) @@ -1971,19 +2289,19 @@ hist trigger specification. # echo 'hist:keys=pid:ts0=common_timestamp.usecs \ if comm=="cyclictest"' >> \ - /sys/kernel/debug/tracing/events/sched/sched_waking/trigger + /sys/kernel/tracing/events/sched/sched_waking/trigger # echo 'hist:keys=next_pid:\ wakeup_lat=common_timestamp.usecs-$ts0:\ onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \ if next_comm=="cyclictest"' >> \ - /sys/kernel/debug/tracing/events/sched/sched_switch/trigger + /sys/kernel/tracing/events/sched/sched_switch/trigger When the histogram is displayed, the max value and the saved values corresponding to the max are displayed following the rest of the fields:: - # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist + # cat /sys/kernel/tracing/events/sched/sched_switch/hist { next_pid: 2255 } hitcount: 239 common_timestamp-ts0: 0 max: 27 @@ -2000,8 +2318,216 @@ hist trigger specification. Entries: 2 Dropped: 0 -3. User space creating a trigger --------------------------------- + - onmax(var).snapshot() + + The 'onmax(var).snapshot()' hist trigger action is invoked + whenever the value of 'var' associated with a histogram entry + exceeds the current maximum contained in that variable. + + The end result is that a global snapshot of the trace buffer will + be saved in the tracing/snapshot file if 'var' exceeds the current + maximum for any hist trigger entry. + + Note that in this case the maximum is a global maximum for the + current trace instance, which is the maximum across all buckets of + the histogram. The key of the specific trace event that caused + the global maximum and the global maximum itself are displayed, + along with a message stating that a snapshot has been taken and + where to find it. The user can use the key information displayed + to locate the corresponding bucket in the histogram for even more + detail. + + As an example the below defines a couple of hist triggers, one for + sched_waking and another for sched_switch, keyed on pid. Whenever + a sched_waking event occurs, the timestamp is saved in the entry + corresponding to the current pid, and when the scheduler switches + back to that pid, the timestamp difference is calculated. If the + resulting latency, stored in wakeup_lat, exceeds the current + maximum latency, a snapshot is taken. As part of the setup, all + the scheduler events are also enabled, which are the events that + will show up in the snapshot when it is taken at some point:: + + # echo 1 > /sys/kernel/tracing/events/sched/enable + + # echo 'hist:keys=pid:ts0=common_timestamp.usecs \ + if comm=="cyclictest"' >> \ + /sys/kernel/tracing/events/sched/sched_waking/trigger + + # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \ + onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio, \ + prev_comm):onmax($wakeup_lat).snapshot() \ + if next_comm=="cyclictest"' >> \ + /sys/kernel/tracing/events/sched/sched_switch/trigger + + When the histogram is displayed, for each bucket the max value + and the saved values corresponding to the max are displayed + following the rest of the fields. + + If a snapshot was taken, there is also a message indicating that, + along with the value and event that triggered the global maximum:: + + # cat /sys/kernel/tracing/events/sched/sched_switch/hist + { next_pid: 2101 } hitcount: 200 + max: 52 next_prio: 120 next_comm: cyclictest \ + prev_pid: 0 prev_prio: 120 prev_comm: swapper/6 + + { next_pid: 2103 } hitcount: 1326 + max: 572 next_prio: 19 next_comm: cyclictest \ + prev_pid: 0 prev_prio: 120 prev_comm: swapper/1 + + { next_pid: 2102 } hitcount: 1982 \ + max: 74 next_prio: 19 next_comm: cyclictest \ + prev_pid: 0 prev_prio: 120 prev_comm: swapper/5 + + Snapshot taken (see tracing/snapshot). Details: + triggering value { onmax($wakeup_lat) }: 572 \ + triggered by event with key: { next_pid: 2103 } + + Totals: + Hits: 3508 + Entries: 3 + Dropped: 0 + + In the above case, the event that triggered the global maximum has + the key with next_pid == 2103. If you look at the bucket that has + 2103 as the key, you'll find the additional values save()'d along + with the local maximum for that bucket, which should be the same + as the global maximum (since that was the same value that + triggered the global snapshot). + + And finally, looking at the snapshot data should show at or near + the end the event that triggered the snapshot (in this case you + can verify the timestamps between the sched_waking and + sched_switch events, which should match the time displayed in the + global maximum):: + + # cat /sys/kernel/tracing/snapshot + + <...>-2103 [005] d..3 309.873125: sched_switch: prev_comm=cyclictest prev_pid=2103 prev_prio=19 prev_state=D ==> next_comm=swapper/5 next_pid=0 next_prio=120 + <idle>-0 [005] d.h3 309.873611: sched_waking: comm=cyclictest pid=2102 prio=19 target_cpu=005 + <idle>-0 [005] dNh4 309.873613: sched_wakeup: comm=cyclictest pid=2102 prio=19 target_cpu=005 + <idle>-0 [005] d..3 309.873616: sched_switch: prev_comm=swapper/5 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=cyclictest next_pid=2102 next_prio=19 + <...>-2102 [005] d..3 309.873625: sched_switch: prev_comm=cyclictest prev_pid=2102 prev_prio=19 prev_state=D ==> next_comm=swapper/5 next_pid=0 next_prio=120 + <idle>-0 [005] d.h3 309.874624: sched_waking: comm=cyclictest pid=2102 prio=19 target_cpu=005 + <idle>-0 [005] dNh4 309.874626: sched_wakeup: comm=cyclictest pid=2102 prio=19 target_cpu=005 + <idle>-0 [005] dNh3 309.874628: sched_waking: comm=cyclictest pid=2103 prio=19 target_cpu=005 + <idle>-0 [005] dNh4 309.874630: sched_wakeup: comm=cyclictest pid=2103 prio=19 target_cpu=005 + <idle>-0 [005] d..3 309.874633: sched_switch: prev_comm=swapper/5 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=cyclictest next_pid=2102 next_prio=19 + <idle>-0 [004] d.h3 309.874757: sched_waking: comm=gnome-terminal- pid=1699 prio=120 target_cpu=004 + <idle>-0 [004] dNh4 309.874762: sched_wakeup: comm=gnome-terminal- pid=1699 prio=120 target_cpu=004 + <idle>-0 [004] d..3 309.874766: sched_switch: prev_comm=swapper/4 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=gnome-terminal- next_pid=1699 next_prio=120 + gnome-terminal--1699 [004] d.h2 309.874941: sched_stat_runtime: comm=gnome-terminal- pid=1699 runtime=180706 [ns] vruntime=1126870572 [ns] + <idle>-0 [003] d.s4 309.874956: sched_waking: comm=rcu_sched pid=9 prio=120 target_cpu=007 + <idle>-0 [003] d.s5 309.874960: sched_wake_idle_without_ipi: cpu=7 + <idle>-0 [003] d.s5 309.874961: sched_wakeup: comm=rcu_sched pid=9 prio=120 target_cpu=007 + <idle>-0 [007] d..3 309.874963: sched_switch: prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=rcu_sched next_pid=9 next_prio=120 + rcu_sched-9 [007] d..3 309.874973: sched_stat_runtime: comm=rcu_sched pid=9 runtime=13646 [ns] vruntime=22531430286 [ns] + rcu_sched-9 [007] d..3 309.874978: sched_switch: prev_comm=rcu_sched prev_pid=9 prev_prio=120 prev_state=R+ ==> next_comm=swapper/7 next_pid=0 next_prio=120 + <...>-2102 [005] d..4 309.874994: sched_migrate_task: comm=cyclictest pid=2103 prio=19 orig_cpu=5 dest_cpu=1 + <...>-2102 [005] d..4 309.875185: sched_wake_idle_without_ipi: cpu=1 + <idle>-0 [001] d..3 309.875200: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=cyclictest next_pid=2103 next_prio=19 + + - onchange(var).save(field,.. .) + + The 'onchange(var).save(field,...)' hist trigger action is invoked + whenever the value of 'var' associated with a histogram entry + changes. + + The end result is that the trace event fields specified as the + onchange.save() params will be saved if 'var' changes for that + hist trigger entry. This allows context from the event that + changed the value to be saved for later reference. When the + histogram is displayed, additional fields displaying the saved + values will be printed. + + - onchange(var).snapshot() + + The 'onchange(var).snapshot()' hist trigger action is invoked + whenever the value of 'var' associated with a histogram entry + changes. + + The end result is that a global snapshot of the trace buffer will + be saved in the tracing/snapshot file if 'var' changes for any + hist trigger entry. + + Note that in this case the changed value is a global variable + associated with current trace instance. The key of the specific + trace event that caused the value to change and the global value + itself are displayed, along with a message stating that a snapshot + has been taken and where to find it. The user can use the key + information displayed to locate the corresponding bucket in the + histogram for even more detail. + + As an example the below defines a hist trigger on the tcp_probe + event, keyed on dport. Whenever a tcp_probe event occurs, the + cwnd field is checked against the current value stored in the + $cwnd variable. If the value has changed, a snapshot is taken. + As part of the setup, all the scheduler and tcp events are also + enabled, which are the events that will show up in the snapshot + when it is taken at some point:: + + # echo 1 > /sys/kernel/tracing/events/sched/enable + # echo 1 > /sys/kernel/tracing/events/tcp/enable + + # echo 'hist:keys=dport:cwnd=snd_cwnd: \ + onchange($cwnd).save(snd_wnd,srtt,rcv_wnd): \ + onchange($cwnd).snapshot()' >> \ + /sys/kernel/tracing/events/tcp/tcp_probe/trigger + + When the histogram is displayed, for each bucket the tracked value + and the saved values corresponding to that value are displayed + following the rest of the fields. + + If a snapshot was taken, there is also a message indicating that, + along with the value and event that triggered the snapshot:: + + # cat /sys/kernel/tracing/events/tcp/tcp_probe/hist + + { dport: 1521 } hitcount: 8 + changed: 10 snd_wnd: 35456 srtt: 154262 rcv_wnd: 42112 + + { dport: 80 } hitcount: 23 + changed: 10 snd_wnd: 28960 srtt: 19604 rcv_wnd: 29312 + + { dport: 9001 } hitcount: 172 + changed: 10 snd_wnd: 48384 srtt: 260444 rcv_wnd: 55168 + + { dport: 443 } hitcount: 211 + changed: 10 snd_wnd: 26960 srtt: 17379 rcv_wnd: 28800 + + Snapshot taken (see tracing/snapshot). Details: + + triggering value { onchange($cwnd) }: 10 + triggered by event with key: { dport: 80 } + + Totals: + Hits: 414 + Entries: 4 + Dropped: 0 + + In the above case, the event that triggered the snapshot has the + key with dport == 80. If you look at the bucket that has 80 as + the key, you'll find the additional values save()'d along with the + changed value for that bucket, which should be the same as the + global changed value (since that was the same value that triggered + the global snapshot). + + And finally, looking at the snapshot data should show at or near + the end the event that triggered the snapshot:: + + # cat /sys/kernel/tracing/snapshot + + gnome-shell-1261 [006] dN.3 49.823113: sched_stat_runtime: comm=gnome-shell pid=1261 runtime=49347 [ns] vruntime=1835730389 [ns] + kworker/u16:4-773 [003] d..3 49.823114: sched_switch: prev_comm=kworker/u16:4 prev_pid=773 prev_prio=120 prev_state=R+ ==> next_comm=kworker/3:2 next_pid=135 next_prio=120 + gnome-shell-1261 [006] d..3 49.823114: sched_switch: prev_comm=gnome-shell prev_pid=1261 prev_prio=120 prev_state=R+ ==> next_comm=kworker/6:2 next_pid=387 next_prio=120 + kworker/3:2-135 [003] d..3 49.823118: sched_stat_runtime: comm=kworker/3:2 pid=135 runtime=5339 [ns] vruntime=17815800388 [ns] + kworker/6:2-387 [006] d..3 49.823120: sched_stat_runtime: comm=kworker/6:2 pid=387 runtime=9594 [ns] vruntime=14589605367 [ns] + kworker/6:2-387 [006] d..3 49.823122: sched_switch: prev_comm=kworker/6:2 prev_pid=387 prev_prio=120 prev_state=R+ ==> next_comm=gnome-shell next_pid=1261 next_prio=120 + kworker/3:2-135 [003] d..3 49.823123: sched_switch: prev_comm=kworker/3:2 prev_pid=135 prev_prio=120 prev_state=T ==> next_comm=swapper/3 next_pid=0 next_prio=120 + <idle>-0 [004] ..s7 49.823798: tcp_probe: src=10.0.0.10:54326 dest=23.215.104.193:80 mark=0x0 length=32 snd_nxt=0xe3ae2ff5 snd_una=0xe3ae2ecd snd_cwnd=10 ssthresh=2147483647 snd_wnd=28960 srtt=19604 rcv_wnd=29312 + +2.8. User space creating a trigger +---------------------------------- Writing into /sys/kernel/tracing/trace_marker writes into the ftrace ring buffer. This can also act like an event, by writing into the trigger |
