summaryrefslogtreecommitdiff
path: root/Documentation/trace
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/trace')
-rw-r--r--Documentation/trace/ftrace.rst132
-rw-r--r--Documentation/trace/timerlat-tracer.rst78
-rw-r--r--Documentation/trace/user_events.rst7
3 files changed, 213 insertions, 4 deletions
diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index 027437b745a0..f606c5bd1c0d 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -324,6 +324,12 @@ of ftrace. Here is a list of some of the key files:
"set_graph_function", or "set_graph_notrace".
(See the section "dynamic ftrace" below for more details.)
+ available_filter_functions_addrs:
+
+ Similar to available_filter_functions, but with address displayed
+ for each function. The displayed address is the patch-site address
+ and can differ from /proc/kallsyms address.
+
dyn_ftrace_total_info:
This file is for debugging purposes. The number of functions that
@@ -1359,6 +1365,19 @@ Options for function_graph tracer:
only a closing curly bracket "}" is displayed for
the return of a function.
+ funcgraph-retval
+ When set, the return value of each traced function
+ will be printed after an equal sign "=". By default
+ this is off.
+
+ funcgraph-retval-hex
+ When set, the return value will always be printed
+ in hexadecimal format. If the option is not set and
+ the return value is an error code, it will be printed
+ in signed decimal format; otherwise it will also be
+ printed in hexadecimal format. By default, this option
+ is off.
+
sleep-time
When running function graph tracer, to include
the time a task schedules out in its function.
@@ -2704,6 +2723,119 @@ It is default disabled.
0) 1.757 us | } /* kmem_cache_free() */
0) 2.861 us | } /* putname() */
+The return value of each traced function can be displayed after
+an equal sign "=". When encountering system call failures, it
+can be verfy helpful to quickly locate the function that first
+returns an error code.
+
+ - hide: echo nofuncgraph-retval > trace_options
+ - show: echo funcgraph-retval > trace_options
+
+ Example with funcgraph-retval::
+
+ 1) | cgroup_migrate() {
+ 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
+ 1) | cgroup_migrate_execute() {
+ 1) | cpu_cgroup_can_attach() {
+ 1) | cgroup_taskset_first() {
+ 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
+ 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
+ 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
+ 1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
+ 1) 4.369 us | } /* cgroup_migrate_execute = -22 */
+ 1) 7.143 us | } /* cgroup_migrate = -22 */
+
+The above example shows that the function cpu_cgroup_can_attach
+returned the error code -22 firstly, then we can read the code
+of this function to get the root cause.
+
+When the option funcgraph-retval-hex is not set, the return value can
+be displayed in a smart way. Specifically, if it is an error code,
+it will be printed in signed decimal format, otherwise it will
+printed in hexadecimal format.
+
+ - smart: echo nofuncgraph-retval-hex > trace_options
+ - hexadecimal: echo funcgraph-retval-hex > trace_options
+
+ Example with funcgraph-retval-hex::
+
+ 1) | cgroup_migrate() {
+ 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
+ 1) | cgroup_migrate_execute() {
+ 1) | cpu_cgroup_can_attach() {
+ 1) | cgroup_taskset_first() {
+ 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
+ 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
+ 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
+ 1) 2.335 us | } /* cpu_cgroup_can_attach = 0xffffffea */
+ 1) 4.369 us | } /* cgroup_migrate_execute = 0xffffffea */
+ 1) 7.143 us | } /* cgroup_migrate = 0xffffffea */
+
+At present, there are some limitations when using the funcgraph-retval
+option, and these limitations will be eliminated in the future:
+
+- Even if the function return type is void, a return value will still
+ be printed, and you can just ignore it.
+
+- Even if return values are stored in multiple registers, only the
+ value contained in the first register will be recorded and printed.
+ To illustrate, in the x86 architecture, eax and edx are used to store
+ a 64-bit return value, with the lower 32 bits saved in eax and the
+ upper 32 bits saved in edx. However, only the value stored in eax
+ will be recorded and printed.
+
+- In certain procedure call standards, such as arm64's AAPCS64, when a
+ type is smaller than a GPR, it is the responsibility of the consumer
+ to perform the narrowing, and the upper bits may contain UNKNOWN values.
+ Therefore, it is advisable to check the code for such cases. For instance,
+ when using a u8 in a 64-bit GPR, bits [63:8] may contain arbitrary values,
+ especially when larger types are truncated, whether explicitly or implicitly.
+ Here are some specific cases to illustrate this point:
+
+ **Case One**:
+
+ The function narrow_to_u8 is defined as follows::
+
+ u8 narrow_to_u8(u64 val)
+ {
+ // implicitly truncated
+ return val;
+ }
+
+ It may be compiled to::
+
+ narrow_to_u8:
+ < ... ftrace instrumentation ... >
+ RET
+
+ If you pass 0x123456789abcdef to this function and want to narrow it,
+ it may be recorded as 0x123456789abcdef instead of 0xef.
+
+ **Case Two**:
+
+ The function error_if_not_4g_aligned is defined as follows::
+
+ int error_if_not_4g_aligned(u64 val)
+ {
+ if (val & GENMASK(31, 0))
+ return -EINVAL;
+
+ return 0;
+ }
+
+ It could be compiled to::
+
+ error_if_not_4g_aligned:
+ CBNZ w0, .Lnot_aligned
+ RET // bits [31:0] are zero, bits
+ // [63:32] are UNKNOWN
+ .Lnot_aligned:
+ MOV x0, #-EINVAL
+ RET
+
+ When passing 0x2_0000_0000 to it, the return value may be recorded as
+ 0x2_0000_0000 instead of 0.
+
You can put some comments on specific functions by using
trace_printk() For example, if you want to put a comment inside
the __might_sleep() function, you just have to include
diff --git a/Documentation/trace/timerlat-tracer.rst b/Documentation/trace/timerlat-tracer.rst
index db17df312bc8..53a56823e903 100644
--- a/Documentation/trace/timerlat-tracer.rst
+++ b/Documentation/trace/timerlat-tracer.rst
@@ -180,3 +180,81 @@ dummy_load_1ms_pd_init, which had the following code (on purpose)::
return 0;
}
+
+User-space interface
+---------------------------
+
+Timerlat allows user-space threads to use timerlat infra-structure to
+measure scheduling latency. This interface is accessible via a per-CPU
+file descriptor inside $tracing_dir/osnoise/per_cpu/cpu$ID/timerlat_fd.
+
+This interface is accessible under the following conditions:
+
+ - timerlat tracer is enable
+ - osnoise workload option is set to NO_OSNOISE_WORKLOAD
+ - The user-space thread is affined to a single processor
+ - The thread opens the file associated with its single processor
+ - Only one thread can access the file at a time
+
+The open() syscall will fail if any of these conditions are not met.
+After opening the file descriptor, the user space can read from it.
+
+The read() system call will run a timerlat code that will arm the
+timer in the future and wait for it as the regular kernel thread does.
+
+When the timer IRQ fires, the timerlat IRQ will execute, report the
+IRQ latency and wake up the thread waiting in the read. The thread will be
+scheduled and report the thread latency via tracer - as for the kernel
+thread.
+
+The difference from the in-kernel timerlat is that, instead of re-arming
+the timer, timerlat will return to the read() system call. At this point,
+the user can run any code.
+
+If the application rereads the file timerlat file descriptor, the tracer
+will report the return from user-space latency, which is the total
+latency. If this is the end of the work, it can be interpreted as the
+response time for the request.
+
+After reporting the total latency, timerlat will restart the cycle, arm
+a timer, and go to sleep for the following activation.
+
+If at any time one of the conditions is broken, e.g., the thread migrates
+while in user space, or the timerlat tracer is disabled, the SIG_KILL
+signal will be sent to the user-space thread.
+
+Here is an basic example of user-space code for timerlat::
+
+ int main(void)
+ {
+ char buffer[1024];
+ int timerlat_fd;
+ int retval;
+ long cpu = 0; /* place in CPU 0 */
+ cpu_set_t set;
+
+ CPU_ZERO(&set);
+ CPU_SET(cpu, &set);
+
+ if (sched_setaffinity(gettid(), sizeof(set), &set) == -1)
+ return 1;
+
+ snprintf(buffer, sizeof(buffer),
+ "/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd",
+ cpu);
+
+ timerlat_fd = open(buffer, O_RDONLY);
+ if (timerlat_fd < 0) {
+ printf("error opening %s: %s\n", buffer, strerror(errno));
+ exit(1);
+ }
+
+ for (;;) {
+ retval = read(timerlat_fd, buffer, 1024);
+ if (retval < 0)
+ break;
+ }
+
+ close(timerlat_fd);
+ exit(0);
+ }
diff --git a/Documentation/trace/user_events.rst b/Documentation/trace/user_events.rst
index f79987e16cf4..e7b07313550a 100644
--- a/Documentation/trace/user_events.rst
+++ b/Documentation/trace/user_events.rst
@@ -14,10 +14,6 @@ Programs can view status of the events via
/sys/kernel/tracing/user_events_status and can both register and write
data out via /sys/kernel/tracing/user_events_data.
-Programs can also use /sys/kernel/tracing/dynamic_events to register and
-delete user based events via the u: prefix. The format of the command to
-dynamic_events is the same as the ioctl with the u: prefix applied.
-
Typically programs will register a set of events that they wish to expose to
tools that can read trace_events (such as ftrace and perf). The registration
process tells the kernel which address and bit to reflect if any tool has
@@ -144,6 +140,9 @@ its name. Delete will only succeed if there are no references left to the
event (in both user and kernel space). User programs should use a separate file
to request deletes than the one used for registration due to this.
+**NOTE:** By default events will auto-delete when there are no references left
+to the event. Flags in the future may change this logic.
+
Unregistering
-------------
If after registering an event it is no longer wanted to be updated then it can